Abstract

A GIS-based study has been carried out to map areas landslide susceptibility using both frequency ratio (FR) and Shannon entropy (SE) bivariate statistical models. A total of 270 landslides were identified and classified randomly into training landslides datasets (70%) and the remaining (30%) of landslides datasets were used for validation purpose. The 11 landslides conditioning factors like slope, elevation, aspect, curvature, topographic wetness index, normalized difference vegetation index, distance from road, distance from river, distance from faults, land use, and rainfall were integrated with training landslides to determine the weights of each landslide conditioning factor and factor classes using both frequency ratio and Shannon entropy models. The landslide susceptibility maps were produced by overlay the weights of all the landslide conditioning factors using raster calculator of the spatial analyst tool in ArcGIS 10.4. The final landslide susceptibility maps were reclassified as very low, low, moderate, high, and very high susceptibility classes both FR and SE models. This susceptibility maps were validated using landslide area under the curve (AUC). The results of AUC accuracy models showed that the success rates of the FR and SE models were 0.761 and 0.822, while the prediction rates were 0.753 and 0.826, respectively.

1. Introduction

Landslides are one of the nature hazards causing a lot of casualties and property losses of all over the world [1, 2]. Natural hazards such as landslides, flood, earthquake, and drought risk cannot be avoided completely but the processes and consequences can be mitigated [3, 4]. Landslides are more widespread than any other geological event and can occur anywhere in the world. They occur when large masses of soil, rocks, or debris move down a slope due to the effect of a natural phenomenon or human activity [5, 6].

In Ethiopia, landslides, mostly manifested as rock fall, earth slide, debris, and mudflow are among the major geo-hazards, especially in the steep and hilly areas of the highlands with greater than 1500 m altitude [7, 8]. According to M. Meten et al. [9], from 1960 to 2010, about 388 people are reported dead, 24 injured and a great deal of agricultural lands, houses, and infrastructures were affected. The occurrence of landslides is an extremely complex phenomenon which depends upon various factors such as geologic structure, lithological association, topography, rainfall, earthquake, and human activity [10]. One of the most widely used approaches to reduce the landslide damages is preparing a landslide susceptibility mapping using suitable models and selecting the effective conditioning factors [11, 12]. Over the last decades, many studies have made contributions in landslide susceptibility maps using qualitative and quantitative methods. Some of the methods include the frequency ratio model [2, 4, 1318]. A combination of both FR and SE have been applied for landslide susceptibility mapping [1924], weights of evidence model [12, 2529], and Shannon entropy model [11, 3033]. Landslide susceptibility models are based on the bivariate FR and WOE models [34] and frequency ratio and information value models [1, 10, 35]. Landslide susceptibility models based on the bivariate frequency ratio and multivariate logistic regression models [3640] etc., are used with the development of the GIS techniques. GIS platforms help in the calculation and visualization of the cumulative effects of conditioning factors on landslides.

In this study, we have used Shannon entropy (SE) and frequency ratio (FR) models for the development of landslide susceptibility maps of Dejen district, Ethiopia. Dejen district, one of the most landslide occurrence districts in northwestern part of Ethiopia with the area of Abay Gorge along the highway main road from Addis Ababa to Bahir Dar. Abay Gorge (Dejen to Goha Tsion) road is reputedly damaged and interrupted by the case of landslide occurrence, especially in the summer season. So, to address all these issues, the results of this study can help for decision makers to delineate the landslide occurrence areas in the study area.

2. Materials and Methods

2.1. Study Area

Dejen district is located in the northwestern part of Ethiopia with an area of 557.48 km2. According to UTM coordinate system, the location of Wereda is approximately between longitude 395000 m E425000 m E and latitude 1110000 m N–1140000 m N direction as shown in Figure 1. Topographically, the altitude ranges from 991 m to 2559 m and the slope angle varies from 0 to 66 degrees. In terms of land use, most of the Wereda is covered by scrub/shrub and agricultural area. The study area receives high amount of rainfall during the summer season. The average recorded annual precipitation of the area was 1070 mm. The geological units of the study area encompass eight distinct features with the stepwise. The geological age of these formations are Palaeozoic and Mesozoic era that composed of sandstones, limestone, gypsum, and shale [22].

2.2. Data Source and Methodology

In this research, to achieve the main objective was after using primary and secondary data. The primary data were collected from field survey and observation and the secondary data for the study were acquired from governmental institutions, journals, internet, and other documents. The main data used for this study were land sentinel 2 images and ASTER GDEM of the area with spatial resolution of 30 m, Google Earth imagery and topographical map of the area. The data layer of land use and NDVI was derived from sentinel 2 images and ASTER GDEM data used to create the slope, elevation, aspect, curvature, TWI, and river networks data layers and their extents through spatial analysis tools. Another data were used in this study, the average annual rainfall of metrological data and geological map of the study area. The geological map was used to create faults layer of the study area. In the present study, various thematic maps were prepared by digitized from Google Earth imagery, topographical and geological maps of the area. The main road networks were digitized from the topographical map and the fault layer was digitized from geological map. The other data sets of landslides were digitized from the study area of Google Earth imagery, shows in Table 1. All the data layers have been constructed and combined in ArcGIS 10.4 tool. ArcGIS tool was applied throughout the whole process in this study. Accordingly, the FR and SE models were used to generate elaborative landslide susceptibility maps. For the purpose of assessment and validation of landslide susceptibility maps, the AUC methods were used, as shown in Figure 2.

2.3. Landslide Inventory Map

Landslide inventory mapping is the systematic mapping of existing landslides in a region using various techniques such as field survey, aerial photographs or Google Earth imagery interpretation, satellite image interpretation, literature search for historical landslide records, technical and scientific reports, governmental reports, and the interview of experts [41, 42]. In this research, the landslide inventory map which has a total of 280 single landslide locations was created based on Google Earth imagery digitized into points using GIS 10.4 and field visits. Though there is no specific rule for defining how landslide occurrence will be allocated into training and validation data sets [43], usually research work has been done by using 70% of landslides events as training data sets for preparing landslide susceptibility model and the rest 30% have been used for validation of the output model [11, 14, 44]. In this study, 196 (70%) of the landslides were used to training landslide susceptibility models and the remaining 84 (30%) of the landslides were used to model validation, as shown in Figure 3.

2.4. Landslide Conditioning Factors

To identify a landslide occurrence conditioning factors is a very complex task. Because there is no standard rule to select which factor to be used or not, rather than deciding on the nature of area and data availability [45]. In this study, eleven conditioning factors were selected based on the literature, effectiveness, availability of data, and the relevance with respect to land slide occurrence [23]. These conditioning factors are slope, elevation, aspect, and curvature, topographic wetness index, normalized difference vegetation index, distance from road, distance from river, and distance from faults, land use, and rainfall. All the selected conditioning factors were used to perform the landslide susceptibility mapping. Each factor was converted to a raster format with a spatial resolution 30 × 30 m and was classified based on the Jenks natural breaks method in ArcGIS application, shown in Figure 4.

In landslide susceptibility studies, slope is considered one of the major contributions of landslide conditioning factor of slope failure [21, 46]. According to the importance of the slope conditioning factor in the landslide occurrence, the study area was classified into seven classes in degree. With increase in slope angle, the possibility of landslide occurrence increases [19, 47, 48]. Elevation is an important conditioning factor in landslide susceptibility mapping and it also impacts the environmental conditions on slopes such as human activity, vegetation, soil moisture, and climate [49, 50]. Curvature plays an important role in the surface run off and ground infiltration, thus affecting the erosion of the surface and ground water condition of the region [17]. The curvature map of the study area was classified into concave (negative), convex (positive), and flat (zero) surfaces. In the case of curvature, the more negative the value, the higher the probability of landslide occurrence [29]. Aspect represents the direction that a slope faces [49]. Slope aspect affects erosion, surface evaporation, desertification, solar heating, and surface weathering, thus affecting the occurrence of landslides [46, 51]. Topographic wetness index (TWI) is among one of the important factors responsible for the landslide, which can quantitatively display the control of terrain on the spatial distribution of soil moisture, is a widely used terrain attribute. The TWI conditioning factor was obtained from DEM with 30 m spatial resolution by the following equation:where As is the specific catchment area (m2/m) and β is slope angle in degrees [52]. TWI is used to measure the topographic control of hydrological procedures [53]. Rainfall is considered to be one of the landslides occurrences a conditioning factor. Rainfall map was prepared using station locations in the study area through the IDW interpolation method of annual average precipitation (1990–2021). Distance to road is one of the most effective factors on landslide occurrence in a hilly area [1]. Road construction near the hillside may lead to changes in the natural conditions of areas. Distance to river networks plays an important role in landslide occurrence factor closely to surface water. The NDVI conditioning factor was obtained from Sentinel-2 satellite imagery with 30 m spatial resolution by the following equation:where IR is the infrared and R is the red bands of the electromagnetic spectrum. NDVI values range between −1.0 and 1.0, where any negative values are mainly generated from clouds, water, and snow and values near zero are mainly generated from rock and bare soil and the positive value indicates that the ground is covered by vegetation. Land use is an important conditioning factor that affects the occurrence of landslides. The map of land use was derived from Sentinel-2 satellite imagery, by using a supervised classification technique in ArcGIS. The land use map was classified into six classes. The study area is predominantly covered with the cropland and scrubs. Distance to faults is considered a highly landslides conditioning factor in many research studies [31, 37, 54, 55]. The strength of rocks decreases with the amount of joints, which increase with the distance to faults [12].

2.5. Landslide Susceptibility Modeling
2.5.1. Frequency Ratio (FR) Model

Frequency ratio is one of the most widely adopted and popular methods for landslide susceptibility assessment [14, 16]. FR is one of the most cited bivariate statistical analysis methods in natural hazard studies, like flood, landslide, and drought hazard [56]. The frequency ratio is the ratio of the area where landslides occurred in the total study area and also is the ratio of the probabilities of a landslide occurrence to a non-landslide occurrence for a given attribute [57, 58]. Generally, a greater ratio indicates a stronger relationship between a conditioning factor and landslide and vice versa. A value of 1 is an average value for the area landslides occurring in the total area. If the FR value is greater than 1, it indicates a high probability of landslide occurrence, and a value less than 1 indicates a low relationship between probabilities of landslide occurrence. The landslides susceptibility map (LSM) can be calculated by summing the FR of all of the factors considered in the following equation:where LSM is landslide susceptibility map and FR represents for each factor type or class, n is the number of factors. FR was applied and the weights were assigned to each class of each conditioning factor. The FR can be obtained by the following equation as follows:where the number of landslide pixels in class i of the factor X is represented by Npix (SXi); the total number of pixels within factor Xj is represented by Npix(Xj); m is the number of classes in factor Xi; and n is the total number of factors in the study area [58].

2.5.2. Shannon Entropy (SE) Model

The second model used for LSM in this study is the bivariate of the Shannon entropy model. The Shannon’s entropy model is an improvement on the frequency ratio model [59]. Shannon’s entropy measures the instability, disturbance, or uncertainties of a system [20]. In fact, Shannon’s entropy states a way to estimate main factors among effective factors of an objective weight of the index system. The following equations were used to calculate the information of the coefficient:where is frequency ratio, is the probability density for each class I in factor j; and Hjmax are entropy values; Ij is the information coefficient of factor j; Sj is the number of classes; and Wj is the final weight of each factor. The final landslide susceptibility map (LSM) was calculated using the following equation:where i is the number of particular parametric map; z is the number of classes within parametric map with the greatest number of classes; mi is the number of classes within particular parametric map; C is value of the class after secondary classification; and is the weight of a parameter.

3. Results and Discussion

3.1. Frequency Ratio (FR) Model

FR was measured for each class of every landslide conditioning factor by dividing the landslide occurrence ratio by the area ratio. The results of the FR model for each of the classes of effective factors are shown in Table 2. In general, the FR value of 1 indicates the average correlation between landslide occurrence and effective factors. If the FR value would be greater than 1, there is a high landslide occurrence and FR value less than 1 indicates that low landslide occurrence [43]. The analysis of FR for the relationship between landslide occurrence and slope degree indicate that class 47.68°–57.21°, the highest FR value of 11.316 among the other classes of slope degree and followed by 57.21°–66.75°, 38.14°–47.68°, 28.61°–38.14°, and 19.07°–28.61°, the FR ratios 9.613, 5.869, 3.924, and 2.027, respectively. Subsequently, at slopes class 0°–9.53° and 9.53°–19.07° (FR = 0.161 and FR = 0.694), respectively, indicating a low probabilities of landslide occurrence. In the study area, was observed that when landslide occurrence probability increased as the slope gradient increased up to a certain extent, and then, it decreased with results of other literature studies [20]. Because the higher slope values trigger the effect of gravity and also increase shear stress [42]. According to the relationship between landslide occurrence and elevation factor indicate that the ranges between 991 m–1215 m and 1215 m–1439 m, (FR = 0.599 and FR = 0.696, respectively), which implies a low probabilities of landslide occurrence in the study area. The elevation ranges between 1439 m–1663 m, 1663 m–1887 m, 1887 m−2111 m, and 2111 m–2559 m and has the highest FR values (1.211, 1.573, 1.872, and 2.746, respectively), indicating a high probabilities of landslide occurrence. In the study area, as the elevation increases, the probability of landslide occurrence increases up to a certain extent, and then, it decreased. In the case of aspect factor classes are the most abundance on east facing (FR = 1.872), south east facing (FR = 2.147), south facing (FR = 1.627), and south west facing (FR = 1.153), indicating a high probabilities of landslide occurrence. However, the remaining aspect classes have the lowest abundance of FR value less than 1, it indicates that a low probabilities of landslide occurrence. Considering the case of land use, results show that the agricultural land, scrub/shrub, and bare land use types have values of FR (1.105, 1.026, and 1.280, respectively), which implies that a high probabilities of landslide occurrence. The highest FR value of agricultural land, scrub/shrub, and bare land is due to its exposure to erosion and soil moisture [37]. In the case of curvature, factor classes of concave (−14.19–(−4.07)) and convex (6.04–16.16) have the highest value of FR (8.840 and 10.026), respectively, indicating a high probabilities of landslide occurrence. Subsequently, curvature class of a flat slope (−4.07–6.04) has a low FR value (0.968), indicating low probabilities of landslide occurrence in this area. The distance from road classes 1808 m–3616 m and 3616 m–5424 m with a value of FR (1.643 and 1.282), respectively, has the greatest impact on landslide coherence. In the study area, the landslide frequency increases as the distance from roads decreases. Therefore, the existing road and the ongoing constructions disturb the stability of slope there by increasing the probability of landslide occurrence with results of other literature studies [19, 20]. According to F. Guzzetti [60], the landslide probability decreases with the increasing distance from river networks. In this study area, distance from river networks, 0–500 m class exerts the highest influence on landslide occurrence in this study. The reason is that permanent rivers are the main source of moisture for landslide occurrence. According to the results for distance from faults, the class of 0–500 m with a value of FR (2.551) has the greatest impact on landslide occurrence in the study area. In the NDVI, the FR value is greater than one, where the NDVI classes −0.1–0.07, 0.15–0.17, and 0.17–0.20, indicating a high probabilities of landslides occurrence. This class of NDVI means bare land, built up areas, and scrubs. However, the remaining NDVI classes have low FR value less than one; with relatively high vegetation coverage can easily lead to landslide occurrence. The relationship between TWI landslide probabilities showed that 2.59–4.61, 10.68–12.70, and 14.72–16.74 classes have the highest value of FR (3.043, 1.216, and 10.231), respectively, greater than one. With regard to the conditioning factor rainfall, four classes with 1335–1350 mm, 1350–1365 mm, 1410–1425 mm, and 1425–1440 mm have a higher FR value than the other classes and are the most landslide occurrence classes.

3.2. Shannon Entropy (SE) Model

The results of SE for the relationship of the effective factors with the occurrence of landslides are presented in Table 2. The weight of each conditioning factors in the Shannon’s entropy model was based on frequency ratio (FR) values. The results showed that curvature, slope and TWI are the most dominant conditioning factors in the landslide susceptibility with SE weights of (Wj = 1.481, Wj = 0.964 and Wj = 0.758), respectively, followed by distance from faults, distance from river, elevation, aspect, and distance from road with SE weights (Wj = 0.145, Wj = 0.135, Wj = 0.129, Wj = 0.129, and Wj = 0.103), respectively. In this study area, the reaming conditioning factors are less significant in the landslide occurrence. From the result Eij, it is seen that slope degree interval of 47.68°–57.21° is highly probabilities to landslide occurrence, followed by the slope class 57.21°–66.75°. The other classes have low values. In the case of elevation, the altitude ranges between 2111 m and 2335 m and has a highly probability of landslide occurrence among other classes of elevation. According to aspect, south east facing followed by east facing aspects are the most abundant of landslide occurrence in the study area. In the case of land use, bare land followed by agricultural area, indicating a highly probabilities of landslide occurrence, with relatively a lower vegetation coverage. The Eij value for curvature clearly showed that classes convex (6.04–16.16) and concave (−14.19–(−4.07)), with high values of 0.505 and 0.446, respectively. The distance to roads, 1808 m–3616 m class has the highest Eij value (0.344) followed by 5424 m–7232 m (0.268). However, the remaining classes have a low landslides occurrence in the area. In the case of distance to rivers, the range between 0 and 500 m has a high Eij value among other classes, with indicating that a high prone landslide occurrence. Generally, the distance to rivers shows that the Eij value decreases as the distance to river increases. From this, it is clear that the bank erosion is one of the main triggering factors [30]. According to the results for distance from faults, the class of 0–500 m with a value of Eij (0.474) has the greatest impact on landslide occurrence. In the case of the relationship between landslide occurrence and NDVI, the highest Eij value (0.271) was located in the NDVI class of (−0.1–0.07) has the most effect on the occurrence of landslides. In the case of TWI, 14.72–16.74 class has a very highest Eij value (10.231) with other classes of TWI. In rainfall, the highest Eij value (0.261) was located in the rainfall class of 1335–1350 mm. The results based on the Shannon entropy (SE) model approach show that slope, curvature and TWI are the most important factors which explain better the landslide occurrence and distribution in the study area. It should be noted that the landslide contributing factors may vary from place to place by the nature of area and data availability [45].

3.3. Landslide Susceptibility Maps

The map of each conditioning factor is prepared with the help of ArcGIS 10.4, and then, the frequency ratio values were calculated. The calculated FR values for each pixel in the LSM indicate the relative susceptibility to landslide occurrence. The higher pixel values of LSM have the higher landslide susceptibility while the lower pixel values will have lower susceptibility. The landslide susceptibility map was calculated based on the frequency ratio values that have been determined in the training process that can be added in a raster calculator of ArcGIS 10.4, as follows:

The LSM values for the frequency ratio model in the study area range from 218.78 to 611.49. These values were classified into five susceptibility classes of very low, low, moderate, high, and very high susceptibility using the geometrical interval method for visual interpretation (Figure 5(a)). From the output of analysis carried out using the ArcGIS 10.4 (Table 3), the very low and low susceptibility zones cover 13.68% and 27.19% of the study area, respectively; whereas, the moderate, high, and very high susceptibility zones cover 30.46%, 20.72%, and 7.94% of the total area, respectively.

The landslide susceptibility map was produced from the Shannon entropy model (Figure 5(b)). The simplest landslide susceptibility equation for this model is given as follows:

The LSM value varies from 6.33 to 16.55 for the Shannon entropy model. These values were classified into five susceptibility classes of very low, low, moderate, high, and very high susceptibility using the geometrical interval method. Then, the very low susceptible zone covers 16.77% of the total study area, whereas low, moderate, high, and very high susceptible zones cover 31.20%, 28.84%, 17.18%, and 6.00% of the total area, respectively (Table 3).

3.4. Validation of Landslide Susceptibility Maps

After obtaining the landslide susceptibility maps using FR and SE models, their validation is necessary in order to check their reliability. Without model validation, landslide susceptibility maps will not be meaningful. In the present study, the performance of the LSM produced by FR and SE models, were evaluated using area under the curve (AUC). The area under the curve (AUC) is the measure that indicates the accuracy of the landslide susceptibility maps by creating success and prediction rate curves. The success rate curve represented the model fitness to existing landslide and the comparison of the training dataset with the landslide susceptibility map provides the success rate curve. The prediction rate curve indicates the model efficiency to predict future landslide and the comparison of the validation dataset with the landslide susceptibility map provides the prediction rate curve [43, 61]. For this study, 196 (70%) of the landslides were used to training landslide susceptibility models and the remaining 84 (30%) of the landslides were used to model validation. The success and predictive rate curves can be created for both FR and SE models by using ROC module in ArcGIS 10.4 tool. The AUC rate curves were drawn through the x-axis both the training and validation landslides (true positive rate) and y-axis (false positive rate). The total AUC can be used to determine prediction accuracy of the susceptibility map qualitatively in which larger area means higher accuracy achieved. The AUC value ranges from 0.5 to 1.0 are used to evaluate the accuracy of the model [61]. The qualitative relationship between AUC and prediction accuracy can be classified as follows; excellent (0.9-1.0); very good (0.8-0.9); good (0.7-0.8); average (0.6-0.7), and fair (0.5-0.6), [61]. If AUC value is close to 1.0, then, the model will have ideal performance, where as a value is equal or less than 0.5, then, the model will have poor performance [62]. The result showed that, the AUC of the success rate curves was 0.761 for the FR model and 0.822 for the SE model, which be equivalent to 76.1% and 82.2% predication accuracy, respectively (Figure 6(a)). The AUC of the prediction rate curves were 0.753 for the FR model and 0.826 for the SE model, which be equivalent to 75.3% and 82.6% predication accuracy, respectively (Figure 6(b)). The AUC of the success rate and predictive rate curves range between 0.7 and 0.8, indicating that a good performance of FR model. Also, the success rate and predictive rate curves range between 0.8 and 0.9, indicating a very good performance of the SE model. Therefore, based on the calculated AUC, it is clear that the SE model exhibited better result for landslide susceptibility mapping in the study area.

4. Conclusion

In this study, two bivariate models (i.e., frequency ratio and Shannon entropy models) were used to identify the landslides susceptible areas in Degen Wereda, north western, Ethiopia; using GIS environment has been presented. Eleven landslide conditioning factors were selected based on the availability and effective data. These factors were slope, elevation, aspect, land use, curvature, and distance from road, distance from river networks, and distance from faults, NDVI, TWI, and rainfall to prepare landslide susceptibility maps. A land slide inventory map was prepared using Google Earth imagery and filed survey assessment. For this process, 280 landslide locations were identified and mapped. Also classified into 70% (196) landslides were used to training and 30% (84) of the landslides were used to validation purpose. The susceptibility maps produced by FR and SE models were divided into five susceptibility classes such as very low, low, moderate, high, and very high susceptibility classes based on the geometric interval method. The AUC rate curve quantitatively indicates the performance of the susceptibility maps. The model of Shannon entropy results showed that the accuracies of success rate (82.20%) and predicative rate (82.60%) of the landslide susceptibility map. Similarly, the model of frequency ratio results showed that the accuracies of success rate (76.10%) and predicative rate (75.30%) of the landslide susceptibility map. So, the Shannon entropy (SE) model has a higher AUC than the frequency ratio (FR) model. Finally, this study confirmed that the models of FR and SE were found to be simple, reliable, and effective models for landslide susceptibility mapping of the study area. The final output of landslide susceptibility maps can help the decision makers as basic information for the concerned authorities of government and non-government, district and zonal level of land use planning to perform proper actions in order to prevent and mitigate the existing and future landslides occurrence.

Data Availability

The data used to support the findings of this study are available from the corresponding author upon request.

Conflicts of Interest

The authors declare that they have no conflicts of interest.

Acknowledgments

The authors convey their thanks to authors and staff of Civil Engineering Department, University of Debre Markos, Ethiopia. This research was funded by the authors.