Mapping numerically classified soil taxa in Kilombero Valley, Tanzania using machine learning
Introduction
The Kilombero Valley in Tanzania presents great potential for the expansion and intensification of rice production. This valley, covering an area of about 11,600 km2 (Kato, 2007), has been identified by the Government of Tanzania for financial and technological investments to expand and intensify rice production (TIC, 2013). Rice is the second most important cereal crop in Tanzania after maize (Bucheyeki et al., 2011), and its demand has been increasing following shift in preference by local population from traditional staples to rice, and increased market demands from neighboring countries. To develop and promote sustainable rice production intensification; farmers and policy makers need to identify the most suitable areas and respective management options. However, updated and detailed soil information to this support decision-making process is currently lacking.
Accurate soil information is crucial for informing management recommendations aimed to increase agricultural productivity and overall food security, especially in developing countries where the GDP is heavily dependent on the agricultural sector (Cook et al., 2008, Msanya et al., 2002). Relatively longer time is required to gather such information through conventional soil inventory and generally, larger amount of resources are required for such exercises (McBratney et al., 2003). Recent developments in remote and proximal sensing, computational methods and information technology, have provided means by which soil information can be collected, shared, communicated and updated more efficiently (Malone, 2013, McBratney et al., 2003, Scull et al., 2003, Vågen et al., 2013, Vågen et al., 2016, Winowiecki et al., 2016a, Winowiecki et al., 2016b). Predictive soil landscape model frameworks such as the SCORPAN approach (McBratney et al., 2003) could be used to predict continuous soil classes and soil attributes that better represent soil spatial variability. The increased availability of high resolution digital elevation models (DEMs) that provide predictive variables in digital soil mapping together with the advances in machine learning techniques add to the ease of generating spatial soil information and depicting uncertainty (Hansen et al., 2009, Haring et al., 2012, Subburayalu and Slater, 2013, Subburayalu et al., 2014).
The overall goal of this study was to develop a predictive soil map for a portion of Kilombero Valley, Tanzania to serve as a basis for quantitative land evaluation for intensified rice production. Machine learning informed by legacy soil map, new field data collection, and multiple sources for environmental correlates were combined and used for mapping of numerically derived soil classes. In this paper we report comparisons of two machine learning algorithms and three sources of terrain data.
Section snippets
Legacy soil map
The base map used to guide soil sampling was a reconnaissance legacy soil map developed in the late 1950s at a scale of 1:125,000 (FAO, 1961). The map was obtained in a scanned format from the World Soil Survey Archive and Catalog (WOSSAC). The legacy map was prepared based on aerial photo interpretation. The air photography at a scale of 1:30,000 was done by the British Royal Air Force in years 1948, 1949, and 1950; Hunting Aerosurveys Ltd. in 1955; Fairey Air Surveys Ltd. in 1956; and Air
Numerical classification
The numerical classification generated 13 soil classes (clusters). Classes grouped between one and five soil profiles. The legacy soil map identified 10 soil classes in the study area. The increased number of new soil classes identified in this study could be due to the techniques used to derive the legacy map (e.g., air photo interpretation) and the limited field data collection (due to floods and restrictive vegetation at the time) (FAO, 1961). It is also possible that some soil classes were
Conclusion
This work used DSM methods to map numerically classified soil clusters of a portion of Kilombero Valley, Tanzania. In this study, terrain based predictors derived from 1 arc SRTM DEM results were similar to that of 12 m WorldDEM despite differences in resolution. It was also demonstrated that RF algorithm was less sensitive to the training set sampling intensity compared to J48.
We suggest the use of RF algorithm and SRTM DEM combination for soil class mapping for the remainder of the Kilombero
Acknowledgement
This work builds on PhD research work by Boniface H.J. Massawe at the Ohio State University, USA. The authors are grateful to USAID's Innovative Agricultural Research Initiative (Cooperative Agreement 621-A-00-11-000090-00) (iAGRI) and Norman Borlaug Leadership Enhancement in Agriculture Program (2013 Borlaug LEAP Fellow, Spring) for funding this work.
References (54)
- et al.
Machine learning for predicting soil classes in three semi-arid landscapes
Geoderma
(2015) - et al.
Numerical classification of soil profile data using distance metrics
Geoderma
(2009) - et al.
Digital terron mapping
Geoderma
(2005) - et al.
Inductively mapping expert-derived soil-landscape units within Dampo wetland catena using multispectral and topographic data
Geoderma
(2009) - et al.
Spatial disaggregation of complex soil map units: a decision tree based approach in Bavarian forest soils
Geoderma
(2012) - et al.
On digital soil mapping
Geoderma
(2003) - et al.
The classification of soil profiles by traditional and numerical methods
Geordema
(1970) - et al.
Bottom-up digital soil mapping I. Soil layer classes
Geoderma
(2011) - et al.
Bottom-up digital soil mapping II. Soil series classes
Geoderma
(2011) - et al.
Optimization of soil adjusted vegetation indices
Remote Sens. Environ.
(1996)
Disaggregation of component soil series on an Ohio County soil survey map using possibilistic decision trees
Geoderma
Landsat-based approaches for mapping of land degradation prevalence and soil functional properties in Ethiopia
Remote Sens. Environ.
Mapping of soil properties and land degradation risk in Africa using MODIS reflectance
Geoderma
Effects of land cover on ecosystem services in Tanzania: a spatial assessment of soil organic carbon
Geoderma
WorldDEM™: The New Standard of Global Elevation Models. Airbus DS/Infoterra2014 GmbH, Germany
Determination of total, organic and available forms of phosphorus in soils
Soil Sci.
Total nitrogen
Spatial prediction of biological soil crust classes; value added DSM from soil survey
Assessment of rice production constraints and farmers preferences in Nzega and Igunga Districts
J. Adv. Dev. Res.
A new global demand for digital soil information
Comparison of SRTM DEM and ASTER GDEM Derived Digital Elevation Models with elevation points over the Lebanese territory
Lebanese J. Geogr.
The Rufiji Basin Tanganyika. FAO Exp. Techn. Ass. Progr. No. 1269 Vol. 7. Rome
Guidelines for Soil Description
Comparison of SRTM and ASTER derived Digital Elevation Models over two regions in Ghana - implications for hydrological and environmental modeling
Particle-size analysis
Use of a green channel in remote sensing of global vegetation from EOS-MODIS
Remote Sens. Environ.
mColorConverter
Cited by (26)
Incorporating machine learning models and remote sensing to assess the spatial distribution of saturated hydraulic conductivity in a light-textured soil
2023, Computers and Electronics in AgricultureAn integrated deep learning-based approach for automobile maintenance prediction with GIS data
2021, Reliability Engineering and System SafetyCitation Excerpt :Another study of potential groundwater mapping is that Rahmati et al. [29] deployed random forest and maximum entropy models for groundwater potential mapping is investigated at Mehran Region, Iran. Massawe et al. [30] proposed a mapping approach for soil taxa mapping based on heterogeneous data, which was collected from different sources including satellite image, digital elevation map and digital soil map. The collected features include soil classes, effects of living organisms (vegetation), terrain parameters and spatial location.
Predicting weathering indices in soils using FTIR spectra and random forest models
2021, CatenaCitation Excerpt :For instance, Liu et al., (2020) used 44 soil samples to successfully estimate soil heavy metals concentration using PLSR. Also, Massawe et al., (2018) used only 33 soil profiles to predict soil taxa by RF in an area covering 11,600 km2. Du et al., (2009) used 56 top soil samples to predict nitrogen, phosphorous, potassium and organic matter content using PLSR.
Using environmental variables and Fourier Transform Infrared Spectroscopy to predict soil organic carbon
2021, CatenaCitation Excerpt :Over the past few decades, the sample size have considerably varied between different studies and one third of the studies have applied a sample with less than 150 soil samples, mostly for local or small-scale areas (Wadoux et al., 2020). For instance, Massawe et al. (2018) applied 33 soil profiles to estimate soil taxa using ML algorithms over a 11,600 km2. Vohland et al. (2014) used 60 soil samples to predict organic carbon, nitrogen and microbial biomass-C using partial least squares regression (PLSR) model and the combination of PLSR and competitive adaptive reweighted sampling (CARS) model.
IRAKA: The first Colombian soil information system with digital soil mapping products
2021, CatenaCitation Excerpt :To analyze the stack, 3 sequential processes were carried out: first, a correlation analysis between properties and covariates was performed; then covariates with zero or close to zero variance were eliminated through the nearZeroVar function of the caret package (Kuhn et al., 2018); and lastly, the Spearman correlation coefficients (r) were calculated between environmental covariates, and those with a coefficient higher than 0.9 were eliminated (Zeraatpisheh et al., 2019). A regression matrix was built with the selected covariates, and this allowed extraction of the covariate values at the coordinates of each sampling point (Massawe et al., 2018). The dataset was stratified into eight strata according to the source of the studies described in Section 2.2.