Elsevier

Geoderma

Volume 311, 1 February 2018, Pages 143-148
Geoderma

Mapping numerically classified soil taxa in Kilombero Valley, Tanzania using machine learning

https://doi.org/10.1016/j.geoderma.2016.11.020Get rights and content

Highlights

  • RF is less sensitive to training set sampling intensity than J48 algorithms.

  • Soil taxa predictions from 1 arc SRTM and 12 m WordDEM are identical.

  • RF and SRTM combination is suggested to predict soil taxa in Kilombero valley.

Abstract

Inadequacy of spatial soil information is one of the limiting factors to making evidence-based decisions to improve food security and land management in the developing countries. Various digital soil mapping (DSM) techniques have been applied in many parts of the world to improve availability and usability of soil data, but less has been done in Africa, particularly in Tanzania and at the scale necessary to make farm management decisions. The Kilombero Valley has been identified for intensified rice production. However the valley lacks detailed and up-to-date soil information for decision-making. The overall objective of this study was to develop a predictive soil map of a portion of Kilombero Valley using DSM techniques. Two widely used decision tree algorithms and three sources of Digital Elevation Models (DEMs) were evaluated for their predictive ability. Firstly, a numerical classification was performed on the collected soil profile data to arrive at soil taxa. Secondly, the derived taxa were spatially predicted and mapped following SCORPAN framework using Random Forest (RF) and J48 machine learning algorithms. Datasets to train the model were derived from legacy soil map, RapidEye satellite image and three DEMs: 1 arc SRTM, 30 m ASTER, and 12 m WorldDEM. Separate predictive models were built using each DEM source. Mapping showed that RF was less sensitive to the training set sampling intensity. Results also showed that predictions of soil taxa using 1 arc SRTM and 12 m WordDEM were identical. We suggest the use of RF algorithm and the freely available SRTM DEM combination for mapping the soils for the whole Kilombero Valley. This combination can be tested and applied in other areas which have relatively flat terrain like the Kilombero Valley.

Introduction

The Kilombero Valley in Tanzania presents great potential for the expansion and intensification of rice production. This valley, covering an area of about 11,600 km2 (Kato, 2007), has been identified by the Government of Tanzania for financial and technological investments to expand and intensify rice production (TIC, 2013). Rice is the second most important cereal crop in Tanzania after maize (Bucheyeki et al., 2011), and its demand has been increasing following shift in preference by local population from traditional staples to rice, and increased market demands from neighboring countries. To develop and promote sustainable rice production intensification; farmers and policy makers need to identify the most suitable areas and respective management options. However, updated and detailed soil information to this support decision-making process is currently lacking.

Accurate soil information is crucial for informing management recommendations aimed to increase agricultural productivity and overall food security, especially in developing countries where the GDP is heavily dependent on the agricultural sector (Cook et al., 2008, Msanya et al., 2002). Relatively longer time is required to gather such information through conventional soil inventory and generally, larger amount of resources are required for such exercises (McBratney et al., 2003). Recent developments in remote and proximal sensing, computational methods and information technology, have provided means by which soil information can be collected, shared, communicated and updated more efficiently (Malone, 2013, McBratney et al., 2003, Scull et al., 2003, Vågen et al., 2013, Vågen et al., 2016, Winowiecki et al., 2016a, Winowiecki et al., 2016b). Predictive soil landscape model frameworks such as the SCORPAN approach (McBratney et al., 2003) could be used to predict continuous soil classes and soil attributes that better represent soil spatial variability. The increased availability of high resolution digital elevation models (DEMs) that provide predictive variables in digital soil mapping together with the advances in machine learning techniques add to the ease of generating spatial soil information and depicting uncertainty (Hansen et al., 2009, Haring et al., 2012, Subburayalu and Slater, 2013, Subburayalu et al., 2014).

The overall goal of this study was to develop a predictive soil map for a portion of Kilombero Valley, Tanzania to serve as a basis for quantitative land evaluation for intensified rice production. Machine learning informed by legacy soil map, new field data collection, and multiple sources for environmental correlates were combined and used for mapping of numerically derived soil classes. In this paper we report comparisons of two machine learning algorithms and three sources of terrain data.

Section snippets

Legacy soil map

The base map used to guide soil sampling was a reconnaissance legacy soil map developed in the late 1950s at a scale of 1:125,000 (FAO, 1961). The map was obtained in a scanned format from the World Soil Survey Archive and Catalog (WOSSAC). The legacy map was prepared based on aerial photo interpretation. The air photography at a scale of 1:30,000 was done by the British Royal Air Force in years 1948, 1949, and 1950; Hunting Aerosurveys Ltd. in 1955; Fairey Air Surveys Ltd. in 1956; and Air

Numerical classification

The numerical classification generated 13 soil classes (clusters). Classes grouped between one and five soil profiles. The legacy soil map identified 10 soil classes in the study area. The increased number of new soil classes identified in this study could be due to the techniques used to derive the legacy map (e.g., air photo interpretation) and the limited field data collection (due to floods and restrictive vegetation at the time) (FAO, 1961). It is also possible that some soil classes were

Conclusion

This work used DSM methods to map numerically classified soil clusters of a portion of Kilombero Valley, Tanzania. In this study, terrain based predictors derived from 1 arc SRTM DEM results were similar to that of 12 m WorldDEM despite differences in resolution. It was also demonstrated that RF algorithm was less sensitive to the training set sampling intensity compared to J48.

We suggest the use of RF algorithm and SRTM DEM combination for soil class mapping for the remainder of the Kilombero

Acknowledgement

This work builds on PhD research work by Boniface H.J. Massawe at the Ohio State University, USA. The authors are grateful to USAID's Innovative Agricultural Research Initiative (Cooperative Agreement 621-A-00-11-000090-00) (iAGRI) and Norman Borlaug Leadership Enhancement in Agriculture Program (2013 Borlaug LEAP Fellow, Spring) for funding this work.

References (54)

  • S.K. Subburayalu et al.

    Disaggregation of component soil series on an Ohio County soil survey map using possibilistic decision trees

    Geoderma

    (2014)
  • T.-G. Vågen et al.

    Landsat-based approaches for mapping of land degradation prevalence and soil functional properties in Ethiopia

    Remote Sens. Environ.

    (2013)
  • T.-G. Vågen et al.

    Mapping of soil properties and land degradation risk in Africa using MODIS reflectance

    Geoderma

    (2016)
  • L. Winowiecki et al.

    Effects of land cover on ecosystem services in Tanzania: a spatial assessment of soil organic carbon

    Geoderma

    (2016)
  • Airbus Defence and Space

    WorldDEM™: The New Standard of Global Elevation Models. Airbus DS/Infoterra2014 GmbH, Germany

    (2014)
  • R.H. Bray et al.

    Determination of total, organic and available forms of phosphorus in soils

    Soil Sci.

    (1945)
  • J.M. Bremner et al.

    Total nitrogen

  • C.B. Brungard et al.

    Spatial prediction of biological soil crust classes; value added DSM from soil survey

  • T.L. Bucheyeki et al.

    Assessment of rice production constraints and farmers preferences in Nzega and Igunga Districts

    J. Adv. Dev. Res.

    (2011)
  • S.E. Cook et al.

    A new global demand for digital soil information

  • J.A. Doumit

    Comparison of SRTM DEM and ASTER GDEM Derived Digital Elevation Models with elevation points over the Lebanese territory

    Lebanese J. Geogr.

    (2013)
  • FAO

    The Rufiji Basin Tanganyika. FAO Exp. Techn. Ass. Progr. No. 1269 Vol. 7. Rome

    (1961)
  • FAO

    Guidelines for Soil Description

    (2006)
  • G. Forkuor et al.

    Comparison of SRTM and ASTER derived Digital Elevation Models over two regions in Ghana - implications for hydrological and environmental modeling

  • G.W. Gee et al.

    Particle-size analysis

  • A.A. Gitelson et al.

    Use of a green channel in remote sensing of global vegetation from EOS-MODIS

    Remote Sens. Environ.

    (1999)
  • Y. He

    mColorConverter

    (2013)
  • Cited by (26)

    • An integrated deep learning-based approach for automobile maintenance prediction with GIS data

      2021, Reliability Engineering and System Safety
      Citation Excerpt :

      Another study of potential groundwater mapping is that Rahmati et al. [29] deployed random forest and maximum entropy models for groundwater potential mapping is investigated at Mehran Region, Iran. Massawe et al. [30] proposed a mapping approach for soil taxa mapping based on heterogeneous data, which was collected from different sources including satellite image, digital elevation map and digital soil map. The collected features include soil classes, effects of living organisms (vegetation), terrain parameters and spatial location.

    • Predicting weathering indices in soils using FTIR spectra and random forest models

      2021, Catena
      Citation Excerpt :

      For instance, Liu et al., (2020) used 44 soil samples to successfully estimate soil heavy metals concentration using PLSR. Also, Massawe et al., (2018) used only 33 soil profiles to predict soil taxa by RF in an area covering 11,600 km2. Du et al., (2009) used 56 top soil samples to predict nitrogen, phosphorous, potassium and organic matter content using PLSR.

    • Using environmental variables and Fourier Transform Infrared Spectroscopy to predict soil organic carbon

      2021, Catena
      Citation Excerpt :

      Over the past few decades, the sample size have considerably varied between different studies and one third of the studies have applied a sample with less than 150 soil samples, mostly for local or small-scale areas (Wadoux et al., 2020). For instance, Massawe et al. (2018) applied 33 soil profiles to estimate soil taxa using ML algorithms over a 11,600 km2. Vohland et al. (2014) used 60 soil samples to predict organic carbon, nitrogen and microbial biomass-C using partial least squares regression (PLSR) model and the combination of PLSR and competitive adaptive reweighted sampling (CARS) model.

    • IRAKA: The first Colombian soil information system with digital soil mapping products

      2021, Catena
      Citation Excerpt :

      To analyze the stack, 3 sequential processes were carried out: first, a correlation analysis between properties and covariates was performed; then covariates with zero or close to zero variance were eliminated through the nearZeroVar function of the caret package (Kuhn et al., 2018); and lastly, the Spearman correlation coefficients (r) were calculated between environmental covariates, and those with a coefficient higher than 0.9 were eliminated (Zeraatpisheh et al., 2019). A regression matrix was built with the selected covariates, and this allowed extraction of the covariate values at the coordinates of each sampling point (Massawe et al., 2018). The dataset was stratified into eight strata according to the source of the studies described in Section 2.2.

    View all citing articles on Scopus
    View full text