By Annemarie van Groenestijn
Summary
A new routine is needed that facilitates accurate, rapid en relatively inexpensive soil property monitoring at a large scale. VNIR based soil property modelling is identified as a great promise for this new routine. However, a major drawback is that VNIR based soil property modelling remains to have a poor model robustness, in other words, requires local calibration. If soil samples are predicted using a model calibrated on soils other than the samples in question, prediction accuracies remain low. This makes the use of VNIR based soil property prediction modelling for rapid, accurate monitoring of soil properties at a large scale limited. It is suggested several times in literature to solve this poor robustness by enlarging the calibration (training) set by creating spectral libraries containing enormous amounts of samples. However, this does not give insight in the nature of the interactions between soil properties and the influence of these interactions on the soil properties’ absorption features. Therefore, this research aims at determining causes of poor model robustness of VNIR based soil property prediction models. The actual model performance is of less relevance, since this research is done with the purpose to obtain more insight in the processes and factors that influence poor model robustness. For this research a large dataset is available, covering a wide range in soil types and soil properties. PLSR modelling is done to create soil property prediction models, investigate model robustness and determine the position (nm) of absorption features of Nitrogentotal, Nitrogen Soluble, Magnesium, Soil Organic Matter, Sodium and Chloride.
Besides indentifying causes of poor model robustness, the actual model robustness of different models created for different calibration and validation sets is investigated. It appears that soil property models are not robust when concentrations of the soil property in question not included in the training set are predicted. This is widely recognized in literature. However this research also proved that building spectral libraries spanning the range of possibilities for soil compositions does not solve this poor model robustness. This conclusion is quite controversial since many researchers claim to use spectral libraries as a solution for poor model robustness.
In this research 4 different causes of poor model robustness are identified. The first cause is found by concluding that the dataset used does not fit completely in the analytical framework made for multivariate regression.
The second cause of poor model robustness is found in the field of data pre-processing. It is concluded that optimal pre-processing settings are site-specific and dependent on the spectral range used for modelling purposes.
The third cause of poor model robustness is identified in the field of soil property interrelations and overlapping positions (nm) of absorption features. The new, large scale, soil monitoring routine needed should be able to extrapolate models to soil samples having a different composition than those samples used during calibration. To achieve this, spectroscopy based soil property modeling requires fixed positions (nm) of absorption features of soil properties. However, it is proven that the positions (nm) of absorption features are strongly site specific causing great difficulties in extrapolating models to other prediction sets. More research is needed on this, especially since this conclusion would mean that robust modelling of soil properties using VNIR based spectroscopy would be hard to achieve.
The fourth cause of poor model robustness is identified in the field of internal shading. It is investigated if classifying soil samples by texture reduces the grain size variation within a training or prediction set and thereby the variability in reflectance due to internal shading. It is proven that soil property prediction models of spectral sets that contain reflectance data of soil samples of different textures have a lower robustness than spectral sets that contain reflectance values of only sandy textures. Models calibrated for clay soils did not show a better robustness than models calibrated on sets having a mix of textures.
Based on these findings, solutions for improving model robustness were investigated. A solution might be to create stratified calibration sets each having a certain range of a soil property. By using these calibration sets for modeling, the influence of outliers and possible non-linear behaving soil property interrelations can be diminished. This is investigated for SOM, which showed promising results. However, this is only useful if the soil property range could be estimated beforehand by some indicators to be able to know in which range the concentration of the soil property falls and thus which model is needed. Therefore, it is tried to find logical relations between ranges of the soil property in question and the texture of the sample, or with the geographical patterns of the samples falling in that range. No relation could be found between texture and the concentrations of soil properties or the geographical pattern. Except for NaCl, where high values were located in dune areas or near the seashore.
The overall conclusion is that VNIR modelling is a promising tool for monitoring soil properties. However, robust modelling is a key-requirement for creating widely applicable, accurate and relatively inexpensive soil property models. For the use in real life situations the accuracy and most of all the robustness still has to improve. The possibilities of this latter are there and provided by this research in solving or overcoming the causes of poor model robustness identified here. However, some general characteristics of soil properties like shifts in the positions (nm) of absorption features are hard to overcome.Further research is necessary on this. The circle size displays the total soil property range of the set in question.