Abstract

Soil spectral libraries (SSLs) are important big-data archives (spectra associated with soil properties) that are analyzed via machine-learning algorithms to estimate soil attributes. Since different spectral measurement protocols are applied when constructing SSLs, it is necessary to examine harmonization techniques to merge the data. In recent years, several techniques for harmonization have been proposed, among which the internal soil standard (ISS) protocol is the most largely applied and has demonstrated its capacity to rectify systematic effects during spectral measurements. Here, we postulate that a spectral transfer function (TF) can be extracted between existing (old) SSLs if a subset of samples from two (or more) different SSLs are remeasured using the ISS protocol. A machine-learning TF strategy was developed, assembling random forest (RF) spectral-based models to predict the ISS spectral condition using soil samples from two existing SSLs. These SSLs had already been measured using different protocols without any ISS treatment the Brazilian (BSSL, generated in 2019) and the European (LUCAS, generated in 2009–2012) SSLs. To verify the TF’s ability to improve the spectral assessment of soil attributes after harmonizing the different SSLs’ protocols, RF spectral-based models for estimating organic carbon (OC) in soil were developed. The results showed high spectral similarities between the ISS and the ISS–TF spectral observations, indicating that post-ISS rectification is possible. Furthermore, after merging the SSLs with the TFs, the spectral-based assessment of OC was considerably improved, from R2 = 0.61, RMSE (g/kg) = 12.46 to R2 = 0.69, RMSE (g/kg) = 11.13. Given our results, this paper enhances the importance of soil spectroscopy by contributing to analyses in remote sensing, soil surveys, and digital soil mapping.

1. Introduction

The importance of soil spectroscopy is expressed in the increasing number of extensive soil spectral libraries (SSLs) that are being generated worldwide [1]. The SSLs are utilized mainly to perform proxy estimations of soil properties by modeling the interaction between soil chromophores (i.e., minerals, organic matter (OM), and water) and their spectral responses [2]. Spectral features at wavelengths below 1000 nm usually result from electronic processes in iron oxides and OM; the well-defined absorption features around 1400 and 1900 nm are due to overtones of OH in minerals and water, and the absorption peak near 2200 nm is attributed to OH-combination modes in clay minerals and overtones of CO3- in carbonates around 2315–2340 nm. Where there is high organic carbon (OC) content, some absorption features across 2200–2500 nm are related to various organic groups [3, 4].

The SSLs are soil databases that mostly consist of reflectance spectra acquired by routine spectrometry using one protocol at each specific laboratory, accompanied by attributes of the soil samples measured using wet chemistry methods with agreed-upon protocols [2, 5]. SSLs have been established at local, regional, and even global scales. One well-established example is the land use/land cover area frame survey (LUCAS) [6] dataset, an extensive topsoil survey carried out across the European Union. The idea behind its development was to derive policy-relevant statistics on the effects of land management on soil characteristics. Soil spectra of approximately 20,000 topsoil samples were acquired in the range of 400–2500 nm by a steady well-defined protocol, and the chemical analyses were performed in a single certified laboratory to avoid protocol uncertainty [79]. Another example is the Brazilian SSL (BSSL), which has around 30,000 spectra through partnerships at the national scale [10]. Whereas these examples can be considered regional SSLs, Viscarra Rossel et al. [1] provided the first example of a global SSL composed of 23,631 soil spectra contributed by 35 independent soil laboratories worldwide. To harmonize the global SSL (collected from SSLs using different protocols), Viscarra Rossel et al. [1] rectified noise effects using wavelet transformation, following the approach of Viscarra Rossel and Lark [11] to improve the performance of spectral-based models for soil properties characterization. The first online spectral service was recently published by Demattê et al. [12], who provided a user-friendly system for global soil spectra communication based on the BSSL and tested it with the spectra from 65 countries.

The accuracies of the harmonization platform of Demattê et al. [12] varied in the OC and texture predictions, as some examination cases showed good accuracies and others did not. Indeed, mathematical manipulations can contribute to relating SSLs from different laboratories, but it is still necessary to harmonize the initial “raw” data (non-preprocessed reflectance spectra) of the SSLs. Thus, the use of samples with different protocols (global SSL approach based on diverse and heterogeneous data) may lead to inaccuracies in the spectral-based modeling of soil attributes. This is because soil spectral information is susceptible to systematic measurement effects (i.e., the performance of sensors, the efficiency of optical fibers, and the illumination source) and nonsystematic effects (temperature, relative humidity, equipment operator, etc.) [13, 14].

Because SSLs vary due to these effects, recent studies have suggested using the internal soil standard (ISS) approach, where a well-agreed-upon soil sample is distributed across all laboratories and its reflectance spectrum, which is measured in a motherhood spectrometer, is used to rectify local measurements.

Accordingly, it is crucial to focus on agreed-upon standards and protocols for evolving SSLs [15]. This idea was first initiated by Pimstein et al. [16] and then polished by Ben-Dor et al. [15], who proposed two sandy samples from southern Australia to be considered as ISS samples. These samples were named after their locations of origin as Lucky Bay (LB) (33°59′ S; 122°13′E), 99% quartz, and Wylie Bay (WB) (33°49′S; 121°59′E), 90% quartz and 10% aragonite. Nowadays, many users recommend the ISS approach, and the review of SSLs from FAO GLOSOLAN reports that 61% of the world’s laboratories that answered the review are using the ISS procedure with the LB sample (https://www.fao.org/global-soil-partnership/glosolan/en/). Nonetheless, as many SSLs have already been established and measured without using the ISS approach, the discussed uncertainties within and between laboratories have not been corrected. Accordingly, as shown by Francos et al. [17], who examined the effect of the ISS in the spectral assessment of the clay content, there is a remaining issue with several existing large SSLs that cannot be adequately harmonized and merged with other SSLs. Nevertheless, Francos et al. [17] suggested a positive effect of the ISS protocol when it is used in the calibration and validation stages, as well as when executed on external unknown samples.

The possibility of merging two datasets from different origins has recently been examined in the mid-infrared (MIR) spectral region for OC content. Dangal and Sanderman [18] and Sanderman et al. [19] showed that applying a calibration transfer function (TF) to the SSLs in question enabled the merging of two SSLs from readings obtained by two different spectrometers. Moreover, Pittaki-Chrysodonta et al. [20] recently suggested that the spectral TF idea could be more efficient at minimizing the root mean square error (RMSE) in the spectral assessment of soil properties. It is interesting to note that Francos and Ben-Dor [21] presented a TF concept based on a random forest (RF) [22] algorithm to predict soil surface reflectance in the field using laboratory spectral measurements of Mediterranean soils from different countries (Italy, Israel, and Greece). Although these recent studies demonstrate the importance of the spectral TF process, it has only been applied to the MIR spectral region; it has not yet systematically been applied to the visible–near-infrared–shortwave infrared (VIS–NIR–SWIR) region (400–2500 nm). On the other hand, Seybold et al. [23] showed that using the same soil type (Mollisol in this case) enables the merging of two soil populations measured with different protocols using neither TF nor the ISS ideas.

The merging of SSL data from different sources is a hot topic because the soil science community has recognized the importance of SSLs in providing a global spatial view of selected soil properties [1]. Finding a way to postrectify existing SSLs to the ISS protocol may be another solution, and this can be done by selecting a group of samples from the SSLs and remeasuring them using the ISS protocol. The purpose of our study is to propose a method for post-ISS rectification using the TF concept to efficiently exploit the tremendous work that has already been done on SSLs worldwide and harmonize them accordingly.

In this study, we followed the approach of Francos and Ben-Dor [21], who demonstrated that it is possible to extract a TF between laboratory and field SSLs using RF. To that end, we developed a TF between selected samples from old, existing SSLs that were subjected to the ISS (LB) approach to rectify the entire old SSL. The idea of the present study was to check this approach’s ability to provide a fair harmonization of the already generated (old) SSLs using a TF developed with a subset of samples. A fair result would suggest that a TF based on the ISS could be a potential way to harmonize and merge the different SSLs that were already constructed with different methodologies. We thus proposed a systematic approach to extracting optimal harmonization based on the following assumptions:(i)All SSLs in question were measured under a strict and stable protocol (although different and without the ISS procedure)(ii)It is not practical to remeasure these SSLs with the ISS protocol (due to their large sizes)(iii)Remeasuring a selected group of samples from these SSLs with the ISS protocol could provide a TF between the new ISS measurements and the old–original spectral measurements(iv)The TF could be applied to all of the existing (old) SSL samples to rectify them, thereby improving SSL harmonization

2. Materials and Methods

2.1. Materials

For this study, we used two existing (old–original) large SSLs datasets of LUCAS (sampling campaign of 2009–2012; [6] and BSSL updated in 2020; [10] consisting of 19,036 and 29,363 samples, respectively. The first stage of this exercise was to set up an examination study that used 201 samples selected from both the LUCAS SSL (n = 124) [6] and the BSSL (n = 77) [10] to remeasure spectrally with and without the ISS protocol [15]. These samples were subjected to a new spectral measurement using the ISS protocol with LB as the ISS [15] at the University of Louvain-la-Neuve (for the LUCAS SSL) and at the University of São Paulo (for the BSSL). This manuscript refers to these samples as the “rectification group,” i.e., they were used to develop a spectral TF to predict the ISS rectification status and harmonize later the big SSLs that have been constructed independently in the past (old—original).

2.2. Chemical and Spectral Measurements

The soil attribute selected to carry out the analytical stage was OC. In both SSLs (the BSSL and the LUCAS), the soil OC content was measured using the same dry combustion method (ISO 10694:1995).

The old–original samples and the rectification group (new measurements) of the BSSL represent Brazilian soils and were measured using an analytical spectral devices (ASD) FieldSpec® (model FSP 350-2500P) with Halon (Spectralon, Labsphere Inc., North Sutton, NH, USA) as the white reference to extract the spectral reflectance. The spectroradiometer has the following three detectors that supply 2151 bands in the 350–2500 nm spectral range: VIS–NIR (350–1000 nm), SWIR1 (1001–1800 nm), and SWIR2 (1801–2500 nm), with a spectral resolution of 3 nm in the VIS–NIR and 8 nm in the SWIR (1 + 2) spectral regions. Each soil sample was measured three times, with 25 scans acquired per spectral measurement.

The old–original samples of the LUCAS SSL differed from those of the BSSL in that they represent only European soils and because they were measured with a different protocol [16, 24]. The old–original samples and the rectification group (new measurements) of the LUCAS SSL were measured using a FOSS XDS spectrometer with a 0.5 nm sampling interval, composed of a dual detector system for silicon (400–1100 nm) and lead sulfide (1100–2500 nm). The output spectrum of the FOSS XDS spectrometer contains 4202 spectral bands. Every spectral measurement was a product of the average of two spectral readings in which two scans were obtained for each spectral measurement. The measurements of the LUCAS SSL were performed following the FOSS XDS manual. As the LUCAS SSL is composed of spectral absorbance measurements, it was converted to reflectance before preprocessing using the following equation:where R represents the reflectance value and A is the apparent absorbance.

The new spectral measurements of the LUCAS SSL and BSSL (rectification groups) were conducted using the following two methods: (i) following the old–original protocol without ISS and (ii) according to the standard and protocols suggested by Ben-Dor et al. [15], using the LB ISS to align the soil spectral readings to a benchmark, thereby, minimizing systematic errors within the spectral measurement scheme.

2.3. Operational Methods and Workflow

The rectification group samples of the BSSL (new and old–original) and the LUCAS SSL (new and old–original) were subset to a 10-nm sampling interval as done by different scholars [1, 25] to enable (later) merging the SSLs, reduce processing time, and avoid multicollinearity between neighboring wavelengths. Then, TFs were developed and examined against the new group rectified by ISS in the following two ways:(i)New rectification group: created using the new spectral measurements of the rectification groups without ISS rectification as input data (independent variables), and the new spectral measurements of the rectification groups after the ISS rectification per wavelength as output data (dependent variables). We refer to these TFs as TFs1.(ii)Old–original rectification group: created using the old–original spectral measurements of the rectification groups without ISS rectification as input data (independent variables) and the new spectral measurements of the rectification groups after the ISS rectification per wavelength as output data (dependent variables). We refer to these TFs as TFs2.

Later, the developed TFs were applied to all of the samples of the original SSLs that underwent a similar resampling procedure to 10 nm. Figure 1 provides a schematic outline of the stages performed to rectify the SSLs using TFs1 and TFs2. Note that each TF (TFs1 and TFs2) is composed of two different independent TFs that were developed for each SSL separately (LUCAS and BSSL). Both TFs aim to predict the ISS rectification of the new measurements in the rectification groups.

The old measurements of the rectification group refer to the old–original measurements of the whole SSLs in their respective sampling campaigns. For the rectification samples, the old–original measurements represent the LUCAS SSL 2009–2012 and the BSSL 2019 sampling campaign measurements of exactly the same samples from the rectification group that underwent the new measurements.

On the other hand, the new measurements refer to a new dataset that was created with a selected group of samples from both SSLs (LUCAS and BSSL). These new measurements cannot be considered an integral part of the original SSLs and were performed by acquiring LB soil measurements to later apply the ISS protocol. Thus, from these new measurements, it is possible to extract two outputs: one with ISS and one without ISS; the latter can also be termed “raw measurements” as they were not ISS-rectified.

Table 1 provides a list of the examined datasets, including their descriptions and a reference number. For all scenarios described in Table 1, spectral-based models were developed to predict OC content and used to evaluate the TF approach’s influence on the OC spectral assessment. In Sections 2.3.12.3.3, we introduce the different scenarios under which the effect of the TF concept was examined, and in Section 2.4, the spectral modeling of OC is addressed.

2.3.1. Rectification Samples: The New Group (Datasets 1, 2, and 3)

The rectification samples of the BSSL (n = 77) and the LUCAS SSL (n = 124) were subjected to new measurements and were corrected to the motherhood LB measurement at the CSIRO Perth laboratory (“new group”) (see [15] for the exact procedure). The new group was used to generate the TFs1 for each SSL, providing samples that underwent the ISS protocol as the target.

After these new spectral measurements, the ISS spectrum was predicted by a “loop,” in which the number of models is equal to the number of bands to be used. Thus, this concept takes advantage of the whole spectral range of each soil sample to predict every rectified band. Finally, the correction factor was calculated against the master LB soil to recalibrate each spectral measurement. The RF algorithm [22] was used in each prediction model for this task. Thus, three datasets were examined at this stage: one with ISS correction (number 1 in Table 1), one without TF and without ISS correction (number 2 in Table 1), and one after applying TFs1 corrections, which were developed using the new spectral measurements as predictors (number 3 in Table 1).

2.3.2. Rectification Samples: Old–Original Group (Datasets 4, 5, and 6)

In Table 1, we refer to the old–original group of the rectification samples before any treatment as number 4. To study the feasibility of a post-ISS rectification process of SSLs created in the past, the TFs generated with the new group (TFs1) were executed on the old–original group. In Table 1, we refer to the product of this analysis as number 5. Then, new TFs (TFs2) were developed using the old group without ISS as predictors to estimate the ISS condition of the new group. The product of this analysis is referred to as number 6 in Table 1.

2.3.3. Old-Original SSLs (Datasets 7, 8, and 9)

At this stage, we executed the TFs, which were developed using the new and old–original rectification groups (TFs1 and TFs2, respectively), on the old–original SSLs. As the OC content of the rectification group ranged between 0 and 180 g/kg, we selected all of the old–original SSL samples whose OC content was lower than 180 g/kg. After this stage, we ended with 47,224 samples (out of 48,399 samples in both SSLs in total; 17,861 for LUCAS and 29,363 for BSSL). Initially, the LUCAS SSL and the BSSL contained 19,036 and 29,363 samples, respectively. Accordingly, none of these SSLs were significantly diluted by this procedure.

In Table 1, we refer to this group before executing any TF as number 7. The output dataset after execution of the TFs1 generated using the new group is referred to as number 8 in Table 1, and the result of executing the TFs2 developed using the old–original group is referred to as number 9.

2.4. Spectral Assessment of OC

After executing TFs1 and TFs2 with the rectification group (including the new and old–original measurements) and the original entire SSLs, all merged datasets were randomly grouped into separate calibration (cal) and validation (val) groups with similar distributions of OC (g/kg) using the Kolmogorov−Smirnov (KS) test, whereas a high degree of similarity was obtained () [26]. For the cal/val random sampling of the rectification group, the same 180 (cal) and 21 (val) samples were used to examine all of the scenarios, representing 90% for calibration and 10% for validation with similar OC (g/kg) content distributions ( according to the KS test). The same procedures were applied for the original SSLs, where 42,501 samples were used for calibration and 4,723 for validation, using the TFs1 and TFs2 generated from the new measurements of the rectification group as well as without any treatment (simple merging of the old–original SSLs). The performance of the TF correction was further examined by judging the accuracy of the spectral-based models for estimating OC content in the validation stages. The RMSE, ratio of performance to interquartile distance (RPIQ) [27], and R2 values were calculated to compare the scenarios. To generate the spectral-based models of OC, the RF algorithm (Breiman, 2001) was used.

The methodological phases of this study are summarized in the flowcharts in Figure 2. The first flowchart (Figure 2(a)) shows the steps taken to generate the TFs and predict the OC with the rectification groups. Then, Figure 2(b) presents the steps for modeling OC content using the original SSLs after executing the TFs.

2.5. TF Validation by Spectral Similarity

In addition to the OC proximal modeling analysis, we examined the spectral similarities between the different treatments against the measurements performed with the ISS protocol in the rectification groups. To validate the spectral similarity between the old–original and rectified (no ISS, ISS, and after TF) spectra, two methods were examined: the average sum of deviations squared (ASDS) [28] and the spectral angle mapper (SAM) [29]. For both indicators, lower values indicate higher spectral similarity. The ASDS equations are as follows:

And the SAM values are calculated as follows:where Rtn is the examined reflectance spectrum, Rtrn is the reference spectrum of the same target, Rrn is the ratio between the examined, and reference spectra, and n is the number of wavelengths used.

3. Results

3.1. Rectification Samples: The New Group

For the new measurements, we examined two scenarios as follows:(i)Corrected (ISS) vs. noncorrected samples(ii)Corrected (ISS) vs. TFs1-corrected samples

Figure 3 shows six representative cases that were randomly selected to illustrate the performance of the TFs1 executions in the rectification group. Samples 0, 5, and 13 were selected from the BSSL, and samples 85, 100, and 83 from the LUCAS SSL; these samples were also used to check the TFs2’s performance later in the different examined scenarios.

The TFs1 spectra in red, the corrected (ISS) spectra in blue, and the measured spectra before the ISS rectification in black are provided. The ISS-corrected spectral signatures are similar to the TFs1 ones, indicating that the TFs1 worked quite well for producing spectral signatures that are highly similar to the ISS-rectified spectra.

To better judge the TFs1 against the basic spectra of all groups’ samples (before any transformation), we calculated two metrics: SAM and ASDS (shown in eqs. 2, 3 and 4), which were obtained before and after executing the TFs generated with the new measurements of the rectification groups are presented in Figure 4 and Table 2. Whereas Figure 4 shows the distributions of the ASDS and SAM values, Table 2 highlights the means of the SAM and ASDS values. In Figure 4 and Table 2, the relationship between the corrected (ISS) and TFs1 spectra provided considerably lower ASDS and SAM values, thus indicating that the TFs’ contribution to each of the (new) SSLs is considerable. After merging the TF-corrected soils from both SSLs (BSSL and LUCAS), lower ASDS and SAM values were obtained in the “corrected vs. TF” relationship.

3.2. Rectification Samples: The Old–Original Group

Figure 5 illustrates some corrections of the TFs1 (trained with new measurements) after their execution on the old–original measurements of the rectification samples. Since these spectral measurements did not undergo the direct ISS protocol but were subjected to TFs1 for a post-ISS rectification, these spectral measurements were not considered a part of the TFs1 calibration; they are presented in Figure 5 just to show how TFs1 regenerates the post-ISS-rectification. The TFs1 rectification yielded reasonable spectral signatures that were very similar to direct ISS rectification, with the spectra retaining their original shape and no “new” spectral features encountered.

Given these results, we decided to generate a new TF for each SSL (termed TFs2), in which the predictors were selected from the old–original spectral measurements of the rectification groups to obtain the hypothetical ISS rectification. At this stage, we compared the predicted ISS rectification using the old measurements to the new measurements that underwent the ISS protocol. Figure 6 and Table 3 illustrate the relationship between the new measurements corrected using the ISS protocol and the old–original measurements subjected to the TFs2. This relationship provided higher ASDS and SAM values than the other relationship that compared the new measurements before and after the execution of the TFs1. Nevertheless, these results again indicated the positive effect of the TFs in each SSL, even using the old–original samples for the training stage of the TFs2, at least when these TFs are applied to the same old–original measurements.

Figure 7 shows the same six cases presented in Figures 4 and 5, but after executing TFs2. In general, the new spectral measurements that were corrected using the ISS protocol (blue) are quite similar to the old–original measurements that underwent TFs2 (red), showing that effective TFs could also be generated using the old–original spectral measurements as predictors. In general, after the execution of TFs2, the samples of the LUCAS SSL presented a substantial correction and a better match with the original ISS-corrected samples. This correction was less significant for the BSSL, and therefore, in one representative case (Sample 13 (BSSL)), the TF correction worked poorly. This demonstrates that outliers in the TF procedure can occur, especially when both the original and ISS-corrected spectra are similar.

As TFs2 was created using measurements that were not precisely derived from the exact measurements as done in the calibration stage of TFs1, TFs2 may present lower performance.

Figure 8 illustrates the results of the spectral-based models for OC content in their validation stage (n = 21) using the new and old–original spectral measurements of the rectification group. It also shows the new group after the ISS correction and the new and old–original groups after the execution of TFs1. It is noted that the execution of TFs1 in the new measurements of the rectification group, provides higher accuracies (RPIQ = 1.25, R2 = 0.76, RMSE (g/kg) = 13.07) than the model created with the same new measurements without executing any manipulation (RPIQ = 0.93, R2 = 0.58, RMSE (g/kg) = 17.46). As expected, the new measurements that underwent the ISS protocol provided the highest accuracies (RPIQ = 1.31, R2 = 0.79, RMSE (g/kg) = 12.44).

Additionally, Figure 8 indicates the OC contents predicted by the spectral-based model in its validation stage (n = 21), using the old–original measurements after the execution of TFs1 and TFs2 as predictors. By comparing the performances of TFs1 and TFs2 against the old–original measurements that did not undergo the ISS rectification (Figure 8), we can also see a considerable improvement where the nonmanipulated measurements showed the poorest accuracies (from RPIQ = 0.92, R2 = 0.55, RMSE (g/kg) = 17.78) followed by TFs1 (RPIQ = 1.11, R2 = 0.71, RMSE (g/kg) = 14.65), and TFs2 (RPIQ = 1.18, R2 = 0.75, RMSE (g/kg) = 13.77).

Thus, the results presented in Figure 8 can be summarized by the following two marked patterns:(i)New measurements: R2beforeTFs1 < R2TFs1 < R2ISS(ii)Old measurements: R2before TFs (1 and 2) < R2TFs1 < R2TFs2

3.3. Original SSLs

At this stage, the TFs (TFs1 and TFs2) generated for each SSL (LUCAS and BSSL) using the rectification groups were executed on the original SSLs that had been measured in the past with OC content lower than 180 g/kg. Figure 9 shows the validation results of the spectral-based models created using the merged SSLs before and after applying the TFs, where TFs1 provided the best accuracies (RPIQ = 1.07, R2 = 0.69, RMSE (g/kg) = 11.13) followed by the non-TF correction (RPIQ = 0.96, R2 = 0.61, RMSE (g/kg) = 12.46), and then by TFs2 (RPIQ = 0.73, R2 = 0.33, RMSE (g/kg) = 16.21) that presented the poor accuracies.

3.4. Spectral Similarity Analysis between the Old–Original and New Raw Spectral Measurements

Figure 10 shows the R2 and RMSE values per wavelength for the relationship between the old–original vs. new measurements of the rectification samples. At this stage, both SSLs showed very high spectral similarity.

However, as it can be seen from the R2 and RMSE values, the samples of the LUCAS SSL showed higher spectral similarity than those of the BSSL, suggesting that the FOSS spectrometer and the LUCAS SSL protocol are stable over time. This observation is further confirmed in Figure 11, which presents the spectral signatures of six examples and strengthens the high stability of the FOSS spectrometer in space and time.

These results indicated that in some cases, nonsystematic and systematic effects in the spectral measurements or sample preparation can be detrimental to the spectral signatures, as a high spectral similarity shape was observed in the LUCAS SSL, whereas in the BSSL, the spectral similarity shape was a little less similar when comparing the old–original measurements to the new ones. This strengthens the significant need of using the ISS procedure to merge data from different laboratories, as they are not preserving the same quality. Still is necessary to mention that high spectral similitude was encountered in the new vs. old–original comparison of the BSSL measurements.

4. Discussion

Today, many users strongly recommend using the ISS approach; 61% of the world’s laboratories are using the ISS procedure with LB, as reported by Fenny van Egmond in the GLOSOLAN review (https://www.fao.org/global-soil-partnership/glosolan/en/). The ISS approach works well for the new SSLs that are being generated. Users have recognized this issue and have recently established a working group (the P4005, https://sagroups.ieee.org/4005/) within the Institute of Electrical and Electronics Engineers Standards Association (IEEE SA) that aims to provide an ISO protocol for soil spectral measurements. Although SSLs of different origins have already been harmonized using mathematical manipulations [1], this evaluation has never been done to regenerate the laboratory protocol.

Following the approach of Francos and Ben-Dor [21], who extracted a TF between the laboratory and field levels to improve the spectral assessment of a soil surface-dependent property (water infiltration rate), the present study developed TFs between selected samples from old existing SSLs (LUCAS and BSSL) that underwent the ISS (LB) protocol.

Post-ISS rectification of old–original SSLs is possible, enabling the merging of SSLs from different origins. Although the accuracy is not equal to that obtained by a direct ISS measurement, the TF concept does improve the OC predictions in merged SSLs, as was shown with the rectification groups. The results obtained from the quality indicators (ASDS and SAM) showed high spectral similarities between the ISS and ISS–TF spectral observations for both TFs1 and TFs2 (shown in Figures 4 and 6, respectively).

Machine-learning TFs were developed using rectification groups that were remeasured with and without ISS. Then, the extracted TFs were applied to the original SSLs, and improvement in OC prediction was noticeable in a validation group that examined the performance of the merged SSLs before and after TF rectification. Nevertheless, this improvement was manifested only with TFs1; TFs2 provided unsatisfactory OC predictions (shown in Figure 9).

A reasonable explanation could be the existence of a more direct relationship between the new measurements before the ISS and after employing the ISS, as within this relationship, the TFs1 is trained by exactly the same samples, maintaining the same conditions before and after the ISS rectification. On the other hand, as TFs2 were trained using old measurements as predictors, the factors (systematic effects) that affected the previous spectral measurements would not be perfectly matched by the new measurements after the ISS correction. Thus, TFs2 could improve the OC assessment when it was executed on the old measurements of the rectification group, but when it was executed on all of the samples of the original SSLs, it failed. To interpret this explanation from a different point of view, we can consider two simple comparisons as follows:(a)New (No ISS) vs. new (ISS)(b)Old (No ISS) vs. new (ISS)

Certainly, we know that in relationship (a), the same spectral measurement is used for each rectification sample, whereas one of them has undergone the traditional ISS correction. However, this relationship is much more direct in relationship (a) than in relationship (b), because the old (no ISS) measurements were performed years ago by different users (with different experiences and skills) and under different maintenance conditions for the spectrometer used. Thus, a TF created with new measurements of a selected rectification group generates a more direct transition between the raw and ISS spectra and may perform better for samples that are not part of the rectification group. On the other hand, in the case of TFs2, this transition can represent only the rectification group and may not be applicable to the whole dataset. Although the ISS rectification may be a suitable tool to harmonize SSLs, as suggested by Francos et al. [17], it is probably not perfect, as it still contains part of the raw spectral information, providing a better relationship in comparison (a).

As shown in Figure 9, the improvement of the prediction capability in the merged SSLs based on TFs1 transformation (from R2 = 0.61 to R2 = 0.69) relative to TFs2 (from R2 = 0.61 to R2 = 0.33), demonstrates that the TF approach might also deteriorate the harmonization results and therefore must be applied with caution. Accordingly, it is still recommended to apply the two TF approaches to the original SSLs, where the TFs with the best performances should be selected.

It is important to mention that this study used just a portion of two important and large SSLs for the rectification groups. We believe that in a bigger rectification group, the calculation of optimal TFs would have even been better. Finding the threshold for the optimal number of samples in the rectification group is a topic for future studies that should focus on the following two issues:(i)It is not practical to remeasure entire large-SSLs(ii)An optimal number of samples in the rectification group should be used for practical purposes

Certainly, the experiments performed in our study showed that it is better to use the same measurements (TFs1) that have undergone ISS rectification in their previous condition (without preprocessing) as predictors for TFs. Therefore, the results obtained in this study reaffirm the observations of Dangal and Sanderman [18] and Sanderman et al. [19], who suggested that it is possible to merge SSLs that are measured with different protocols by using TFs.

We can conclude that post-ISS rectification is possible, where a selected group of samples (rectification group) is measured again with and without the ISS protocol to then generate a TF. With the TF procedure, we can expect a considerable improvement in the spectral-based OC content assessment. To summarize, the TFs1 exercise presents a practical case for post-ISS rectification of existing SSL samples, but a similar exercise with different soil attributes (e.g., total nitrogen or textural parameters such as sand, silt, and clay), different rectification groups, and SSLs, along with an examination of the stability of the soil constituent in question, is highly recommended to recheck the TFs2 variation in future works.

However, caution must be taken when transferring SSL protocols with TFs. If one SSL (e.g., Brazilian SSL) is transferred to regenerate the protocol used by a different SSL (e.g., LUCAS), we believe that poor results will be obtained. Although this was not examined in the present study, we assume that such an approach will fail as systematic effects during the spectral measurements will not be correctly rectified. Of course, this will depend on the standards and protocols selected for each SSL and requires further investigation.

5. Summary and Conclusions

In this exercise, random forests (RF)-based transfer functions (TFs) were developed to rectify old SSLs with the ISS protocol, toward harmonizing the old SSLs without remeasuring all of their samples, the latter being impractical due to the SSLs’ large size. To this end, two TF scenarios were used: TFs1 calibrated with new nonrectified samples and TFs2 calibrated with old nonrectified samples. In our exercise, TFs1 considerably improved the prediction of soil OC content in merging old SSLs compared to TFs2, which provided the least promising prediction. However, we recommend that in all cases, both TFs1 and TFs2 be run in any post-ISS correction of old SSLs, and the best transformation be selected relative to the original noncorrected harmonized SSLs. More studies are required to determine the optimal number of samples to select for the rectification group that will best represent the SSLs without unduly increasing the workload. We strongly suggest that the ISS procedure be considered as part of any laboratory measurement to track possible systematic effects in generating SSLs. Precaution must also be taken with respect to the stability of the wet-chemical analysis, which also impacts SSLs within the elapsed time; this was beyond the scope of this paper and warrants a separate study.

Data Availability

The data used to support the findings of this study have not been made available.

Additional Points

(i) Two SSLs created using different protocols and instrumentation were harmonized. (ii) The ISS protocol was used to merge the SSLs and improve soil OC assessment. (iii) The harmonization improved soil OC assessment using two different SSLs. (iv) A transfer function to harmonize SSLs with different protocols was developed.

Conflicts of Interest

The authors declare that they have no conflicts of interest.

Acknowledgments

This work has been partially funded by the “WORLD SOILS” project supported by the European Space Agency developed within the EO Science for Society slice of the 5th Earth Observation Envelope and by the ProbeField project in the framework of the H2020 European Joint Programme for SOIL.