Abstract

Soluble solids content (SSC) is a vital evaluation index for the internal quality of apples, and NIR spectroscopy is the preferred technique for predicting the SSC of apples. Due to the differences in fruits’ sizes, their SSC prediction models have poor robustness and low prediction accuracy, so it is important to eliminate the effects brought by the differences in fruit sizes to improve the accuracy of fruit sorting models. The NIR spectra of apples with different fruit sizes were collected by applying NIR spectroscopy online detection device, and after various preprocessing of the spectra, the partial least squares (PLS) models of apple SSC were established, respectively, and then the modeling set in the apple fruit size group of 75 mm–85 mm was used to predict the prediction set samples in the apple fruit size group of 65 mm–75 mm and 85 mm–95 mm, respectively. To better address the effects of apple size differences, data fusion techniques were used to perform an intermediate fusion of apple fruit diameter and spectra, firstly, the competitive adaptive reweighting algorithm (CARS) and the continuous projection algorithm (SPA) were used to select spectral variables and build their prediction models for apple SSC, respectively, and the results showed that the models built with 61 spectral variables selected by CARS had better performance, greatly reduced the amount of data involved in modeling, effectively simplified the model, and improved the stability of the model. The apple size variables were added to the wavelength variables selected by CARS, and the data were normalized to establish a PLS model of apple SSC with the normalized spectral and apple fruit diameter data, and the results showed that the size compensation model based on intermediate fusion had the best prediction performance, with the prediction set Rp of 0.886 for fruit diameter of 65 mm–75 mm, RMSEP of 0.536%, and its prediction set Rp was 0.913 and RMSEP was 0.497% for the fruit diameter of 85 mm–95 mm. Therefore, adding the fruit diameter variable to establish the size-compensated model of apple SSC can improve the prediction performance of the model.

1. Introduction

Apples are rich in many vitamins and acids inside, and eating more apples can relieve fatigue and improve brain vitality. Due to the increasing standard of living, the demand for high-quality apples is also increasing. Near-infrared spectroscopy online inspection technology has been applied to detect the internal quality of fruits such as apples, strawberries, citrus, pears, and watermelons as a fast, nondestructive, and green inspection technology [14]. The difference in fruit diameter of apples affects the performance of the established SSC model. Therefore, selecting a size-compensated model of apple SSC is necessary.

Scholars at home and abroad have done a lot of research on the internal quality of fruits by applying NIR spectroscopy. Guo et al. [5] built an online detection system for apple heart rot using NIR transmission, and the correlation coefficient of the prediction model they built was 0.92. Liu et al. [6] created a NIR diffuse reflection online detection model for SSC of navel orange, and its prediction correlation coefficient was 0.90. Li et al. [7] built an apple online nondestructive testing equipment using NIR spectroscopy and established a prediction model for the SSC content of apple, whose correlation coefficient reached 0.949 and the root mean square error of the prediction set was 0.449. Han et al. [8, 9] used NIR transmission spectroscopy combined with a band screening method to discriminate two diseases of apple, and their discriminant model accuracy reached 95.7%. Xu et al. [10] studied compared the effect of single- and double-point detection on the accuracy of online detection of apple SSC. The double-branched fiber system proved excellent robustness, while the single-branched fiber proved perfect accuracy with a prediction set coefficient of 0.63. The studies conducted by the above et al. did not consider the effect of apple fruit diameter on the model, and the performance of the established model was low [7, 8, 10]. Liu et al. [11] established the NIR spectrum detection model for navel oranges of different sizes and found through the study that the use of MSC and SNV pretreatment can solve the influence of apple size differences and improve the accuracy of the prediction model. Tian et al. [12] established a discriminative model for apple core mold of different sizes in NIR spectra. They found that the NIR spectrum intensity and optical range were exponentially related and modified the NIR spectra and the modified model. The prediction set discrimination accuracy reached 90.2%, and the method can correct the effect of fruit size on the transmission spectrum to improve the identification of diseased apples. Two prototypes of online NIR systems were developed by McGlone et al. [13], one is based on a time-delay integral spectrometer and the other on a large-aperture spectrometer. The latter system has high accuracy, with a predicted root mean square error of 4.1% after PLS correction. However, only apples with a mean equatorial diameter of 76 mm (SD = 2.8 mm) were selected for the experiment, and the effect of fruit size on the detection of browning tissue was not investigated. Qin and Lu [14, 15] quantified the light transmission in apples using Monte Carlo simulations. They corrected the diffuse reflectance spectrum according to the fruit size to eliminate the light intensity distortion caused by the curved fruit surface. In this paper, we applied NIR spectra. We collected apple NIR spectra at different sizes to establish various preprocessing models, mixed size models, and data fusion-based apple size compensation models to compare the advantages and disadvantages of the prediction performance of the three models and find the best solution for the effect of apple size differences on the model performance.

2. Materials and Methods

2.1. Experimental Materials

A total of 480 apple samples, including 160 of 65 mm–75 mm fruit diameter, 160 of 75 mm–85 mm fruit diameter, and 160 of 85 mm–95 mm fruit diameter, were ordered from an orchard in Yantai, Shandong Province. Upon arrival, the experimental spectra of the apples were collected after wiping off the dust from the surface of the apples with a wet paper towel in order to prevent the scattering effect of the dust on the transmission spectra and leaving the apples in a room with an ambient temperature of 25°C for 24 hours.

2.2. Experimental Device and Spectrum Acquisition

The near-infrared spectrum acquisition device used in this paper is a dynamic online diffuse transmission detection device developed by our group [16], as shown in Figure 1. The light sources are two rows of halogen lamps, 5 in a row, 10 in total. The parameters of the halogen lamps are 12 V and 100 W, which provide light sources for collecting spectrum information in the diffuse transmission mode. The apples are placed on the fruit cups and transferred to the dark box by the chain. The halogen lamps illuminate the apples passing by, and the light through the interior of the apples is received by the optical fiber and transferred to the computer through the spectrometer, which has a wavelength range of 350∼1150 nm, and the exposure time of the samples is adjusted by the supporting spectrum acquisition software. The device was preheated for 30 min before the spectrum acquisition, the detection speed of the device was set to 0.5 m/s, and the exposure time was 100 ms. Each sample was collected four times at the equatorial part, and the average spectrum was taken as the experimental spectrum of that sample.

2.3. Parameter Measurement

The SSC content of apple samples was measured using a refractive digital saccharimeter (PR-101a, Japan). The measurement process was as follows: a fruit knife was used to cut off part of the flesh of the spectrum collection site on the four sides, and the juice was squeezed out of the flesh and dropped on the measurement position of the saccharimeter to measure the saccharimetric value of this side of the apple. The average SSC value of the four sides was taken as the SSC value of the apple sample. The fruit diameter at the equatorial position of the apple was measured using a digital vernier caliper (Mitutoyo-500, Japan). Each apple was measured four times at the equatorial position and the average fruit diameter was taken as the fruit diameter of that fruit.

2.4. Data Processing

The Kennard-Stone (K-S) algorithm was first applied to classify the collected apple spectra. The collected spectrum data were imported using unscrambler software to establish the SSC model of apples. The prediction set correlation coefficient Rp judged the performance of the model and prediction set root mean square error value (RMSEP). The formulae for RP and RMSEP are shown in equations (1) and (2), respectively.where n is the number of samples in the prediction set, is the predicted value of the i-th sample in the prediction set, yi is the true value of the i-th sample in the prediction set, and is the average of the true values of all samples in the prediction set.

PLS is the most commonly used multivariate linear correction technique, which is widely used in NIR spectroscopy to predict the internal quality of fruits quantitatively, and the principle of PLS prediction is shown in where Y is the model prediction, i denotes the i-th wavelength point, βi denotes the regression coefficient value corresponding to the i-th wavelength point, λi is the spectrum energy value corresponding to the i-th wavelength point, n is the number of wavelength points, and B is the intercept.

3. Results

3.1. Analysis of Apple SSC and Measurement Results

The experimental samples of 480 apples were classified separately using the K-S algorithm for the modeling set and the prediction set, 160 samples under each fruit diameter group, of which 120 were in the modeling set and 40 in the prediction set, and the SSC measurements of apples are shown in Table 1. The SSC content range of the modeling set was more comprehensive (9.05–16.4 Brix) than that of the prediction set (9.65–14.85 Brix), which could achieve a better prediction for the apple SSC model.

3.2. Analysis of NIR Spectrum Characteristics of Apples with Different Fruit Diameters

The average spectra of apples in the three fruit diameters were taken and compared with the spectra of three different fruit diameters, as shown in Figure 2. The peaks at 640 nm, 710 nm, and 800 nm and the troughs at 675 nm and 755 nm are mainly related to the multiplicative stretching vibrations of C-H and O-H bonds at 710 nm [17, 18]. The peaks at 805 nm are mainly associated with the secondary multiplicative absorption of C-H and N-H bonds [19, 20].

The energy spectrum of apples with a fruit diameter of 65 mm–75 mm is higher than apples with a fruit diameter of 75 mm–85 mm and 85 mm–95 mm. This phenomenon is because the energy carried by near-infrared light inside the apple decays as the light range increases. At a given wavelength, the extinction rate of light entering the apple interior is approximated as an exponential decay function [21], which can be fitted as where I0 is the light intensity entering the interior of the apple, I is the light intensity received by the fiber optic probe below the apple, ue is the extinction coefficient, and d is the distance from the point where the light enters the apple to the point where the light exits the apple.

In the NIR online detection device, as in Figure 1, d in formulae (3) is the fruit diameter of the apple. As the fruit diameter increases, the energy of the light is absorbed more and more inside the apple, resulting in a lower energy value of its collected NIR spectrum. It is thus speculated that different fruit diameters of the apple will cause differences in its NIR spectrum, which will impact the performance of the apple SSC prediction model built from the NIR spectrum.

3.3. Apple Soluble Solids Content Prediction Model for Each of the Three Fruit Diameters

The PLS was used to build the apple SSC prediction model, and the number of LVs was set to 1∼20 to prevent the model from overfitting or underfitting. The spectra were pretreated by MSC, SNV, and S-G smoothing alternatively, and the results of the PLS model of sugar content built for the three groups of fruit diameter apples are shown in Table 2.

The results showed that the NIR spectra of apples with fruit diameters of 65 mm–75 mm were treated with SNV. Their model predictions were the best, with Rp of 0.885 and RMSEP of 0.771%. Their scatter diagrams are shown in Figure 3(a). The NIR spectra of apples with fruit diameters of 75 mm–85 mm were treated with SNV, and their model predictions were the best, with Rp of 0.959 and RMSEP of the scatter diagram shown in Figure 3(b). The NIR spectra of apples with the fruit diameter of 85 mm–95 mm were best predicted by SNV, with Rp of 0.937 and RMSEP of 0.421%, and the scatter diagram is shown in Figure 3(c). The modeling results of the original spectra at the three fruit sizes and the modeling results after SNV pretreatment show that SNV could solve the problem of poor performance of its sugar prediction model due to differences in apple size to some extent, because SNV, as a pretreatment method that can eliminate sample particle size, surface scattering, and light range variation [22], can solve the effect of spectrum scattering caused by uneven sample size.

3.4. Fruit Diameter Group 75 mm–85 mm Predicted Other Fruit Diameter Group SSC Prediction Model

As shown in Table 2, the modeling set of apple SSC model with fruit diameter 75 mm–85 mm had better performance than the other fruit diameter groups, so the modeling set with fruit diameter 75 mm–85 mm was selected as the modeling set of the hybrid prediction model to investigate the performance of the apple SSC prediction model when the apple fruit diameter in the modeling set was different from the apple fruit diameter in the prediction set. The modeling set in the fruit diameter 75 mm–85 mm group was used to predict the prediction set in the fruit diameter 65 mm–75 mm and fruit diameter 85 mm–95 mm groups. The prediction model effects are shown in Table 3. The scatter diagrams are shown in Figures 4(a) and 4(b). The results showed that the prediction model of apple SSC was poor when the difference between the modeling set and the prediction set of apple fruit diameter was significant, and compared with the modeling results of separate groups of fruit diameter in Table 2, the prediction set of apple SSC model for fruit diameter 65 mm–75 mm Rp decreased from 0.879 to 0.779 and RMSEP increased from 0.782% to 0.877%. The prediction set Rp decreased from 0.918 to 0.745 and RMSEP was risen from 0.493% to 0.914% for the model of the SSC of apples with 85 mm–95 mm fruit diameter. The difference in apple size had a significant influence on the prediction model of the SSC of apples. In the actual fruit sorting line, the size of apples can vary greatly, which can lead to the poor sorting performance of the fruit sorting model, so size compensation of the apple SSC model is needed to increase the prediction performance of the sorting model.

3.5. Prediction Model of Soluble Solids Content under Apple Size Compensation
3.5.1. Analysis of the Relationship between Apple Size and Spectrum

Apples with fruit diameters of 65 mm, 67 mm, 69 mm, 71 mm, 73 mm, 75 mm, 77 mm, 79 mm, 81 mm, 83 mm, 85 mm, 87 mm, 89 mm, 91 mm, 93 mm, 95 mm were taken separately to establish the relationship between their fruit diameter and the light intensity of their collected NIR spectra at 750 nm, as shown in Figure 5, from which it can be seen that the size of the apple will have an effect on its NIR spectrum and thus will have an effect on the performance of the apple SSC prediction model established by its NIR spectrum, so it is necessary to compensate the size of the apple SSC model to increase the prediction performance of the sorting model.

3.5.2. Mixed Apple Size Soluble Solids Content Prediction Model

From Table 3 and Figure 5, it is known that apple size differences will affect its NIR spectrum, so it is necessary to build an apple size compensation model to solve the influence of apple size differences on the model. 120 representative apple samples from each fruit size group were selected as the modeling set of the mixed apple size SSC prediction model using the K-S algorithm, and 40 apple samples were selected as the prediction set. The PLS prediction models for SSC of different apple sizes were constructed, the model effects are shown in Table 4, and the scatter diagrams are shown in Figure 6. The results show that the prediction models established for mixed apple sizes fit better, the model performance is better, the correlation coefficient Rp is significantly improved, the root mean square error value RMSEP of the prediction set is significantly reduced, and the model stability is significantly improved compared with the prediction models established in Table 3. The model stability was significantly improved, and the influence of the model on the SSC of apple due to apple size could be reduced.

As shown from Table 4, the constructed PLS model of mixed apple size SSC has improved its model prediction performance relative to Table 3 when the modeling set and prediction set of fruit diameter is different. Still, its prediction set root mean square error value RMSEP is as high as 0.911%. In the actual fruit sorting line, such a high error value will lead to inaccurate apple quality sorting. Add the size variable, and then build its SSC prediction model. Its model prediction results are shown in Table 5.

As shown in Table 5, the model prediction performance of the mixed apple size SSC prediction model improved after adding the size variable. The root means square error value RMSEP of the prediction set decreased from 0.911% to 0.822%, but the improvement of its size variable on the model performance was negligible. In the apple mixed size solids content model, its modeling set containing each group of the effect of size variables was diluted in the mixed size solids content model, resulting in an insignificant impact.

3.5.3. Development of a Size-Compensated Soluble Solids Content Prediction Model for Apples

In this study, the effect of apple fruit size on the SSC model of apple in NIR spectroscopy was investigated by the data fusion technique. The main objective of this technique is to optimize the amount of information on the tested sample metrics through the synergistic effect between the individual assays of the same sample [23], which consists of three levels of fusion: primary, intermediate, and advanced fusion. Primary fusion is the fusion and modeling of the raw data from multiple assays; intermediate fusion is the screening of the effective variables of each assay and then data fusion and modeling; advanced fusion is the modeling of each assay independently and then the decision making after considering the results of each model. In recent years, data fusion techniques have been applied in several fields, such as metabolomics [24, 25], artwork identification [26, 27], dye classification [28], and food testing [29, 30]. In this study, a preliminary analysis of the effect of apple fruit diameter on the visible/near-infrared spectrum fruit nondestructive inspection model was conducted using a mid-level data fusion technique, the technical flow chart shown in Figure 7.

CARS and SPA were used to select the spectrum variables of the apple modeling set in the fruit diameter of 75 mm–85 mm to eliminate useless variables, further optimize the prediction model’s performance, and improve the detection speed of the model. PLS modeling was performed in two cases: (1) PLS models of apple SSC were established with the wavelength variables selected by CARS or SPA. (2) The wavelength variables selected by CARS or SPA were fused with their corresponding apple size data and normalized to establish the PLS model of apple SSC. The selected spectrum wavelength points of CARS and SPA are shown in Figures 8(a)) and 8(b), and the results of the established PLS model of apple SSC are shown in Table 6. The results showed that most of the wavelength points selected by CARS and SPA were located at 650 nm–850 nm. Most of them were at the peaks and valleys, indicating a large amount of information on the SSC of apples in this spectrum region. The poor performance of the SSC model built with the screened wavelengths after using SPA wavelength screening was caused by the fact that the SSC has multiple representations on the spectrum, and the wavelength selection was performed to remove a lot of useful information, which led to the poor performance of the model. Among them, the Rp of the model prediction set established by SPA for fruit diameter of 65 mm–75 mm was 0.744 and RMSEP was 1.340%; the Rp of the model prediction set for fruit diameter of 85 mm–95 mm was 0.665 and RMSEP was 1.942%. Using PLS to model the wavelength variables selected by CARS, the Rp of the model prediction is set for fruit diameter of 65 mm–75 mm. The number of wavelength variables used in the model decreases from 1044 to 61, effectively simplifying the model and improving its stability.

The apple size variables were added to the wavelength variables selected by CARS, and the data were normalized because the apple size data and spectrum data units do not coincide [31]. The effect of data normalization is to eliminate the impact of data dimensionality and make the data metrics comparable, which is essential for model building [32]. A PLS model of apple SSC with the intermediate level fusion of spectrum and apple fruit diameter data after normalization was established to show the model performance in Table 7. The model scatter diagrams are shown in Figures 9(a) and 9(b).

As shown in Table 7, compared with the PLS model of apple SSC built with CARS selected wavelength variables, the developed size-compensated intermediate fusion model had an improved prediction set Rp from 0.854 to 0.886 and a reduced RMSEP from 0.611% to 0.536% for fruit diameter of 65 mm–75 mm, and its prediction set Rp from 0.863 to 0.913 and a reduced RMSEP from 0.586% to 0.497% for fruit diameter of 85 mm–95  mm improved to 0.913 and RMSEP decreased from 0.586% to 0.497%. Compared with the mixed fruit size model, its prediction set Rp had a significant improvement and RMSEP had a considerable decrease. The results indicate that apple size influences the performance of the apple SSC model, and adding fruit diameter variables to establish a size-compensated model of apple SSC can improve the prediction performance of the model.

4. Conclusion

This paper investigated the effect of apple fruit diameter differences on its SSC prediction model. The results showed that apple size differences will have an impact on its spectrum, and apple size and its spectrum light intensity satisfy the relationship of the logarithmic function, which will eventually have an effect on the prediction performance of the PLS model of apple SSC established by it. For this reason, the solution methods of different size differences and different preprocessing models, and apple fruit size were studied. We found that SNV is a pretreatment method that can eliminate sample particle size, surface scattering, and light range variation. It can solve the poor performance of the prediction model of SSC due to the difference in apple size to a certain extent. The correlation coefficient Rp was significantly improved; the root means the square error of the prediction set RMSEP was reduced considerably. The stability of the model was dramatically enhanced, which could reduce the influence of apple size on the model of apple SSC. The prediction set Rp for fruit diameter 65 mm–75  mm is 0.886 and RMSEP is 0.536%, and the prediction set Rp for fruit diameter 85 mm–95  mm is 0.913 and RMSEP is 0.497%, which is the best model performance. Therefore, adding the fruit diameter variable to establish the size compensation model of apple SSC can improve the model’s prediction performance and meet the requirements of online detection of the SSC of apples with different fruit diameters.

Data Availability

The raw data in our research cannot be shared at this time as the data also form part of an ongoing study.

Conflicts of Interest

The authors report no conflicts of interest.

Authors’ Contributions

Xiaogang Jiang carried out investigation, validation, and formal analysis. Mingwang Zhu did writing of original draft, data curation, and methodology. Jinliang Yao provided validation and software. Yuxiang Zhang provided resources. Yande Liu contributed with resources.

Acknowledgments

This work was supported by the National Natural Science Foundation of China (grant number: 31760344) and the Science and Technology Research Project of Education Department of Jiangxi Province (grant number: GJJ200615).