Abstract

Wavelength selection is a challenging job for the detection of the bruises on pears using hyperspectral imaging. Most modern research used the feature wavelength set selected by a single selection method which is generally unable to handle the wide variability of the hyperspectral data. A novel framework was proposed in this work to increase the performance of the bruise detection, through combining three state-of-the-art variable selection methods and the concept of feature-level integration. Successive projection algorithm, competitive adaptive reweighted sampling, and RELIEF were first applied to the spectra of the Korla pear, respectively. Then, the corresponding feature wavelength subsets were integrated and an optimal feature wavelength set was constructed. An ELM-based classifier was employed for the pear bruise identification finally. Experimental results demonstrated that the feature wavelength integration resulted in lower detection errors. The proposed method is simple and promising for bruise detection of Korla pears, and it can be utilized for other types of defects on fruits.

1. Introduction

Korla pear is an important fruit product in Xinjiang, China. It contributes greatly to local economic development and social life. Accordingly, the planting area and yields continued to increase during the past decade. In 2017, the planting area reached 40, 000 hectares and the annual output was 800,000 tons. However, the presence of bruises that occurs during harvest operations and postharvest handling lowers the pear quality and consequently causes significant economic losses. From the orchard to the market, pears undergo a series of operations such as picking, sorting, packing, transportation, refrigeration, grading, and packaging. In these links, pears inevitably suffer from bruise damage. After storage for a short time, the damaged tissue can easily induce the growth of rot fungi, which accelerates the spread of pear rot to the surrounding undamaged pear. Thus, there is a strong need to investigate new ways to detect bruises on pears rapidly and accurately.

However, the bruises on pears are arduous to detect. The color change of the pears after the slight damage is not obvious. In addition, the detection is easily afflicted by many factors, including the type of bruise, the severity of bruising, and the fruit conditions [1]. Nowadays, many technologies have been introduced into this field, such as light reflection, light transmission, fluorescence, ultrasound, dielectric properties, X-ray, gamma-ray, and magnetic resonance. [2]. Near-infrared (NIR) spectroscopy and hyperspectral imaging analysis techniques have the advantages of low cost, rapid, and nondestructive and have been widely used in fruit bruise detection in recent years. Zhang reported a detection accuracy of 94.75% for rapid and nondestructive testing on the bruises on apples using the hyperspectral images in the 400–1000 nm region [3]. Li et al. applied the short-wave NIR hyperspectral imaging technique to peaches and segmented the raw hyperspectral images into the bruised region and the intact region with high accuracy [4]. Similar applications of NIR spectroscopy and hyperspectral imaging analysis techniques were also investigated to detect bruises on other fruits, such as blueberry [5], Kiwi fruit [6], and Lycium barbarum L [7]. More extensive reviews on the applications were provided by Liu et al. [8] and Wang et al. [9].

Those studies demonstrated that NIR spectroscopy and hyperspectral imaging analysis is a promising technique for detecting bruise damage on fruits. However, compared with other fruits, only a few work was conducted on the pears until now and further exploration is still needed, although there have been many studies on the bruise susceptibility and storage time of pears [10, 11]. Dang et al. combined hyperspectral imaging with supervised classification techniques (K-nearest neighbor and support vector machine) to detect the scar area of pears [12]. Lee et al. introduced the F-Value method to find the optimal band ratio of the pear hyperspectral images and then compared the band ratio of each pixel of the images with a predetermined threshold to segment the image of the damaged area [13]. The effects of illumination inhomogeneity on the bruise detection of pear using hyperspectral imaging were studied by Zhao. Four methods were introduced to address this problem, including maximum likelihood classification (MLC), Euclidean distance classification (EDC), Mahalano distance classification (MDC), and spectral angle mapper (SAM). The experimental results demonstrated that MDC and SAM achieved better detection performance. The detection accuracies were 93.8% and 95.0%, respectively [14].

Especially, to the best of the authors’ knowledge, there were barely any studies reported to investigate the wavelength selection method for the bruised detection of pears using that NIR spectroscopy and hyperspectral imaging analysis. Jiang extracted the feature wavelengths of the hyperspectral images of pears by principal component analysis (PCA) and then identified the damage of pears using partial least squares discriminant analysis [15]. Moreover, as mentioned before, the detection is easily afflicted by many factors. Finding a single method to select wavelengths of hyperspectral images with large variety is difficult. The specific objective of this paper was to propose a feature wavelength selection method based on feature-level integration framework, aiming at achieving better performance of bruise detection of Korla pear. Successive projection algorithm (SPA), competitive adaptive reweighted sampling (CARS), and RELIEF methods were conducted independently on the spectra ranging from 400 to 1000 nm of all samples. Subsequently, the wavelength subsets selected by the three methods were further optimized by an integration strategy and the output wavelength set worked as the input variables of ELM-based classifier to identify the bruises on pears. The main advantage of the proposed framework is that the complementarity of the feature wavelengths selected by different rules can improve the robustness and accuracy of the detection of bruises on pears.

The NIR spectrum of a tested sample often contains hundreds of wavelengths. The huge size of the data set increases the processing load of the NIR spectral applications. Therefore, it is always desirable to explore advanced algorithms for selecting a minimum subset of wavelengths that carry the most information of the tested sample. During the past decades, a variety of wavelength selection methods can be found in the literature, such as branch and bound [16], uninformative variable elimination (UVE) [17], PCA [15], SPA [18, 19], CARS [20], random frog [21], and genetic algorithms (GAs) [22]. Numerous studies suggested that better performance can be achieved while using the selected wavelength subset than the full range of the wavelengths [23, 24].

Several reviews have been published in this regard [8, 2325]. Liu et al. divided the methods into three types, filter, wrapper, and embedded methods. With the brief analysis of the details and typical applications, advantages and disadvantages of each method were discussed [8]. Yun et al. categorized the methods based on the approaches employed for variable initialization, modelling, evaluation, and wavelength selection and compared the similarities and differences of each type of method [24]. Dai et al. surveyed the methods based on the searching strategy for generating wavelength subsets. The fundamentals and applications of each algorithm were also provided [25]. These reviews provided a better understanding of the characteristics, advantages, and disadvantages of the existing wavelength selection methods from different perspectives. With the significant help from them, the readers could choose an appropriate method and apply it correctly for their studies.

It was generally accepted that each method has its own characteristics and limitations, and the integration of existing methods may help to achieve better performance by combing advantages of different kinds of methods [24, 25]. There have been many hybrid methods that combine two or three methods. Yun et al. proposed methods of VCPA–GA and VCPA–IRIV, in which the variables selected by modified VCPA were further optimized by GA or IRIV [26]. RF-BP, presented by Chen et al., generated a new comprehensive variable subset by combing random forest and back propagation network [27]. They both claimed that the hybrid algorithm was a good and promising strategy for variable selection and could improve the performance of NIR spectral applications. However, most of the existing hybrid methods of wavelength selection were in cascade form. The former method made a rough selection, and the output wavelength subsets were refined by the latter one. If the former method did not select the key wavelengths, it would be impossible to exploit the advantages of the single methods [24].

Motivated by the above, an integration framework was proposed by combining three state-of-the-art methods, including SPA, CARS, and RELIEF. The motivation for choosing the three algorithms came from the good performance of wavelength selection as reported in [18, 20, 28]. Different from the previously mentioned hybrid methods, the three methods were combined in parallel in this work. It means the algorithms can select variables independently and the negative effect of each other can be avoided. Each method has its own evaluation metric and selection strategy. The combination of the wavelength subsets selected by different methods have complementary relationships and have a better ability to relate to the properties of interest. So, the parallel combination would produce better results than the serial form [29].

3. Materials and Methods

3.1. Pears and Bruising

Korla pears were purchased from a local market in Lin’an District, Hangzhou City. To ensure the reliability of the research, a total of 80 pears were manually selected from the same batch product. They had similar shape and size and had no obvious surface defects. Their surface color was distributed uniformly. The samples were randomly divided into two groups of 40 each.

A hollow cylinder with a height of 60 cm and a diameter of about 7 cm was made by hand using cardboards. The pears were dropped one by one from the top of the cylinder to make artificial damage to the equator position of the pear. The intact and damaged pear categories were assigned 1 and 2, respectively.

3.2. Hyperspectral Imaging System and Image Acquisition

Figure 1 depicts the experimental setup. The hyperspectral imaging system consisted of an industrial camera (SOC710-VP, Surface Optics Corp., USA), two fiber optic halogen lamps (150 W EKE, 3250 K, Techniquip, USA), and a stage. The whole system was fixed in a dark chamber aiming to reduce the effects of ambient light. The working range of the camera was from 400 nm to 1000 nm. The spectral resolution was 4.68 nm, and the band number was 128.

The pears were placed on the stage directly below the camera and scanned perpendicularly to the bruised region. The scanning speed was 30 lines per second, and the time for taking a hyperspectral image was 46.4 s.

3.3. Hyperspectral Image Calibration and Preprocessing

To remove the influence of uneven illumination and the dark current noise, image calibration was conducted on the raw hyperspectral image in each experiment. The operation can be expressed using the following equation:where denotes the original hyperspectral image, represents the image of the white reference which has a reflectance of 99%, and corresponds to the camera’s dark current which was measured with the camera lens covered.

Consequently, the Savitzky–Golay smoothing method was applied the calibrated hyperspectral images [30] to remove random noise and promote the quality of the hyperspectral data. The length of the sliding window was seven, and the polynomial order was two.

3.4. Feature Wavelength Selection

As previously mentioned, wavelength selection becomes an essential step for the NIR spectral applications. A novel framework was proposed in this work to increase the accuracy of the bruise detection, through combining three state-of-the-art variable selection methods and the concept of feature-level integration.

3.4.1. SPA Method

SPA is a forward variable selection algorithm that can minimize the collinearity between the spectral variables and extract valid feature from the redundant spectral data. Simple projection operations in a vector space are first performed. Then, among all the remained variables, the new variable which has the maximum projection value on the orthogonal subspace of the previous selection variable is selected [19, 31].

3.4.2. CARS Method

The CARS method selects variables by simulating the basic principle of “survival of the fittest” in Darwin’s theory of evolution. Each wavelength variable is taken as a unique individual. By an adaptive reweighted sampling technique, those individuals with larger coefficients of the partial least squares regression model are selected, whereas those with small weights are eliminated. By this means, a collection of wavelength variable subsets is achieved. Finally, all the wavelength variable subsets are modeled by cross-validation, and according to the root-mean-square error of cross-validation minimum principle, the optimal wavelength variable subset is selected [20, 32].

3.4.3. RELIEF Method

RELIEF is also an individual evaluation filtering feature selection method proposed by Kira and Rendell in 1992 [33, 34]. It calculates a proxy statistic for each variable that can be used to estimate the difference between homogeneous neighbor samples and heterogeneous neighbor samples. Then, a relevance threshold is defined such that any variables with a relevance value larger than the threshold are selected.

3.4.4. Wavelength Selection Based on Feature-Level Integration

In the past studies, the wavelength was almost selected by a single principle, which results in the low robust of the bruise identification. Generally, it has been accepted that the feature wavelengths selected by different methods have complementary relationships and have a better ability to describe the attributes of the target. With regards to this, a feature-level integration based wavelength selection framework was proposed. Figure 2 illustrates the flowchart of the proposed framework.

SPA, CARS, and RELIEF were applied to the original hyperspectral data, respectively. Three subsets of wavelengths were obtained. Subsequently, the selected variables by the three methods were integrated by combining the three subsets and removing the duplicate wavelengths. The output was considered as the optimal feature wavelength set.

3.5. Extreme Learning Machine

Extreme learning machine (ELM) is a rapid learning algorithm based on the single hidden layer feedforward neural networks (SLFNs). It has been widely used for supervised learning or unsupervised learning [35]. As illustrated in Figure 3, ELM consists of an input layer, a hidden layer, and an output layer.

Given a training set which contains samples, activation function , and neuron number of the hidden layer , the training step of ELM can be described as follows:(1)Assign the weights and bias randomly, where (2)Calculate the output matrix of the output layer (3)Obtain the output weights according to the formula

Compared with the traditional SLFN, ELM does not need to adjust the input weight and bias during the training process. The output weights , with which an optimal global solution of ELM can be easily calculated, are obtained according to the corresponding algorithm rather than iterative learning. Therefore, the parameter optimization is easier and the training speed is significantly improved. Moreover, it does not fall into local optimum. All these indicate that ELM-based method can provide a real-time, accurate, and reliable way for bruise detection of pears.

3.6. Performance Evaluation of the Bruise Detection

To evaluate the performance of the pear bruise detection quantitatively, the confusion matrix was introduced to this work, as shown in Table 1 [36].

According to the confusion matrix, the accuracy , precision , and the recall rate R can be expressed as follows:

4. Results and Analysis

4.1. Reflectance Spectra of Pears

The region of interest (ROI) was manually selected from the gray image at 994.36 nm of each sample. The size was 20 × 20 pixels. For the purpose of reducing the influence of the uneven illumination, ROIs of the intact pears were selected to be close to the damaged area of the bruised samples. According to the principle of the hyperspectral imaging, each pixel in the ROI has a complete reflectance curve taking wavelengths as variables. The average reflectance curve of all the pixels in each sample ROI was calculated. Figure 4(a) shows the mean reflectance spectra of 20 intact surface areas and 20 damaged surface areas. The mean and variance of the reflectance spectra are shown in Figure 4(b).

It can be observed that the average reflectance of the intact samples and bruised samples followed the same trend, but the reflectance ratio had a little difference, especially in the bands of 400∼450 nm, 600∼700 nm, and 780∼1000 nm. It was feasible to identify the bruised sample by analyzing the reflectance spectra. However, from Figure 4(a), it can be seen that the reflectance curves of the intact sample and bruised sample overlapped with each other. Therefore, it was necessary to select the discriminating feature wavelengths and reduce the noise which was weakly correlated to the bruise information and confused the classifier.

4.2. Results of Wavelength Selection

The aforementioned three methods were applied to the original spectra. For the SPA method, the maximum number of the feature wavelengths was set to be 5–30. The F-test was used to remove those wavelengths which weakly correlated with the bruises. The significance level of the F-test was assigned a value of 0.25. According to the output of CARS, the minimum RMSECV can be obtained with 25 iterations. Then, the wavelength set of the 25th iteration was selected as the optimal feature wavelength subset. The threshold of RELIEF was 0.15. The wavelengths with weights less than the threshold were removed and 18 wavelengths were remained as the output of the RELIEF method. The wavelength selection process based on the three methods is shown in Figure 5, respectively.

The output subsets of the three methods were combined, followed by removing the duplicate wavelengths. The final results of the wavelength selection are listed in Table 2.

It can be seen from Table 2 that the wavelengths selected by the three methods overlapped each other in some regions. Especially, all the three methods selected wavelengths distributed in the bands of 400 nm∼460 nm and 960 nm∼1000 nm. It can be observed from Figure 4 that the spectra of the intact pears and the bruised samples in these two regions had distinctly different reflection ratios. It meant all the three methods were effective, and they all can found the wavelengths that were highly correlated with the bruise. It can also be seen from Table 2 that the distribution of the wavelengths selected by the three methods was slightly different. For example, only CARS select wavelength in the bands of 500 nm∼700 nm. It can be inferred that the three methods would extract different features from different perspectives. The bruises can be identified with high robustness by integrating the selected wavelengths.

4.3. Results of the Bruise Detection

An ELM-based classifier was established for the detection of the bruise on pears with the integrated wavelengths. Through a lot of experiments, the Sigmoid function was selected as the activation function, and the number of optimal hidden layer neurons was set to 20. Both the training set and the test set contained 20 intact samples and 20 damaged samples. Table 3 lists the detection results of the ELM-based classifier. For comparison, the results of bruise detections using wavelength subsets of the three methods are also provided in Table 3. The parameters of the ELM-based classifier were the same as those of the aforementioned experiments.

The accuracy, precision, and recall rate, obtained according to equation (2), are also listed in Table 3. Using all the wavelength subsets, the ELM-based classifier can identify the bruised with high accuracy. Accuracy, precision, and recall rate were all larger than 85%. From Table 3, it can be seen that the proposed framework always performed better than that using only a single wavelength selection method. This exhibited the power of our proposed method.

5. Conclusion

In this work, a wavelength selection method based on feature-level integration was investigated. SPA, CARS, and RELIEF were applied to the spectra of pears, followed by a feature-level integration framework which can make full use of the complementarity of wavelengths selected by different methods. Combined with ELM-based classifier, high detection performance of bruise on pears was achieved. The detection rate was 97.5%, the accuracy was 95.2%, and the recall rate was 100.0%, which was superior to the results of the three single selection methods. In conclusion, this method is feasible and might provide a reference for future research on the bruise detection on Korla pears. However, the capacity of the method was only verified by experimental results. Further efforts are desired from the view of the mathematical basis, aiming at interpreting the framework theoretically and finding an effective way to improve the performance. Moreover, much more pears with different size and shape should be tested to ascertain properly the identification capability of this method in the future.

Data Availability

The hyperspectral images used to support the findings of this study are available from the corresponding author upon request.

Conflicts of Interest

The authors declare that they have no conflicts of interest.

Acknowledgments

This work was partially supported by the National Natural Science Foundation of China (Grant no. 51565052) and the Open Research Foundation of Key Laboratory of Modern Agricultural Engineering, Tarim University, China (Grant no. TDNG20170301). It was also partially supported by a grant from the Postdoctoral Science Foundation of Zhejiang Province, China (Grant no. ZJ20180156).