Abstract

This paper presents a new multisupervised coupled metric learning (MS-CML) method for low-resolution face image matching. While coupled metric learning has achieved good performance in degraded face recognition, most existing coupled metric learning methods only adopt the category label as supervision, which easily leads to changes in the distribution of samples in the coupled space. And the accuracy of degraded image matching is seriously influenced by these changes. To address this problem, we propose an MS-CML method to train the linear and nonlinear metric model, respectively, which can project the different resolution face pairs into the same latent feature space, under which the distance of each positive pair is reduced and that of each negative pair is enlarged. In this work, we defined a novel multisupervised objective function, which consists of a main objective function and an auxiliary objective function. The supervised information of the main objective function is the category label, which plays a major supervisory role. The supervised information of the auxiliary objective function is the distribution relationship of the samples, which plays an auxiliary supervisory role. Under the supervision of category label and distribution information, the learned model can better deal with the intraclass multimodal problem, and the features obtained in the coupled space are more easily matched correctly. Experimental results on three different face datasets validate the efficacy of the proposed method.

1. Introduction

Image matching is an important task in computer vision and multimedia analysis, and numerous methods have been proposed to solve this issue under controlled conditions [14]. However, these methods often fail to gain a good performance in real-world degraded image matching. To address the problem of low resolution, some researchers usually improve image quality by image reconstruction [58]. However, image reconstruction methods usually require big data to train the model, which is complicated and time-consuming. More importantly, some invalid information is easily introduced into the reconstructed image, and sometimes the information is even interference for image matching. Therefore, to avoid invalid information interference in image reconstruction, some researchers proposed the image matching idea based on coupled metric learning.

In 2009, the linear coupled metric learning (LCML) [911] was proposed, which achieved good performance in low-resolution image matching. In order to better implement nonlinear data metric, some nonlinear coupled metric learning methods [1214] combining the kernel technique were proposed. In addition, Zhang et al. [15] proposed the distance metric based on coupled edge discriminant mapping, which implemented the effective metric for different attribute data. Jiang et al. [16, 17] proposed coupled discriminant multimanifold analysis for low-resolution face recognition, which effectively improved the recognition performance. Since 2014, deep learning has become more and more widespread, which is introduced to improve the metric learning model [18, 19]. The deep coupled metric method maps samples to a coupled space by using deep networks, which can better extract nonlinear features and improve the performance of image matching. But deep networks require huge datasets for training and cannot be applied in small datasets.

Although there have been many studies on coupled metric learning, the existing metric methods still have some shortcomings. When only category labels are used as supervision, it is difficult to overcome the intraclass multimodal problem, as shown in Figure 1; the distribution of samples cannot be well maintained in the coupled space, which will result in a decrease of image-matching accuracy. Therefore, to better use the spatial distribution information of samples to supervise the metric learning, this paper proposes a multisupervised coupled metric learning method fusing category label and distribution information of samples. Meanwhile, the proposed method is suitable for small sample datasets and avoids the deficiencies of deep coupled metric method.

In the proposed metric learning method, we first construct the category correlation matrix and distribution correlation matrix, respectively. Under the supervision of these two correlation matrices, the main objective function and auxiliary objective function are defined. Then, the general objective function is generated by fusing these two objective functions and the new correlation matrix fusing category label and distribution information is obtained. In the general objective function, the category label is used as main supervised information and distribution information is used as auxiliary supervised information. Finally, the objective function is transformed into the generalized eigenvalue problem to solve. The experiments on Yale-B, ORL, and UMIST face datasets demonstrate that the multisupervised coupled metric extends the distance metric methods and effectively improves the image matching performance. To sum up, there are two main contributions:(1)We propose a multisupervised coupled metric learning method fusing category label and distribution relationship, which can overcome the defect of existing coupled metric methods. The method can map data in different spaces to the coupled space, and at the same time, it can better maintain the original distribution relationship between samples in the new space.(2)We construct the linear and nonlinear multisupervised coupled metric learning objective function, respectively. The nonlinear coupled metric learning can better map sample in different spaces to the coupled space for more reliable image matching.

The rest of this paper is organized as follows. The definition and objective functions of coupled metric learning are described in Section 2. The multisupervised coupled metric learning method is described in detail in Section 3. The performance evaluation experiments based on three different face datasets are implemented in Section 4. Finally, Section 5 concludes our work.

2. Preliminary

Suppose that IX and IY are two images with the same dimension. Conventionally, the image should be converted into a vector before distance measurement. Assuming that the vector forms are x and y, the distance between IX and IY is defined as follows:where denotes norm operation and A is the distance metric matrix which is positive semidefinite matrix. It can be denoted as ; then,

Obviously, the distance metric is converted to norm calculation after feature transformation. Therefore, the unified description of distance metric is as follows:

2.1. Definition of Coupled Metric Learning

The coupled metric is a distance function for samples from different datasets. Assuming that the two images IX and IY have different resolutions, the corresponding vectors and . The coupled distance metric is defined as follows:where f (⋅) and are the transformation functions to map samples into coupled space and the matrix denotes the transformation and distance measurement in the coupled space.

After transformation, the samples with the same category label should be as close as possible in the coupled space. Therefore, the optimal objective function of coupled metric learning is written aswhere S is a category correlation matrix between datasets X and Y, which is defined as follows: assuming that sample xiX, its category label is ; sample yjY and its category label is . The element in the matrix S is defined as follows:

Obviously, based on different supervised information, we can obtain different matrix S, so as to achieve different coupled metric approaches. Additionally, the mapping functions f (⋅) and may be a linear or nonlinear function, i.e., corresponding to linear or nonlinear coupled metric.

2.2. Linear Coupled Metric Learning (LCML)

Assume that the datasets and contain N samples, respectively. SRN×N is the correlation matrix between X and Y, and Sij is the elements in matrix S. The optimal objective function of coupled metric learning is as follows:where xiX and yjY.

Assume that f (⋅) and are linear parameterized functions, which are defined as , .

Let and , (7) is changed to

After simplification, (8) can be rewritten aswhere Sx and Sy are diagonal matrices, the diagonal elements in Sx are the cumulative sum of the corresponding row of matrix S, and the diagonal elements in Sy are the cumulative sum of the corresponding column of matrix S.

2.3. Nonlinear Coupled Metric Learning (NCML)

Assume that f (⋅) and are nonlinear functions; that is, f(xi) = ϕx(xi), . The samples x1, x2, …, xN and y1, y2, …, yN will be transformed to a high-dimensional feature space through the mapping function ϕ: RnF. We can get the feature representation mode ϕx(x1), ϕx(x2), …, ϕx(xN), and ϕy(y1), ϕy(y2), …, ϕy(yN); then (7) can be rewritten as

Let Φx(x) = [ϕx(x1), ϕx(x2), …, ϕx(xN)] and Φy(y) = [ϕy(y1), ϕy(y2), …, ϕy(yN)]. Based on the kernel space theory, the matrix can be represented by Φx(x) or Φy(y); that is, or , where α and β are the coefficient matrices. Then, (10) can be rewritten as follows:where Hx and Hy are diagonal matrices, and the diagonal elements in these two matrices are cumulative sum of the corresponding row of matrix S and the cumulative sum of the corresponding column of matrix S. Constructing the inner product relationship by kernel function, (11) is changed towhere matrices Kx and Ky based on the Gaussian kernel are defined as follows:

3. The Multisupervised Coupled Metric Learning Method

3.1. The Class Relation Matrix and Distribution Relation Matrix

Through the analysis in Section 2, it is known that the matrix S reflects the correlativity between sets X and Y, which plays a decisive role in metric learning. In this work, we fuse category label and distribution information of samples to construct a novel correlation matrix S, where the category label supervises the distance metric between two datasets and the distribution information supervises the distance metric inside the same dataset. The correlation matrices are constructed based on the category label and distribution information, respectively.

3.1.1. Category Correlation Matrix

Supposing that sample xiX, its class label is . Sample yjY and its class label is . The category correlation matrix is C; the element in this matrix is defined as follows:

3.1.2. Distribution Correlation Matrix

Assuming that in set X, sample xi is the k-nearest neighbor of sample xj, we connect samples xj and xi to each other. Similarly, the same operation is also implemented in set Y. Let Tx and Ty be the distribution correlation matrix for set X and set Y, respectively, and the elements in Tx and Ty can be calculated as follows:where the parameter t is the average value of distance between all samples.

3.2. The Linear Multisupervised Coupled Metric Learning (LMS-CML)

In the linear situation, in order to realize the joint supervision of category label and distribution information for coupled metric learning, we need to fuse the above correlation matrices. Therefore, two coupled metric objective functions need to be constructed: (1) category-label-based coupled metric objective function and (2) distribution-information-based coupled metric objective function. Then, the fusion of multisupervised information is realized through the fusion of these two objective functions.

To achieve the first objective function, combining (9) and (14), we construct the linear optimal objective function based on the category correlation matrix C:where and are the coupled transformation matrices; the diagonal matrices Cx and Cy are similar to Sx and Sy in (9).

To achieve the second objective function, combining (9), (15), and (16), we construct the auxiliary optimal objective function for set X and set Y in the linear case, as follows:where Txx and Tyy are diagonal matrices, and the diagonal elements of these two matrices are the cumulative sum of corresponding row of matrix Tx and the cumulative sum of corresponding column of matrix Ty.

Then, we combine (17), (18), and (19) to obtain the linear general objective function:where Cx + 2γ(TxxTx) and Cy + 2η(TyyTy) are the novel correlation matrices fused category and distribution information. The coefficients γ and η are within [0-1], which are used to control the strength of distribution information to supervise coupled metric learning.

Let , , .

The linear objective function (20) is converted into

The solution of coupled transformation matrix A is equivalent to the calculation of the following generalized eigenvalue equation:

After obtaining matrix A, the transformation matrix consists of the 1st to Dx-th rows of matrix A; its size is Dx × Dc, where Dc is the dimension of coupled space. The transformation matrix consists of the (Dx + 1)-th to (Dx + Dy)-th rows of matrix A and its size is Dy × Dc.

3.3. The Nonlinear Multisupervised Coupled Metric Learning (NMS-CML)

To implement the coupled metric of nonlinear data, we exploit the kernel technique to improve the linear general objective function. The nonlinear optimal objective function is constructed based on the category correlation matrix C and the distribution correlation matrices Tx and Ty.

Firstly, we combine (12) and (14) to get the nonlinear main objective function:where α and β are the coupled transformation matrices; the diagonal matrices Hx and Hy are similar to Hx and Hy in (12).

Then, combining (12), (15), and (16), the nonlinear auxiliary optimal objective function is constructed as follows:where Txx and Tyy are diagonal matrices, and the diagonal elements of these two matrices are the cumulative sum of the corresponding row of matrix Tx and the cumulative sum of the corresponding column of matrix Ty.

Finally, combining (23), (24), and (25), we get the nonlinear general objective equation:where Hx + 2γ(TxxTx) and Hy + 2η(TyyTy) are the correlation matrices fusing category and distribution information in nonlinear situation. The coefficients γ and η control the supervised intensity of the distribution information.

Let , , .

The solution of optimal objective function is transformed into the solution of the generalized eigenvalue problem:where λ is eigenvalue and is the corresponding eigenvector. The matrix is constructed by the eigenvectors corresponding to the minimum eigenvalue, the Dc-th eigenvalue. According to the definition of W = [α β]T, we get and .

3.4. Complete Steps of Algorithm Implementation

Assume that there are two training datasets X and Y. The elements in set X are high-dimensional data corresponding to clear images; the elements in set Y are low-dimensional data corresponding to low-resolution images. Additionally, there are some low-dimensional data as test samples. The implementation steps of the proposed method are as follows.

3.4.1. Training Process

Step 1. Based on the supervision of category label, establish the connection relations between set X and set Y; then according to (14), the category relation matrix C is calculated.Step 2. Establish local neighbor relation inside set X and set Y, respectively, and then the corresponding distribution correlation matrices Tx and Ty are computed according to (15) and (16).Step 3. Construct main objective function JL in (17) or JN in (23), where JL is the linear objective function and JN is the nonlinear objective function.Step 4. Construct the linear auxiliary objective functions JLX and JLY in (18) and (19); moreover, construct the nonlinear auxiliary objective functions JNX and JNY in (24) and (25).Step 5. According to (20) and (26), the general objective functions in linear and nonlinear cases are constructed, respectively.Step 6. Solve the optimal objective functions. The linear optimal function can be solved according to (22), and then the nonlinear optimal function can be solved according to (27).

Finally, we obtain the coupled metric matrix or , and the features of training samples in set X are calculated, namely, or xif = αTxi, where xiX, i = 1, 2, …, N.

3.4.2. Testing Process

Step 1. Test samples are indentified. Let IT denote one low-resolution test image. The coupled metric matrix or β is used to extract the feature of test image; namely, or If = βTIT.Step 2. After coupled mapping, the features of training sample and test sample have the same dimension, so the image matching is converted to the comparison of the features xif and If.

Figure 2 illustrates the complete steps of the proposed method.

4. Experiment and Analysis

In this section, we evaluated the linear and nonlinear multisupervised coupled metric methods on recognition tasks involving inconsistent matching issues. The face datasets and experimental settings were briefly described in Section 4.1 . The linear and nonlinear coupled metric experiments were implemented in Sections 4.2 and 4.3. Face matching experiments with different resolutions were implemented in Section 4.4. Finally, Section 4.5 shows comparative experiments between our method and other methods. Detailed experiments and results are described as follows.

4.1. Datasets and Experimental Settings

To prove the validity of the proposed coupled metric method, the Yale-B [20], ORL [21], and UMIST [22] face datasets were used for the low-resolution image matching experiments. Table 1 briefly shows the situation of these three face datasets.

In the Yale-B face dataset, each person’s images are divided into 5 subsets based on the illumination changes. So, we randomly selected half number of images from the 5 subsets with illumination changes as training samples, a total of 2880 faces. The rest of the face images were used as the normal test samples.

In the ORL face dataset, we adopted 5 images of each volunteer as training samples, a total of 200 faces. The remaining 200 faces were used as normal test samples.

In the UMIST face dataset, we randomly selected 18 faces of each person as training samples, a total of 360 faces. The other faces were used as normal test samples.

Images in the above three face datasets are all high-quality samples, and there are not low-resolution faces. To verify the effect of our proposed method in low-resolution image matching, we need to artificially produce some low-quality images. In the experiment, we use the blurring and undersampling methods to produce low-resolution face images. For example, the resolution of clear face images is normalized to 64 × 64, and the resolution of corresponding low-quality face is 16 × 16. So, the training dataset includes two sample sets; namely, the high-quality images form the set X and low-quality images form the set Y. Then, these two sets are used to learn the coupled transformation matrices. In testing process, the test images are also low-resolution images, which can be produced by blurring or undersampling from normal test face images.

In Sections 4.2, 4.3, and 4.5, the resolution of the degraded face image used in the training and testing process is set to 16 × 16. We repeated our experiments 10 times and took the average as the final matching results.

4.2. Linear Multisupervised Coupled Metric Experiments

In the proposed LMS-CML, there are three influencing factors: : (1) the number k of nearest neighbors in the calculation of distribution correlation matrix, (2) the dimension of preserving features in solving generalized eigenvalue problems, and (3) coefficients γ and η in general objective function. Therefore, in the following experiments, we will discuss and analyze the impact of these factors on image recognition performance.

4.2.1. Experiment 1

Firstly, we analyze the joint influence of the number k of nearest neighbors and dimension of preserving features. Figure 3 illustrates the experiment results.

It can be seen that with the dimension increasing of preserving features, the recognition rate slowly decreases after a rapid rise. In the initial stage, the feature dimension increases; the effective classification features are more reserved, so the recognition rate rises rapidly. Then, when the feature dimension reaches the optimal value, the best recognition effect can be achieved. After that, the feature dimension continues to increase; the unnecessary interference factors are introduced in the classification feature, so the recognition rate will slowly decrease.

In Yale-B face dataset, when the dimension of preserved features is 15 and the number of nearest neighbors is 5, the highest recognition accuracy reaches 89.06%. In ORL face dataset, the dimension of reserved features is 35; the recognition rate is relatively high. When the number of neighbors is 7, the optimal recognition rate is 93%. In UMIST face dataset, due to the great pose variation, it is necessary to reserve more features to obtain a better recognition effect. When the dimension of reserved features is 20 and the number of nearest neighbors is 9, the optimal recognition rate reaches 92.16%. Based on the above experiments, we see that the number of nearest neighbors has some influence on the recognition accuracy, but it does not change the overall trend of recognition rate curves.

4.2.2. Experiment 2

In the linear general objective function, the coefficients γ and η control the strength of the supervision of distribution information. So in experiment 2, we analyzed their impact on the recognition effect. Figure 4 shows the experiment results.

Obviously, in different face datasets, the recognition results reach the optimal value when parameters γ and η take different values, but the variation curves of recognition results have a stronger overall regularity. The optimal value of parameter γ is within [0.6–0.9], and the optimal value of parameter η is within [0.3–0.5]. It can be seen that the impact of distribution information of high-resolution faces is stronger than that of low-resolution faces.

Additionally, we can see that the coefficients η = 0 and γ = 0; that is, the distribution information is not used as supervision and only the category label is used as supervision; the recognition effect is not the worst, but it is not optimal. It indicates that the distribution information of samples as supervision effectively improves the performance of coupled metric method.

With the increasing of these two parameters, the recognition performance of LMS-CML decreases slightly. There are two reasons for this variation: (1) When the distribution information of samples is considered more, the main supervisory role of category label is weakened. (2) In construction of distribution correlation matrix, the neighbor relation of samples with the same class is easily interfered by samples of other categories. Therefore, with the increasing of η and γ, the proportion of auxiliary function in the general objective function increases, which enlarges the interference intensity and affects the recognition effect.

4.3. Nonlinear Multisupervised Coupled Metric Experiments

Different from the linear coupled metric learning, there are four influencing factors in the nonlinear coupled metric learning method; in addition to the three influencing factors as described in Section 4.2, the adjustable factor σ of Gauss kernel function is the fourth influencing factor. In experiment 1 of this section, we mainly analyzed the joint influence of adjustable factor σ and dimension of reserved features. The influence of the coefficients γ and η in the general objective function is discussed in experiment 2. In addition, the experiment results show that the number of nearest neighbors in the nonlinear case is the same as that in the linear case.

4.3.1. Experiment 1

In the following experiment, we analyzed the impact on recognition rate based on different adjustable factor σ and varied dimensions of reserved features. The results are shown in Figure 5.

Figure 5 shows the average recognition rates of NMS-CML method on different datasets. Comparing with the experiment results in Section 4.2.1, we can see that the recognition effect is obviously improved in the nonlinear case. Furthermore, to achieve an optimal recognition rate, the dimension of reserved features in the nonlinear case is higher than that in the linear case. When the adjustable factor σ is near the optimal value, the recognition rate has an improvement dramatically, which indicates that the adjustable factor has a greater impact on the recognition performance.

The curves indicate that the dimension of features is 25 and σ= 0.5; the optimal face recognition rate of 91.15% is obtained in Yale-B face dataset. When the dimension of features is 30 and σ= 0.3, the best recognition rate is 97% in ORL face dataset. Then, in UMIST face dataset, the dimension of features is 25 and σ= 0.7; the best recognition rate is 95.59%. These results illustrate that the NMS-CML can better extract the low-resolution and pose-varied face features, but the improvement in recognition effect of low-resolution images with severe illumination is limited.

4.3.2. Experiment 2

We experimented and discussed the effects of parameters γ and η in the nonlinear general objective function. The conclusion is that the supervised effect of the distribution relationship between high-resolution images is stronger than that between low-resolution images. The optimal values of γ and η are 0.65 and 0.3, respectively, in Yale-B face dataset. The optimal values of γ and η are 0.7 and 0.3 in ORL face dataset. The best values of γ and η are 0.7 and 0.45 in UMIST face dataset. In general, the category label has the strongest supervision effect in MS-CML, and the distribution information of samples in high-resolution image set has stronger supervision than that of low-resolution image set.

4.4. Matching Experiments of Images with Different Resolutions

In the above experiments, the resolution of high-quality face images is set to 64 × 64 pixels, and the resolution of low-resolution face images is set to 16 × 16 pixels. To better illustrate the effectiveness of MS-CML method, in this section, we performed coupled metric experiments of low-quality face images with different resolutions. The resolution of low-quality face images is set to 32 × 32, 16 × 16, and 8 × 8 pixels. Figure 6 shows some faces with different resolutions, and the experiment results are summarized in Table 2.

The experimental results show that the resolution of low-quality faces is set to different values; the recognition performance of MS-CML is affected to some extent. When the resolution of low-quality face image is set to 8 × 8 pixels, the face image contains less effective classification information; thus, the recognition rate drops more. In addition, due to strong illumination interference in Yale-B face dataset and obvious pose variation in UMIST face dataset, the effective classification information further reduced, and the recognition rate decreased more than that of ORL face dataset. In addition, the nonlinear coupled metric has a stronger feature extraction capability, so the LMS-CML has a much lower recognition rate than NMS-CML.

4.5. Comparison Experiments with Other Methods

In order to evaluate the performance of proposed coupled metric methods, based on the methods of feature extraction [2326] after the image reconstruction [27, 28], the methods in [9, 11, 1315], we make the comparison experiments, respectively. The experiment results are shown in Table 3.

We can see that the recognition effect of feature extraction after image reconstruction is not ideal. The main reason is that some false information is introduced in the image reconstruction, which is not obvious for the improvement of recognition effect, and even interferes with the recognition rate. In addition, the same person’s face images have large illumination and pose changes; this intraclass multimodal problem will affect the metric learning and final classification results. The methods in [9, 14] are distance metric learning only based on category label information, which cannot overcome the intraclass multimodal problem, so the recognition rate is lower. The distance metric methods in [11, 13] are based on locality preserving relation in the same class, which are advantageous to solve the intraclass multimodal problem, so comparing with CML and KCML, the recognition effect is improved. The CMDM [15] combines discriminant information with data distribution of local neighbors, so the recognition effect is similar to the proposed LMS-CML method. However, the CMDM does not deal with the nonlinear data, so the final recognition effect is not as good as our proposed NMS-CML method.

Through comparison experiments, we get two conclusions. (1) The fusion of category label and distribution information can better supervise the learning of the metric matrix. (2) The nonlinear coupled transformation can extract the nonlinear essential features more effectively. So, in this paper, we propose the linear and nonlinear MS-CML method, which make full use of the supervision of category label and distribution information. Finally, we get the excellent recognition result in face recognition with low resolution and pose variations.

5. Conclusions

In this paper, a multisupervised coupled metric learning method was proposed to obtain the coupled mapping matrix, which can be used for low-resolution image matching. In the metric learning, we constructed the linear and nonlinear coupled metric learning objective functions, respectively. Compared with linear metric learning, the mapping matrix obtained by nonlinear metric learning can better improve the accuracy of low-resolution image matching. In addition, experimental results prove that the proposed MS-CML can overcome the intraclass multimodal problem of existing metric methods and effectively improve the matching accuracy of low-resolution images under the condition of small samples.

Data Availability

The experimental data of face database setting and recognition rate used to support the findings of this study are included within the article. The download address of the three face databases has been added to the references; you can download it by yourself, if necessary.

Conflicts of Interest

The authors declare that they have no conflicts of interest regarding the publication of this paper.

Acknowledgments

This study was funded by the Visiting Project Funds of Shandong University of Technology (Grant no. 20180831); Integration Funds of Shandong University of Technology and Zhangdian District (Grant no. 118228); and the Natural Science Foundation of China (Grant no. 61601266).