Abstract

This paper enhances the recognition capabilities of the facial component-based techniques using the concepts of better Viola–Jones component detection and weighting facial components. Our method starts with enhanced Viola–Jones face component detection and cropping. The facial components are detected and cropped accurately during all pose-changing circumstances. The cropped components are represented by the histogram of oriented gradients (HOG). The weight of each component was determined using a validation process. Combining these weights was done by a simple voting technique. Three public databases were used: the AT&T database, the PUT database, and the AR database. Several improvements are observed using the weighted voting recognition method presented in this paper.

1. Introduction

Face recognition is a very important application of pattern recognition at which a database is used to train a classifier that tries to identify each person in it. A handful of studies concerning the face recognition problem were surveyed in [1]. Studies in cognitive science have found that local and global features can be used for face recognition [28]. There is enough evidence to prove that all of the holistic, configurable, and facial component information exist in the human face perception [215]. Additional studies in humans have concluded that some facial components are more important and useful for recognizing faces than other components. For example, the upper face is more important than the lower face [13, 16]. Researchers have approached face recognition through two methods: component-based and global-based face recognition.

1.1. Component-Based Face Recognition

This method relies on training multiple models depending on the number of components representing an image. This technique in face recognition has not been researched intensively in comparison to the global-based technique. Therefore, they are limited in their approach [17]. Most of them use raw-pixel representation and that’s what makes them less robust. Several other component-based face recognition methods have been discussed in [4, 12, 16]. The facial components used for recognition in this paper are the eye pair, the nose, and the mouth. The Viola–Jones object detection framework [18] was used to crop the facial components.

1.2. Global-Based Face Recognition

On the contrary, to the component-based concept, the global method of face recognition relies on a single array to represent a face. A comparison between the best technique in the global-based face recognition such as eigenfaces, Fisher’s discriminant analysis, and kernel PCA can be found in [19, 20]. The global-based face recognition techniques have a weakness against pose changes. This technique had to include a face alignment algorithm phase or be developed to meet the standards of a component-based recognition technique [21].

The remainder of this paper is organized as follows: Section 2 explains the methods we used for component detection and cropping. The HOG features are explained in Section 3. Section 4 presents the results summarized and compared.

1.3. 3D Face Recognition

3D facial surface is [22] encoded into an indexed collection of radial strings emanating from the nose tip. Then, a partial matching mechanism effectively eliminated the occluding parts. Facial curves can express the deformation of the region which contains the facial curve used for detecting occluded facial areas. In [23], a novel automatic method for facial landmark localization relying on geometrical properties of the 3D facial surface working both on complete faces displaying different emotions and in presence of occlusions.

2. Component Detection and Cropping

The detection functionality is a vital process in our face recognition method. Components help to collect unique data for every person in the database. Two ways of component detection are used: Viola–Jones object detection framework [18] with geometrical approaches and Landmark detection using face alignment with an ensemble of regression trees [24]. Both facial component detection methods are used to achieve the detection of the facial components in all circumstances (changes in illumination and pose). Accurate component cropping leads to better features. The more the crop is specific to the facial component, the less the useless information is included in the representation, and therefore unique data will participate in the learning process.

2.1. Viola–Jones Object Detection Framework

Viola–Jones object detection framework is used to train a model that detects the facial components (eye pair, nose, and the mouth) needed for the recognition process. It consists of the following parts that are explained in detail in [18]: the Haar-Like features, integral image, weak classifiers and strong classifiers, AdaBoost, and the cascades.

2.2. Enhancing Viola–Jones with Geometrical Approaches

Viola–Jones is a robust object detection system. However, trained models may suffer miss detections or failures in detecting the objects. Our recognition method relies on the accurate detection of the three components (The eye pair, the nose, and the mouth). Miss detections cannot be tolerated when it comes to detecting the facial components. The component-based face recognition system needs the components to be cropped and represented accurately. The miss detection may lead to the representation of useless data (as shown in Figure 1) in the learning process and that yields a lower recognition success rate. The eye pair component is the most crucial part of the three extracted components. The eye pair carries the major unique information about a person’s face. It is also the reference object used in this algorithm to detect and crop the rest of the facial component. An eye pair-location prediction model is trained to estimate where the eye pair might be found in a face. In Figure 2, some cases where the eye pair was not found are demonstrated along with the detection result after the proposed solution. The nose and the mouth object detector might not find the component because the search area did not include the whole object, or a multiobject is detected in the search area. If the object was not detected, then the search area is expanded gradually until an object is found. The multiobject detection framework happened in the mouth area and was solved by picking the object with the maximum y coordinate.

2.3. The Area Selection Process

The concept of the geometrical approaches is to concentrate the search for the components in the right areas. For example, the nose cannot be above the eye pair; it is located somewhere beneath the eye pair. The same concept is applied to the mouth; it has to be under the nose and the eye pair. Geometrical approaches aim to narrow search areas to where the nose and the mouth may occur [25]. The area selection algorithm (Figure 3) consists of the following steps:(a)The face is the first component to look for.(b)Eye pair detection in the cropped face image.(c)The area under the eye pair within the cropped face image will be the search area for the nose.(d)Specific area is used to detect the mouth (Figure 3). In case of multiple mouth detection, the object with the more significant y-axis value (the lowest object) is chosen to be used as the mouth component.

Several problems face the usage of Viola–Jones object detection framework for component detection. They are as follows:(1)Failure to detect the eye pair.(2)Failure to detect the nose.(3)Detection of multiple false mouths.

Figure 4 shows the miss detection problems and the solution of our area selection algorithm.

Figures 5 and 6 shows the miss detections and the solution of our area selection algorithm.

3. Features

Pixel patches extracted from facial images are often too large and cannot help building a robust classifier [24]. They are converted into a vector of features. A feature descriptor is an array of data that describes an image or a part of an image. It helps provide unique information about the image. It can support the recognition application for the object in that image. In this paper, we have used the histogram of oriented gradients (HOG) features [26].

3.1. Hog Features

Histogram of oriented gradients (HOG) is a feature descriptor that uses oriented gradient information [26]. The steps for calculating HOG are described as follows:(1)For each pixel I (x, y), the horizontal and vertical gradient values are obtained as follows:(i)For example,

(ii)The gradient magnitude m and orientation θ are computed by(iii)The histogram is constructed based on the magnitudes accumulated by orientation.

The image is divided into several small spatial regions (cells) for each of which, a local histogram of the gradient orientations is calculated by accumulating votes into bins for each orientation. The best performance is achieved when the gradient orientation is quantized into 9 bins (0–180). On the contrary, the vote is weighted by the gradient magnitude allowing the histogram to take into consideration the importance of gradient at a given pixel. Finally, the HOG descriptor is obtained by concatenating all local histograms in a single vector.

However, it is necessary to normalize cell histograms due to the fact that the gradient can be affected by illumination variations. Figure 7 shows an example of obtaining the HOG feature vector.

4. Experimental Results

4.1. Face Databases Setups

Three databases were studied in this paper. They have been picked to test the recognition accuracy against low-resolution, missing components, and pose change circumstances. We have used the PUT [27], the AT&T [28], and the AR databases [29]. The PUT database consists of 50 people: each one has 22 colored facial images with different poses and different illumination conditions. The AT&T database consists of images of 40 persons. Each person has ten different facial images. The AR database consists of 50 persons. Each person has 26 different colored facial images. Table 1 shows the different random training sets (k-flops). For example, for the PUT database, for k = 2, we took 11 out of the 22 as training and 11 for testing. Images with a missing component shall substitute that particular missing component with components detected within its learning/testing set as shown in Figure 8.

The HOG features are calculated on the batch basis for each image. A batch is a part of an image cropped out to seek for its useful information, for example, the eye pair, the nose, and the mouth. The HOG features can be calculated for patches with different aspect ratios. To make the best use of these features, we have to maintain a fixed aspect ratio for all the patches within a single database. A ratio of 1 : 4, 1 : 1 and 1 : 2 was chosen for the eye pair, the nose, and the mouth, respectively (Figure 9).

4.2. The Validation Process

The purpose of this process is to figure out which model performs best for the certain database to calculate its priority. The better the score of the particular component is, the higher its priority is.

We have divided our training sets into 2 sets: training (75%) and validation (25%).

This technique uses the validation results to assign weights to each component. The higher the weight assigned to a certain component, the heavier the impact it has on the final classification result. The process is demonstrated in Figure 10.

4.3. Results

The results for the three databases are shown in the following subsections.

4.3.1. The PUT Database Recognition Results

Using our validation process, Table 2 shows the priority of each component for the PUT database. Combining these priorities with a voting technique reached a 100% accuracy success rate for k = 5 (Table 3).

4.3.2. The AT&T Recognition Results

Table 4 shows the priority of each component for the AT&T database. The voting recognition success rate reached 96% accuracy success rate for k = 5 (Table 5).

4.3.3. The AR Database Recognition Results

Table 6 shows the priority of each component in the AR database. The voting criteria improved the recognition success range from 73% to 87% for k = 2 and from 84% to 94% for k = 5 (Table 7).

4.3.4. Summary of Results

Three public databases were used:AT&T with 40 subjects and 400 images.PUT database with 50 subjects and 1100 images.AR database with 50 subjects and 1300 images.

Our method has the following advantages:(i)Excellent accuracy in detecting facial components during all pose-changing circumstances.(ii)Improved recognition accuracy by combining multiple classifications using majority voting.

5. Conclusion

Enhancing the recognition capabilities of the facial component-based techniques was the objective of this paper. This was done by using the concepts of better Viola–Jones component detection and weighting facial components. Each component was given a certain weight using a validation process. We used a voting technique which incorporates all of these weights. The component-weighted technique supplied the opportunity to involve multiple features into the success rate, granting that chance to use a particular feature’s strength to suppress other feature’s weakness. The improvement of the weighted voting method is demonstrated for the databases that we have used. The voting technique has boosted the recognition success rate. The boost in the success rate within the voting technique distributes the weight importance among the facial components without settling for one major facial component.

Data Availability

The data used to support the findings of this study are available from the corresponding author upon request.

Conflicts of Interest

The authors declare that they have no conflicts of interest.