Abstract

To discover the influence of the commercial videos’ low-level features on the popularity of the videos, the feature selection method should be used to get the video features influencing the videos’ evaluation mostly after analyzing the source data and the audiences’ evaluations of the videos. After extracting the low-level features of the videos, this paper improved the Correlation-Based Feature Selection (CFS) method which is widely used and proposed an algorithm named CFS-Spearmen which combined the Spearmen correlation coefficient and the classical CFS to select features. The 4 datasets in UCI machine learning database were employed as the experiment data. The experiment results were compared with the results using traditional CFS, Minimum Redundancy and Maximum Relevance (mRMR). The SVM was used to test the method in this paper. Finally, the proposed method was used in commercial videos’ feature selection and the most influential feature set was obtained.

1. Introduction

As a major kind of commercial multimedia, commercial videos’ popularity is concerned by related companies and producers. There are many factors which influenced the audiences’ evaluation to the commercial videos, such as the actor/actress, director, and story. But it is too subjective if these factors are employed to evaluate the videos. It is because that these factors are not quantitative in the evaluation procedure. So, the importance of design a model to analysis the videos and calculate the evaluation of them should be realized by the researchers [1]. The article [2] proposed an objective video quality evaluation method based on motion and disparity information. The article [3] presented a video quality evaluation method based on Quaternion Singular Value Decomposition. But only a few video features were used in [2, 3]. The video features should include many features such as color features, motion features, and shot features which might be very likely to influence the audiences’ evaluation of the videos. However, the most influential feature set and the relationships between them and the video popularity are still not clear yet. For all the features extracted, we can select the most influential feature set through the feature selection methods according the commercial videos’ evaluation data. Of cause, the feature dimension reduction methods could also be used here.

Different feature set might be selected by different feature selection or dimension reduction methods. The feature dimension reduction algorithms, such as PCA [4, 5], FDA [6], and KPCA [7, 8], would reduce the feature number, while feature selection algorithms would select the optimal feature set from all the original features. Compared with the dimension reduction algorithms, the feature selection maintains the physical meanings of the original features and this is more convenient for data and relationship between features and videos’ popularity analysis [9].

The main idea of feature selection is selecting a few valuable features and removing the useless ones from all the features extracted. The methods, such as embedded methods, Relief [10], mRMR [11], and CFS [12, 13], are widely used now.

For the embedded methods, sparsity regularized feature selection methods were also widely used in some research area such as appearance modeling in visual tracking [1416]. In these methods, -norm [17] (being called Lasso, presented by Tibshirani in 1996) and -norm based regularization models have been researched for selecting features with joint sparsity across different tasks [18]. These methods select features by adding a penalty term with weight to the objective function of the machine learning model, which restricts the weight of each feature of the model. Feature selection is done in the process of model training, and those features whose coefficients are trained to zero are considered as redundant features. Recently, -norm and -norm model based regularization methods have attracted more and more attention because they can obtain sparser solutions than the methods based on 1-norm and 2,1-norm [19]. The algorithms typically take a trade-off between a data-fitting loss function term and a sparsity term; therefore there inevitably exists residual in the loss function. However, little is known about such a residual’s impact on the feature selection [20].

For other methods, only the correlation between feature and class was concerned in Relief algorithm but not the correlation between features. So, the selected feature set was not the optimal one [21]. For mRMR algorithm, both the correlation between features and feature-class were concerned to get the best feature set [22]. For the classifier, all the contributions of the features selected by mRMR were same, and the feature set was selected from the original features. The main idea of CFS is selecting the feature set with lower correlation between features and higher correlation between feature and class. After this procedure, the redundant features and the features which were not closely related to the class would be removed. The Pearson linear correlation coefficient was used in [21, 22] by Huanjing Wang and Ningqing Sun to calculate the correlation between features and feature-class. But only linear correlation coefficient could be measured in Pearson formula. For continuous data, discretization methods or kernel density estimation methods should be employed to solve the problem. This procedure would lead to probability estimation error.

So, the effectiveness of the correlation calculation is the key of successfulness of CFS. Currently, some correlation calculation methods were concerned by many researchers, and we should select the best one according to experiment results. Except Pearson Coefficient, some other correlation calculation methods are being used now. Spearman Rank Correlation Coefficient was used in the article of Marie Therese Puth [23] to descript the correlation of two vectors. The experiment result showed that the Spearman Rank Correlation Coefficient is better than Pearson Coefficient. In the research of Xiaoyuan Xu [24], the Spearman Rank Correlation Coefficient was employed to descript the correlation between features of wind speed. In the article of Jing Feng [25], a nonparametric method based on Spearman Rank Correlation Coefficient was proposed to measure the principle of storage degradation failure.

Spearman Rank Correlation Coefficient was not widely used in feature selection yet. In this paper, after extracting the commercial videos low-level features including color features, motion features, and shot features, a CFS-Spearman algorithm is presented and used to process the four datasets, including ‘Cancer’, ‘Glass’, ‘Bank’, and ‘Credit’, in the UCI machine learning database. The experiment results are compared with CFS and mRMR. The LibSVM classifier is utilized to test the effectiveness of CFS-Spearman. Then, the method in this paper is employed to select the low-level features of the commercial videos to predict the popularity of them. The results showed that the proposed method was better than CFS, mRMR, and p-norm based sparsity regularized feature selection.

2. Video Low-Level Features Extraction

2.1. Color Features

Color is an important feature of vision. The color feature set is combined with 10 features including means and variances of Brightness, Contrast, Saturation, Colorfulness, and Simplicity. Brightness is calculated by average the brightness of every pixel in every frame in HSV space. It is similar as Saturation calculation procedure. The Contrast could be expressed by the following formula: in which, r, g, and b represent red, green, and blue value of a pixel. Var is the variance calculation function [26].

Colorfulness is a parameter reflecting the combination of image’s color. It is defined asin which Mean is the function to get the average the input value [27].

To attract the audiences’ attention in the progress of movie making, the director and the cameraman always make the scenes simpler than the objects. The Simplicity is used to measure this character in some article. It is defined in [28] and the final Simplicity is the mean value of every frame.

2.2. Motion Features

Motion features reflect the changing rate of the scene or objects in the videos. It could be regarded as the moving speed between camera and the objects while shooting. In this article, the motion features are calculated as follows. Firstly, every frame is separated into blocks. The pixel barycenter of every block is get, and then, the frame n and the frame n+1 are compared to obtain the barycenter changing rate of the corresponding blocks in the two neighbor frames. The motion feature mean is the mean value of pixel barycenter coordinate changing and the motion feature variance is the variance of it [29].

2.3. Shot Features

Shot features are also important for video evaluation. To get every shot in a video, the key frames, which are located at the edge of the shots, of the video should be selected firstly [30]. We compare the color histogram of every two neighbor frames to calculate the similarity of them. After key frame selection, the four features, “Shot length mean”, “Shot number”, “Shot length variance”, and “Video length”, are calculated.

Then, the 16 features, including “Brightness mean”, “Contrast mean”, “Saturation mean”, “Colorfulness mean”, “Simplicity mean”, “Brightness variance”, “Contrast variance”, “Saturation variance”, “Colorfulness variance”, “Simplicity variance”, “Motion mean”, “Motion variance”, “Trailer length”, “Shot number”, “Shot length mean”, and “Shot length variance”, are get as shown in Figure 1.

3. Feature Selection Suing CFS

When the features are extracted, the relationship between features and video evaluation is still not so clear. It is because that some features influence video viewers’ evaluation but others do not. We should select the most influential feature set of the videos’ evaluation. Some feature selection methods might be introduced here.

3.1. mRMR

mRMR is a typical feature dimension reduction method which use mutual information to measure the correlation between two features and feature-class. The formulas are listed as follows: in which S is the feature set, is the feature number, c is the class index, is the mutual information between the feature and class c, and is the mutual information between the feature and the feature.

The mutual information between x and y here is defined as Then, we get the criterion of feature selection as follows:

3.2. CFS Algorithm

CFS is a simple feature selection method. It calculates the correlation value between every two features and feature-class to select the features related to classes most closely. As shown in (9) and (10), this method tries to maximize the Ms. In the formulas above, is a measure of feature set with k features. is the average correlation calculation method of x and y which are all features or feature and class. N is the number of samples. According to formula (10), in the feature set S, the value of will be bigger if the average correlation between features is minor and the average correlation between features and class is bigger. Then, the feature set is an optimized one. This algorithm is called Pearson Correlation Coefficient.

3.3. Spearman Rank Correlation Coefficient

Pearson Correlation Coefficient was employed in traditional CFS. But there are some other correlation calculation methods which could be tested. Spearman Rank Correlation Coefficient is one of them. It could be defined as follows:In the formula above, we firstly define the random variable X and Y with N samples as . Then, let and be the ranks of and in the corresponding sample. and are the average ranks of the sample. The Spearman Rank Correlation Coefficient described the monotonic dependence of variables X and Y. The monotonic direction could be recognized by the sign of . When Y creases with X creasing, the sign of is positive, and it is negative conversely. Y will not change with X while the sign of is 0.

The linear correlation coefficient is a widely used correlation measurement and it is easy to be calculated. When the random variables are elliptical distribution, the linear correlation coefficient could express the correlation between the variables. But the short points of it is that it is nonexistent when the variables’ first- and second-order moments could not be get, its value should change when the variables distribution function changed, and after nonlinear strictly increasing, the linear relationships between variables would change [24]. It is most important that the relationship could not be expressed accurately while the variables do not distribute as elliptical distribution.

The Spearman Rank Correlation Coefficient is a nonparametric statistic method. Let the rank correlation coefficient of the two variables X and Y be , thenin which and are the cumulative probability of and , respectively.

The two random variables’ rank correlation coefficient is the linear correlation coefficient of the cumulative probability distribution function expressed as . If the inverse function of it exists, the variable distributed as uniform distribution in because of the following formula:So the rank correlation coefficient is just an expression of relationship after transformation from the original variables to the variables in uniform distribution. Compared with linear correlation coefficient, the rank correlation coefficient’s advantages are as follows: always exist; does not change with edge distribution; does not change after strict nonlinear transform. So, it is chosen in this paper to measure the relationships between feature and feature or feature and class.

4. Experiment Analyzing for Feature Selection

To prove the effectiveness of CFS-Spearman feature selection method in this paper, the 4 datasets, including “Breast-cancer”, “Glass”, “Bank”, and “Credit” in the UCI machine learning database, are used as the experiment data. The detailed information of the 4 datasets is listed in the Table 1. In the experiment procedure, CFS-Spearman feature selection method was employed to select the most important features and the SVM classification was used to test the selected feature sets. In the experiment, 1/10 of all the samples were selected randomly as the testing data and the others were used as the training data. This classification procedure was repeated for 10 times for every dataset. The mean values of the correctly classification rates were showed in the tables and figures to prove the effectiveness of the method in this paper.

The experiment results were got by Matlab and LibSVM tool box. The results of CFS-Spearman feature selection were also compared with the original CFS, mRMR, and p-norm based sparsity regularized feature selection method as shown in the tables. In Tables 3 and 8, the weighted fonts are the correctly classification rate of CFS-Spearman higher than or equal to CFS, mRMR, and p-norm. For p-norm based sparsity regularized feature selection, we obtained the different settings of in every experiment. This parameter was selected by the classification results using SVM.

Table 2 showed the 8 features and classes of dataset “Breast-cancer”. As shown in Table 3, the CFS-Spearman feature selection method, original CFS, mRMR, and p-norm (p=0.9) were used to select the features in this dataset. From the results, it is obvious that the selected features were different in most feature sets. But if using features selected by CFS-Spearman, the correct classification rates were higher than or equal to other methods in all cases.

In the second experiment, the dataset is “Glass”. We can estimate whether a piece of glass is “float glass” or not according to the chemical elements in it. These elements were listed in Table 4. Same as the second experiment, the third one is about one person might be a member of bank or not. The 9 features, like “Age”, “Living area”, “Income”, and so on, were listed in Table 5. The last experiment was about “Credit degree”. There were 14 features in this dataset as shown in Table 6.

The correct classification rates were shown in Figure 2. In the figure, (a) showed the classification results of dataset “Breast-cancer”. The red curve is the correct classification rate of CFS in the cases of 1-feature, 2-features, to 7-features. The green curve was for mRMR and the blue one was for CFS-Spearman. The pink curve was for p-norm (p=0.9). The results in this subfigure are same as the data in Table 3.

Figure 2(b) showed the classification results of dataset “Glass” using different feature selection methods. Correct classification rates of CFS-Spearman algorithm in cases of 1-feature, 2-features, to 7-features were better than original CFS, mRMR, and p-norm (p=0.7). But if we select 8 features, the correct classification rate of CFS-Spearman algorithm was lower. When 4 features were selected, the correct classification rate was highest and the feature set included features 1, 3, 4, and 5 corresponding to Table 4.

As shown in Figure 2(c), for dataset “Bank”, the correct classification rates of CFS-Spearman algorithm in cases of 1-feature, 2-features, to 8-features were better than or equal to original CFS, mRMR, and p-norm (p=0.7). The correct classification rates were all 81.25%. Corresponding to Table 5, when we selected 2 features, the set is constructed by feature 1 and feature 6. When 4 features were selected, the set is combined with feature 1, feature 6, feature 7, and feature 8.

Figure 2(d) showed the classification results of dataset “Credit” using different feature selection methods. Correct classification rates of CFS-Spearman algorithm in cases of 1-feature, 2-features, to 13-features were better than or equal to original CFS, mRMR, and p-norm (p=0.9). When more than 9 features were selected, the correct classification rate reached the highest value 84.2%, and the 9-features set included features 3, 5, 6, 7, 9, 11, 12, 13, and 14 corresponding to Table 4.

From Figure 2, in most cases, the correct classification rates increased with the increasing of feature number until one feature number and then decreased. So, if we want to make a classification with some features like those shown in Table 8, not all features are necessary. Some more important features should be selected by the appropriate feature selection method.

5. Videos Popularity Prediction

In this paper, the CFS-Spearman feature selection method was used in video low-level feature selecting for videos’ popularity prediction. The low-level feature set included 16 features as expressed in the second part of this paper. There are 300 videos, which were downloaded from “Youtube”, used in experiments. The 16 features were extracted for each video and the serial numbers of them were showed in Table 7.

To evaluate the popularity of them, the audiences’ “Like/Dislike” votes numbers were employed to calculate the popularity degree as shown inin which LN is the vote number of “like” and DN is the vote number of “dislike” for a video. We set 4.7 as the threshold of classification for “good” videos and “bad” videos. It means that if the score of a video is higher than or equal to 4.7, it is a good video. Otherwise, it is regarded as bad one. All the extracted features were listed in Table 7. We used the four methods to select the features, and the selecting results were shown in Table 8. If the selected features were employed to classify the videos in to two classes, good and bad, the correct classification rates were listed in Table 8.

The SVM classification results showed that most correct classification rates of CFS-Spearman method were higher than that of the other two ones. And when the feature number is 3, the correct classification rates are the highest one which is 78%. And in this case, the features “Mean of contrast”, “Variance of Simplicity”, and “Variance of shot length” were selected as the most influential features. So, they could be used as the feature set to predict the popularity of commercial videos.

The correct classification rates curves were drawn in Figure 3.

The results in Table 8 were obtained by SVM using the selected feature set. Recently, some classification methods associated with deep learning received wide attention. CNN (convolution neural network) is a popular one within them. To compare the classification effectiveness, CNN was designed to test the prediction of the videos’ popularity in this part using the selected 3 features. The design of CNN used in this paper was shown in Figure 4. Firstly, the selected feature set, including features 1, 3, 4, 5, 6, and 7, were employed as the original input data of CNN. It is because that other features, such as “Variance of motion” and “Variance of shot length”, were just one scalar but not a vector or matrix.

In contrast with using SVM, the 6 features (features 1, 3, 4, 5, 6, and 7) were not the mean values of every frame in the whole video, but just uniformly selected 500 frames from a video and calculated the 6 features in every selected frame. Then, we get the input data whose size is 500×6 of CNN. The parameters in this framework were set as follows. The convolution and pooling procedure was repeated for 3 times. For the 6 features, there was no any relationship between each other. So, the convolution kernels’ size was set as [m, . The parameters were designed as follows.

For the first time, m1=51, k1=500, and there were 4 convolution layers in the procedure. Then, the size of feature map C1was 4@(450×6). After the first pooling processing with p=2, the size of feature map S1 was 4@(225×6). For the second time, m2=76, k2=225, and there were 10 convolution layers. The size of feature map C2 was 10@(150×6). The size of feature map S1 was 10@(75×6). For the third time, m3=75, k3=75, and the size of feature map S3 was 10@(1×6). We convert S3 to a column shoes size which was (60×1). Then, the video’s 3 features were transformed to a new feature set with 60 numbers. After Softmax classification, the results were obtained. The experiments were repeated for 10 times, as shown in Table 9, just like the experiments using SVM. The testing result showed that the mean correct classification rate was 61.6667, which was much lower than SVM using the 3 selected features. It is because that the 16 features were most scalars which were not suitable for CNN classification.

If we use the whole videos or some frames in them as the input data for CNN, according to the characteristic of CNN, it only pays attention to the differences between the input data in the convolution procedures or tries to get the edges in the images or videos. So, the features such as the 3 selected one, which were proved to be the influential features to the popularity of the videos, would not be calculated in the CNN framework.

6. Conclusions

This paper researched the low-level feature selection methods in commercial videos’ popularity prediction. To select the more influential features in the feature set, the paper proposed a CFS-based method in which the Spearman Correlation Coefficient was employed to take place of original Pearson Correlation Coefficient. To test the method, the data in UCI machine learning database were used as the experiments data source. The widely used algorithm, mRMR, original CFS, -norm based sparsity regularized feature selection, and the method called CFS-Spearman algorithm presented in this paper were compared using the data. Four results showed that the method in this paper was better than the other ones.

Then, to select the influential low-level features for commercial videos, 300 videos were downloaded from Youtube. For each video, 16 features were extracted, and the videos were separated into two classes, “good” and “bad” according to their scores which were calculated through the “like/dislike” number voted by audiences. When the selected feature number is 3 by CFS-Spearman, the correct classification rates is the highest one which is 78%. The features called “Mean of contrast”, “Variance of Simplicity”, and “Variance of shot length” were selected as the most influential ones.

Finally, the SVM classification was compared with the popular classification method CNN. The results showed that because the 16 features were most scalars, the SVM was more suitable for this target.

Appendix

See Table 10.

Data Availability

The data used to support the findings of this study are included within the article.

Conflicts of Interest

The authors declare that they have no conflicts of interest.

Acknowledgments

The research work of this paper was supported by the Science and Technology Development Program Fund of Science and Technology Department Jilin province, China (no. 20150414051GH).