Abstract

Personalized recommendation systems have been widely used as an effective way to deal with information overload. The common approach in the systems, item-based collaborative filtering (CF), has been identified to be vulnerable to “Shilling” attack. To improve the robustness of item-based CF, the authors propose a novel CF approach based on the mostly used relationships between users. In the paper, three most commonly used relationships between users are analyzed and applied to construct several user models at first. The DBSCAN clustering is then utilized to select the valid user model in accordance with how the models benefit detecting spam users. The selected model is used to detect spam user group. Finally, a detection-based CF method is proposed for the calculation of item-item similarities and rating prediction, by setting different weights for suspicious spam users and normal users. The experimental results demonstrate that the proposed approach provides a better robustness than the typical item-based kNN (k Nearest Neighbor) CF approach.

1. Introduction

Nowadays, personalized recommendation systems have been widely used as an effective way to help people cope with information overload [1, 2]. It automatically adjusts, restructures, and presents tailored information for individuals by analyzing user information, creating one-to-one relationship, or understanding user needs in different contexts [36]. Until now, CF is the most popular approach used in personalized recommendation systems. Approaches for CF recommendation can be grouped into two general classes [711]: user-based and item-based.

Both the typical user-based and item-based CF approaches, however, suffer from “Shilling” attacks [12] because users of online systems can multiply their profiles and identities nearly indefinitely. Thus, the systems that depend on such profiles would be subject to control by an attacker bent on making the system recommend as he or she desires [1217]. It is a common knowledge that some users’ ratings in recommendation systems are more valuable than those of others. If there is an approach that makes the credit ratings (or ranks, weights) of spam users [15] made up by an attacker less than those of normal users, the antiattack ability of recommendation systems would be improved.

There are several kinds of relationships between the users usually used in item-based CF, such as similarities and correlations. In this paper, an approach based on these relationships is proposed to calculate the relative weights of users and to improve the attack resistant ability of typical item-based CF approaches further. The proposed approach is constructed by the following four steps: three kinds of relationships between users are selected to construct user models; a density-based clustering algorithm is then used to select the best user model; the model is then applied to detect spam users; the detection results are incorporated into an approach for the calculation of item-item similarities and rating prediction. Finally, the experimental results illustrate that the proposed approach is able to provide a better robustness (the stability of prediction and hit ratio) than (1) a mostly used item-based kNN CF (similarity-based CF) recommendation approach and (2) other robust recommendation approaches.

The rest of this paper is organized as follows: Section 2 presents the background of item-based CF approaches and their related problems. Section 3 presents the proposed methods for how to select user models, how to detect and mark suspicious spam users and normal users, and how to calculate item-item similarities and predictions according to the detection. Section 4 presents experimental results of the proposed approach on MovieLens dataset and analyzes if the approach is effective in comparison with the typical item-based CF approach and other robust recommendation approaches. Section 5 draws conclusions.

2. Background and Associated Problem of Item-Based CF Approaches

CF is the mostly used and most successful recommendation technique to date [1820]. The traditional CF, user-based CF, is to predict the rating of an item for a target user based on the opinions of other like-minded users. It was remarkably successful in the past, but some potential challenges have arisen [21] such as problems in scalability, that means that the computational complexity is growing rapidly with the number of users. The item-based CF has been proved to solve the problem [9]. Both the user-based and item-based CF approaches, however, suffer from “Shilling” attacks.

2.1. Shilling Attack Problem

An attack that influences a recommendation system is to arrange with a group of users, named shills [20] or spam users [14], to enter the system and vouch for items in question. Their ratings are intended to mislead other users. The attacks are, therefore, called shilling attacks (or profile injection attacks [12]).

An attack consists of a set of attack profiles (also named attack ratings). An attack model is an approach to construct attack profiles. The general form of an attack profile is shown in Figure 1 [14].

Suppose that there are items in total in a recommendation system; an attack profile consists of -dimensional vector of ratings. The -dimensional vector can be divided into 4 sets: , , , and . Here, .    (~) is a set of randomly selected filler items. (~) is a set of unrated items. (~) is a set of selected items which have some relationships with the target items. (~) is a set of target items. Several attack models have been identified, such as random attack and average attack [13], and the newer models, bandwagon and segment model [22]. The bandwagon attack model is designed by giving high ratings on the most popular items [14] with the following characteristics: (1) : all items in are the most popular items that are assigned to ;  (2)  : all items in are assigned to random values that are in line with normal distribution ;  (3)  : all items in are assigned to   .

The segment attack model is designed to push an item to a targeted group (segment) of users with known or easily predicted preferences [22]. It has the following characteristics: (1) : all items in are assigned to   ;  (2)  : all items in are assigned to   ;  (3)  : all items in are assigned to   .

Research in the area of shilling attacks has made significant advances in last years. User-based CF makes recommendations by finding peers with preference profiles; consequently, the profiles with biased data may result in biased recommendations easily. Item-based CF looks for items with similar profiles and makes predictions based on a user’s own ratings of the peer items; therefore, the item-based CF also suffers from the attacks.

Random attack and average attack models are successful against the user-based CF algorithms; however, they fall short of having a significant impact against the item-based CF algorithms [13]. The newer models, bandwagon and segment model, are quite successful against item-based CF algorithms [22]. In these attack models, random and bandwagon attacks belong to low knowledge attacks [13] which need minimal knowledge of recommendation systems and user profiles. For experimental purpose, the bandwagon attack is adopted in the paper since it is a low knowledge attack and quite successful against item-based CF.

2.2. Shilling Attack Resistant CF

A number of recent studies have been focusing on the robust CF, due to the vulnerability of the recommendation systems that are easily to be attacked. O’Donovan and Smyth [23] proposed that the trustworthiness of users should be taken into consideration in recommendation systems. Their trust models can improve the predictive accuracy. Massa and Avesani [24] proposed a robust CF approach, also called trust-based CF, based on “web of trust.” The approach increases the coverage of recommendation systems while preserving the quality of predictions, especially for new users. However, the predictive accuracy and the coverage of recommendation systems are not the essential metrics for robust recommendation systems [25]. Zhang [26] proposed a trust-aware CF based on users’ multiple interests. He proposed a topic-level trust model and a CF approach based on the model. The approach improves the robustness of the recommendations. However, all those three levels of the trust model are based on the number of user ratings.

The relationships and weights among users are essential to a recommendation system. Yu et al. [27] proposed a reputation-based approach for decoding information from noisy, redundant, and intentionally distorted sources. Zhou et al. [28] proposed correlation-based reputation algorithm to solve the ranking problem of rating systems. Shang et al. [29] presented that relevance information can outperform the mostly used Pearson correlation coefficient under the standard collaborative filtering framework, especially for sparse data set. Thanks to these researches because we are provided with valuable input to our approach.

In the paper, the user models are formed by the relationships of users, in which not only the numbers of user ratings but also the ratings themselves are taken into account. Three kinds of mostly used relationships between users are selected to construct user models firstly. The best user models then are experimentally selected for detecting and weighting users. The rating weights for the users are incorporated into a typical item-based CF finally. The proposed models and the approach can further improve the robustness of recommendation approaches. They will be discussed in detail in Section 3.

3. User Relationships-Based Robust CF

To achieve a robust collaborative recommendation approach, the spam users are detected based on users’ relationships and the detection results, represented by weights, are incorporated into item similarity computing (see Figure 2). The paper adopts the definition of robustness for collaborative recommendation, the ability to make recommendations despite noisy product ratings [23]. The approach takes the rating matrix as input and takes predicted ratings as output. In data modeling module, three kinds of user relationships are taken into consideration, which are interest similarity, rating similarity, and rating linear dependence. In user weighting module, clustering-based detection results are applied to produce the weights of users. Then the weights are incorporated into item-item similarity calculations and further predictions.

3.1. The Analysis of User Relationships

There are different relationships between users in a recommendation system, just as there are various relationships in any social group. The relationships are exploited to construct user models for the detection of spam user.

Traditionally, ratings similarity is the most used relationship between users in recommendation systems [18]. The rating similarity is shortly named R_Sim to measure how much two users’ ratings are similar to each other.

The rating similarity, however, is only one aspect of the user relationship. There are other relationships behind the ratings [29]. For example, which many items are rated by both user and user ; the ratings are extremely different, however. In this case, the rating similarity of them is very low. Nevertheless, there should be a similarity between them is high since the rating sets of them are similar. Especially, if the data set is very sparse, rating on same items is more important than same ratings [29]. In the paper, this relationship is called interest similarity, shortly named In_Sim, which represents how two users are interested in the items, in a recommendation system.

In addition to those relationships, Gao and Wu [21] pointed out that the covariance between ratings is an important measure because it represents the linear dependence between the ratings of users. In practice, however, correlation coefficient (Corr_coef) instead of covariance is usually used in measuring the linear dependence between two variables because it gives a value between −1 and 1 inclusive. The linear dependence is also usually used as user similarity in recommendation systems. Thus, in the paper, the linear dependence (L_depd) is considered as the third relationship in the model, which means how the ratings of two users change together.

Therefore, these three kinds of relationships, interest similarity, rating similarity, and linear dependence, are taken into consideration in the research.

The interest similarity of users and can be calculated by (1). The more items have been rated by both user and , the closer the users are [19]. We define   as the set of items rated by the user ; is similar to . is the set of items rated by both users and . Consider

The rating similarity of users and can be calculated by Cosine, the most used measure for the calculation of similarities among users (see (2)). Here, the rating means how the user prefers the item . The rating is similar to . is the set of items. onsider

The linear dependence between the ratings of user and those of user can be calculated by Pearson Corr_coef  (see (3)).  The Corr_coef is defined as the covariance of the variables divided by the product of their standard deviations. Consider

Here, is the average of the ’s ratings on the items in , ; is similar to .

So far, three relationships form three matrixes R_Sim, In_Sim, and L_depd. Table 1 shows three pair correlations between R_Sim, In_Sim, and L_depd matrices before and after bandwagon attacks with 10% attack size and 10% filler size.

3.2. Construction of User Models

The combinations of the matrixes, In_Sim, R_Sim, and L_depd, can form seven different user models, such as (In_Sim, R_Sim, L_depd) and (In_Sim, R_Sim). Please note that the user model constructed by (R_Sim, In_Sim) is similar to the model constructed by (In_Sim, R_Sim).

All those three matrixes are dimensional matrixes. is the cardinality of the set of users. A vector from the combinations of the three matrixes can be used to represent a user, which is high dimensional data. To decrease the dimension, the matrixes are experimentally analyzed. It is found that those In_Sim and R_Sim values can be, respectively divided into 10 slots, respectively, (0 to 1, 0.1 intervals); those L_depd values for every user can be divided into 20 slots (−1 to 1, 0.1 intervals).

Figure 3 is the distribution chart of Slotted In_Sim, Slotted R_Sim, and Slotted L_depd.

Slotted In_Sim is a matrix that records the distribution of the interest similarities for all users. It is formed by ten attributes that are the slots from 0 to 1, 0.1 intervals. The values of the attributes is in .

Slotted R_Sim is a matrix that records the distribution of the rating similarities for all users. The definitions and values of attributes of slotted R_Sim are similar to those of slotted In_Sim.

Slotted L_depd is a matrix that records the distribution of linear dependence for all users. The twenty attributes of Slotted L_depd are the slots from −1 to 1, 0.1 intervals. The values of the attributes are in .

Thus, the seven user models formed by the combinations can be simplified to the combinations of slotted In_Sim, slotted R_Sim, and slotted L_depd. In those user models, each user can be represented by ten to forty attributes.

Attacks will make similarities among spam users which are greater than similarities among normal users. Therefore, the weighting problem can be seen as a clustering related problem. Density-based clustering algorithm DBSCAN [30, 31] is chosen to group users in the research because it can discover arbitrary shaped clusters and good efficiency on large databases. DBSCAN groups the users who are dense and can be connected into a single cluster. DBSCAN is applied on all those user models to find which one will be most helpful to detect the group of spam users.

In the DBSCAN algorithm, a user will be a core of a group when his/her neighbors are equal to or more than . Two users will be neighbors when the distance of their attributes is less than 0.05. The bandwagon attack is used to analyze how the attributes are beneficial to the clustering. The attacking size and filler size are 5% and 5%, and 10% and 10%; the number of attacked items is 1. The attacks can be push attacks or nuke attacks according to if it is to raise the predicted rating of a target item. A push attack will raise the rating; otherwise it is a nuke attack. Push attacks are taken into account in this paper.

Figures 4 and 5 represent the distributions of Slotted L_depd and Slotted R_Sim values of normal users and spam users. The attack sizes and filler sizes are 5%, 5%; 10%, 10%; and 20%, 20%, respectively, in Figure 4. Those are 5%, 5%; 10%, 10%; and 15%, 5%, respectively, in Figure 5. In these figures, the distribution of spam users are much obviously different from those of normal users with increasing of attack size and filler size.

As seen from Table 2, the (Slotted In_Sim, Slotted R_Sim) is the best combination among them. Consequently, the attributes from Slotted In_Sim and Slotted R_Sim are chosen to detect spam users. The precisions of other user models unlisted in the table are no more than 20%. Most of those models even cannot find any spam user. With increasing of attack size, filler size, and the number of attack items, most of the user models emerge remarkable results. That is because the characteristics of attack users become much more obvious.

3.3. Detection-Based Item Similarity Calculation and Rating Prediction

As discussed previously, item-based CF is proposed to compute the similarities between items and then to choose the most similar items for prediction [18]. The theory behind is to compare items based on the pattern of ratings across users.

In the research, the rating weights of users are incorporated with one of similarity-based algorithms [1], named item-based kNN collaborative filtering (shortened to IKCF).

As mentioned in Section 3.2, the sets of suspicious users will be obtained when DBSCAN algorithm is applied to the twenty attributes of Slotted R_Sim and Slotted In_Sim.

The new algorithm we proposed is a weighted item-based kNN collaborative filtering approach (named WIKCF). If the users in the spam user group, then their weights should be extremely small; otherwise, the weight should be large. In the research, the weight of user is simply set to 1 when he/she is not in the suspicious spam group or 0 when he/she is in the suspicious group.

There are several algorithms for computing item-item similarities, such as cosine, correlation, and adjusted cosine-based similarity [18]. Adjusted cosine is the mostly used algorithm to calculate the similarities between items because it is reasonably accurate, widely used, and easily analyzed [25]. Thus, in the WIKCF, adjusted cosine is utilized to calculate item similarities:

Here, is the set of users who have rated on item . Formally, . is the average ratings of user ’s. The   is the weight of user .

In order to estimate a rating, the most used weighted sum is applied to predict ratings for users, which is the crucial step in a CF recommendation system. Consider where is the set of items rated by user .

4. Experimental Evaluations

4.1. Dataset

The widely used MovieLens dataset is utilized to evaluate the proposed approach. MovieLens [32] is a free service provided by GroupLens Research at the University of Minnesota (http://www.movielens.org). The site had over 43,000 users who had rated more than 3,500 different movies.

There are two datasets in the MovieLens project. One includes 1,000,209 anonymous ratings (1–5) of approximately 3,900 movies made by 6,040 users who joined MovieLens in 2000. Another dataset consists of 100,000 ratings from 943 users on 1,682 movies. Each user has rated at least 20 movies. The latter dataset has been used in the experiments. The dataset was randomly divided into a training set (80,000 ratings) and a test set (20,000 ratings) 50 times. The training and test sets are named base and test  .

4.2. Evaluation Metric

Three metrics are used to evaluate the algorithms: mean absolute error (MAE [19]), predictions shift [18], and hit ratio [14] shift. MAE is a broadly used metric for the deviation of predictions from their true values. Prediction shift and hit ratio shift are mostly used metrics for measuring the robustness of the recommendation systems.

For all predictions and corresponding real ratings , is the average of absolute error between all pairs. The lower the MAE is, the better the proposed approach is.

Prediction shift models the difference between average predicted ratings of all the ratings in the test set, after and before the attacks [18]:

In the formula, and are the predicted ratings after and before the attacks, is the set of users and is the set of items in the test set, and the abs function indicates the absolute value of .

In a recommendation system, users are usually interested in the first items in the recommendation list. The changes of predicted values may not trigger the change of the recommendation list. Hit ratio is the average number of hits across all the users in the test set [14]. In the paper, the hit ratio indicates the ratio the first items in the recommendation hit the first items in the test set. Hit ratio shift models the difference between average hit ratios of all users, after and before the attacks:

Here, and are the hit ratios of the users in the test set, after and before the attacks.

4.3. Experimental Methodology

In the experiments, 10, 15, and 20 items are randomly selected as the target items, respectively. The two metrics of prediction shift and hit ratio shift are used to measure the relative performance of robustness of the algorithms. The values of these metrics are plotted against the size of the attacks reported as the number of spams and a percentage of the total number of users in the system. The for the kNN of items was set to 20. The users in the segment had similar ratings on 10 randomly selected items.

To test the robustness of the recommendation algorithms, the applied attack models, attack size, and filler size are listed below. (i)Attack model is bandwagon attack. (ii)Attack size is the percentage of attack profiles, valued 5%, 10%, 15%, and 20%, respectively. (iii)Filler size is the percentage of the filler ratings in the attacks, valued 5% and 10%, respectively.

The settings of the attack profiles are as follows:(i): the randomly filling items were assigned to random valued by its mean and variance ;(ii): the selected items were the first items rated by most users, ; the selected items were assigned to   ;(iii) : the target items were assigned to .

The experimental procedure included the following steps:(1)to get R_Sim_Csn, R_AdjSim_Csn, and In_Sim of users,(2)to calculate their SRSC, SRSA, and SIS,(3)to compute the rating weights of users applying DBSCAN algorithm,(4)to predict ratings in Uitest using WIKCF and compare the predicted ratings with the real ratings in Uitest to get the values of MAE, prediction shift, and hit ratio shift,(5)to predict ratings in Uitest applying IKCF and calculate the values of MAE, prediction shift, and hit ratio shift,(6)to fill attacks into rating matrix (Uibase) with different attack sizes and filler sizes then repeat the steps 1–5 several times (see the above settings).

4.4. The Experimental Results and Analysis
4.4.1. Comparisons of Prediction Shift Values

The values of prediction shift are emphasized in Figure 5, in which the impact of the attack is compared between IKCF and WIKCF. The -axis depicts the different attack sizes and filler sizes: the former are 5%, 10%, 15%, and 20%; the latter are 5% and 10%. The -axis indicates the prediction shift values.

In Figure 6, the light and dark gray bars are the results of IKCF; the light and dark blue bars are the results of WIKCF. The bars indicate the prediction shifts when the system suffered from the attacks. In the attacks, the numbers of the target items are 10 and 20. The figure illustrates that the predicted ratings of the adjusted cosine algorithm changed a lot when the system suffers from the attacks with different attack sizes and filler sizes. The greater the attack sizes and filler sizes, the greater the change. Compared with IKCF, the predicted ratings of WIKCF change a little at any attack size and filler size.

4.4.2. Comparisons of the Values of Hit Ratio Shift

The hit ratio shifts are emphasized in Figure 7, in which the impact of the attack is compared between IKCF and WIKCF algorithms. Similar to Figure 6, the -axis depicts the different attack sizes and filler sizes: the former are 5%, 10%, 15%, and 20%; the latter are 5% and 10%. The -axis indicates the values of hit ratio shifts.

In Figure 7, the light and dark gray bars are the results of IKCF; the light and dark blue bars are the results of WIKCF, which indicate the hit ratio shifts under the attacks. The number of the target items is 10 in the attacks. The hit ratios were computed according to the top 10 and 20 items in the recommendation list and Uitest. The figure shows that the hit ratio of IKCF changed a lot when the system suffered from the attacks with different attack sizes and filler sizes. The greater the attack sizes and filler sizes, the greater the change of WIKCF. Compared with IKCF, the hit ratio values of WIKCF change little at any attack size and filler size.

4.4.3. Comparison of MAE Values

As illustrated in Table 3, MAE values of two algorithms are almost the same.

4.4.4. Experimental Analysis

It is easily found from Table 3, Figures 6, and 7 that the robustness of WIKCF is in a higher degree than IKCF with MAE values compared with IKCF. The robustness has been demonstrated by the following: (1) the prediction shift and hit ratio shift of WIKCF are less than those of IKCF are and (2), with the increasing of attack size and filler size, the impact of the attack is growing to IKCF; however, the impact of the attack is stable to WIKCF. A possible reason is that the rating weights of the users are not taken into consideration in the baseline approaches; in other words, the weights of spam users and normal users are the same.

4.5. The Comparisons with Related Works

Zhang [26] proposed a trust-aware CF approach based on users’ multiple interests to provide robust recommendations and tested it against MovieLens dataset. He applied random and average attack models to test his user-based CF algorithm. Similar results for user-based CF can be found from Mehta and Nejdl [33], in which a matrix factorization strategy (VarSelectSVD) is used, under 5% average attacks and 7% filler. As mentioned before, those models are successful against the user-based CF rather than item-based CF algorithms, such as bandwagon and segment models, which are quite successful against item-based CF algorithms. Therefore, in the research, the bandwagon models are applied against the proposed item-based CF algorithm. Mobasher et al. [13] applied NN supervised classification for user-based and item-based CF on the MovieLens 100 K dataset by using 15 detection attributes that include six generic attributes, six attributes of average attack model, and three attributes of group attack model.

Despite the weak comparability, the experimental results are given for reference: the prediction shifts of Zhang’s research [14] are in the range of 0.2~0.5, the shifts experimental results in this research are less than 0.1, and the hit ratio shifts of his work are similar to the experimental results of this research. The prediction shifts from Hurley are about 0.1~0.3 [34] under bandwagon attacks, but the results in this research are less than 0.1.

5. Conclusions

In this paper, three usually used user relationships and the construction of user models have been analyzed at first. Then the best user models have been selected based on clustering method according to the results of spam user detection. Finally, a detection-based approach has been proposed for the calculation of item similarities and ratings prediction. The experimental results in this research demonstrate that the most used relationships, interesting similarity and rating similarity, are important to detect spam users; density-based clustering algorithm is effective to detect spam users; the detection-based filtering approach does benefit improving the robustness of the typical item-based kNN CF recommendation approach.

Conflict of Interests

The authors declare that there is no conflict of interests regarding the publication of this paper.

Acknowledgments

This research is supported by the National Natural Science Foundation of China (71102065), the National Key Basic Research Program of China (973) (2013CB328903), and the China Postdoctoral Science Foundation (2012M521680).