Abstract

Group activities on social networks are increasing rapidly with the development of mobile devices and IoT terminals, creating a huge demand for group recommendation. However, group recommender systems are facing an important problem of privacy leakage on user’s historical data and preference. Existing solutions always pay attention to protect the historical data but ignore the privacy of preference. In this paper, we design a privacy-preserving group recommendation scheme, consisting of a personalized recommendation algorithm and a preference aggregation algorithm. With the carefully introduced local differential privacy (LDP), our personalized recommendation algorithm can protect user’s historical data in each specific group. We also propose an Intra-group transfer Privacy-preserving Preference Aggregation algorithm (IntPPA). IntPPA protects each group member’s personal preference against either the untrusted servers or other users. It could also defend long-term observation attack. We also conduct several experiments to measure the privacy-preserving effect and usability of our scheme with some closely related schemes. Experimental results on two datasets show the utility and privacy of our scheme and further illustrate its advantages.

1. Introduction

With the development of the social networks and the mobile devices like smartphone or IoT terminals [1, 2], recommender systems, which are used to recommend for each individual, play an important role in our daily life. Nowadays, more and more people tend to gather together to enjoy social activities, such as watching movie or hiking together, having dinner with friends, or using apps such as Meetup or Douban to find out group activities with common-interest-strangers. With the latest development and popularity of smart devices and social networking services, it is convenient for people to form a group. Such trends bring new challenges to the existing recommender system: which items or events (e.g., movie/attraction/restaurant) should be recommended, in order to satisfy all/most of the group members? Under this circumstance, group recommendation has gradually become one of the hotspots in the field of recommender systems [38].

However, the privacy issue is obstructing the development of recommender system. In 2018, about 87 million Facebook users’ information was leaked to Cambridge Analytica company. By utilizing such information, the company built user model and obtain users’ personal preferences. Based on these preferences, the company recommended targeted promotion content about the US election to these users. It affects users’ votes to some extent, causing serious violations of human rights. Therefore, we should pay much attention to protect users’ privacy on recommender systems, including both of their historical data and personal preference.

Many privacy-preserving approaches have been proposed to protect users’ historical data during recommendation over recent years. Yang et al. [9] proposed a framework called PrivRank. In PrivRank, both historical data and activity data of users are protected by obfuscation. In the mean time, these obfuscated data could provide high-quality personalized ranking-based recommendation services. In the eHealthcare system, Xu et al. [10] proposed a privacy-preserving online medical service recommendation scheme. This scheme uses Paillier encryption algorithm to match each patient’s needs with the information of doctors. It recommends appropriate doctors to patients, without knowing patients’ exact demands.

Although these work protect users’ historical data during recommendation, they ignore that results of personalized recommendation also leak user’s privacy. For example, if an untrusted server learns a specific user’s preference on restaurants, without knowing the exact restaurants that the user went to, the server could easily figure out user’s food taste, places that he or she usually went, and how much he or she may cost on food. And the server probably infers user’s salary or even home address according to these preferences.

For this concern, we take the privacy of personal preference into consideration. In that case, there are three challenges we need to solve. First, we need to hide both historical data and preferences of users from honest but curious parties during the recommendation. We also need to guarantee the accuracy of recommendation results at the same time. Second, since a group is dynamic, we should focus on the long-term observation attacks launched by the malicious users in the group recommendation. Third, how to evaluate our recommendation scheme is also a tricky issue.

The main contributions of our paper are summarized as follows: (i)We propose a privacy-aware group recommendation scheme that protects each user’s historical data and personal preferences at the same time. The scheme consists of two algorithms. The former focuses on the personalized recommendation problem, which guarantees -LDP for each user. The latter is a privacy-preserving preference aggregation algorithm called IntPPA, designed to solve the group recommendation problem. By adding noise on each member’s preference profile, and transferring them within the group, the user’s preferences can be protected, and the accuracy of the group recommendation is also guaranteed.(ii)In IntPPA algorithm, no one could easily infer the members’ preferences through long-term observation attacks. We also adopt a median aggregation strategy to prevent malicious group members from tampering data.(iii)We conduct several experiments on two real-world datasets. We use RMSE and F-score to measure the utility, and use RMSE and “matched pairs” to measure the privacy-preserving effect under long-term observation attacks. The results show our scheme has good utility and privacy-preserving effect.

The rest of this paper is organized as follows. In Section 2, we review some recent work. Section 3 introduces the preliminaries. In Section 4, we elaborate the details of our proposed methods. We also demonstrate our evaluation results in Section 5. Finally, we conclude our paper in Section 6.

2.1. Privacy-Preserving Personalized Recommendation

Differential privacy (DP) has been widely utilized in the personalized recommender systems to protect users’ privacy. Under DP, the recommender systems could not infer a user’s profile from the changes of recommendation results. McSherry and Mironov [11] proposed a differential privacy method based on collaborative filtering algorithm. They added noise not only on the sum, count, or average but also on the covariance matrix and then sent it to the recommender system. Even though the method could protect users’ privacy, the performance is of low utility since too much noise has been added. Then, Liu et al. [12] and Balu and Furon [13] also proposed recommendation algorithms based on DP and improved the performance. However, most of them assumed that the recommender system is a trusted third party.

Hua et al. [14] achieved differentially private matrix factorization based on a gradient descent algorithm, which prevents users’ private ratings from the untrusted recommender. Shen et al. [15] proposed a differentially private framework without any trusted third parties. By solving optimization problems, the method could perturb users’ historical records, and guarantee users’ category preferences under -DP.

Some solutions utilized local differential privacy to construct privacy-preserving recommender systems. Shin et al. [16] proposed a personalized recommendation algorithm based on matrix factorization and used a LDP mechanism on the gradient of users’ profiles. The algorithm not only protected users’ ratings, but also the items that were rated by the users. Shin et al. [16] also enhanced their algorithm via dimension reduction in order to improve the accuracy and decrease the computing costs. Since this algorithm obtains recommendation results on the user side, only users can learn the recommended result.

2.2. Group Recommendation

Group recommendations try to satisfy a group of users’ preferences. It could be applied in various areas such as video services [17], shopping [18], traveling [19], having dinner [20], and even in IoT scenario [21]. How to extract the common preferences of group members, reduce the preference conflict among group members, and make the recommendation results satisfy the needs of all group members as much as possible, are the key problems in group recommendation system.

Solutions to group recommendation are usually divided into two categories. One category aggregates the profiles of the group members into one profile, then regards it as a user profile, and makes personalized recommendations based on the aggregated profile [22, 23]. On the contrary, the other methods first makes personalized recommendations for each group member, respectively, and then aggregates their recommendation results as the group recommendation result by aggregation strategies [24, 25]. Comparing with the former methods, the latter ones have better flexibility and also obtain more attentions.

A few work solve the privacy problems in group recommendation. Luo and Chen [26] proposed a group-based privacy-preserving method. The algorithm perturbed the preference data of each group member and then utilized CF-based group recommendation to aggregate their data. Shang et al. [27] proposed a ranking-based privacy-preserving group recommendation method. Group members exchanged their obfuscated profiles to protect sensitive information. And they constructed a hybrid collaborative filtering model based on Markov random walks to provide recommendations and predictions to group members.

3. Preliminaries

3.1. Motivation and Basic Idea

There is a huge privacy risk in personalized recommendation. Once the attacker collects the user’s historical data in the recommendation, the user’s sensitive information could be inferred. For example, we often rate the restaurants we have visited in the Dianping or Yelp. According to these data, the attacker can easily infer the user’s historical geographical location and even further speculate the user’s financial conditions and home address.

On the other hand, with the development of social network, the interaction between users increases. Recommender systems are no longer limited to recommending items to one user, but gradually extended to group-oriented services.

Most group recommendation algorithms need to integrate the personalized recommendation results of each group members, so the risks in personalized recommendation also hold true for group recommendation. However, in addition to historical data, users’ personalized recommendation results are also very sensitive. Personalized recommendation results reflect users’ preferences and behavior habits, involving food preferences, entertainment interests, social network relationships, political tendencies, and other aspects, thus forming a “user portrait” of people.

Existing privacy-preserving algorithms of recommendation system only protect user’s historical data but ignore the disclosure risk of personalized recommendation results. Moreover, a group tends to change and update, so the attacker is likely to analyze more user information through the long-term observation. How to further protect the personalized recommendation results and resist the long-term observation attack in group recommendation is the main problem to be solved in this paper.

We design a privacy-aware group recommendation scheme. Figure 1 further illustrates our basic idea, which could be divided into two parts. Steps 1, 2, and 3 show the personalized recommendation part. Users interact with the recommendation server under local differential privacy, in order to train the personalized recommendation model via matrix factorization mechanism without privacy leakage, and then the personalized recommendation results will be generated on the client side. After the personalized recommendation part, each user obtains his or her own preferences. Steps 4 and 5 represent the preference aggregation part. When a group needs a recommendation, the group will execute the IntPPA algorithm. Each group member’s personal preference is transferred inside the group with perturbation before sending to the server. In this case, the real preferences of group members are hidden during the random transmission and will not be exposed to others. After receiving all members’ perturbed preferences, the server adopts the median aggregation strategy to fuse them and obtains the final group preference.

3.2. Local Differential Privacy

Local differential privacy is a state-of-the-art notion that guarantees users to share private data safely.

Definition 1. A privacy mechanism satisfies -local differential privacy () if for any two different records , and for any output ,

Local differential privacy has a sequential composibility property same as differential privacy. Sequential composibility means the privacy budget could be assigned in different steps of an algorithm.

Property 2. For privacy-preserving algorithms , . If algorithm satisfies -local differential privacy, algorithm satisfies -LDP, which .

3.3. Problem Statement

We take the movie recommender system as an example to introduce our scheme in this paper. We assume that there is a movie recommender system consisting of movies and users. We denote the rating generated by user for movie as . If user has not rated movie , . The movie recommender system has many groups that are formed by users. We denote a typical group with users as . The notations are listed in Table 1.

In a group recommender system, it first collects users’ historical data to predict preference rating for movies that are not rated by users and then aggregates (for all ) for each movie to obtain group preference rating for movie . Usually, the sensitive historical ratings and preference ratings will be transmitted between users and server without protection, which increases the risk of users’ privacy leakage. In our work, we assume that the server is untrusted and each user only trusts themselves. And the adversaries are assumed to have access to the output data of all users and know the privacy-preserving scheme adopted by the users. Since groups tend to change and update, attackers also try to find out the connection between users’ identities and their sensitive data by carrying out the long-term observation attacks. In these cases, how to build a group recommender system without privacy leakage is our main problem.

We are facing several challenges to solve the problem. First, how to protect users’ personal preference ratings from leakage to others except the rating owner and guarantee the accuracy of group recommendation results at the meantime? Second, how to defend against long-term observation attacks? Third, how to measure the privacy-preserving effect against long-term observation attack?

4. Our Proposed Scheme

4.1. Personalized Recommendation under Local Differential Privacy Algorithm

To fully protect users’ personal profiles, the privacy-preserving algorithm should not only prevent users’ historical data from leakage but also need to pay attention to their personalized recommendation results, during the personalized recommendation step. So based on the algorithm proposed by Shin et al. [16], we refine the matrix factorization method to fit in with our group recommendation scenario.

To solve the matrix factorization problem without overfitting, we minimize the following formula:

In formula (2), user vector represents user ’s relations with latent factors separately. These factors also have inter-dependencies with movies, which are denoted as item vectors for movie . Matrix represents whether a user has given a rate to a movie. If , , else . , which is the total number of the ratings in the system.

We utilize a privacy-preserving gradient descent algorithm to approach the minimum of formula (2).

Users separately interact with the server for times to achieve the gradient descent algorithm. According to Property 2, if our algorithm needs to realize -LDP, then each iteration needs to realize -LDP. Before each iteration, users and the server need to initialize and on their own sides. During each iteration, is sent to each user first, then user computes a matrix , where each element . It is relative to the gradient of formula (2) on user vectors. According to Shin’s work, each needs to be normalized into range . The min-max normalization is usually used to normalize, but it will leak users’ gradient range. So, we propose a normalization method without too much information leakage. Each user computes , then is normalized in range , where . Then, each user uses LDP algorithm to perturb . User randomly chooses two elements , and computes .

Next, user perturbs in the following distribution and obtains . We denote :

After that, each user sends , , to the server, then updates user vector , where is the learning rate of the gradient descent algorithm. After receiving information from all users, the server adds each to the row and column of matrix , in order to estimate the gradients of . Then, server updates and sends it back to all users.

Users and the server interact times as above, and finally each user computes on their own sides and obtains each items’ predicted ratings.

The following theorems illustrate the privacy and usability of this algorithm, which have been proved similarly in [16], so we are not going to prove them here.

Theorem 3. For each user, the algorithm in Section 4.1 satisfies -local differential privacy.

Theorem 4. There is . represents the true value of the item matrix under no privacy-preserving algorithm. represents the perturbed value of item matrix elements under this algorithm.

Theorem 3 indicates the functionality on the privacy protection of our algorithm, while Theorem 4 shows its utility, since is the unbiased estimation of .

4.2. IntPPA Algorithm

This algorithm corresponds to steps 4 and 5 in Figure 1. It consists of an inter-group random transmission and an aggregation strategy. We first introduce why we choose median strategy to be our aggregation strategy.

There are lots of aggregation strategies in group recommendation such as average strategy. Under this strategy, the average of group members’ ratings represents the group preference. Also, there is a typical strategy called least misery. Under this strategy, the minimum value of group members’ ratings represents the group preference.

Suppose a group consists of five members: Alice, Bob, Charles, David, and Eric. Each member has his or her own preferences on five movies (predicted via personalized recommendation), which is shown in Table 2. The server supposes to select two movies to satisfy most of the members. If the average strategy is chosen, the group preferences for these 5 movies will be 3.2, 3.4, 3.8, 2.8, and 2.8, separately. And Movies 2 and 3 will be recommended to the group. If the least misery strategy is chosen, the group preferences for these 5 movies will be 1, 2, 2, 1, and 1, separately, and the server will also recommend Movies 2 and 3. In this paper, we choose median strategy to measure the group preference. Under this strategy, the median value of members’ ratings represents the group preference. So, the group preference will be 3, 4, 4, 3, and 3, separately, and Movies 2 and 3 will be recommended. According to the definition, median strategy is less affected by extreme ratings. We assume there is a dishonest user Gloria, she is a new member of this group, and she wants the server to choose Movie 4, so she changed her predicted ratings into 1, 1, 1, 5, 1. In this case, the group will recommend Movies 3 and 4 under average aggregation strategy. While under least misery strategy, all movies has the same preference, which is indistinguishable for the server, so this strategy is unavailable under this situation. For our median aggregation strategy, Gloria’s plan does not work out. Movies 2 and 3 still have the highest ratings, which means Movie 4 will not be recommended. Based on these analysis, it is hard for the malicious users in the group to affect the group preference by changing their predicted scores under median aggregation strategy.

Next, we are going to introduce the random transmission. Suppose there is a group , who has group members. Group members aim to send profile to the server without privacy leakage, and at the meantime, the server could obtain the group preference of each item. In order to prevent the server from knowing users’ personal preferences, we design a random transmission mechanism inside the group, called IntPPA.

Server sets two time parameters: and . All the members have to start this IntPPA algorithm after time and finish before time . We propose a perturbed algorithm , and for each , , . is a fixed value, and we call it as “change value.”

Group members first perturb their profile into . Then, they send perturbed profiles to the server with the probability of and send to each group member with the probability of . When a member receives others’ profiles, he/she also needs to use the perturbed algorithm to update first and then sends to group members or server according to the probability distribution described above. A member’s profile will not stop to be transmitted until the profile is sent to the server or the time exceeds .

The server finally receives all and computes group recommendation results , in which . represents the group preferences of movie . Figure 2 shows an example of the process of IntPPA. User holds a profile (i.e., personalized recommendation results) . He/she first executes the perturbed algorithm and obtains and transmits the perturbed profile to user . Then, the profile is sent to user user user user server. Finally the server receives .

Under IntPPA algorithm, neither the server nor users could recognize whom the profiles are exactly from. When a new group member joins in this group, the server also cannot find out the fresh member’s profile, according to the difference between results computed before and after the member’s attendance.

5. Performance Evaluation

5.1. Experimental Settings
5.1.1. Datasets

We evaluate our experiments on two datasets. The first dataset is MovieLens-100k [28], which contains one hundred thousand movie ratings from approximately 1000 users on over 1500 movies. The second dataset is FilmTrust [29]. FilmTrust is a dataset crawled from the entire FilmTrust website. Table 3 shows the details of the two datasets.

5.1.2. Comparison Methods

In Section 2.2, we mentioned several privacy-preserving group recommendation work. However, we did not compare our scheme with theirs. Although their works protect users’ ratings and even could protect users’ preferences to some extent, they did not consider the long-term observation attack during the group recommendation. While we assume a stronger attack model, it is not appropriate to compare with them. In this case, we compare our scheme (priv-MF-IntPPA) with some baselines in our experiments: priv-MF-med, priv-MF-avg, and priv-MF-lm.

The difference between ours and the other three methods is the aggregation strategy. Priv-MF-med, priv-MF-avg, and priv-MF-lm choose median, average, and least misery aggregation strategies, respectively. They have the same privacy-preserving personalized recommendation methods as ours, while they do not apply privacy protection to the preference aggregation part.

5.1.3. Utility Metrics

We evaluate the data utility of personalized recommendation algorithm by computing the RMSE (root mean squared error) between the item scores predicted by the training set and the actual score of the test set. We evaluate the group recommendation accuracy performance using precision () and recall (). Here, is the number of the recommended items. We evaluate the experiment with . is the fraction of top recommendations selected by the group, and is the fraction of true items retrieved in the top recommendations.

5.1.4. Privacy Metrics

We use two metrics to evaluate the privacy-preserving effect of IntPPA algorithm. We compute the RMSE between each user’s profile before and after the IntPPA algorithm. We also propose a “matched pair” metric to describe the privacy-preserving effect against long-term observation attack.

We execute the IntPPA algorithm on the same group for multiple times and compute the differences between the perturbed profiles. For example, if user ’s profile in execution one is closest to user ’s in execution two, we will call them as “a pair.” If these two members have the same identity, which means user is actually user , we will call that pair “matched.” We denote the number of matched pairs in a group as and normalize it to interval . Obviously, the smaller is, the better the privacy protection performs against the long-term observation attack.

5.2. Evaluation Results

We follow the evaluation method of privacy protection proposed in Li et al. [30] to measure our scheme.

We first did some pilot experiments to screen appropriate parameters of the method. According to the pilot experiments, we set number of latent factors and number of iterations . We also set , in MovieLens’ experiments and , in FilmTrust’s.

5.2.1. Personalized Recommendation Utility Analysis

We divide the dataset into five folds, using four folds as training set and the other as testing set. We use RMSE to measure the utility of personalized recommendation algorithm. In Figures 3(a) and 3(b), the dotted line displays the utility of non-privacy matrix factorization algorithm, while the red line displays the utility of ours. In both datasets, the red line approaches to the dotted line as the privacy budget increases, which means has a negative correlation with the utility of our personalized recommendation algorithm. As we know, has a positive correlation with the privacy-preserving effect. So, in order to balance the privacy and utility, we set for both datasets in the following experiments.

5.2.2. IntPPA Algorithm Privacy-Preserving Effect Analysis

We utilize two methods to measure the privacy-preserving effect of IntPPA. Since these measurements are only relevant with parameter , , and , we test it on MovieLens dataset.

Figure 4(a) shows that larger or larger probability achieves better performance on privacy protection. In terms of parameter selection, when the data in the dataset has wider range, we need to increase or to guarantee a suitable perturbation.

Second, we utilize “matched pairs” to measure the privacy. Figure 4(b) shows that when transfer probability increases, “matched pairs” will decrease, which means a better performance on privacy protection. Change value and group size are also negatively correlated with . However, once is smaller than , the IntPPA algorithm could not resist long-term observation attack.

According to the above experiments, we choose parameter for both MovieLens and FilmTrust datasets.

5.2.3. Group Recommendation Accuracy Analysis

Since both MovieLens and FilmTrust do not contain group information, we extract groups that are randomly chosen from the users to build groups.

We use and to evaluate our scheme’s recommendation accuracy. In MovieLens (FilmTrust), if a group member gives score 4 (3) or above to a movie, we assume that the movie is adopted by the member.

Figures 5 and 6 report the and values for the two datasets with and group size . We compare our scheme (priv-MF-IntPPA) with the other group recommendation algorithms (priv-MF-med, priv-MF-avg, and priv-MF-lm) described in Section 5.1. We observe from the two figures that (i)For priv-MF-IntPPA, when becomes larger, has a trend to become smaller, but becomes larger.(ii)Group size has little effect on the recall and precision value of our group recommendation.(iii)In the MovieLens dataset, the precision of our scheme is close to others’, and sometimes even be better than them due to the randomness. However, the recall of our scheme is a little weak. In Figure 5(c), when , the recall values of priv-MF-IntPPA and priv-MF-med are 0.0021 and 0.0039. Priv-MF-med improves 85.7% over our scheme. But when , the recall values are and ; priv-MF-med only improves 22.9% over our scheme. So, as group size becomes larger, the differences will be acceptable.(iv)In the FilmTrust dataset, the precision and recall values of each method are much lower than the values in the MovieLens dataset. It is probably because FilmTrust has lower rating density. Figure 6(c) shows that our scheme performs much better than other methods, which means that our algorithm is more suitable for low sparsity data than other algorithms.

From the above, our scheme can provide accurate group recommendation under the premise of ensuring privacy.

5.3. Communication and Time Cost

As for personalized recommendation, we only analyze the communication cost in one iteration. For each user, no matter how many items the user rates, only three elements are transmitted to the server, which is about B for both MovieLens and FilmTrust datasets. During each iteration, server transmits a matrix of elements to each user, which are about MB for Movielens and  MB for FilmTrust.

In IntPPA algorithm, server has no communication cost. Users transmit elements to the server and group members, which are less than  MB during our algorithm in both datasets.

According to our test, under the MovieLens (FilmTrust) dataset, the time of each iteration in personalized recommendation is no more than 90 (180) seconds. The time consumed by other parts can be ignored.

6. Conclusions

In this paper, we proposed a privacy-aware group recommendation scheme, consisting of a personalized recommendation algorithm and a preference aggregation algorithm. For the personalized recommendation, we employed local differential privacy to protect user’s historical data and prevent predicted preferences from leakage. We also designed an intra-group transfer privacy-preserving preference aggregation algorithm called IntPPA. IntPPA could not only protect users’ privacy but also defend against long-term observation attacks. Moreover, we presented several experiments to measure the privacy effect and usability of our proposed scheme. The effect and efficiency of our proposed scheme on the MovieLens and FilmTrust datasets show our advantages.

Data Availability

Our datasets are from MovieLens-1M dataset. The MovieLens-1M data supporting our recommendation system are from previously reported studies and datasets, which have been cited [28].

Conflicts of Interest

The authors declare that they have no conflicts of interest.

Acknowledgments

This work is supported by the National Natural Science Foundation of China (61932015, 61872441, 61872100), the National Key R&D Program of China (2017YFB0802203), and the Youth Innovation Promotion Association, Chinese Academy of Sciences (2018196).