Abstract

With the ever-increasing popularity of mobile computing technology, a wide range of computational resources or services (e.g., movies, food, and places of interest) are migrating to the mobile infrastructure or devices (e.g., mobile phones, PDA, and smart watches), imposing heavy burdens on the service selection decisions of users. In this situation, service recommendation has become one of the promising ways to alleviate such burdens. In general, the service usage data used to make service recommendation are produced by various mobile devices and collected by distributed edge platforms, which leads to potential leakage of user privacy during the subsequent cross-platform data collaboration and service recommendation process. Locality-Sensitive Hashing (LSH) technique has recently been introduced to realize the privacy-preserving distributed service recommendation. However, existing LSH-based recommendation approaches often consider only one quality dimension of services, without considering the multidimensional recommendation scenarios that are more complex but more common. In view of this drawback, we improve the traditional LSH and put forward a novel LSH-based service recommendation approach named , to protect users’ privacy over multiple quality dimensions during the distributed mobile service recommendation process.

1. Introduction

With the advent of mobile computing age, an increasing number of web services are migrating to the mobile infrastructure or devices (e.g., mobile phones, PDA, and wearable devices), producing a wide range of mobile services that cater for the mobile environment, such as mobile payment and mobile shopping [13]. This rapid growth of number of mobile services, on one hand, provides more service candidates that can satisfy the users’ functional or nonfunctional requirements and, on the other hand, places a heavy burden on the users’ service selection decisions as selecting an appropriate mobile service is really a tiresome and time-consuming job [4, 5]. In this situation, various recommendation techniques are proposed to alleviate such a burden on users, such as the well-known collaborative filtering (CF). Typically, through analyzing the historical service usage data generated from past service invocations, CF can find out the similar friends of a target user and then make appropriate service recommendation based on the taste of the found similar friends, as they often hold the same or similar preferences.

However, in the mobile environment, the historical service usage data generated from mobile terminals are often collected and stored in various fog platforms, instead of being transferred to the remote cloud platform directly, due to the big volume of data and heavy transmission cost. In this situation, the historical service usage data used to make service recommendation are not centralized, but distributed, which raises a new topic of distributed service recommendation where user privacy may be exposed [6, 7]. Due to privacy concerns, a fog platform is often reluctant to share its inner data to other platforms, which significantly impede the cross-platform data collaboration and subsequent service recommendation. Therefore, how to integrate or fuse these distributed service usage data across different fog platforms while guaranteeing user privacy has become a challenging task that needs further investigation.

In view of this challenge, in our previous work [811], Locality-Sensitive Hashing (LSH) technique is introduced into distributed service recommendation so as to protect the private information of users. However, only one QoS (quality of service) dimension (e.g., response time) of mobile services is considered in the previous work, while, in this paper, we further extend our work by considering multiple quality dimensions simultaneously and further put forward a novel service recommendation approach that can protect user privacy over multiple QoS dimensions, named . Our proposed approach can significantly extend the applicability of traditional LSH-based service recommendation.

In general, our contributions are twofold.

(1) We improve the traditional LSH-based service recommendation approach by considering multiple QoS dimensions, instead of only one QoS dimension, so that the applicability of LSH in recommendation domain can be enlarged significantly.

(2) A wide range of experiments are deployed on a real distributed service quality dataset, that is, WS-DREAM, to validate the feasibility of our proposed approach. Experiment results show that our proposal outperforms the other state-of-the-art approaches in terms of efficiency and accuracy while guaranteeing user privacy.

The remainder of this paper is organized as follows. In Section 2, we formulate the multiple QoS-aware distributed service recommendation approach and demonstrate the motivation of our paper. In Section 3, we briefly introduce the main idea of LSH technique and then propose a novel multiple QoS-aware distributed service recommendation approach, that is, . A set of experiments are conducted in Section 4 to validate the feasibility and advantages of our proposal. Related work and comparison analyses are presented in Section 5. Finally, in Section 6, we summarize the paper and point out the future research directions.

2. Formulation and Motivation

In Section 2.1, we formulate the multiple QoS-aware distributed service recommendation approach; afterwards, in Section 2.2, we present an intuitive example to demonstrate the motivation of our paper.

2.1. Problem Formulation

The distributed mobile service recommendation problem with multiple QoS dimensions can be specified by a five-tuple SerRer (FOG, MS, U, , Q), where(1)FOG = denotes the fog platform set in which each platform consists of a set of mobile services;(2)MS = represents the set of mobile services that are ready to be recommended to potential users;(3)U = denotes the set of users in the user-service invocation network;(4) is a target user to whom the recommender system intends to recommend services. Here, holds;(5)Q = represents the set of QoS dimensions of mobile services.

With the above formulation, we can specify the multi-QoS-aware mobile service recommendation problems in the distributed fog environment as follows: according to users (U)’ historical QoS data (Q) over mobile services (MS in different fog platforms (FOG), predict the future quality of mobile services that are never invoked by and recommend the quality-optimal mobile service to . As a user’s historical service quality data are often distributed in different fog platforms, the service recommendation process often calls for a cross-platform data collaboration process during which the private information of users (e.g., the historical QoS data observed by a user) may be exposed to the outside. In this paper, we will investigate how to make accurate service recommendations while guaranteeing users’ privacy over multiple QoS dimensions.

2.2. Motivation

In this subsection, an intuitive example is presented in Figure 1 to motivate our paper. In Figure 1, there are two users, that is, and u1, whose ever-invoked service quality data are located in Microsoft and SAP fog platforms, respectively; each candidate mobile service has k QoS dimensions, that is, . Then according to collaborative filtering theory, if a recommender system intends to recommend new services to , it is necessary to calculate the user similarity (e.g., the well-known PCC distance) between and u1 by fusing or integrating the historical QoS data across both Microsoft and SAP fog platforms. However, such a cross-platform collaboration process may reveal the private information of and u1, which significantly impedes the recommendation process for .

In view of this challenge, we introduce LSH technique into the cross-platform and privacy-preserving service recommendation process and modify it to adapt the recommendation scenario where multiple QoS dimensions of mobile services are involved. We will introduce our proposal in detail in the next section.

3. Privacy-Aware Multidimensional Mobile Service Recommendation

In this section, we first briefly introduce the LSH technique recruited in our proposal; afterwards, we modify traditional LSH to adapt the multi-QoS-aware service recommendation scenario so as to tackle the privacy-preservation problem raised in the distributed fog environment.

3.1. Locality-Sensitive Hashing (LSH)

Locality-Sensitive Hashing, an fast lookup technique of approximate nearest neighbor (ANN) search for massive and high dimensional data, was put forward by Alex Andoni in 1999 [12]. The privileged advantage of LSH in ANN search is that it possesses a good characteristic of “similarity-keeping.” Namely, two points that are close to each other in the raw data space would be projected into the same bucket of a hash table after LSH process, with high probability; on the contrary, for two points which are far away from each other in the raw data space, they would be projected into different buckets after LSH process, with high probability. This way, we can utilize the hash values with little privacy to realize the ANN search, so that the private information can be protected during the search process.

The main idea behind LSH can be illustrated more intuitively by Figure 2 where the general hashing and LSH are compared. In Figure 2(a), denotes a hash function; through general hashing, the raw data points are projected into corresponding buckets of a hash table, that is, , without following the “similarity-keeping” rule. In contrast, the hashing projection process in Figure 2(b) follows the “similarity-keeping” rule; namely, the blue point and yellow point that are close to each other are projected into the same bucket (i.e., in Figure 2(b)) with high probability, while the green point and red point that are far away from each other are projected into buckets b1 and , respectively. Thus if a target user (marked with green five-star shape) is projected into bucket b1, then we can conclude that the target user is similar to the green point with high probability. This is the main idea of ANN search based on LSH. In this paper, we use this search idea to find the similar friends of a target user.

3.2. Multi-QoS-Aware and Privacy-Preserving Distributed Service Recommendation Approach Based on LSH:

In this subsection, we introduce our proposed multidimensional service quality prediction and recommendation approach, that is, . Concretely, approach consists of the three steps in Algorithm 1. Here, u and ms denote a user in set U and a mobile service in set MS, respectively; denotes a LSH function; L and r denote the number of hash tables and the number of hash functions in each hash table, respectively.

Step 1 (build multi-dimensional user indices offline). For each , according to his/her observed QoS data
over dimensions , build his/her index (denoted by ) offline based on a LSH function family =
. Repeat the above index building process times so as to obtain hash tables.
Step 2 (search for similar friends of online). For each , compare and online; if
holds in any of the hash tables, then u can be regarded as similar with and put into set
.
Step 3 (service recommendation). For each ms (MS) never invoked by , predict its quality over
dimensions by , denoted by , based on set
obtained in Step 2. Finally, return the quality-optimal mobile service to .

Step 1 (build multidimensional user indices offline). In this step, we build the index for each user based on u’s historical QoS data over dimensions . Concretely, u’s historical QoS data can be specified by matrix in Figure 3, where denotes the number of candidate mobile services. Then the index for user can be calculated by in Figure 3 where r denotes the number of hash functions in each hash table. Here, denotes a Boolean hash value of user u, which can be calculated by the hash function in (1)-(2). In (2), denotes the transpose operation of vector X; symbol “” denotes the dot product of two vectors; is a random value in range [13] (according to the LSH theory, each kind of distance metric is corresponding to a specific LSH function; as Pearson Correlation Coefficient (PCC) is often taken as the similarity metric in recommender systems, we adopt the LSH function corresponding to PCC distance here; i.e., is randomly selected from ). Thus we can get a Boolean hash value through the following:Repeat the above process until the user index in Figure 3 is obtained, after which a hash table (denoted by H_Table) is obtained. Then we repeat the hash table building process offline so as to derive L hash tables, that is, . The advantages of hash tables are twofold: first, a hash table only stores or records the less sensitive hash values of users, without revealing the private information of users; second, hash tables can be built offline before a service recommendation request arrives, through which the efficiency of further service recommendation can be improved significantly.

Step 2 (search for similar friends of online). In Step 1, we have derived the index for each user , that is, . Likewise, in this step, we can obtain the index for the target user, that is, . Next, if holds in any of the L hash tables, then we can conclude that u is a similar friend of with high probability according to the LSH theory; so we put u into a new set which contains all the similar friends of the target user.

Step 3 (service recommendation). In this step, we make mobile service recommendation to the target user based on ’s friend set Friend derived in Step 2. Concretely, for each mobile service (denoted by ) never invoked by before, we predict its quality over dimension by , denoted by , based on (3). Here, denotes ’s quality over dimension observed by user u. Next, we predict ms’s comprehensive quality (denoted by ) by considering all the k quality dimensions . Concretely, can be calculated by (4), where denotes the normalization operation for eliminating the interference of different quality units. Finally, we select the candidate mobile service with the optimal predicted quality and return it to the target user, so as to finish the whole distributed mobile service recommendation process.

4. Experiments

In this section, a set of experiments are designed and tested to validate the feasibility of our proposed service recommendation approach, that is, .

4.1. Experiment Configurations

Our experiments are deployed on a real-world distributed service quality dataset WS-DREAM [14] which collects real-world QoS data of 5825 web services observed by 339 users. To simulate the distributed service recommendation scenario, we take each country that hosts a set of services as an independent fog platform. The service QoS matrix in WS-DREAM is very dense, so we randomly remove 90% entries from the matrix to simulate the missing QoS prediction and service recommendation requirements. Moreover, two quality dimensions of services are considered, that is, response time and another one whose QoS values are randomly generated according to the range of response time values (although WS-DREAM provides QoS data of two quality dimensions, response time and throughput, the QoS data distribution of throughput is very skew which makes it not suitable for LSH-based ANN search very much according to the LSH theory).

The following two evaluation criteria are tested in the experiments:

(1) time cost: time consumed for generating the final recommended service.

(2) MAE (Mean Absolute Error, the smaller the better): average difference between the predicted service quality and the real service quality of recommended service.

In order to validate the feasibility and advantages of our proposed approach, we compare our proposal with another three state-of-the-art approaches: UPCC [5], P-UIPCC [6], and PPICF [7]. Concretely, UPCC is a benchmark collaborative filtering recommendation approach, while P-UIPCC and PPICF utilize data perturbation or division technique to protect the private information of users.

The experiments were conducted on a Lenovo laptop with 2.40 GHz processors and 12.0 GB RAM. The machine runs Windows 10, JAVA 8, and MySQL 5.7. Each experiment was performed 10 times, and the average experimental results are reported.

4.2. Experiment Results

Concretely, four profiles are tested and compared. Here, m and n denote the number of users and number of services, respectively; L and r denote the number of hash tables and number of hash functions in each hash table, respectively.

Profile 1 (recommendation accuracy comparison of four approaches). In this profile, we test the recommendation accuracy of our proposed approach and compare it with the other three approaches. The experiment parameters are set as follows: m is varied from 100 to 300, n is varied from 1000 to 3000, L = 10, and . Concrete experiment results are shown in Figure 4. As Figure 4 indicates, the recommendation accuracy of our proposal is close to that of the benchmark approach UPCC. Besides, our approach outperforms PPICF and P-UIPCC in terms of recommendation accuracy; this is because in our proposal, only the “most similar” friends of a target user can be found and recruited to make service recommendation, while, in both PPICF and P-UIPCC, additional perturbation or division operation is applied on the real service QoS data so as to protect user privacy, which reduces the recommendation accuracy to some extent.

Profile 2 (recommendation efficiency comparison of four approaches). Recommendation efficiency is an important metric to evaluate the performance of a recommender system. So in this profile, we test and compare the recommendation efficiency and scalability of four approaches. Experiment parameters are set as follows: m is varied from 100 to 300, n is varied from 1000 to 3000, L = 10, and . Concrete experiment results are presented in Figure 5. As Figure 5 shows, the time cost of our approach is rather low and outperforms the other three approaches significantly; this is because most jobs (e.g., user indices building) in our approach can be done offline before a service recommendation request arrives, while the rest job (e.g., online similar friends search) can be done quickly as its time complexity is about [13]. The low time complexity means that our recommendation approach can satisfy the quick response requirements from mobile users.

Profile 3 (recommendation accuracy of with respect to L and r). LSH is essentially a probability-based ANN search technique; therefore, the recommendation accuracy of our approach is correlated with the number of hash tables (i.e., L) and the number of hash functions in each hash table (i.e., r). In view of this, in this profile, we test the relationship between recommendation accuracy and parameter combination (L, r). Experiment parameters are set as follows: m = 200, , L is varied from 6 to 14, and r is varied from 2 to 6. Concrete experiment results are illustrated in Figure 6.

As Figure 6 indicates, the recommendation accuracy of our proposal does not show a very regular variation tendency with respect to L or r, and this is because both L and r can affect the recommendation accuracy simultaneously. Specifically, our approach achieves the highest recommendation accuracy (i.e., lowest MAE value) when parameter combination (L, r) = (10, 2) holds. However, we argue that our LSH-based recommendation approach is very sensitive to the experiment data; namely, (10, 2) is not always the optimal parameter combination when different experiment data are adopted.

Profile 4 (recommendation efficiency of with respect to L and r). In this profile, we test the time cost of approach with respect to L and r. Experiment parameters are set as follows: m = 200, , L is varied from 6 to 14, and r is varied from 2 to 6. Concrete experiment results are illustrated in Figure 7. As Figure 7 shows, the time cost of our proposal increases with both L and r, because more comparison operations are incurred during the online friend search process when L and r grow. However, as Figure 7 indicates, the time cost of our proposal is rather low (<0.3 s) in most cases, which means that our recommendation approach can satisfy the users’ quick response requirements regardless of the parameter values of L and r.

The distribution of service QoS data used to make service recommendation has raised the problem of distributed service recommendation and privacy-preservation. Many researchers have investigated this problem and put forward their respective privacy-preservation resolutions. In [15], the authors suggest that a user only releases partial historical QoS data to the public so as to ensure that the majority of QoS data are still secure; however, the released partial QoS data can still reveal partial privacy of the user. In [16], the authors propose a K-anonymous approach to protect the sensitive user information; however, the data availability after K-anonymity process is often reduced considerably, which decreases the service recommendation accuracy to some extent. In [17], a homomorphic encryption-based approach is proposed to make e-commerce recommendation while guaranteeing private-preservation; however, similar to other encryption-based approaches, the proposed homomorphic encryption-based privacy-preservation approach in [17] is often heavy-weight and hence cannot satisfy the light-weight service recommendation requirements from mobile users. In [6], data perturbation technique is recruited to protect the real service QoS data observed by users, so that only the obfuscated QoS data can be released to the public; however, as the QoS data used to make service recommendation has been obfuscated, the service recommendation accuracy is reduced accordingly. In [7], data division technique is adopted to protect user privacy. Concretely, each piece of QoS data is divided into several QoS segments with little privacy and then the QoS segments are employed to calculate user similarity and make service recommendations. However, this approach can still reveal partial privacy of users, such as the service intersection commonly invoked by two users.

In our previous work [811], LSH technique has been recruited to protect the private information of users while performing distributed service recommendation. However, only one quality dimension (e.g., response time) of services is considered in the existing research, while, in the actual situations, multiple dimensions are more common [1827]. In view of this drawback, in this paper, we extend our previous work by considering more quality dimensions and put forward a multi-QoS-aware mobile service recommendation approach based on LSH. Through experiments deployed on a real distributed service quality dataset WS-DREAM, we validate the feasibility of our proposal in terms of recommendation accuracy and efficiency while guaranteeing privacy-preservation.

However, there are still several shortcomings in our approach. First, we only consider the quality dimensions whose values are real and continuous, for example, response time, while neglecting some other dimensions whose values are discrete [2838], binary [39], and fuzzy [4042]. So in the future, we will investigate how to integrate the quality dimensions with different data types. Besides, in the multidimensional applications, each dimension is often assigned a weight to indicate its importance [4349]; therefore, we will take the weight information of different quality dimensions into consideration so as to make the recommendation decisions more reasonable.

6. Conclusions

The service QoS data generated from mobile devices are often first handled and filtered by different fog platforms, instead of sending the QoS data directly to a remote cloud platform, so as to reduce the data transmission cost from mobile devices to the cloud platform. However, such a cross-platform data distribution raises a novel problem of distributed mobile service recommendation as well as the resulting privacy leakage risks. Existing research often falls short in protecting the sensitive information of users during the distributed service recommendation process, especially when multiple QoS dimensions are involved. In view of this drawback, we put forward a multi-QoS-aware mobile service recommendation approach so as to protect user privacy in the distributed fog environment. Through a set of experiments on a real-world distributed service quality dataset WS-DREAM, we validate the feasibility of our proposal in terms of service recommendation accuracy and efficiency while guaranteeing privacy-preservation. In the future, we plan to improve our approach by taking into consideration the data types and weight of different quality dimensions.

Conflicts of Interest

The authors declare that they have no conflicts of interest.

Acknowledgments

This paper is partially supported by Open Project of State Key Laboratory for Novel Software Technology (no. KFKT2016B22).