Abstract

The Internet of Things (IoT) is one of the latest internet evolutions. Cloud computing is an important technique which realizes the computational demand of largely distributed IoT devices/sensors by employing various machine learning models. Gradient descent methods are widely employed to find the optimal coefficients of a machine learning model in the cloud computing. Commonly, the data are distributed among multiple data owners, whereas the target function is held by the model owner. The model owner can train its model over data owner’s data and provide predictions. However, the dataset or the target function’s confidentiality may not be kept in secret during computations. Thus, security threats and privacy risks arise. To address the data and model’s privacy mentioned above, we present two new outsourced privacy-preserving gradient descent (OPPGD) method schemes over horizontally or vertically partitioned data among multiple parties, respectively. Compared to previously proposed solutions, our methods improve in comprehensiveness in a more general scene. The data privacy and the model privacy are preserved during the whole learning and prediction procedures. In addition, the execution performance evaluation demonstrates that our schemes can help the model owner to optimize its target function and provide exact prediction with high efficiency and accuracy.

1. Introduction

The Internet of Things (IoT) is the latest internet evolution which provides multifarious novel digital, smart services and products by integrating abundant devices into networks [1]. It enables the communication between the physical world and the cyberspace [2]. IoT system contains radio-frequency identifications, wireless sensor networks, and the cloud computing [3]. Cloud computing realizes the computational demand of large-scale distributed IoT devices or sensors through various machine learning methods. Since IoT devices have tiny memory, the collected data are required to be stored and managed by the cloud servers [35]. Data can be downloaded from the cloud for different purposes such as machine learning. However, since there may exist sensitive data such as physiological data, location data, and some other data which are closely related to our personal information [6], this exposes the data to security breaches. Therefore, IoT not only provides convenience but also brings about security and privacy issues [7]. How to deal with security, privacy, and trust has been one of the main barriers in developing IoT in the real world [8, 9]. Most of the existing work on the protection of sensitive data is based on the secure communication channels and authorization [10]. In our paper, we focus on the protection of sensitive data in machine learning or deep learning. The data can be protected during the transmission phase, the computation phase, and the prediction phase. Furthermore, the computation and prediction results’ privacy can also be preserved.

In machine learning or deep learning, the prediction function is usually called the decision model. The model coefficients’ quality determines the accuracy of the model. In order to minimize the error of the model, the optimal coefficients are indispensable. This process is called model learning. Gradient descent methods are effective methods to find the optimal coefficients of the decision model, such as linear regression, hyperplane decision classification, and neural networks. Gradient descent methods conclude four types: classical gradient descent method (GD), stochastic gradient descent method (SGD), minibatch stochastic gradient descent method (minibatch SGD), and momentum. Through these methods, the optimal prediction function can be obtained after several iterations.

In the cloud computing, the cloud server offers huge storage and computing capacity. The model owner initializes the prediction function, and the training data are distributed among different data owners who hope to get desired results with these data by cloud servers without exposing their privacy. These data form an enormous training dataset which is divided into different disjoint subsets held by different data owners. The dataset partition can be horizontal or vertical. The number of data owners can be two, even more than two. As is known to all, the channel transmission is not secure in our real life. In addition, data owners, the model owner, and the cloud server do not trust each other. When they train a decision model together, they worry about that any other participant may get information from their own data. So, they encrypt their training data or the decision model with their own public keys or blind their data to preserve confidentiality before delivering them to the cloud server. The training data and the decision model can be kept confidential during the whole cloud computing. After finishing training the decision model, the model owner learns the model securely based on the training dataset with the help of the cloud server. At this time, the clients can get the prediction about their request data from the cloud server according to this decision model.

At present, although a lot of researchers focus on the data privacy protection or the model privacy protection when gradient descent methods are utilized to optimize machine learning models, few schemes can provide both data privacy and model privacy at the same time. Beyond that, some privacy-preserving gradient descent schemes can protect data owners’ privacy, but they are not applied to an outsourcing computation. In addition, the dataset’s partition is usually horizontal or vertical in the distributed system. In many previous literature studies, few schemes can be applied to two different partitioned datasets at the same time. Besides, both training data and the decision model are held only by data owners rather than the model owner. In fact, it is more practical that the models are held by the model owner rather than the data owner. Motivated by the above, we construct two novel outsourcing gradient descent methods to solve these problems.

Generally speaking, it is necessary to preserve the privacy of the training data, the decision model, and the request data during the model training. Assume that there exists a training dataset , and the corresponding label vector is . Each row of the dataset represents one sample with a set of attributes. By , we denote the prediction function which maps the sample into its corresponding category label . According to the partition of the dataset, each data owner has part of data samples or part of the attributes. The model owner holds the coefficients of the prediction function . The target of data owners and the model owner is to minimize the error of the prediction function and obtain the optimal coefficients ultimately through the gradient descent methods. Thus, the model owner holds the optimal decision model. Then, it can provide the client accurate prediction. In this paper, we focus on outsourced gradient descent methods over distributed data among multiple parties which conclude data owners, the model owner, the cloud server, and the client. Both horizontal and vertical partition of the dataset are discussed. For the horizontally partitioned dataset, two or multiple data owners hold different samples with the same attributes, whereas two or more data owners hold all same samples but with different sets of attributes when the dataset is vertically partitioned.

1.1. Contributions

To address the privacy when performing gradient descent methods by multiple parties via the cloud computing, we propose two OPPGD schemes over horizontally or vertically distributed data. Our main contributions of this paper are summarized as follows:(1)We design an outsourced privacy-preserving scalar product (OPPSP) algorithm. The cloud server computes the inner product of two vectors encrypted under different keys securely. For example, one data owner and the model owner hold one vector, respectively. Both parties first encrypt their own vector with their own key and send the encrypted vector to the cloud server. Then, the cloud server computes the scalar product of these two encrypted vectors.(2)We propose two secure and comprehensive schemes to perform OPPGD over horizontally or vertically distributed dataset, respectively. The number of data owners can be two or more than two. The prediction functions are linear regression or neural networks. The OPPGD schemes are applied to classical GD, SGD, minibatch SGD, and momentum. It is worth noting that our schemes are with higher applicability and practicability contrasted with other schemes.(3)We demonstrate that our OPPGD schemes are privacy-preserving. The computational cost and communication complexities are discussed. The analyses show that our OPPGD schemes are with high efficiency and accuracy.

1.2. Organization

The remainder of this paper is as follows. In Section 2, we discuss the related works on privacy-preserving gradient descent methods. In Section 3, we briefly introduce some preliminaries, Elgamal homomorphic cryptosystem [11], and gradient descent methods. In Section 4, we describe the system model, problem statements, the threat model, and the system requirements. We present two OPPGD schemes and prove their correctness, security, and complexity in Section 5. The performance evaluation of the schemes is analyzed in Section 6. Section 7 makes a conclusion on our OPPGD schemes.

In this section, we review works on privacy-preserving gradient descent methods among parties. According to the existence or absence of cloud servers, the existing works can be classified into two categories.

2.1. The Absence of Cloud Servers

Wan et al. [12] presented the first privacy-preserving scheme for gradient descent methods. They proposed a generic formulation of gradient descent methods by defining the prediction function as a composition . The formulation is used to perform the specific iteration-based algorithm in linear regression or neural networks. In our paper, we also use this formulation. However, the partition of the dataset discussed in their scheme [12] is only vertical. Han et al. [13] extended the scheme [12] to the horizontally distributed dataset and proposed the least square approach to perform gradient descent methods. Both schemes [12, 13] utilize a secure scalar product to gain their privacy preservation, but they cannot be applied to the outsourced model. Gabor Danner and Jelasity [14] designed a novel fully distributed privacy-preserving minibatch SGD that can avoid collecting any personal data centrally. Their scheme does not require the precise sum of gradients. A tree topology and homomorphic encryption are employed to produce a “quick and dirty” partial sum. The protocol can resist collusion attacks. Hegedus and Jelasity [15] adopted differential privacy technology to solve privacy-preserving stochastic distributed gradient descent methods. Mehnaz et al. [16] designed two secure gradient descent schemes over horizontally partitioned data and vertically partitioned data via a secure sum protocol. Later, they designed a secure gradient descent method scheme [17] without Yao’s circuits over the arbitrarily partitioned dataset. Based on output perturbation, Wu et al. [18] devised a novel “bolt-on” differentially private algorithm for stochastic gradient descent.

2.2. The Existence of Cloud Servers

Liu et al. [19] designed an encrypted gradient descent method. Both data owners and the cloud server perform operations collaboratively to learn the target function without leaking any data privacy. They extended their scheme to the outsourced model by utilizing the BGN cryptosystem. However, their protocol is only suitable for a two-party scenario. Shokri and Shmatikov [20] learnt an accurate neural network model without sharing input datasets by using the stochastic gradient descent method. After the parameter server initializes the parameter vector, it updates the parameters with the help of the cloud server without leaking any privacy. Kim et al. [21] provided a practical frame for mainstream learning models such as logistic regression. They calculated the gradient descent method securely by using homomorphic encryption, but this is inefficient. Since the required bit length of ciphertext modulus per iteration is too long, it also takes up too much space. Francisco-Javier et al. [22] realized training supervised machine learning over ciphertext. Through the gradient descent method, the server optimizes the predicted training model without exposing the data or the training model. Mohassel and Zhang [23] used the stochastic gradient descent method to construct new and efficient privacy-preserving machine learning protocols for linear regression, logistic regression, and neural network. Their protocol is involved with a two-server model. Data providers distribute their private data among two noncolluding servers, while the servers train models on the joint data through secure two-party computation techniques. Li et al. [24] also presented a multikey privacy-preserving deep learning scheme in the cloud computing environment. Their protocols realize outsourced multilayer backpropagation network learning via the gradient descent methods. Ma et al. [25] took advantage of two noncolluding servers’ framework to build a new outsourced model of the privacy-preserving neural network. However, the model owner can only make prediction rather than learning the model itself.

2.3. The Other Works on Privacy Preservation for Machine Learning

Aside from the above privacy-preserving gradient descent methods, there are also plenty of other works on privacy-preserving computation over distributed data among multiple parties under the cloud environment. Liu et al. [26] constructed an efficient privacy-preserving method to compute outsourced data. They [27] also proposed a privacy-preserving outsourced calculation toolkit, which allows data owners to securely outsource their data to the cloud for storage and calculation. Rady et al. [28] designed a new architecture that achieves the confidentiality and integrity of query results of the outsourced database. Yu et al. [29] devised a verifiable outsourced computation scheme over encrypted data by employing fully homomorphic encryption and polynomial factorization algorithm. Chamikara et al. [30] presented an efficient and scalable nonreversible perturbation algorithm of data mining without leaking privacy of big data via optimal geometric transformations. Li et al. [31] proposed a novel outsourced privacy-preserving classification scheme based on homomorphic encryption. In their scheme, multiple parties outsource securely their sensitive data to an untrusted evaluator for storing and processing. Li et al. [32] devised a novel scheme for a classifier owner to provide users with the privacy-preserving classification service by delegating a cloud server. However, they focus on two concrete secure classification protocols: naive Bayes classifier and hyperplane decision classifier. Park et al. [33] described a privacy-preserving naive Bayes protocol. No intermediate interactions are required between the server and the clients. Hence, their protocols can alleviate the heavy computational cost of fully homomorphic encryption. Li et al. [34] proposed an outsourced privacy-preserving C4.5 decision tree algorithm over both horizontally and vertically partitioned datasets. They used the BCP cryptosystem to present an outsourced privacy-preserving weighted average protocol. Rong et al. [35] presented a series of privacy-preserving building blocks for verifiable and privacy-preserving association rule mining under the hybrid cloud environment. Li et al. [36] used an efficient homomorphic encryption with multiple keys to design an outsourced privacy-preserving ID3 data mining solution. Xue et al. [37] built a differential privacy-based privacy-preserving classification system for secure edge computing. Yang et al. [38] realized privacy-preserving medical record sharing in the cloud computing environment. Kaur et al. [39] devised an efficient privacy-preserving collaborative filtering for the healthcare recommender system over arbitrary distributed data. In our work, we aim at designing outsourced privacy-preserving gradient descent methods among multiple parties. To the best of our knowledge, there has not been any work which addresses the issue comprehensively.

3. Preliminaries

In this section, we introduce some preliminaries for our outsourced privacy-preserving gradient descent schemes.

3.1. The Elgamal Homomorphic Cryptosystem

The Elgamal cryptosystem [11] comprises the following algorithms: preparation, key generation, encryption, and decryption:Preparation : given a security parameter . The system generates the public parameter as follows. The system first chooses a large prime number and a random number less than. And it publishes the multiplicative cyclic group of prime order with the generator . The public parameter KeyGen (): taking PP as the input, each party randomly selects a number less than as its private key and computes as its public key.: selects a random integer which is coprime to and encrypts its plaintext with the public key to generate the ciphertext : each party decrypts with its secret key and obtains the plaintext . The decryption process is

Its correctness is early confirmed.

The semantic security of the Elgamal cryptosystem is based on the hardness assumption of discrete logarithm problems over finite fields.

3.2. The Key Conversion System

As for the secure outsourced computation over the dataset among multiple parties, the essential difficulty is how to deal with different ciphertexts encrypted under different keys which are sent from multiple parties. Based on Gentry’s fully homomorphic encryption [40], we transform the ciphertext under different keys into the ciphertext under the same key. Take two parties, Alice and Bob, as an example. Assume that their respective key pairs are and . For a plaintext , its ciphertext encrypted under key is . The goal is to switch encrypted into a new ciphertext which is encrypted under the public key . The conversion can be divided into the following steps:Rekey generation (): taking and as the input, it outputs the rekey , where is the -th binary representation of Reencryption (, ): taking public key and ciphertext as inputs, it outputs , where Evaluation algorithm (, ): taking the public key , rekey , ciphertext , and a decryption circuit , it outputs

3.3. Gradient Descent Methods

Assume that is the dataset of data samples, , where the vector presents the -th sample’s attributes and denotes the target attribute. The goal is to determine a prediction function such that is as close to as possible. Thus, when one makes prediction about the test data, the basic strategy is to make the prediction function to produce the smaller error. Gradient descent methods are always applied to search ’s optimal coefficients. The technique can minimize the prediction error. The whole process can be described as follows. At the beginning, one determines the loss function , randomly initializes a coefficient vector of , and calculates the current error about the learning dataset. If the current error is not ideal, one can take the derivative of with respect to the vector, modify the coefficient vector, and update based on the derivative. Then, one recalculates the loss and repeats optimizing its model until the minimum error appears. To this end, one can generate the optimal value through several iterations.

There are four main gradient descent methods, such as classical GD, SGD, minibatch SGD, and momentum. In classical GD, the loss function is determined by all samples in each iteration which leads to high computational complexity. For SGD, the loss function is determined by a random sample every iteration which reduces computing overhead. However, this method has one weakness that, sometimes, the final coefficient vector may be the local optimal value rather than the global optimal value. When the loss function is determined by a batch of random samples every iteration, the gradient descent method is called minibatch SGD. The minibatch SGD has classical GD’s and SGD’s advantages and overcomes their weaknesses. So far, SGD is the most widely applied in machine learning. Momentum is the latest gradient descent method which greatly improves the accuracy and speed of the prediction. Beside the learning rate , the coefficient vector in momentum contains a new parameter , the attenuation rate. However, our schemes can be applied to the above four main gradient descent methods.

The error function of every sample is . Given arbitrary samples, the loss function is

The prediction function is a composition function of two functions and , where is any differentiable function and is a linearly separable function: , where is the coefficient vector of the prediction function. When , the method is SGD, when , the method is minibatch SGD, whereas when , the method is GD. Subsequently, we update the coefficient vector , where and is a constant parameter called learning rate. When the coefficient vector is , where is a constant parameter called attenuation rate, this method is momentum.

For each sample , there is a derivative . Thus, we calculate

As the function changes, is also different. Here, we discuss two specific functions used in linear regression and neural network.

In linear regression, the prediction function for an arbitrary sample is . Then,

In neural networks, is also called as activation function that is a sigmoid function, , or tanh function, . If the function is a sigmoid function, the prediction function for an arbitrary sample is . Then,

Through the Taylor expansion formula, the function can be expanded into a polynomial . Then, we have

4. Models and Requirements

4.1. System Model

As shown in Figure 1, the system comprises five entities: data owners, a model owner, a cloud server, a key conversion server, and a trusty decryption server. Each entity is described as follows:Data owner (DO): after receiving the public parameter PP, each DO generates their own key pair and encrypts their data. Then, DOs send their respective ciphertext to the cloud server, depicted as Step 1 in Figure 1. After MO has finished training the model, one DO can request the CS and MO to make prediction.Cloud server (CS): assume that CS can provide DOs and MO with unlimited computation and storage service. After receiving vectors encrypted by every DO and MO, the CS executes the OPPSP algorithm and finally sends the encrypted results back to DOs as Step 2.Model owner (MO): MO holds the target function which contains the coefficient vector, learning rate, or the attenuation rate. MO encrypts the target function’s coefficients with its own key and then sends the ciphertext to the CS and executes the OPPSP algorithm. After receiving , MO updates its model until it gets the optimal coefficients. Moreover, it can provide DO with prediction services.Key conversion server (KCS): KCS runs the conversion algorithm and switches different ciphertexts encrypted under DOs’ respective keys into a new intermediate ciphertext under the same key, which is depicted as Step 3.Trusty decryption server (TDS): assume that TDS is trusty. It only provides decryption service. TDS will not conspire with other parties. After receiving new encrypted results from the KCS as Step 4, the TDS decrypts these results and performs few computations to acquire the final results. In the end, TDS sends the intermediate results back to the MO, as depicted in Step 5.

In our system model, each entity is semihonest except TDS. All the entities have some background knowledge of the attribute names, class names, and the number of their attributes. Each data owner has a part of the complete dataset, which can be partitioned horizontally or vertically. When the dataset is distributed vertically, all data owners have the class value vector. The complete attribute dataset is of size , and the target vector is represented as follows:where is ’s corresponding class value.

For the horizontally partitioned dataset, each data owner has samples with all the attributes and the corresponding class value, as described in Figure 2. For the vertically partitioned dataset, each data owner has attributes with all the samples and the corresponding class value. The data owner ’s data can be depicted as in Figure 3

The scheme consists of the preparation phase, the training phase, and the prediction phase. An overview of the scheme can be described as follows:Preparation phase: according to the public parameter PP, DOs and MO generate their respective key pairs. They also share a secret value in advance. Then, DOs encrypt their dataset with their respective keys, while MO encrypts the coefficient vector of its model with his public key. Then, DOs and MO send their ciphertext to CS, respectively.Training phase: CS performs the OPPSP algorithm and sends the results back to DOs. Next, DOs perform decryption and send the results to the KCS. KCS switches these encrypted results and sends the final results to the TDS. TDS decrypts the results and sends the results to the MO. MO can update the coefficients to optimize the model.Prediction phase: with the help of the CS, the MO makes prediction for the DO’s query.

4.2. Problem Statement

Let be the dataset of data owner . All datasets are disjoint and composed of the complete dataset. Each dataset is of size , where the integers and . If the dataset is partitioned horizontally, . If the dataset is partitioned vertically, . MO holds the coefficient vector of the target function and the target vector

Our goal is to train the MO’s target function with DOs’ datasets. MO needs to get to optimize the coefficients of the target function after renewing coefficients over MO’s coefficients and DOs’ datasets. We discuss two kinds of machine learning methods: linear regression and neural network. For linear regression, each MO’s task is to obtain encrypted of every sample with the help of the CS. For the neural network, each MO’s task is to obtain for every sample . After getting the results , MO chooses one gradient descent method to refresh its coefficients. In the end, MO can provide accurate prediction services about the query through its optimal target function.

Since each encrypts with its public key and MO encrypts its coefficients with its public key , CS performs computations only over encrypted data. TDS performs the decryption algorithm of the final results, while DO and MO share a secret value . This can prevent the TDS from getting the information about the coefficients.

4.3. Threat Model

Assume that all the entities except TDS are semihonest, honest-but-curious. In other words, these entities follow the protocol, but they may try to obtain as much as secret information from the message which they receive.

Consider two kinds of adversaries in this model: an external adversary and an internal adversary. An external adversary may obtain some information, i.e., encrypted data or encrypted results, during every iteration via public channels. An internal adversary could refer to a malicious data owner DO, the model owner MO, the cloud server CS, or the key conversion server KCS. The goal of a malicious DO is to extract the coefficients of target function . An internal adversary KCS tries to extract the intermediate results and the MO’s coefficient vector, while the goal of an adversary MO is to reveal the information of each DO’s partitioned dataset. In addition, if the CS is an internal adversary, it tries to acquire MO’s coefficients or DO’s datasets.

4.4. Privacy Requirements

In the outsourced gradient descent schemes, privacy preservation is essential. In our model, we assume that the cloud server is semihonest. In order to measure the extent of privacy preservation, now, we define two privacy preservation levels.

Definition 1. Explicit privacy leakage means that privacy may be exposed during the computation of the cloud server or among the message transmission over public channels. If an outsourced computation scheme can prevent the explicit privacy leakage, we call it achieving the level-1 privacy.

Definition 2. Implicit privacy leakage means that one’s privacy may be leaked by deducing from results of the cloud server. If an outsourced computing scheme can prevent the implicit privacy leakage, we call it achieving the level-2 privacy.
In our OPPGD scheme, DOs’ data and MO’s coefficient vector are uploaded to the cloud server in the ciphertext. Explicit privacy leakage means that DOs’ data or MO’s coefficient vector and final desired results are leaked during the scheme. Implicit privacy leakage means that it is impossible to deduce DOs’ data or MO’s coefficient vector from intermediate results. Our OPPGD schemes can realize level-1 privacy or level-2 privacy.

5. Two OPPGD Schemes

In this section, we present two outsourced privacy-preserving gradient descent schemes over horizontally partitioned data or vertically partitioned data. For simplicity, we make the following assumptions. When data are horizontally partitioned, each DO has only one record with all the attributes and the class value. When data are vertically partitioned, each DO has one attribute of all the samples and the corresponding class vector. An outsourced privacy-preserving gradient descent scheme is composed of the preparation phase, the training phase, and the prediction phase. Now, we first describe the OPPGD scheme over horizontally partitioned data.

5.1. OPPGD Scheme over Horizontally Partitioned Data
5.1.1. Preparation Phase

The phase is involved with several essential algorithms, parameter generation, key pair generation, and encryption.Step 1: the system runs Algorithm 1 to generate and Step 2: after receiving the PP, DOs, MO, and TDS operate Algorithm 2 to obtain their own key pair , , and Step 3: DO encrypts its and to be and . Then, MO encrypts its coefficient vector to be by Algorithm 3, where and

Input: the security parameter
Output: the public parameter , a secret value
(1)generate a prime , choose a primitive element in
(2)generate a secret value
(3)end
Input: the public parameter and a secret value
Output: the key pair
(1)choose
(2)compute
(3)end
(4)return
Input: the key pair , a message , and a random integer which is a coprime to
Output: the encrypted message
(1)choose a random integer which is a coprime to
(2)compute ,
(3)end
(4)return and
5.1.2. Training Phase

Step 4: each DO sends their encrypted and to the CS, and MO sends to the CS.Step 5: CS operates Algorithm 4 and obtains the encrypted scalar product vector after receiving , , and from DOs and MO, where . In addition, CS also makes some other computations over some components of . To be specific, CS computes and in the linear regression model or computes , , , and in the neural network model, Step 6: CS sends the above encrypted results to the DO.After receiving encrypted scalar product , DO performs decryption operation. The TDS and the MO perform decryption as shown in Algorithm 5Step 7: once DOs receive encrypted results or , , , from CS, DO runs Algorithm 5 to get the new ciphertext:Step 8: DO blinds above ciphered data with the security parameter to be in the linear regression model or , , , in the neural network model.Step 9: DO sends these blinded encrypted results to the KCS.Step 10: KCS operates Algorithm 6 to convert the blinded encrypted results , or , , , to be new results and in the linear regression model or , , and in the neural network model,which are all encrypted under the TDS’s key Step 11: subsequently, the KCS sends the above intermediate results , or , , , to the TDS.Step 12: TDS runs Algorithm 5 and gets whereand then TDS makes some simple computations: in the linear regression model or in the neural network model to get the final results or Step 13: TDS sends or to the MO.

Input: two encrypted vectors and , where and
Output: encrypted scalar product
(1)CS computes
(2)end
(3)return
Input: an encrypted message , its corresponding key pair , and ciphertext
Output: m
(1)compute:
(2)end
(3)return
Input: the ciphertext of the message with the original public key , the decryption circuit of the original encryption, and the target key
Output: the ciphertext of the message ms.
(1)compute:
(2)end
(3)return
5.1.3. Prediction Phase

In this phase, DO requests prediction with the help of the CS and MO.Step 14: MO receives or and removes the security parameter to obtain different of each sample Step 15: then, the MO chooses one gradient descent method and then optimizes its coefficient vector through Algorithm 7Step 16: each of the DO encrypts a query feature vector , and the MO encrypts its optimal coefficient vector Step 17: each of the DO and MO sends and to the CS, respectively.Step 18: finally, MO, CS, and DO operate together to help the DO to extract the prediction results by operating subprotocol prediction (Algorithm 8).

Input: the update information , the coefficient vector , the learning rate , and the attenuation rate
Output: the renew coefficient vector
(1)compute or
(2)end
(3)return
Initialization: DO’s encrypted query feature vector , the corresponding key pair (, ), MO’s encrypted coefficient vector , and the corresponding key pair (, ), where and
Target: prediction result
Step 1: DO and MO send and to the CS, respectively.
Step 2: CS computes pr, whereas
Step 3: CS sends pr to the MO.
Step 4: MO runs Algorithm 5 and decrypts pr with its key pair (, ) and obtains
Step 5: MO sends to each DO.
Step 6: MO runs Algorithm 5 to decrypt with its key pair (, ) and gets access to the desired prediction result:
5.2. OPPGD Scheme over Vertically Partitioned Data

The OPPGD scheme over vertically partitioned data is a little different from the OPPGD scheme over horizontally partitioned data. After receiving , , and , CS executes Algorithm 4 times in the first scheme, whereas CS operates Algorithm 4 times in the second scheme. This is because one record’s attributes are sent to the CS by its DO, respectively. In addition, when the KCS receives the blinded encrypted results, it needs to add blinded encrypted results together times to get the inner product of a record and the coefficient vector. For simplicity, we omit the same steps of the OPPGD scheme over vertically partitioned data as the steps of the OPPGD scheme over horizontally partitioned data.

5.3. Scheme Correctness

Now, we prove the correctness of our proposed OPPGD scheme over horizontally partitioned data. The correctness of the other scheme can be verified in a similar manner.

Theorem 1. MO can correctly obtain to update its coefficient vector.

Proof. After receiving , , and , CS computes an encrypted scalar product , where . For linear regression, CS calculates and , whereas for the neural network, CS calculates , , , and . After receiving the encrypted results from the CS, each DO decrypts the message sent from the CS and obtains and or , , , and in linear regression or the neural network, respectively. Then, it blinds these encrypted results with to be and or , , , and and sends them to the KCS. Consequently, KCS converts the ciphertext into and or , , , and under the key of the TDS. TDS decrypts the above intermediate results through Algorithm 5 to produce and or , , , and . Then, it computes or and generates the final results or for linear regression or the neural network. Ultimately, after the MO receives them, he removes the security parameter and obtains in linear regression or in the neural network which are equal to equation (3) or equation (5), respectively. Then, MO can achieve accurate

6. Privacy and Complexity Analysis

We will analyze the privacy, computational cost, and communication overhead of the OPPGD scheme over horizontally partitioned data. We can perform analysis of the OPPGD scheme over vertically partitioned data in terms of the privacy, computational cost, and communication overhead in almost the same way. For simplicity, we omit the latter.

6.1. Privacy Analysis

According to the definitions of two different privacy levels in Section 4.4, we conduct the privacy analysis of our proposed OPPGD scheme over horizontally partitioned data.

Proof. Upon the hardness assumption of the Diffie–Hellman problem, our proposed OPPGD schemes achieve level-1 privacy against any probabilistic polynomial-time adversary.

Proof. Now, we show that our scheme can preserve MO’s model privacy and DO’s data privacy.
In Step 3 of Algorithm 3, MO and DO hide their input via Elgamal encryption. After receiving , , and , the CS runs Algorithm 4 and obtains the encrypted scalar product . Especially, MO’s and every DO’s encrypted input are and . Upon the hardness assumption of the Diffie–Hellman problem, although CS knows MO and DO’s public keys and , it is still impossible for them to acquire their secret keys and sk. Since the randomness and are chosen by DO and MO, respectively, any adversary who attempts to solve {, } from the public keys {, } will have to be faced with two instances of Diffie–Hellman problems. Thus, DO’s and and MO’s will not be exposed to other parties. When the KCS performs Algorithm 6 to convert the encrypted results { , , , }, it receives MO and DO’s secret keys encrypted under the TDS’s public key. However, TDS is a trusty decryption server, so KCS cannot obtain TDS’s secret key, which means KCS knows nothing about MO and DO’s secret keys and their private value. So, the encrypted results {} cannot leak any secret information. Next, TDS runs Algorithm 5 and obtains encrypted . However, without the secret value , TDS cannot obtain . Hence, MO’s model parameters will not be exposed.
Since MO’s coefficient vector, gradient , and DO’s data will not face the privacy problem, our OPPGD schemes can provide level-1 privacy.

Theorem 3. Upon the hardness assumption of knapsack problems, our OPPGD schemes can provide level-2 privacy against any probabilistic polynomial-time adversary.

Proof. After receiving the encrypted results from the CS, DOs run Algorithm 5 to generate new encrypted results under MO’s key. For linear regression, DO knows {, , }. For neural networks, DO knows { , , , , }. However, with the knowledge of the information, it is still impossible to acquire . This is because that the knapsack problem is assumed to be difficult: given a scalar product z and a vector a, it is hard to find vector b that satisfies
Consequently, MO’s coefficient vector and gradient results cannot be deduced from the intermediate results all over the scheme.
Therefore, we conclude that our schemes can achieve level-2 privacy.

6.2. Theoretical Efficiency Analysis

Now, we carry out the theoretical efficiency analysis of the schemes. We consider the situation for linear regression. Assume that the MO chooses the SGD method to update its coefficient vector within one epoch. In essence, MO optimizes its coefficients within several epochs. In the following, we analyze the feasibility of our proposed schemes in detail in terms of computational cost and communication overhead. Both computational cost and communication overhead are shown in Table 1

6.2.1. Computational Cost

Assume that the dataset contains n records, each of which has m attributes, and one class value in the OPPGD scheme over horizontally partitioned data. In Step 3, DOs and MO operate Algorithm 3 and times, respectively. Thus, multiplications are required. In Step 5, CS performs OPPSP to calculate encrypted scalar product S. It requires multiplications and additions. In Step 7, DO performs nm decryptions to generate the encrypted results which are under the MO’s key. In Step 8, DO needs multiplications to blind encrypted results with the security parameter . In Step 10, KCS performs multiplications to convert the encrypted results into new results. In Step 12, TDS performs decryptions and makes mn subtractions to obtain the final results. In Step 14, MO performs multiplications and obtains . MO operates SGD to update its coefficient vector by executing times of multiplications and additions. In Step 16, both DO and MO perform encryption operations to encrypt their query and the optimal coefficient vector. In Step 18, in order to generate prediction results, CS performs multiplication and () additions, while both DO and MO perform one encryption operation, respectively.

6.2.2. Communication Overhead

Next, we analyze the communication complexity of each entity in our proposed schemes. In Step 4, DO and MO communicate with CS rounds and one round, respectively. It takes bits to transmit. In Step 6, since CS sends the encrypted results to DOs, its communication overhead required is . So, it requires bits. Moreover, in Step 9, when each DO sends blinded encryption results to the KCS, the communication overhead is . Thus, bits are required to be transmitted. In Step 11, KCS sends new intermediate results to TDS via round with bits. In Step 13, the communication overhead between TDS and MO is . It costs bits to transmit. In Step 17, DO and MO send the encrypted feature vectors to CS, respectively, with the communication overhead of which costs to transmit. In Step 18, communication cost is required for the DO to obtain its desired prediction results, while bits are transmitted. Hence, the communication cost of the scheme is in total.

7. Performance Evaluation

In this section, we evaluate the efficiency of the OPPGD scheme over horizontally partitioned data by using a custom simulator built in JAVA. The running time of the OPPGD scheme over vertically partitioned data can be evaluated in a similar way. The scenario we focus on in our paper is the data are partitioned among multiple data owners, and the target function is owned by the model owner. The model owner can not only train its model over data owner’s data but also provide users with predictions. To the best of our knowledge, no other prior work in the literature discusses this scenario. So, we present detailed performance evaluation of our schemes rather than comparing them to previous works. There are five entities in the scheme: the model owner MO, the data owner DO, the cloud server CS, the key conversion server KCS, and the trusty decryption server TDS.

We run the data owners DOs and the model owner MO on a laptop with Intel Xeon(R) E5-1620 3.50 GHZ CPU processor and 16 GB RAM memory. The cloud server CS, the key conversion server KCS, and the trusty decryption server TDS sides are operated on a computer with Intel(R) Core (TM) i7-4770 3.40 GHz CPU processor and16 GB RAM memory.

In our experiments, DO’s data X are represented as one matrix, where ranges from 1000 to 6000 and . We evaluate the computational efficiency of our OPPGD schemes without considering communication latency. We simulate four stages: the KeyGen algorithm, the encryption algorithm, the training phase, and the prediction phase. As the data size changes, the corresponding time cost is also different. When the key bit-length is 2048 bits, the running time of each stage of the schemes with the number of data tuples can be seen from Table 2. The calculation of the OPPGD scheme is mainly in the training stage, while the calculation cost of the rest stages is very low. We use the histogram to explicitly present the running time in the KeyGen algorithm and the encryption algorithm in Figures 4 and 5. The running time in the KeyGen algorithm, the encryption algorithm, and the training phase is shown in Figure 6. In addition, when the data dimension is 6000, the running time mainly verified in the KeyGen algorithm, the evaluation algorithm, and the training phase based on various key bit-lengths is different. So, we simulate these stages and the running time, as shown in Table 3 and Figure 7. When the key bit-length is 2048 bits, the total running time of each entity in our OPPGD scheme is shown in our Table 4. According to the variation of the tuples or key bit-lengths, the running time of each party is shown in Figure 8

8. Conclusion

Massive work on the protection of sensitive data of IoT devices is based on the secure communication channels and authorization. In our paper, we focus on the protection of data which are collected by the IoT devices, stored, and calculated on the cloud end and the privacy of the machine learning model which is held by the MO. Gradient descent methods are employed comprehensively to train a machine learning model in the cloud computing environment. In order to preserve data privacy and model privacy during the cloud computing, we propose two secure schemes to perform outsourced privacy-preserving gradient descent methods over a horizontally or vertically distributed dataset. The proposed schemes enable the model owner (MO) to train its learning model and obtain the optimal coefficient vector based on the dataset owned by the DO with the help of CS, TDS, and KCS. After the MO improves its model, it can offer prediction service to the DO. Both the privacy of the MO’s model and DO’s dataset can be protected. Complexity and performance evaluation are also given in detail. In the future work, we will try to optimize our system to reduce the number of entities.

Data Availability

The data used to support the findings of this study are available from the corresponding author upon request.

Conflicts of Interest

The authors declare that they have no conflicts of interest.

Acknowledgments

This work was partially supported by the Ministry of Science and Technology of the People’s Republic of China (Grant no. 2018YFB0803505), the National Natural Science Foundation of China (Grant nos. 61862028 and 61702238), the Natural Science Foundation of Jiangxi Province (Grant no. 20181BAB202016), and the Science and Technology Project of Provincial Education Department of Jiangxi (GJJ160430 and GJJ180288).