Abstract

Homomorphic encryption (HE) is considered as one of the most powerful solutions to securely protect clients’ data from malicious users and even severs in the cloud computing. However, though it is known that HE can protect the data in theory, it has not been well utilized because many operations of HE are too slow, especially multiplication. In addition, existing data mining research studies using encrypted data focus on implementing only specific algorithms without addressing the fundamental problem of HE. In this paper, we propose a fundamental design and implementation of data mining algorithm through logical gates. In order to do this, we design various logic of atomic operations in encrypted domain and finally apply these logic to well-known data mining algorithms. We also analyze the execution time of atomic and advanced algorithms.

1. Introduction

With the progress of storage in the cloud server, advanced data process and analysis using machine learning and data mining techniques are developed to extract valuable information. However, the concern about the data privacy and security issues has occurred in storing and managing information in cloud servers. This is because the server must decrypt the data in order to process the data encrypted in conventional cryptosystems such as AES and DES, even though the client transmits the data to the server in encrypted form. Eventually, users must share the decryption key with the cloud, which can lead to data infringement by a malicious server.

Homomorphic encryption (HE) [1, 2] is mentioned as one of the most powerful solutions to the data security problem in the cloud, since the data can be processed in the encrypted domain without decryption. However, data analysis with HE is not so popular in real world although it is highly recommended for providing the proper security to the cloud. The major reason is the fact that it is difficult to link HE and machine learning. As known, HE is a new cryptosystem which uses profound, mathematical property with lattice, which makes it difficult for the data scientists to understand and use.

In addition, a few well-known HE algorithms support only very simple operations such as addition and multiplication between integers. Although Gentry [3] presented fully homomorphic encryption (FHE) which allows all operations on the ciphertext to be theoretically unlimited, it had many limitations in adapting to the real cloud model [4]. Since the implementation and development of the encryption algorithm are not main interest to theoretical cryptographers, the practical usage and implementation are rarely developed compared to the theoretical progress in FHE. Therefore, to date, FHE has been limited to be applied only to specific algorithms without solving the fundamental problems of FHE [511].

From this point of view, we propose a FHE computation method that can be applied more generally by using bitwise logical circuits, rather than algorithms that operate only under certain conditions. By designing the basic operations necessary for machine learning, we make a universal link between HE and machine learning. People who are studying FHE can easily apply machine learning with homomorphic operations. Furthermore, machine learning researchers will be able to run data-driven data analysis algorithms with encrypted data although they do not have the knowledge about FHE at all.

Our contribution of this paper is threefold:(i)In order to build simple data mining techniques with FHE, we design various atomic operations including absolute value operation, multiplication, comparison, and sorting through the gate operation provided by the TFHE library(ii)In contrast to the integer-based FHE scheme in which possible operations are limited, all the operations including division and can be designed in the bit-based FHE scheme(iii)We finally demonstrate the applicability of the several well-known data mining techniques using our proposed bitwise FHE schemes: the linear regression, the logistic regression, k-NN classifier, and k-means clustering

2. Background

2.1. Homomorphic Encryption

Homomorphic encryption (HE) [1, 2] is a cryptosystem in which the result of operations between ciphertexts is equal to the result of operations between plaintexts when decrypted. The operations on the ciphertexts of a and b can be expressed as where and denote encryption and decryption, respectively.

The concept of HE was first presented in 1978 by Rivest et al. [12]. Many HE schemes have been introduced since then, and the most popular one was the Paillier cryptosystem, proposed by Paillier [13] in 1999. However, they were partial HE with a limited number of operations since the encryption noise is amplified each time the operation is performed. The solution to this noise accumulation problem was the fully homomorphic encryption (FHE) of Gentry [3] in 2009. Gentry [3] proposed a bootstrapping algorithm that removes accumulated noise, thereby eliminating the limit on the number of operations. However, this Gentry [3] technique had to encrypt each plaintext bit by bit. It was a heavy burden on memory because the size of the ciphertext was so large. In addition, the bootstrapping operation was performed with a very complicated algorithm, so it took dozens of minutes to bootstrap a bit. For these reasons, many FHE libraries now use integer-based schemes, but this also has the disadvantage that the possible operations are very limited.

2.1.1. TFHE Library for FHE

In 2017, Ilaria Chillotti, Nicolas Gama, Mariya Georgieva, and Malika Izabachène proposed TFHE [14] library which is an improved version of FHEW [15] library. It has the bit-by-bit encryption scheme similar to Gentry’s initial FHE [3]. However, unlike [3], TFHE has constructed operations in a more fundamental way than addition and multiplication between ciphertexts. It is the binary circuit that was used for the encrypted bits operation. In other words, TFHE supports NOT, AND, OR, NAND, NOR, XOR, and XNOR gate operations between encrypted bits, allowing users to construct encrypted circuits using these logical operations. Another advantage of TFHE is that it efficiently solves the bootstrapping problem, which was the biggest obstacle to using FHE. This is designed to perform a bootstrapping function automatically whenever a single operation is performed, unlike the conventional FHE, in which a direct bootstrapping must be performed to remove noise each time a certain number of operations are performed. In other words, it is possible to perform computation without limitations. Here, the bootstrapping algorithm is performed with a time of less than one 0.1 second and has the fastest performance among all of the preceding FHE schemes.

In addition, through supporting the multiplexer function, convenience of implementation and speed of circuit are more improved. In the below function, a is a multiplexer factor and outputs either b or c depending on the value:

2.2. Data Mining and Machine Learning Algorithms
2.2.1. Linear Regression

Linear regression is the most popular model for predicting target value of y. It is the method that estimates the coefficients of the linear equation, involving one or more independent variables. Several types of process exist to optimize the values of the coefficients. We focus on gradient descent, iteratively minimizing the error of the training data.

2.2.2. Logistic Regression

Logistic regression is a special case of generalized linear model in which the target variable is binary such as pass or fail, live or death, etc. In general, logistic regression makes an inference on parameters of sigmoid function which determines classification of modeling binary or categorical dependent variables.

2.2.3. k-Nearest Neighbors (kNN) Classification

In data mining, the k-nearest neighbors algorithm is one of the most well-known and useful supervised methods for classifying a dataset. Given the classified data with several classes, the kNN determines the class of new input data based on its neighbors. At this time, the label of the input data is set to the largest number of labels of the closest k data. In addition, there are many ways to calculate the distance between data, typically Euclidean distance. Depending on which distance measurement method is used, different results may be obtained.

2.2.4. k-Means Clustering

Unlike kNN, the k-means clustering algorithm grasps the relationship of unlabeled data and clusters them into k clusters. The k-means clustering sets the representative value of each cluster and assigns each data to the cluster with the closest representative value. After forming clusters for k representative values initially set arbitrarily, the mean of each cluster is newly representative of each cluster. This process is repeated until the cluster converges, and finally, the data are clustered into k clusters. The result of the k-means is affected by the distance measurement method as well as the kNN.

3. Problems

3.1. FHE for Machine Learning

Although machine learning and FHE have long history, their research has been conducted separately for a long time. Recently, as the era of cloud computing comes, privacy-preserving machine learning and data mining have been introduced as a hot topic. There have been several studies on connecting FHE and machine learning [6, 7, 9, 10, 16].

However, as mentioned in Section 2, there is a limitation that it is difficult to apply FHE to machine learning algorithms because it is only possible to perform limited operations such as addition and multiplication in most libraries. Accordingly, existing machine learning studies using encrypted data have focused on implementing specific algorithms such as Naïve Bayes classifier [9] or linear regression [16]. In addition, since FHE requires complex theoretical knowledge, it is difficult for general machine learning engineers to understand its concept. Worse, in order to use the FHE scheme, we need a technique to replace all operations on plaintext with homomorphic operations.

In this paper, we focused on how to efficiently implement basic atomic operations and universal application to various machine learning algorithms. These studies will be a good mediator between FHE and machine learning.

3.2. Integer-Level Encryption vs. Bitwise Encryption

The FHE, which operates in integer space, takes scalar integers or polynomials with integer coefficients as input and then performs an operation on an integer basis. Therefore, additional integer encoding is required for real data that are not integers. Previous research studies about FHE application have used the rounding function to convert real numbers to integers for the encoding and decoding processes. Most of them used the scaling constant k before rounding to preserve the original number. In order to recover the encoded value, k must be divided from the decrypted result as follows:(1)Encoding: for a plaintext (2)Encryption: (3)Decryption and decoding:

However, there is a problem with this method, which is to use an approximation rather than an accurate data. The approximation accuracy of the data is determined by the scaling constant, and the user must also determine this constant.

On the other hand, bitwise encryption does not require encoding process to an integer because all real-valued data can be represented in bits. In addition, since the computer stores and processes data on a bit-by-bit basis, a generalized encryption scheme can be easily applied to any data.

In this paper, we introduce the logic of various operations for the bitwise encryption scheme using the TFHE library. We present a method for constructing atomic operations using the circuit operation for each bit after converting integer data into bits. Table 1 shows the logical operators used in this study and their notation.

4. Designing Homomorphic Atomic Operations

Our method uses the TFHE library, so we perform all operations on a bit-by-bit basis. This is similar to the way that binary data in a plaintext are processed by a computer using AND/OR/NAND/NOR/XOR/XNOR/NOT gates. However, since we do not know actual values to be computed, the operations should be differently designed from algorithms in the plaintext, such as using ciphertext in if-statement (for example, “If ciphertext , then follow below command”; in this case, we cannot compare ciphertext and typically). Considering these characteristics, we introduce a new design of the atomic operations in this section. The atomic operations include Addition, 2’s Complement, Subtraction, Equivalent Comparison, Large and Small Comparison, Shift, Absolute, Multiplication, and Division. Note that Addition, Subtraction, and Multiplication among these atomic operations have already been introduced in the literature [14, 17]. However, other atomic operations have rarely been studied although they are highly significant for numerical computation. We demonstrate the description and algorithms of both already and rarely studied atomic operations in this section because they are separately classified as homomorphic atomic operations from the advanced homomorphic data mining algorithms in Section 4.

All algorithms introduced in this paper are implemented and evaluated with Intel i7-7700 3.60 GHz, 8.0 GB RAM, and Ubuntu 16.04.4 LTS.

4.1. Addition Operation

Addition is one of the most basic operations. There are many ways to implement full adder circuit with basic gates such as 9 NAND gates and 7 NOR and 5 NOT gates. However, since the number of basic gates is relative to speed of the circuit in the TFHE library, addition can be more efficiently designed by using only 2 XOR, 2 AND, and 1 OR gates. More details are described in Figure 1.

In the circuit diagram of Figure 1, the least significant bit (lsb) of a and b is input to the upper bit input, and , which is the lsb of the carry, is initialized to . passing through the circuit is the sum of the corresponding bits, and is the carry of the next bit.

4.2. 2’s Complement Operation

It is necessary to express a negative number in order to perform an integer binary data operation. There are two ways to represent negative numbers in a computer, mainly the 1’s complement method and 2’s complement method. The 1’s complement method has a simpler advantage than the 2’s complement method when representing a negative number. The desired number is operated through a XOR gate with a single bit 1. The process can be replaced to taking the NOT gate for every bit of the desired number. This is because the NOT gate is significantly faster than the XOR gate. However, the 1’s complement method has two ways of representing 0, and it is necessary to use a logic different from the plaintext to perform operations such as addition and subtraction. The method of improving this is the 2’s complement method, which is represented by adding the integer 1 in the 1’s complement method. Since, when 1 is represented by a binary number, it is filled with zeros except for lsb, the carry can be added to the next bit of 1 to perform addition. Therefore, when adding, it is possible to reduce the speed by adding the NOT gate to the half adder which does not need carry, without using the previous full adder: and . This is expressed by a circuit as shown in Figure 2.

Set the number a and b =  to take the 2’s complement operation and input from each lsb. The output from the above circuit is the result of the corresponding bits; the carry is , which is the next input.

4.3. Subtraction Operation

In a typical computer environment, you can implement subtraction using the 2’s complement method and addition, so subtraction logic is not implemented separately. However, subtraction can be processed using the 2’s complement method and addition as in plaintext, but it can be newly implemented with 2 XOR, 2 AND, 1 OR, and 2 NOT gates. The detailed circuit diagrams are demonstrated in Figure 3.

Subtraction enters the input from lsb of a and b. passed through the circuit is the result of the subtraction of that bit, and is the carry of the next bit. and are defined according to their value after defining D as shown in the following equation: . In this equation, if , then and . If , then and . If , then and . And if , then and .

4.4. Comparison Operation
4.4.1. Equivalent Comparison

Equivalent comparison in plaintext compares each bit for two input values and outputs 1 if all are equal and 0 if there are other values. However, in encrypted data, it is possible to determine whether each bit is the same through an XOR gate, but since it comes out encrypted, it does not know what the value is. Therefore, to get the results we want, all the results of the XNOR gate of each bit are operated with the AND gate as shown in Figure 4. Then, is output when there are different bits in two inputs, and is output if each bit is the same value. Then, if the input values are different, is returned for the output and for the same input values.

4.4.2. Large and Small Comparison

We will explain this as a large comparison because the large comparison and the small comparison are logically similar. In a computer, large comparison is a system that outputs results when bits with different values are compared while comparing from upper bit to lower bit. However, since it is not known whether the value of the comparison of each bit is ciphertext of 1 or ciphertext of 0, it does not know which bit has a different value and which of the two numbers is larger. Thus, we have to use the new logic.

First, let us consider the sign bit of the result of subtracting the preceding number from the latter number of two inputs. If the preceding number is less than or equal to the latter number, is output and larger is output. Therefore, we will use this subtraction to make a large comparison. However, considering the speed of the circuit, we will use a method that uses a multiplexer function and XNOR gate. The detailed circuit diagrams are demonstrated in Figure 5. The result of the comparison is the result of repeating the circuit by the length of the data.

Larger than or equivalent comparison or smaller than or equivalent comparison can take a NOT gate as the result of a small comparison or a large comparison, respectively.

4.5. Shift Operation

Since the ciphertext is encrypted bitwise, it can be shifted in the same way as for the shift in plaintext. Shift the k bits to the left and fill the empty right k bits with . Shifting k bits has the effect of multiplying as shown in Algorithm 1.

Input: , k
Output: LSHIFT(a, k)
(1)for do
(2)
(3)end for
(4)for do
(5)
(6)end for
(7)return

In this algorithm, “HomCONSTANT” is a function that produces one bit ciphertext corresponding to the input value and “HomCOPY” is a function that produces the same one bit ciphertext as the result of the decryption, but different ciphertexts.

The right shift can be divided into a general shift, which is a method of shifting the upper k bits to after shifting like a left shift, and an arithmetic shift which shifts the upper k bits to the same value as the sign bit. An arithmetic shift is mainly used, and shifting k bits has the effect of dividing by .

4.6. Absolute Value Operation

In the plaintext, the absolute value algorithm outputs as it is if the most significant bit is 0 and takes the complement of 2 if the most significant bit is 1. Since the value of the most significant bit is not known in a ciphertext, a new algorithm must be designed. Let the original value be a and the value obtained by taking the complement of 2 to a be b; then, one is positive and the other is negative (except for 0). Now, let sign bit of a be a multiplexer factor, which returns a or b depending on the value:

4.7. Multiplication Operation

In general multiplication, multiplying m bits by n bits results in bits. When the two numbers to be multiplied are positive, the multiplicand is multiplied from the lsb of the multiplier to the upper bit as if it were calculated by hand. Then, the result of multiplication is the sum of all the left shifted values as the bit position of the multiplier increases. Thus, the smaller 1-bit of the multiplicand is, the more efficient it is. Therefore, we divide the multiplier by addition or subtraction to reduce the number of 1-bit as much as possible. However, as mentioned earlier, this is an algorithm that can be applied only to positive numbers, so a more advanced form of algorithm is needed to consider negative numbers. This is because, in the case of the unencrypted plaintext data, the sign of the data can be inspected by checking the msb, but in the case of the encrypted data, the value of the msb cannot be confirmed. That is, a new algorithm should be designed to output the correct result regardless of the sign of the given data. To solve this problem, we can calculate the product of positive numbers through an absolute value operation and then perform a 2’s complement operation on the result according to the sign. That is, for multiplication of a and b, we follow the below way:

Therefore, our algorithm adopts the latter method, and its circuit diagram is shown in Figure 6.

4.8. Division Operation

Binary division algorithms can be thought of as dividing input into positive cases and negative cases. First, let us consider the case where both the divisor and the dividend are positive. Let the array M, Q, and A have the same length of l, and initialize M to divisor, Q to dividend, and A to zero. The count value is the dividend length, l. And let AQ with a length of and start the main part of the algorithm.

If the divisor or dividend is negative, we need to use a slightly different algorithm. First, we can implement negative binary division algorithm by modifying Algorithm 2 slightly. However, since the sign of the input value cannot be known, when the negative binary division algorithm is implemented with a new algorithm, both algorithms must be performed and a single result should be output according to the sign of the input value. This is inefficient because it takes time to perform Algorithm 2 twice. Therefore, we will implement the signed binary division algorithm using a second method that uses absolute values and multiplexer function as in multiplication. That is, for signed binary division M and Q, we follow the following way:

Input: divisor M, dividend Q
(1)Shift the AQ to the left by one bit and let the upper l bit of AQ at A.
(2)Calculate AM and put it in A.
(3)If A is negative, the last bit of AQ becomes 0 and AM is calculated and put it in A to return to the value before step 2.
(4)If A is positive or zero, the last bit of AQ is 0.
(5)The count value is decremented by 1.
(6)If the count is not 0, the algorithm goes to step 1 and the algorithm is progressed.
(7)If the count value is 0, the result of algorithm is output (the lower l bit of AQ becomes the quotient and the upper l bit becomes the remainder).

5. Experiments

5.1. Basic Gate Experiment

We implemented the operations of Section 4 based on the basic gates and checked the speed of 1-bit basic gate operation in TFHE 1000 times.

As shown in Table 2, the basic gates except the NOT gate have the same speed, and the speed of the NOT gate is significantly lower than that of the other gates. Also, the multiplexer function is implemented differently from the basic gates so that there is a difference in speed. It can be seen that the speed of the multiplexer function is faster than the speed of computing basic gate about two times.

5.2. Number of Gates Used in Designed Homomorphic Atomic Operations

Since all gates except NOT gate and MUX gate have the same speed, we will denote execution time of these gates as . Time of the MUX gate is represented by , and the NOT gate is omitted because the speed converges to zero. Table 3 shows the number of gates used when performing designed homomorphic operations with l-bit input values for each operation.

Most of the operations listed in Table 3 are linear for data length. In shift operation, the position of bit is shifted without using a gate operation, and the number of gates in multiplication and division operations is proportional to the square of the data length.

5.3. Execution Time of the Homomorphic Atomic Operations

In Table 4, we measure the speed of the operations based on 16 bits. The speed of the shift operation is not measured because gate is not used; for nonlinear operations, we measured 8, 16, and 32 bits to see the change in speed.

Looking at the measured values, the doubling of the length of the data increases the speed of both algorithms by about four times. This is because the speed of addition, subtraction, and comparison operations constituting the multiplication and division is linearly increased with respect to the data length, and the number of iterations of the algorithm is also proportional to the length of the data.

6. Applications

6.1. Linear Regression

Given a d-dimensional input variable and its corresponding target variable for , an inference on parameters of the linear function within hypotheses is defined asfor , parameters , and number of features, . This regression describes a hyperplane in the d-dimensional space of the independent variables .

In general, the linear regression can be easily estimated by using least square estimation as follows:where , . However, in FHE, it is rather difficult to design and implement the inversion matrix of equation (6). Therefore, instead of the exact solution, we choose an approximation estimation which is based on the gradient descent update in order to avoid the calculation of inverse matrix.

The approximation estimation uses error function to optimize the parameters of both simple and multiple linear regression as follows:

The main goal of linear regression is to fit a straight line through the data, so we minimize the error function . Gradient descent is achieved by an algorithm that starts with an initial θ and repeatedly performs the update:where α is denoted by a learning rate. The parameters are updated concurrently for every iterations till convergence. Our algorithm of linear regression is given in Algorithm 3.

Input: data , , learning rate α, number of iteration
Output: Parameter θ
(1)Initialize parameter θ to FX
(2)Gradient descent part 1: calculate partial derivative of cost function
(3)Gradient descent part 2: multiply α with the value of part 1
(4)Gradient descent part 3: update θ until iteration times
(5)return each of θ’s

The method of implementing the linear regression is very similar to operation in the plaintext. However, it is calculated in an encrypted state; therefore, in an encrypted domain, we can calculate all operations in gradient descent algorithm which includes multiplication, addition, and subtraction operations. We initialized parameters θ to 0 and updated our parameters using linear regression function with FHE operations.

6.1.1. Performance Evaluation of FHE Linear Regression

We performed two experiments with varying d, the simple linear regression () and the multiple linear regression (). We set the number of data (N), the number of dimensions (d), the length of data (l), and the number of iterations of the algorithm () as factors for the linear regression algorithm. Then, the number of gates (T) can be expressed as follows:

For the simple linear regression, we set the initial values to (N, d, l, p) = (10, 1, 16, 1) for the experiment. The dataset consists of a feature vector and a target variable with 10 data created artificially, and it takes 554 seconds with 0.01 running rate. The iteration proceeded 100 steps to converge with threshold value, .

For the multiple linear regression (), we set the initial values to (N, d, l, p) = (10, 2, 16, 1) for the experiment. The dataset consists of feature vectors , , and a target variable with 10 data created artificially, and it takes 1047 seconds with 0.01 running rate. The iteration proceeded 50 steps to converge with threshold value, .

6.2. Logistic Regression

Implementation of various algorithms such as linear regression can be easily facilitated by our FHE arithmetic operations. However, logistic regression is an algorithm that holds a nonlinear function which requires variation in the equation to be calculated. Therefore, the key point of deriving FHE logistic regression lies in designing a nonlinear sigmoid function. We initially elaborate a brief derivation and structure of FHE logistic regression followed by explaining two ways of constructing logistic function.

Given an input variable and its corresponding target variable for , an inference on parameters of the logistic function within hypotheses is defined aswhere for , , and number of features, . We also denote as an element of a matrix in the th row and th column position.

The logistic regression uses likelihood function to make an estimate on weight θ. If we let and , the likelihood for a single data is. .

Finally, the likelihood function for the whole data, , is to multiply likelihood of each data. Next, operation is performed to enumerate log likelihoods in a linear combination as the follows:

In order to maximize the likelihood, , we chose to perform gradient descent algorithm that iteratively updates cost function, , where . Therefore, θ is updated with the following equation:

Existing literature [18] designed a nonlinear logistic function by two approximation techniques, namely, the Taylor series method and least square approximation. In this paper, we show feasibility of constructing two different approximation techniques based on our proposed bitwise FHE operations to perform the logistic regression.

6.2.1. Taylor Series Method

It is well-known that Taylor expansion enables a differentiable real-valued function to be expanded in a series at such that where is denoted by r-th derivative of f.

Bos et al. applied Taylor series expansion to logistic function which facilitates calculation of the nonlinear function since the altered equation incorporates only the four fundamental operations [18]. Therefore, Taylor series polynomial of degree 9 for sigmoid function can be derived as

Using our basic bitwise FHE operations that are presented in the previous section, we can construct approximate logistic function by Algorithm 4. In addition, we refer to coefficients of where , , , .

Input: a training data
Output: logistic value of w.r.t the Taylor expansion method
(1)Convert coefficient into arrays
(2)Construct power series of x to 9th power
(3)Multiply with corresponding power of x
(4)Add all the derived terms in step 3

Figure 7 illustrates approximated logistic function with respect to Taylor series expansion. Our approach guarantees a boundary of while for the existing literature [18]. This is due to the 4th and 5th coefficients that are 0 for the length of the input designated by 32 bit. This can be solved by assigning larger length to represent the coefficient numbers.

6.2.2. Least Square Approximation

Kim and Cheon et al. proposed a least square polynomial that broadens bounded domain of Taylor series expansion to [19, 20]. The underlying principle is to derive a function that minimizes mean squared error (MSE) such that where is denoted by the length of an interval.

We omit an algorithm for implementing the least square approximation with respect to our scheme since the algorithm follows a similar procedure as in Algorithm 4. The visualized comparison of the real sigmoid function with our approach and that of the existing literature [18] can be seen in Figure 8 to verify that our FHE scheme can approximate the desired function equal to the current literature.

6.2.3. FHE Gradient Descent Algorithm

When the logistic function is designed either by the Taylor series or the least square approximation technique, we are able to perform the gradient descent algorithm for parameter estimate. The process of logistic regression is indicated in Algorithm 5.

Input: training data
Output: parameter θ
(1)Set parameter θ to 0
(2)Assign learning rate α and iteration number p, respectively
(3)Calculate partial derivative of cost function for the training data
(4)Multiply by the previous outcome
(5)Update θ by the result of step 4
(6)Repeat steps 3 to 5 for times to obtain θ
6.2.4. Performance Evaluation of the FHE Logistic Regression

We implemented logistic regression with two of the strategies mentioned previously. From Algorithm 4, we claim that number of data (N), length of data (l), dimension (d), and iteration () are the principal factors of time complexity (T) for both methods. We deliver their time performances in a precise manner, where and are time complexity of the Taylor series and least square approximation, respectively:

Since the time for experiment requires fairly significant amount of time, we set number of data, dimension, and iteration to be 10, 2, and 1, respectively. The summary of time performance with respect to 16 bit is elaborated in Table 5.

6.3. kNN Classifier

The bitwise FHE method of implementing the kNN algorithm in Algorithm 6 is almost similar to that of the plaintext, except the sorting operation which is described in the next section. The conventional kNN algorithm uses Euclidean distance between data, but our algorithm replaced the distance as the sum of the absolute value for speed efficiency. Also, when sorting the calculated distances, we searched for only the k smallest values to reduce the computation time. As shown in Algorithm 6, we need to design two additional homomorphic operations for the homomorphic kNN classifier: sort of Algorithm 7 and conditional swap of Algorithm 8.

Input: training data (X, Y, l), test data (x, y), and the number of neighbors, k
Output: test label
(1)Calculate distance with training data (X, Y) and test data (x, y) with absolute value operation.
(2)Sort the smallest k distance using conditional swap operation on selection sort algorithm.
(3)Output most major label among the labels of nearest k data.
Input:
Output: SORT()
(1)for do
(2)for do
(3)  COND_SWAP()
(4)end for
(5)end for
(6)return
Input: , , S
Output: COND_SWAP()
(1) L_COMP(): large comparison
(2)
(3)
(4)
(5)return

When sorting is completed, we check the labels of the nearest k data and output the major labels. Since the label is also encrypted, it is not possible to know which label is the most major.In order to attain the most frequently used label, we first counted number of data with the same label. Since the counting numbers are encrypted, we perform equivalent compare operation of a label to the other labels. Lastly, we add all the output numbers and sort out in descending order to pick the largest number, which is our desired label. Algorithm 9 represents the pseudocode that finds the most major label among the labels of k-nearest data in our kNN algorithm.

Input:
Output:
(1)for do
(2)
(3)for do
(4)  if then
(5)   
(6)  else
(7)   
(8)  end if
(9)  
(10)end for
(11)end for
(12) for
(13)return
6.3.1. Sorting for kNN Algorithm

The kNN algorithm on encrypted domain requires sorting algorithm to find the nearest neighbors, so we design a new sort algorithm for ciphertext. Algorithm 7 represents the pseudocode to sort the numbers in arr[n] by the selection sort algorithm.

A swap operation that simply exchanges a location in a ciphertext should only change its position as in plaintext, but to apply the selection sort algorithm to ciphertext, we must decide whether to relocate it through a large or small comparison. So, we have to input the factor to determine whether to swap or not, and we call this swap operation conditional swap.

6.3.2. Conditional Swap

Conditional swap operation runs swapping if a determining factor is E[1]. Otherwise, data are not swapped. In the selection sort algorithm, if arr[i] is bigger than arr[j], it has to be swapped. Therefore, it outputs E[1] through a large comparison operation and puts it into the factor to decide whether to swap or not. If arr[i] is less than or equal to arr[j], swap will not occur because it outputs E[0] through the large comparison operation. Algorithm 8 represents the conditional swap pseudocode that takes this situation into consideration.

When S is E[1], arr[i] = NS arr[i] + S arr[j] is arr[j] and arr[j] = S arr[i] + NS arr[j] is arr[i]. Thus, swap operation has occurred. The other way, in case S = E[0], arr[i] = NS arr[i] + S arr[j] is arr[i] and arr[j] = S arr[i] + NS arr[j] is arr[j]. Thus, swap operation has not occurred.

6.3.3. Performance Evaluation of the FHE kNN Classifier

We set the number of data (N), the dimension of data (d), the length of data (l), the number of near neighbors (k), and the length of label (L) as factors of the kNN algorithm. Then, the time complexity of kNN algorithm (T) can be expressed as follows:

We set the initial values to (N, d, l, k, L) = (64, 1, 10, 3, 1) for the experiment. When conducting experiment with the initial value, it took 226 seconds. Then, we performed the experiment by changing the value of each factor one by one. As a result, because the algorithm consists solely of linear operations except k, we confirmed that the speed of the algorithm is almost proportional to the value of each factors.

6.4. k-Means Algorithm for Image Segmentation

We also performed gray color image segmentation using the k-means algorithm. The target image for the homomorphic segmentation has the 8-bit gray color of each pixel in the image, and the k-means algorithm is used to input the encrypted color value of all the pixels. In order to do this, the cloud server first obtains encrypted values of the pixels at N random locations rather than all encrypted pixels for efficient computation. Afterwards, the k-means algorithm is applied to partition N encrypted pixels into k clusters. As a result, the cloud server calculates the representative values of k clusters in a homomorphic way. After deciphering the representative values in the client’s side, the colors of all the pixels in the image are compared with the representative values, and image segmentation is performed by replacing the color with the representative value of the near cluster. Our algorithm of k-means is given in Algorithm 10; we performed the algorithm by expanding the total data size to 10 bits considering 8-bit original data, the sign bit, and addition operation.

Input: data X, the number of neighbors k, and the initial value of clusters
Output: labeled data (X, l)
(1)Obtain the distance between each cluster and the data X.
(2)For each data, label closest clusters.
(3)Initialize the cluster by averaging the data with the same cluster value.
(4)Repeat steps 1 to 3 to obtain converged clusters and return the labeled data.

In general, use the Euclidean distance when calculating the distance between two points. In this experiment, however, another method can be used because the dimension of the data is one-dimensional. After calculating the center point of each representative value of each cluster, labeling is performed without calculating the distance through comparison of the data with the values.

Since the values of the data are not known in the encrypted state, when the representative values of the cluster are given, it is not known which value is closest to the representative value. However, we can set the label to distinguish the nearest value from the representative value of each cluster. Let E[1] and E[0] denote each label. Through an AND operation of each data and its label, we can divide all data into 0 of which all bits are set to and non-0 values (if a data’s label is E[0], the result of AND operation is 0 value; otherwise, the result is non-0 value). Now, we can obtain the average of the clusters by dividing sum of data with sum of labels.

6.4.1. Performance Evaluation of the FHE k-Means Clustering Algorithm

We set the number of data (N), the length of data (l), the number of clusters (k), and the number of iterations of the algorithm () as factors for the k-means algorithm. Then, the number of gates (T) can be expressed as follows:

The algorithm took 148 seconds given the initial value, (N, l, k, p) = (64, 10, 3, 1). The experiment was set up with an input of an image in Figure 9(a), where the parameters are given as 64, 10, 3, and 10. The representative value for each cluster is recorded as 23, 56, and 170, respectively. The experiment took approximately 1,500 seconds, and the result of segmentation can be checked in Figure 9(b).

7. Discussion

In this section, we describe the limitation of our proposed approach in usage. Our proposed approach has a concern: it has extremely slow computation with large memory space.

Currently, it is true that bit-based schemes are inefficient in terms of speed and memory compared to integer-based schemes. However, integer-based schemes have a fatal disadvantage that their possible operations are limited and can only be used for specific algorithms. This is a fundamental problem and hard to improve. On the other hand, the speed of computation, which is a disadvantage of bit-based schemes, can be improved more flexibly.

Our current approach is not optimized yet, so each operation on encrypted domain is extremely time consuming. However, this problem may be addressed by accelerating the computation with a lot of state-of-the-art techniques. For instance, the atomic operations can be implemented in a hardware level rather than in a software level. FPGA and ASIC would be the good candidates for the implementation. Additionally, we can reduce the computation time by optimizing the logic and programming codes in a software level. We can also save the computation time using a graphical processing unit (GPU) and parallel computing scheme.

8. Conclusion

In this paper, we have proposed basic homomorphic arithmetic operations using bitwise homomorphic gates. We applied these bitwise homomorphic operations to several well-known data mining techniques: the linear regression, logistic regression, k-NN classifier, and k-means clustering. To implement the algorithms, we introduced advanced bitwise operations such as sorting and conditional swap, which are specific to bitwise homomorphic operations. With our proposed bitwise homomorphic atomic and additional operations, even data scientists without any knowledge of FHE can easily analyze and process data on encrypted domain.

Data Availability

The training data and image data used to support the findings of this study have not been made available because it is artificially created and also small enough so that the reader can easily create it.

Conflicts of Interest

The authors declare that there are no conflicts of interest regarding the publication of this paper.

Acknowledgments

This study was supported by the Institute for Information and Communications Technology Promotion (IITP) grant funded by the Korea government (MSIT) (no. 2017-0-00545).