Abstract

An increasing amount of active research is being conducted to protect the locations of mobile device users. Users must tune to more data than they would like to in order to hide their location. In particular, if a user requests a query over NN, the number of objects the user must receive may increase. Several studies have been proposed to solve these problems. However, problems have been identified during the course of query processing, such as errors and increased query processing times. When the tuning time is increased, the amount of data to download and the battery consumption of the client also increase. In this study, we propose the Privacy-preserving Spatial Index (PSI), an index that allows users to reduce their tuning time while being satisfied with the results of their queries. The querier (q) requests the object in the area protecting his/her location from the server. The server sends the requested data of points of interest (POIs) (DPOIs) in the Privacy-preserving Region (PR) to q. Finally, q reduces tuning time by selectively tuning to the desired data of POIs (Dw) through PSI. The superiority of PSI over previous techniques is experimentally proven.

1. Introduction

As the use of mobile devices has recently increased rapidly, the use of location-based services (LBS) based on GPS has also increased. LBS refers to various information services provided by the LBS server based on the location of mobile users, such as finding nearby points of interest (POIs), navigation, location tracking, and maps [15].

However, users must reveal their location information in order to access LBS. When the location information of a user is sent to the server, the server can precisely identify the location of the user. If the server is hacked or otherwise abused, the location of the user can be revealed, potentially causing serious damage. Active research is therefore being conducted on the protection of user locations [610]. The D-Bcast model was proposed to enable clients that have not received data from the main server or clients that have moved from other main servers to effectively listen to data [6]. The method proposed by [7] can store the received data in cache and reuse them in order to minimize the exposure of user query service data to the unreliable LBS server. Reference [9] proposes a method that can make the user location ambiguous through the location anonymity. However, [10] points out that location anonymity is also an extension of server and cannot be trusted. The cloaking method can prevent the exposure of the specific location of a user such as building information, because queries are sent from a generalized area including the user instead of the specific location of the user. If the location of a user is continuously revealed to the server, the movement path of that user can be exposed [1116]. For example, let us assume that a mobile user sends queries in a certain path from a starting point to a destination. The server can predict the moving path of the user by connecting the locations of the query points from the starting point to the destination. If this path is revealed to a malicious attacker, the living pattern as well as the home and work addresses of the user can be revealed, and other pieces of information such as the hospitals that the user has visited before can also be revealed. Thus, this can lead to privacy issues. Therefore, it is critical to protect this trajectory information, which is a set of location data, as well as the location information in general when using LBS. However, in order to hide their location, users must increase the number of areas or paths involved in query processing and tune to more data than they would like to. If the tuning time increases, the data to be downloaded by the client and the battery consumption increase. Recently, a method that decreases the number of objects that need to be checked while protecting the location and query of a mobile user has been proposed [1720]. Reference [18] proposes a method of solving the problems of users and servers. For example, users want to hide their location and the server must not want to process the queries of all users. Therefore, a system model that satisfies the needs of users and servers was developed by proposing a mobile service provider that gathers the queries of multiple users. Reference [20] considered various types of objects as a method for supporting effective approximate kNN queries. This method does not have to reveal the accurate location of the querier (q) to the server because the query is requested from the center of a grid of a map divided by the server instead of the precise location of the q. While the query should be requested from the center of a grid, however, an error may occur such that a query is made based on the location of the q. When the q provides cloaking region (CR); CR1, CR2, and q requests a query of the server; Q instead of his/her location, the server provides the q with objects corresponding to Response Generation (RG); RG (Q, CR1,2), thereby protecting the location of the q. However, when the objects are searched according to the center of CR, an error may occur as shown in Figure 1.

Figure 1 shows that the entire map has been quartered into CR1~CR4 and that three objects exist in each CRi. It is assumed that to protect his/her location, the q selects CR1 and CR2 and finds two objects that are closest to himself/herself. Through the method outlined in [20], when two objects are searched from the center of CRi, the resultant values of CR1 and CR2 are and , respectively. However, if two objects are searched based on the actual q, the resultant value is . Furthermore, the q who receives RG from the server additionally requires a decryption process for the encrypted data. Furthermore, the probability of exposing the q’s location increases with k-anonymity and the grid size. In addition, as anonymous servers are removed, users should create anonymous zones to protect their location, because users’ smart devices have improved in performance of calculating in recent years. In addition, it is considered more important to remove middleware under the circumstances where privacy is becoming an issue and to be 100% reliable in terms of location protection. On the other hand, if the number of candidate objects users receive from the server increases, the cost of exploring the objects of the query results may increase. Thus, in this study, we propose the Privacy-preserving Spatial Index (PSI), which allows for the selective tuning of required object data while protecting the locations of users. As far as we know, this is the first study that enables the tuning of object data in a location protection method. The key contributions of this study are as follows:(i)There are many methods to protect user location. We propose PSI, which is a general index structure that can support the existing methods.(ii)Since the user directly makes an anonymous request, no third party can expose his or her information.(iii)If users set a large query region to protect their location, they must receive data for the number of objects in the set range. On the other hand, if the query range is narrowed in order to reduce the amount of received data, the probability of the user location being exposed can increase. Therefore, we reduce the tuning time by selectively tuning to the object data that must be received from the server.(iv)We have proven through experimentation that the proposed method exhibits better performance than the existing location protection methods.

This paper is organized as follows: Section 2 describes related works on the protection of trajectory. Section 3 describes our model and Section 4 proposes various queries using the PSI index. Section 5 compares the performance between the PSI and the existing method through experimental results. Finally, Section 6 outlines the conclusions.

LBS queries are generally classified as either snapshot queries or continuous queries [114]. The query process using snapshot queries is as follows. Methods using k-anonymity to protect user privacy have been recently proposed [58]. Reference [8] constructs a CR by combining the q with other k-1 users and then sends the CR to the LBS server instead of the actual location of the q. Reference [19] proposes a dynamic grid system (DGS) that allows users to protect their personal information. Through the DGS, users can protect their location for the grid radius from unreliable servers through the process of sending encrypted queries to the query server and transmitting the content of the queries to the LBs server. However, the encryption and decryption between users and servers can increase the query processing time. Reference [20] improved the problems that could occur when clients are grouped by k and moved to a technique that protects the locations of users. However, this method has a problem because users must obtain consent from surrounding clients and the movement time and direction need to be considered. Reference [21] proposes a method for protecting the locations of users by using dummies using an enhanced-dummy location selection scenario. However, it has limitation in applying to continuous techniques because it considers snapshots. Reference [22] proposes a method of efficiently placing k dummies to protect the locations of users. However, the dummies may be concentrated on the center if they are placed only by angles depending on the number of dummies. Reference [23] suggests a method of preventing the generation of dummies in arbitrary directions while users move in certain directions if dummies are created randomly while users are moving. The proposed method prohibits the users from moving out of a specific range using the radius d. However, even if dummies are generated within the radius of d, they are likely to be generated in zigzags in contrast to the moving path of the users, and there is a possibility of exposing the locations of users. Reference [24] protects the user information in continuous LBS based on the method of [13]. However, it has a possibility that the user location protection probability will decrease because it does not consider various situations (obstacles) during the generation of dummies.

Aside from that method, there has been research into methods using dummies [2023] as well as into the encryption of user information [24, 25]. However, the above studies require middleware (hereinafter referred to as an “anonymous server”).

Because the anonymous server called k-anonymity exists, client information can be revealed if a third party attacks the anonymous server. To address this, the k-anonymity method was proposed, which uses a peer to peer (P2P) process instead of the anonymous server [26]. Although the privacy level is high because users communicate among themselves without the use of an anonymous server, personal privacy can still be compromised because other users cannot be trusted fully.

Reference [20] proposed a method for supporting effective approximate kNN queries. The query process of this method consists of three steps: Query Generation (QG), Response Generation (RG), and Response Retrieval (RR). In QG, the q requests a query of the server. QG is equal to (Q, s) where Q includes CR, n×n cells, m POI types (t), the location of q (i, j), and the number of objects to be found and s is for protecting Q. In RG, the server receives (Q, s) from the q and the objects that satisfy the query are sent (R) to the q from the database (D) in which POIs are stored, and this is referred to as RG(Q, D). Finally, RR outputs k objects from the RG(Q, D) received from the server considering k and t requested by the q, and this is referred to as kNN=RR(R, s).

Continuous queries refer to queries continuously sent to the LBS server in real time to the destination. They consist of multiple snapshots, creating a trajectory of the user by connecting the locations of snapshots.

Cloaking methods used to protect continuous queries or the trajectory of the user include the k-anonymity method and the dummy trajectory creation method. The proposed trajectory k-anonymity method receives a similar trajectory as the trajectory of the q in the database which is stored in the anonymous server, and the k-1 locations of other users are grouped together. Queries are then randomly made. However, this method requires an anonymous server and there must be other users near the query location. If a user is somewhat far away, the CR becomes large and the amount of searched data increases, lowering the query processing efficiency.

3. Background

PSI = . α and β denote the numbers of divisions of the x-axis and y-axis, respectively. is a bitmap in the  αβ grid. indicates the existence of object cell coordinates and objects. If is larger than , the server can provide . The sizes of and can be measured by

In (1) and (2), Privacy-preserving Region (PR) is the range requested by the user and is the number of POIs in the PR. The server provides information about or to the q based on the size of the PR and the number of .

The purpose of our study is to protect the location of users from the server and to enable effective data tuning. The query process is divided into three steps as follows.

Spatial query (SQ): to protect his/her location, the user sets a PR based on his/her current location and the map data that he/she has. The q requests , which are the POIs included in the PR, from the server.

Privacy-preserving Spatial Index (PSI): the server manages the dataset (D) of every POI in the map. The server also divides the PR by n for the x-axis and by m for the y-axis (α=β depending on the distribution of objects). The cell coordinate () is set for each divided grid. Each grid can have one object, and bit 1 is saved if it has an object or bit 0 is saved if otherwise. Figure 2 shows the setting of the sequence of bitmaps () based on .The ranges of i and j are as follows: 0≤iα, 0≤jβ.

If the data sizes of the POIs are identical, the data arrival time can be confirmed through . If the data sizes of the POIs differ, the is further configured. The data arrival time size of is assumed to be identical. Finally, the data of all POIs in the PR are sent to the q.

Dataset to the SQ (DSQ): the q first receives the PSI and selects the desired objects through the PSI. The q can then confirm the locations and sending times of the desired POIs through the PSI. Thus, the q selectively tunes to only the data corresponding to among the . D.

4. Our Model

4.1. Our System Model

As shown in Figure 3, the basic system model is composed of a movement device, a positioning system, and a single LBS server. If an attacker attacks the LBS server or if the LBS server is unreliable, various pieces of information about the q can be exposed. Therefore, protecting the location information is critical in LBS.

The existing system is composed of an LBS server, an anonymous server, and mobile users. However, the anonymous server cannot be trusted. The anonymous server is a single point of failure, and if is attacked, some or all services will fail. In general, the q sends his/her location information to the LBS server to receive information, and this causes the problem of location exposure. Therefore, we assumed that the q acquires map information through the broadcast method from the LBS server. The advantage of the broadcast method is that the client can obtain map information without exposing one’s location information.

The server manages the locations and other information (e.g., price, discount, advertisement) of objects that the q does not have (e.g., gas station, hotel, restaurants). For example, the q creates a PR based on his/her own location through the map information and satellites stored in the terminal. Then he or she requests the location and price of a nearby gas station after creating a PR based on his or her location, confirmed through the map and satellite. The server provides the location and other information of gas stations () that exist in the PR requested by the q. If the q selects only one nearest gas station, he or she can only tune to the data of one gas station among the ten.

4.2. PSI Index Structure and Query Process

The PSI structure is composed of α, β, , or (varies according to the number of objects) and the data arrival time table (), as shown in Figure 4.

5. Various Queries Using the PSI Index

In this chapter, we introduce the process of querying after applying PSI to the existing method for protecting the user location. There are three existing methods mainly used, which are defined as follows.

Definition 1 (cloaking-based spatial query (CSQ)). In general, users set an area that is equal to or greater than their desired area as the PR in order to protect their location. Users request information about the objects in the PR without providing their location to the server. The server cannot verify the location of the user because it only receives information about the PR from the user and sends only information on the objects in the PR to the user. The users have the advantage of not revealing their location, but they do have to check all objects in the PR. Meanwhile, the server incurs no additional costs (e.g., searching for the object that is closest to the user) because it does not know the user’s precise location.
Figure 5 shows an example of processing the cloaking-based spatial query using PSI.
The CSQ process is as follows:

Step 1. The q requests query results from the server via SQ. The structural elements of SQ in CSQ are as follows: First, the PR is set in a rectangular shape (this shape can vary by the request of the q). The PR of the CSQ () sets the minimum of x coordinate (), the minimum of y coordinate (), the maximum of x coordinate (), and the maximum of y coordinate () and then requests the that exists in from the server.

Step 2. The server searches requested in the in the location-based D under its control. After checking the distribution of , the PR is divided by α for the x-axis and by β for the y-axis so that only one POI will exist in each grid (α=β depending on the distribution of objects). is configured through (3). Figure 4 shows that is configured as “0101111110111101” according to the distribution of POIs. Finally, the server sends the PSI = and to the q.

Step 3. The q divides the map using the that he/she requested as well as the α and β values of PSI and checks the location of the objects through . The POIs included in the search region (SR) that the user wants to search are checked and the frame number of POI is checked through . Figure 5 shows that the POIs included in the SR are . The q can determine the POI number by adding the sequences of and bit 1. Finally, the q selectively tunes to and only (Algorithm 1).

Input: SQ(e.g., CQS, p-AQS, s-TrQS) of q
Output: PSI,
Procedure:
01: The server check =
02: PR is divided by α for the x axis and by β for the y axis
03: Bit 1 is saved if it has an object or bit 0 is saved if otherwise
04: The server computes PSI and sends to q
05: The q check PSI
06: Checks the location of the objects through
07: The q can check the POI number by adding the sequence of and bit 1
08: The q selectively tunes to and

Definition 2 (p-Anonymity-based Spatial Query (p-ASQ)). We define p-anonymity in order to prevent confusion with k in kNN and k-anonymity. p is a virtual q that the q provides the server with to obfuscate his/her location, and the server cannot distinguish between the location of the q and the location of p-1. As proposed in [19], we also assume that the query is sent to the server with the location of q and the location of p set in the grid area. The size of the area needed to guarantee the accuracy of query result when a query is requested based on the grid is expressed as In (5), r generates a circle based on the longer length between x-axis and y-axis . All the grids included in this circle form an area where the POI that the q wants will exist.
Figure 6 shows an example of p-ASQ processing using PSI.
The process of p-ASQ is as follows.

Step 1. The q requests query results from the server via SQ. The structural elements of SQ in p-ASQ are as follows: First, the locations of the q and p-1 virtual points (PRs) are specified and the PR is set in a rectangular shape. The PR of p-ASQ () consists of , which is the distance of the x-coordinates and , which is the distance of the y-coordinates. After is randomly set based on p points requested by the q, p PRs are created and k POIs are requested.

Step 2. Among the location-based Ds under its control, the server verifies the requested from the q. Then, for the accuracy of the query result, adds δ to the (, ) of . Then, k POIs () are searched for based on p grids.

In (6), denotes the number of additional POIs included in . If k includes λ, λ=λ-k. σ denotes k POIs that are overlapped among the k POIs of .

The server sets an area that includes and divides the map according to the distribution of POIs (same process as for ). The server finally sends PSI = and to the q.

Step 3. The q verifies (, ) and (, ) through the PSI received from the server and divides the corresponding map by α and β values. The locations of objects are verified through . Finally, after k POIs are verified based on one’s own location, the frame number of POI is verified through . Finally, the q selectively tunes to and only (Algorithm 1).

Definition 3 (s-Trajectory based Spatial Query (s-TrSQ)). s-TrSQ sets the path from the starting location () to the ending location () in which the user will query. The trajectory distance of is defined as and it is assumed that the distance of and the trajectory distance of tri are all identical. To prevent the exposure of his/her , the q additionally creates s-1 tri and then sends a query to the server. trs are connected to nodes (n).The server cannot distinguish between and tri. Therefore, the server sends the query result to the q based on Tr that the q requested.

Figure 7 shows an example of s-TrSQ processing using PSI.

The s-TrSQ process is as follows.

Step 1. The q requests query results from the server via SQ. The structural elements of SQ in s-TrSQ are as follows: First, is set. To create s-1 trs excluding , the q sets the creation range and randomly sets s-1 trs in this creation range. As shown in Figure 7, the q sets the search range based on the Tr that he/she created and sends it to the server, and then requests in the search range of this Tr.

Step 2. The server searches requested in the among the location-based Ds under its control. After checking the distribution of , the PR is divided by α for the x-axis and by β for the y-axis so that only one POI will exist in each grid. The server configures the overlapping area between the divided grid and as . Finally, the server sends the PSI = and to the q.

Step 3. The q divides the map using the that he/she requested and the α and β values of PSI, then checks the location of the objects through . The POIs included in the search region () that the user wants to search are checked and the frame number of POI is checked through . Figure 7 shows the POIs included in the . The q can check the POI number by adding the sequences of and bit 1. Finally, the q selectively tunes to and only.

6. Experimental Results

6.1. Experimental Environment

In this section, we discuss the experiments conducted for CSQ, p-ASQ, and s-TrSQ using PSI. We also compare them with the Original (Ori) CQS, p-ASQ, and s-TrSQ. In the experiments, the C++ programming language was used to actualize the algorithms on a 3.3-GHz CPU with 8 GB of main memory. We assumed the basic parameter setting values shown in Table 1 in order to evaluate the performance. We also discuss experiments conducted for CSQ, p-ASQ, and s-TrSQ using only the indexes of each method. To conduct these experiments, we set variables as their default values, except for the variables expressed as the values in parentheses in Table 1. Furthermore, the values of and are configured by (1) and (2) because they vary by query type. The size of a single grid is assumed to be 10m2. The experimental environment comprised a server, a client in 2D space, and a wireless broadcasting channel used by the client to obtain information. Tuning time can differ depending on bandwidth and transfer rate, so the data size was expressed as a graphical result (y-axis) in the experiment.

6.2. Experimental Results of CSQ

is 10% of the total map, and is set as 20% of .

In Figure 8, the x-axis variable is divided into the data sizes of 128, 256, 512, and 1024 K bytes for comparison.

Figure 8 shows the variations in tuning time according to the data size. We can see that the performance of PSI improved by 80% more than that of Ori-CSQ. This is because the number of that the PSI must search is smaller than the number of that the Ori-CSQ must search. Therefore, as the data size increases, the difference in tuning time increases.

Figure 9 shows the variations in tuning time according to the size of the search range desired by the q. The default settings are shown in parentheses in Table 1. The variable was set as 0%, 20%, 30%, and 50% of the size of . We can see that the performance of PSI improved by 71.5% on average compared to that of Ori-CSQ. This is because as the increases, the number of in the increased when the tuning time of PSI also increases.

6.3. Experimental Results of p-ASQ

To process p-ASQ, we set the default values listed in Table 1. In Figure 10, the default value p is 100, the data size is 256 K bytes, and the variable of the x-axis is k POIs that are closest to the q, which are divided into 10, 20, 30, and 50 for comparison.

Figure 10 shows the variations in tuning time according to the size of k. The performance of PSI is higher by 98.8% than that of Ori-p-ASQ, this is because Ori-p-ASQ must tune to all k POIs corresponding to p’s locations.

Figure 11 shows the variations in tuning time according to the size of p. The variable of the x-axis is p, and the p size is set to 50, 100, 200, and 300. As the p size increases the tuning time of PSI stays constant, but the tuning time of Ori-p-ASQ greatly increases. In the case of PSI, only k POIs need to be calculated because the location of the q is already known. However, as the Ori-p-ASQ increases, the POIs corresponding to (6) must be tuned, greatly increasing the tuning time. Therefore, the performance of PSI improved by 99.2% on average more than that of Ori-CSQ.

6.4. Experimental Results of s-TrSQ

To process s-TrSQ, we set the default values listed in Table 1. The default value in Figure 12 is 50km and the data size is 256 K bytes. The variable of the x-axis is the number of s trajectories including the trajectory of the q, which is set to 10, 20, 30, and 50 for comparison.

Figure 12 shows the variations in tuning time according to the size of s. The performance of PSI is higher by 96.1% than that of Ori-s-TrSQ. This is because the Ori-s-TrSQ must tune to all grids included in s paths.

Figure 13 shows the variations in tuning time according to the length of . The variable of the x-axis is , and the length of is set to 10, 50, 200, and 300km. As the length of increases, the tuning time of PSI increases at a fixed low rate, whereas the tuning of the Ori-s-TrSQ sharply increases. In the case of PSI, only the POIs in the grids included in the path of the q need to be received. Therefore, the performance of PSI improved by 94.9% on average compared that of Ori-s-TrSQ.

7. Conclusions

In this study, we proposed PSI which can selectively tune to only the data desired by the q while protecting the location of the q. Furthermore, we proposed a general index structure applicable to the conventional location protection method for PSI. Finally, the tuning of unnecessary data and the battery consumption of the device were experimentally reduced by selectively tuning to the data of the objects to be received by the server, compared to the conventional method. In the future, we plan to research a space query processing method considering both the type and location of POI.

Data Availability

The data used to support the findings of this study are available from the corresponding author upon request.

Conflicts of Interest

Doohee Song, Moonbae Song, and Kwangjin Park declare that there are no conflicts of interest regarding the publication of this manuscript.

Acknowledgments

This paper was supported by Wonkwang University in 2018.