Abstract

Given a set of positive-weighted points and a query rectangle r (specified by a client) of given extents, the goal of a maximizing range sum (MaxRS) query is to find the optimal location of r such that the total weights of all the points covered by r are maximized. All existing methods for processing MaxRS queries assume the Euclidean distance metric. In many location-based applications, however, the motion of a client may be constrained by an underlying (spatial) road network; that is, the client cannot move freely in space. This paper addresses the problem of processing MaxRS queries in a road network. We propose the external-memory algorithm that is suited for a large road network database. In addition, in contrast to the existing methods, which retrieve only one optimal location, our proposed algorithm retrieves all the possible optimal locations. Through simulations, we evaluate the performance of the proposed algorithm.

1. Introduction

With the widespread use of mobile computing devices [17], location-based services [8] have attracted much attention as one of the most promising applications whose main functionality is to process location-related queries on spatial databases. Most traditional research in spatial databases have focused on finding nearby data objects (e.g., range queries, nearest neighbor queries [9], etc.), rather than finding the best location to optimize a certain objective. Recently, a maximizing range sum (MaxRS) query was introduced in [10]. This query is useful in many location-based applications such as finding the most representative place in a city with a limited reachable range for a tourist or finding the best location for a pizza store with a limited delivery range. Given a set of positive-weighted points and a query rectangle   (specified by a client) of a given size, the goal of a MaxRS query is to find the optimal location of   such that the sum of the weights of all the points covered by   is maximized.

Figure 1 shows an example of the MaxRS query, where the size of the query rectangle   is and all the points are assumed to have the same weight and be equal to 1. In the figure, the center of the solid-lined rectangle is the optimal location of   because the solid-lined rectangle covers the largest number of points (i.e., 3).

To process MaxRS queries, Choi et al. [10] proposed an external-memory algorithm, while Imai and Asano [11] an internal-memory algorithm. Tao et al. [12] proposed the solution for approximate MaxRS queries, each of which retrieves a rectangle whose covered-weight is at least , where is the optimal covered-weight and is an arbitrary constant between 0 and 1. All of these studies aim at Euclidean spaces. In many real-life location-based services, however, the motion of a client may be constrained by an underlying (spatial) road network; that is, the client cannot move freely in space. Consider the scenario of a tourist service as an example, where a tourist (i.e., client) tries to find the hotel whose location is close to as many sightseeing spots as possible (e.g., maximum is 1.5 km walking from the hotel). In this scenario, a MaxRS query can be applied. However, the existing MaxRS query processing methods cannot be applied in this scenario because the distance between the hotel and each sightseeing spot is confined by the underlying (spatial) road network, and thus the actual distance between two locations can differ significantly from their Euclidean distance. We can see this significant difference in Figure 2, where the Euclidean distance between and is about 1.24, while for moving from to in real-life, we must pass through and with total length around 3.74, which is three times farther than Euclidean distance. With this problem in mind, we study, for the first time to the best of our knowledge, the problem of processing MaxRS queries in a road network, where the distance between two points is determined by the length of the shortest path connecting them (i.e., network distance [13]).

Figure 2 shows an example of the road network, which consists of 5 nodes (square vertices) and 7 edges. In the figure, there are 4 facilities (weighted points), each of which, denoted by  , is associated with a positive weight   indicating the importance of  . The numbers that appear in parenthesis next to nodes and facilities show their respective coordinates. Note that it is assumed in this paper that all the facilities must be located on edges of the road network. Then, a MaxRS query in a road network is defined as follows. Given a set of facilities and a radius  , the MaxRS query finds all the locations   (on a road network), which maximizes the total weights of all the facilities whose network distance to   is less than or equals  .

In the case of road network in Figure 2, we have an example of MaxRS query with the radius 1.5 (km) in Figure 3 (the weight of each facility is 1). The distance between each point in the stage   to three facilities , , and is less than or equal to 1.5. And the total weight of all the facilities whose network distance to all points of stage   is less than or equals 1.5 is 3, which is maximum in this scenario. Then, stage   is an optimal result in this MaxRS query and user can choose any hotel on this stage.

In this paper, we propose the external-memory algorithm for MaxRS queries in a road network. The proposed algorithm is suitable for a large road network database. In addition, in contrast to the existing methods, which find only one optimal location, our proposed algorithm finds all the possible optimal locations. This can help clients of diverse interests choose their own best locations by considering other additional conditions.

The remainder of this paper is organized as follows. In Section 2, the problem is formally defined, and in Section 3, the details of the proposed algorithm are provided. In Section 4, the performance evaluation results are presented. In Section 5, some related work is reviewed. Finally, Section 6 concludes the paper.

2. Problem Formulation

A road network is represented by an undirected graph , where is a set of vertices (i.e., nodes) and is a set of edges. Let be a set of facilities, each of which, denoted by  , is located on an edge (in ) and is associated with a positive weight  .

Definition 1 (network range and network radius). Network range   of a point   in a road network consists of all points (in the network) whose network distance to   is less than or equals the value  , where   is called the network radius of  .

Definition 2 (a MaxRS query in road network). Given  , a set of positive-weighted points  , and a network radius value  , let   be the network range of a point   in the network and   the set of facilities covered by  . Then, a Maximizing range sum (MaxRS) query in a road network finds all points   (in  ) that maximizes

3. The Proposed Method

3.1. Preliminaries

In this subsection, we review the idea of transforming the max-enclosing rectangle query into the rectangle intersection query discussed in [14], which is the fundamental idea for processing MaxRS queries in Euclidean space [10].

Definition 3 (max-enclosing rectangle query). Given a set of points  , a rectangle   with a given size, a max-enclosing rectangle query finds the location of   such that   encloses the maximum number of points in  .
The MaxRS query calculates the total weight of points, while the max-enclosing rectangle query counts the number of points in rectangle. Note that when assuming all points have the weight being equal to 1, the result of the MaxRS query equals that of the max-enclosing rectangle query.

Definition 4 (rectangle intersection query). Given a set of rectangles  , a rectangle intersection query finds the area, where most rectangles overlap.
Figure 4 shows two examples of the max-enclosing rectangle query and the rectangle intersection query. It can be observed from the figure that the optimal location in the max-enclosing rectangle query can be any point in the most overlapped area (i.e., the gray area, where 3 rectangles overlap), which is the outcome of the rectangle intersection query.
Our solution is based on the above idea. Consider an example of a MaxRS query in a road network shown in Figure 5. To simplify our discussion, we use a simple road network that consists of two edges (i.e., and ) and two facilities (i.e., and ) on two edges.
In this example, we assume that the weight of each facility is 1 and the network radius   is 1. The gray solid segments in Figure 5 indicate the network range of the facility , and gray dotted segments indicate the network range of facility . Let   be the set of all segments presented in the network range of all facilities in the road network. Then, we define the following two important notions for the MaxRS query in the road network.

Definition 5 (location-weight). Let   be the location in road network. The location-weight of   with regard to   equals the total weights of all the segments (in  ) that cover  .

Definition 6 (max-segment). The max-segment   with regard to   is a segment such that every point in   has the same location-weight  , and no point in the network has a location-weight higher than  .
From the idea of the transformation mentioned before, we can see that the overlapping segment in Figure 5 is a max-segment. Because all max-segments in the network contain all the optimal locations (i.e., the result of the MaxRS query in the road network), we need to find all max-segments in the network to evaluate the MaxRS query.

3.2. Storage System

Similar to the disk-based storage model proposed in [13], the road network and the facility set are stored in a secondary storage.

Figure 6 shows the files and indexes for the network and facility set. In this storage model, the network (adjacency list) is stored in a flat file, which is indexed by the B+-tree. For each node (e.g., ), besides the information of (i.e., node identifier, coordinates), we also store the additional information of all adjacent nodes including adjacent node identifier and Euclidean distance between and its adjacent node (e.g., length of edge is 2.236). Similarly, the facility list is also stored in a flat file and indexed by the B+-tree. To support the algorithm efficiently, besides the information of each facility (i.e., facility identifier, coordinates, and weight of facility), we store the additional information of the edge that contains including start node identifier, end node identifier, and the Euclidean distance (offset) between start node and (e.g., start node of is , end node of is , and length of segment is 1.0).

3.3. Main Algorithm
3.3.1. Overview

Our algorithm is based on the idea mentioned in Section 3.1. From each facility , we generate segments that cover the network range . The segments generated by facility will have the weight of , namely, . These segments are organized in a seg-file. Then, we process the seg-file to find out all max-segment. The following three main steps constitute the proposed algorithm:(1)generating segments;(2)inserting segments into seg-file;(3)processing seg-file to find max-segments.

3.3.2. Generating Segments

In this step, we generate segments from all facilities of facility flat file. For each facility , we generate the segments, which cover the overall network range . This process is described in Algorithm 1. First of all, we retrieve the information of the edge that contains , start node, and end node. Then, we generate the segments at the start node side first (lines 8–16), after which we generate the segments at the end node side (lines 17–26). If the distance between and the start node is greater or equals the network radius , we only need to generate one segment with the length being equal to (lines 9-10). On the contrary, we generate the segment between and the start node (the length is equal to the offset of facility, lines 13-14) and continuously generate segments from the start node with the remaining network radius by calling the function recursiveGenerateSegs (line 15), which will be described in Algorithm 2. We do the same way to generate segments at the end node side (with the new offset is the length from to end node, line 17). Each new generated segment has the weight of and contains the facility identifier of . This facility identifier will help the merging process when there is more than one segment of generated in one edge. These new generated segments are inserted into the seg-file with the edge that contains them. In our algorithm, we use a list in order to contain edges processed completely in generating process of a facility (finished-edge-list). The edges in this finished-edge-list will not be processed during the invocation of the function recursiveGenerateSegs. After generating the segments of finishes, we need to clear the finished-edge-list to start generating the segments of a new facility (line 27).

Input   : adjacency list flat file, : facilities flat file , : network radius
(1)    Initialize the list finishedEdges
(2)    for each facility in flat file do
(3)     startN = getNode( .startId)
(4)     endN = getNode( .endId)
(5)     edge =
(6)     finishedEdges.add(edge)
(7)     create a new node fN at facility location
(8)     if  ( .offset ≥   ) then
(9)      create new node nN between fN and startN, dist(fN, nN) =
(10)   newS = new segment(fN, nN, .weight, .Id)
(11)   insertSegment(edge, newS)
(12)  else
(13)   newS = new segment(fN, startN, .weight, .Id)
(14)   insertSegment(edge, newS)
(15)   recursiveGenerateSegs(startN, endN, .offset, )
(16)  end if
(17)  endOff = edge.length − .offset
(18)  if  (endOff ≥ ) then
(19)   create new node nN between fN and endN, dist(fN, nN) = endOff
(20)   newS = new segment(fN, nN, .weight, .Id)
(21)   insertSegment(edge, newS)
(22)  else
(23)   newS = new segment(fN, endN, .weight, .Id)
(24)   insertSegment(edge, newS)
(25)   recursiveGenerateSegs(endN, startN, − endOff, )
(26)  end if
(27)  finishedEdges.clear()
(28) end for

Input curN: the node (vertex) will be processed, oldN: the node has been already processed,
newR: new network radius from this node, f: the original facility
(1)   neighList = curN.getNeighborList()/oldN
(2)   for each neighN in neighList do
(3)    edge =
(4)    if  (edge not in finishedEdges) then
(5)     if  (edge.length ≥ newR) then
(6)      create new node nN between curN and neighN, dist(curN, nN) = newR
(7)      newS = new segment(curN, nN, .weight, .Id)
(8)      insertSegment(edge, newS)
(9)     else
(10)    finishedEdge.add(edge)
(11)    newS = new segment(curN, neighN, .weight, .Id)
(12)    insertSegment(edge, news)
(13)    recursiveGenerateSegs(neighNode, curN, newR − edge.length, )
(14)   end if
(15)  else
(16)   if  (newR − edge.length > 0) then
(17)    recursiveGenerateSegs(neighNode, curN, newR − edge.length, )
(18)   end if
(19)  end if
(20) end for

After finishing generation of the segments from a facility to start node (and the end node) in Algorithm 1, if the network radius is greater than the distance between and the start node (and the end node), the generating process of the segments is continued from this start node (end node) with the new shortened network radius (lines 15 and 25). This process is described in Algorithm 2, which helps segments spread out the network range .

In Algorithm 2, we generate all edges of the current node (i.e., the node we start generating segments). These edges are created from the neighbor list of current node, except the old node, which has been already processed (line 1). To process an edge, we need to consider two situations. In the first situation, this edge does not exist in finished-edge-list (line 5). If the length of this edge (e.g., ) is greater than or equals the new network radius, we only need to create a new segment between the current node and the neighbor node with its length being equal to the new radius. Then, we insert this segment into seg-file (lines 6–8). If the length of the edge is smaller than the new network radius, we create a new segment between the current node and the neighbor node, and insert this new segment into seg-file, after which we continuously generate segments from the neighbor node with the new shorten network radius (line 13). In the second situation, this edge existed in the finished-edge-list (lines 15–19). If the length of the edge is smaller than the new network radius, we only need to generate segments from the neighbor node with the new shortened network radius (line 17). This process continues until the generated segments cover the network range of the original facility.

Figure 7 shows the process of generating segments of facility in road network shown in Figure 3. In this example, the network radius is 1.5. First of all, we generates the first segment with length 1 and then two segments with length 0.5 on 2 edges and . After that, we generate segment with length 0.803 and 3 segments on 3 edges, , and with the same length 0.697. The numbers nearby segments show the generating order of these segments.

3.3.3. Inserting Segments into Seg-File

Segments generated at step 1 are inserted into seg-file (together with containing edge information). Algorithm 3 describes this insertion process. One important point of seg-file is that all segments on the same edge will be grouped into one record (edge-record). So, each edge-record in seg-file has the format of the form . This seg-file is indexed by B+-tree. This structure of seg-file helps to find max-segments effectively.

Input SF: segment file, : the edge contains segment, : a segment
(1)   edgeRecord = SF.getRecord( )
(2)   if  (edgeRecord is null) then
(3)    edgeRecord = new SegmentRecord( , )
(4)    SF.insert(edgeRecord)
(5)   else
(6)    for each segment seg in edgeRecord do
(7)     if  (seg.facId = .facId) then
(8)      mergeSeg = mergeSegment(seg, )
(9)      if  (mergeSeg is not null) then
(10)     edgeRecord.remove(seg)
(11)      = mergeSeg
(12)   end if
(13)  end if
(14) end for
(15) edgeRecord.add( )
(16) SF.update(edgeRecord)
(17) end if

When we insert a segment into seg-file, if there is no edge-record of that segment in seg-file, we create a new edge-record of that segment and insert it into seg-file (lines 3-4). In case an edge-record of that segment has already existed in seg-file, we need to check if there exist any segments of the same facility in this edge-record. If this is the case, we need to merge these existing segments with the new segment (lines 7–13). Then, the mergeSegment function merges two segments into the same edge (line 8). Figure 8 shows some situations of position of two segments in an edge. In the first three situations, the mergeSegment function returns one new segment, whereas in the last situation, it returns null (two segments cannot be merged). After updating segment list of edge-record, we update this edge-record in the seg-file (lines 15-16).

Figure 9 shows the records in seg-file after finishing the generating segments step and inserting segments step. In the figure, the segments generated from the facility are gray dotted segments, the segments generated from the facility are gray solid segments, the segments generated from the facility are black solid segments, and finally the segments generated black dotted segments originate from the facility . Each record associates with one edge (e.g., the thin solid line ). In Figure 9, the first record associates with the edge and contains one segment generated from facility .

3.3.4. Finding Max-Segments

After finishing construction of the seg-file, Algorithm 4 is invoked, which is the process of finding max-segments from the seg-file.

Input : seg-file
Output maxSegs: list segments with maximum weight
(1)   Initialize an empty list maxSegs
(2)   maxWeight = 0
(3)   for each segRecord in seg-file do
(4)    localMaxSegs = lineSweep(segRecord)
(5)    localMaxWeight = localMaxSegs 0 .weight
(6)    if  (localMaxWeight ≥ maxWeight) then
(7)     if  (localMaxWeigh > maxWeight) then
(8)      maxWeight = localMaxWeight
(9)      maxSegs.clear()
(10)  end if
(11)  for each seg in localMaxSegs do
(12)    maxSegs.add(seg)
(14)  end for
(13) end if
(14) end for

In this algorithm, we find the local optimal segments in each edge-record first (line 4), after which we compare the maximum weight of segments on these edge-records, and the segments that have maximum weight are added into the list as final result (lines 6–14). The process of finding local optimal segments is processed by function lineSweep, which is the line version of algorithm plane Sweep proposed in [11].

Figure 10 illustrates the algorithm line Sweep on the record associated with the edge . Assuming that we are sweeping on an edge (e.g., ), if we meet a start node of a segment (e.g., positions 1 in the case of segment 2,…) the weight of this segment will be included in the calculation of local maximum weighted segment; in case we meet an end node (e.g., position 4 in the case of segment 2,…), we will remove the weight of this segment from the calculation. In the figure, the segment from position 3 to position 4 on edge is the local maximum weighted segment of this record.

After finishing the finding max-segments step, from Figure 11, we can see that two segments   (in edge ) and   (in edge ) are max-segments with maximum weight (e.g., 3) in the example of Figure 3 (we assume that the weight of each facility is 1).

4. Performance Evaluation

4.1. Simulation Setup

We use two real datasets, namely, North America (NA) road network and San Francisco (SF) road network. These datasets are depicted in Figure 12. The NA dataset is obtained from http://www.cs.fsu.edu/~lifeifei/SpatialDataset.htm and the SF dataset is obtained from [15]. The cardinalities of datasets are shown in Table 1.

Because this is the first work for processing MaxRS queries in a road network database, we develop a naive algorithm to compare with our proposed algorithm. The naive algorithm uses an unstructured seg-file, and thus the generated segments are inserted directly to seg-file in step 2 (segments on the same edge are not grouped into one edge-record). In step 3, the naive algorithm reads the segments from seg-file, groups segments in the same edge, and finds max-segments.

We use disk-based storage model to store very large road network databases, so in our simulation, the performance metric is the number of I/O’s, which is the number of read/write blocks from files. We do not consider CPU time because it is dominated by I/O cost [10, 12, 16]. The default values of the parameters are shown in Table 2.

4.2. Simulation Results
4.2.1. Effect of the Number of Facilities

Figure 13 shows the effect of the number of facilities on the I/O cost. For both datasets NA and SF, when the number of facilities increases, the I/O cost increases. However, the proposed method is much less sensitive to this parameter than the naive algorithm.

4.2.2. Effect of the Network Radius

Figure 14 shows the results for the varying of network radius (network range). When the network radius increases, the number of segments increases, and thus the I/O cost also increases. The increment of I/O cost in SF dataset is greater than NA dataset because we can see the destiny of edges in SF is higher than NA. Therefore, the number of generated segments of SF is more than NA.

4.2.3. Effect of the Buffer Size

Figure 15 shows the results for the varying of buffer size. Although both algorithms have better performance as the buffer size increases, the proposed algorithm is more sensitive to the size of buffer than the naive algorithm.

4.2.4. Effect of the Block Size

Figure 16 shows the results for the varying of block size. We can see that when the block size increases, the I/O cost decreases. This is because as the block size increases, the number of objects stored in a block also increases, which causes the number of read/write blocks to decrease. Similar to the buffer size case, the proposed algorithm is more sensitive to the size of block than the naive algorithm.

In this section, we review related work on facility optimization location problem in general and MaxRS problem in particular.

Facility Optimization Location Problem. MaxRS problem can be seen as an instance of facility location optimization problem, which has been studied extensively in current years. The aim of this facility location optimization problem is to find an optimal location to maximize/minimize an objective function. Cabello et al. introduced and investigated optimization problems according to the bichromatic reverse nearest neighbor (BRNN) rule [17], while Wong et al. [18] studied a related problem called MaxBRNN; find an optimal region that maximizes the size of BRNNs. These two problems are studied in space. Du et al. [19] proposed that the optimal-location query returns a location with maximum influence, where the influence of a location is the total weight of its RNNs. In the extension version of [19], Zhang et al. [20] proposed and solved the min-dist optimal-location query.

There are some studies, specially, about facility location optimization in road network database. Xiao et al. [21] have studied about optimal location queries in road network, with the introduction of three important types of optimal location queries: competitive location query, MinSum location query, and MinMax location query. Yan et al. also proposed some algorithms for finding optimal meeting point, which have smallest sum of network distances to all the points in a set of points in road networks [22].

MaxRS Problem. Imai and Asono proposed an optimal algorithm for the max-enclosing rectangle problem [11] with the time complexity being ; n is the number of rectangle. Nandy and Bhattacharya also presented another algorithm which is based on interval tree data structure with the same cost [14]. Those algorithms are internal memory algorithms. Choi et al. [10] proposed an algorithm for solving MaxRS problem in the case of external memory with optimal I/O cost. Tao et al. [12] proposed a new problem called ()-approximate MaxRS which returns a solution that can be worse than optimal solution by a factor at most ; is an arbitrary small constant between 0 and 1.

Another version of MaxRS problem is maximizing circular range sum (MaxCRS) problem. This is a circle version of MaxRS problem with the boundary being a circle. Chazelle and Lee [23] proposed an algorithm for solving the max-enclosing circle problem with the time complexity being . As max-enclosing circle problem is 3SUM-HARD [24], in which the best algorithm takes time, many studies used approximate approaches to solve max-enclosing circle problem. Aronov and Har-Peled [25] give a Monte-Carlo ()-approximation algorithm for unweighted point sets that runs in time; this algorithm can be extended to the weighted case, giving an algorithm that uses time. de Berg et al. [26] proposed another approximation algorithm for max-enclosing circle problem with time complexity . The MaxCRS problem is also proposed in [10] by a novel reduction that converts the MaxCRS problem to the MaxRS problem.

6. Conclusions

The MaxRS problem can be used in location-based applications to find the most profitable service place or the most serviceable place. All of previous studies are stated in Euclidean distance; however, in many location-based applications, the network distance is used instead of Euclidean distance. This paper proposed an efficient algorithm for solving the MaxRS problem in road network database. We proposed an external-memory algorithm, which is suitable for large dataset of road network. In our algorithm, all optimal locations (max-segments) on the network will be returned while all previous methods only return one result. This can help clients of diverse interests choose their own best locations by considering other additional conditions. For the future works, we plan to improve our method and calculate the complexity of algorithm.

Conflict of Interests

The authors declare that there is no conflict of interests regarding the publication of this paper.

Acknowledgments

This research was supported by Basic Science Research Program through the National Research Foundation of Korea (NRF) funded by the Ministry of Education (NRF-2013R1A1A2061269) and this research was funded by the MSIP (Ministry of Science, ICT & Future Planning), Korea, in the ICT R&D Program 2013.