Abstract

Human trajectory prediction is an essential task for various applications such as travel recommendation, location-sensitive advertisement, and traffic planning. Most existing approaches are sequential-model based and produce a prediction by mining behavior patterns. However, the effectiveness of pattern-based methods is not as good as expected in real-life conditions, such as data sparse or data missing. Moreover, due to the technical limitations of sensors or the traffic situation at the given time, people going to the same place may produce different trajectories. Even for people traveling along the same route, the observed transit records are not exactly the same. Therefore trajectories are always diverse, and extracting user intention from trajectories is difficult. In this paper, we propose an augmented-intention recurrent neural network (AI-RNN) model to predict locations in diverse trajectories. We first propose three strategies to generate graph structures to demonstrate travel context and then leverage graph convolutional networks to augment user travel intentions under graph view. Finally, we use gated recurrent units with augmented node vectors to predict human trajectories. We experiment with two representative real-life datasets and evaluate the performance of the proposed model by comparing its results with those of other state-of-the-art models. The results demonstrate that the AI-RNN model outperforms other methods in terms of top-k accuracy, especially in scenarios with low similarity.

1. Introduction

Moving records bring large opportunities and challenges for mining knowledge about public transit behavior in many location-based-service applications, such as travel recommendations, location-sensitive advertisement, and traffic planning. In contrast with other types of applications, next-location prediction is considered a difficult issue with uncomfortable and inherent uncertainties that are relevant to its unique features [1]. Transit uncertainties are mainly caused by two things: The first is randomness due to the nondeterministic nature of choice-behavior problems, and the second is vagueness uncertainty due to the lack of familiarity with road networks and the linguistic information of the network attributes [2]. Based on these two choice-behavior models and according to their study on a million users, Song et al. give the points where the movement patterns of users can easily appear random and unpredictable [3].

In particular, the dissimilarity of movement behavior in massive-scale data is an obvious problem in real-life scenarios. For example, in Figure 1, we collect user urban travel behavior by using Wi-Fi sensors in the city, thereby acquiring both geographical and temporal information. We evaluate the regularity of travel patterns of people by using similarity algorithms (e.g., Jaccard similarity) and find that the places visited by different residents differ completely in daily life. Even more, primary daily trajectories from the same user are different. Based on our experiment results, we conclude that most users do not have regular daily transit patterns—less than 20% of people have consistent patterns.

According to the large-scale observational study mentioned above, more recent developments turn to statistical- or pattern-based methods, which assume that user behavior patterns are repeatable and predictable, thereby ignoring the randomness and vagueness in real-life scenarios. Below, we list three key challenges of next-location prediction:(i)First, the quality of moving data depends on user activities and technical limitations. For example, taxi trajectories recorded by GPS devices are continuous and high precision. Although people actively post the places they visit to location-based-service applications to mark their track, these data are of low sample rate and accuracy. Another reason that user activities may be missed is because of technical limitations, e.g., a Wi-Fi sensor cannot capture mobile MAC information if it is off or if the device is in airplane mode.(ii)Second, existing work leverages sequential models of point levels to capture movement regularities. However, mining-mobility patterns are complex and time sensitive and depend on public transport systems, activity time, weather conditions, and other possible emergencies. All of these factors interact to complicate predictions.(iii)Third, residents living in different areas in a city have different trajectories, but the intent behind these completely different activities may be similar. Finding the underlying semantic context for each trajectory based on the points is a difficult but important task for predicting next location.

These challenges are illustrated in Figure 2, where trajectory a is an actual path with six nodes . Trajectories b and c are two paths observed by the Wi-Fi sensors. For trajectory b, node is lost due to technical limitations or user activities. For trajectory c, the user went through nodes and , as recorded by nearby sensors and , and the observed trajectory is . Therefore, the tracks b and c only present parts of actual paths with some errors, which degrades the capability of the statistical and pattern-based prediction models.

In past decades, probability models and fuzzy models have been used to deal with the problems of randomness and vagueness. Researchers now use hybrid models to interpret incomplete or discrete datasets, as exemplified by, e.g., the FLOWSIM travel-choice model [4], the LCML model [2], the AHP model [5], and the FITA model [6]. However, all of these approaches are based on predefined features, such as distance, speed, parking conditions, and comfort level, which are not available in the present study. Furthermore, neural network models perform better for predicting general locations [7, 8], and graph neural networks (GNNs) use structural interpretation for classification. Our intuition and motivation to build our model are based on these previous studies.

We thus propose herein an augmented-intention recurrent neural network model (AI-RNN) for location prediction in randomness and vagueness situations. In the AI-RNN model, we use a graph structure of nodes to build intent for each user through their historical transit logs in which each position node is embedded and combine the features of its adjacent nodes. Therefore, AI-RNN can describe the distinct characteristics of different locations and leverage GCN models to extend the semantic context of points. To choose suitable nodes to generate transit intention, we also propose three selection strategies to explain each trajectory for special transit scenarios. We then use a GCN to augment the vector of each point in the trajectory and use a RNN with these augmented intention points to predict the next location.

The contributions of this work are summarized as follows:(i)We propose an AI-RNN model to capture the randomness and vagueness difficulties over a large scale of transit records. AI-RNN is thus an end-to-end trajectory-prediction approach that considers both trajectory graph structure and sequential patterns.(ii)We design three context-selection strategies to augment user intention, including random selection, path-direction-oriented selection, and path-probability selection, which refer to various situations of user movement. We evaluate these strategies for various trajectories and determine which selection strategy works best in special scenarios.(iii)We do extensive experiments on two real-life datasets: our urban travel dataset and the public Foursquare dataset. The results show that the AI-RNN method outperforms other statistical methods and the RNN method [79] for accuracy in top-k categories. Furthermore, compared with randomly selected nodes, a clear business strategy could find some nodes related to the current trajectory semantic and improve the accuracy.

The rest of the paper is organized as follows: we first formulate the problem and introduce the concepts used in the AI-RNN in Section 2. Section 3 gives the architecture of the AI-RNN and proposes three selection strategies. Next, in Section 4, we discuss experiments with two real-life datasets and evaluate the performance of the proposed model by comparing its results with those of the existing methods. Related work is discussed in Section 5, and Section 6 concludes the paper.

2. Preliminaries and Motivation

First, we formally propose the location-prediction problem under diverse trajectories and then briefly introduce the RNN gated recurrent unit (GRU) and graph convolution networks (GCNs). Finally, we discuss the motivation and present an overview of our solution.

2.1. Problem Formulation

Definition 1. A trajectory sequence is a concept of a spatial-temporal sequence generalized as , where , . Trajectory sequence contains geographical information like longitude and latitude and other information about trajectory like timestamp .
For given trajectories, two preprocessing steps are done: data cleaning to remove potential precision errors of Wi-Fi probes and trajectory compression to deal with redundant data acquired during data collecting.

Definition 2. The concept of regularity represents the similarity of a given user’s trajectories. For example, user u has several trajectories . For each trajectory , we calculate the regularity by comparing with other trajectories via sequence similarity functions. The regularity of user u is defined as .

Definition 3. The trajectory intention is a representation of the semantic information for a trajectory. Given a trajectory , we can extract a set of features as this trajectory’s intention. Formally,  = TrajectoryLength, TrajectoryComplexity, Speed, Duration Time, etc. Analyzing a trajectory intention is not only beneficial to models based on statistics but also provides valuable features as input for models and plays a significant role in understanding the potential semantics information of users.

Definition 4. Location-prediction problem: given the first few points in a trajectory, , then for each location in the trajectory we make prediction about top-k next location. To find out the correctness of predictions, we check whether the real next location exists in predictions. For instance, we have a length of 10 trajectory sequence and next-location prediction . The true next point sequence is , and then for each point in , check if it exists in prediction , .

2.2. Gated Recurrent Unit

GRU is a gating mechanism in recurrent neural networks, introduced in 2014 by Cho et al. [10] to make each recurrent unit adaptively capture dependencies of different time scales. GRU has gating units that modulate the flow of information inside the unit. It has three major parts, which are gate, gate, and activation unit. First, the gate is computed by

Similarly, the gate is computed by

The activation of the GRU at time t is a linear interpolation between the previous activation and the candidate activation :

GRU was first introduced in machine translation, which shows that this kind of model learns a semantically and syntactically meaningful representation of linguistic phrases. Analogy to trajectory prediction, it can not only remember what happen before but also learn its internal semantics of each user’s trajectories to get a better understanding.

2.3. Graph Neural Networks

GNNs have been first introduced in the study of Gori et al. [11] and modified by Scarselli et al. as a form of RNN [12]. The idea of GNNs is using the graph structure and node features to learn a representation vector of a node, improved by Li et al. [13] by introducing a modern useful strategy for RNN training to the original GNNs. This strategy is borrowed from the idea of aggregation and combination, where it iteratively updates the representation of a node by aggregating representations of its neighbors.

Spectral approaches work with a spectral representation of the graphs and have been successfully applied in the node classification problem. While Bruna et al. [14] first introduced the convolution operation is defined in the Fourier domain by computing the eigendecomposition of the graph Laplacian, it demands great computational complexity and lacks spatially localized filters. Fortunately, these issues were addressed in [15, 16]. And Kipf and Welling [17] presented the most widely used GCN model, which is designed for semisupervised learning in transductive settings, and it integrates the operations of aggregation and combination and also applies element-wise mean pooling, which successfully reduces the number of parameters and can also scale to large graphs. Formally, the model of GCN is

2.4. Motivation Overview

The trajectory of an individual is a sequence of records of movements with geographical and temporal information. Thus, a suitable sequence model such as a RNN could resolve the sequential-prediction problem. However, end-to-end prediction methods depend on the quality and quantity of data. In real-life cases with diverse and sparse trajectories, the prediction accuracy would rapidly degrade. The main reason for the limitation of mobility prediction in the single-point view is that we cannot capture the complete intention of the trajectory because of the user’s frequent activities or because of the technical limitations described in Figure 2. Therefore, we need to repair the context of the point and determine the user’s trajectory intention before using sequential-prediction methods.

Our motivation is to find the “relevant” points of each point via the user’s historical trajectory and let the target point fuse the information of the relevant points to form a fuzzy vector representation, as represented in the last chart of Figure 2. We call this process an “intent augment” and merge it into our prediction task to improve the representation.

3. Augment Intent Neural Network Framework

Figure 3 presents the architecture of AI-RNN, which comprises three key components: the intention-argument module, semantic-selection module, and sequential-prediction module.

3.1. Intention-Augment Module

Due to the randomness and vagueness of trajectories, we should merge richer semantic information to represent user intention. Here, the graph neural network-related models could be utilized to describe the relation of points by users’ historical records [18].

Inspired by a relational inductive model based on graph neural networks that provide a straightforward interface for manipulating structured knowledge and producing structured behaviors [19], we design the intention-augment module to promote semantic of trajectories. We conduct tuple , where u is the user, is the set of nodes, and each is a point in historical trajectories. is the set of edges, where is the weight of edge, is the source node, and is the target node.

Intention augment of a point p is a trajectory context by past visited points, which could be an edge-weighted graph. After that, GCN-based methods would fuse related points to augment the semantic of point p.

3.2. Context Generation Strategy

The critical part of AI-RNN is how to build the semantic context for each trajectory. The intention of a trajectory has two key characteristics: the time sensitiveness and the scenario sensitiveness [1]. We propose three selection strategies to represent travel semantic, including random selection strategy, direction-oriented strategy, and maximum probability strategy. All of these strategies could be cooperated with graph-based learning models such as [20] or [21].

3.2.1. Random Selection Strategy

The naive idea for the selection approach is to choose part of nearby points at random. Since more points merged for computing the richer semantic of a trajectory could be represented, here we could draw context between the minimum level (i.e., the point node itself) and the maximum level (i.e., all the neighborhood nodes). We define a threshold γ whether or not to consider a point as an augmented information by the following equation.where represents the weight of each node which is defined by the number of each node’s neighborhoods, the function of is a random function which outputs a , and γ is a threshold, deciding whether we consider this point as an expanded branch. In this paper, we chose γ = 0.6 which provides the best result in our experiment.

Candidate points are selected from historical behaviors but should follow several criteria as shown in Algorithm 1. In Figure 4(a), we augment information of point by merging (red dashed line), as well as point .

Input: current trajectory for user u
Output: A dict for candidate nodes
(1) initialize: , ,
(2)for each node do
(3)  for each node adjacent nodes of do
(4)   if distance between and  < 
(5) and then
(6)    
(7)   end if
(8)  end for
(9)end for
3.2.2. Direction-Oriented Strategy

This strategy is intuited by the idea that most of the people traveling in the city prefer a simple but straightforward path. Therefore, we could choose those nodes which keep the direction of the original path with two criteria.

First of all, the selected segments are in an individual’s historical trajectories, which means the probability of transiting through these paths is relatively high. Secondly, the direction of the observed path and the additional path is similar, which means the direction difference between those two paths should be less than a threshold.

For example, as Figure 4(b) presents, given a trajectory segment in graph , the direction of , denoted by , is defined to be the angle of the anticlockwise rotation from the x-axis to a vector from to . And we find a point from historical trajectories and get the direction of as well. The “angular difference” is defined to be the minimum of the angle of clockwise and anticlockwise rotations from to . The criterion of direction-oriented strategy is to filter those points whose angular difference between current trajectories is greater than and equal to σ:

3.2.3. Maximum Probability Strategy

This model mainly demonstrates the probability that a user choosing to follow a certain path. In a given segment , is a previous node, is a candidate node, and is the augmented point.

In maximum probability strategy, we first split whole trajectory into separated consecutive points and calculate the probability of each pair . After getting all the two-point segment selection probabilities, we could calculate the three-point segment by the following equation. It should be noted that we assume that all segments are independent for simple. Of course, more complex methods could be used to calculate the selection probability of the composited trajectory:

In reality, due to the limitation of techniques, some user traces are ignored. According to this strategy, trajectory semantics could be compensated by past behavior probabilities. For instance, as Figure 4(c) represents, for two candidate nodes ( and ), we compare probability value of two composited segments and . If the path through has higher confidence, we put into our augmented structure.

3.3. Training Algorithm

Algorithm 2 outlines the training process of AI-RNN. AI-RNN works in an end-to-end manner without requiring human business features. We could choose one of three strategies to augment user intention (lines 3–5), fusion semantic context, and utilize recurrent neural network to predict next location.

Input: trajectory: , candidate nodes: , window size:
Output: trained model
(1)for each user do
(2)  for each traj. do
(3)   strategy 1: random selection
(4)   strategy 2: based on angle
(5)   strategy 3: based on conditional probability
(6)   //Use one of the above strategies applied on each traj.
(7)   Initialize: parameters θ
(8)   for each batch nodes do
(9)    
(10)     Neural Network (GCN ({Points: , Adj_Points: }))
(11)   end for
(12)  end for
(13)end for

4. Performance Evaluation

4.1. Dataset

We collect two representative real-life spatiotemporal datasets to evaluate the performance of the proposed model. The first one is urban travel data (https://github.com/jincanghong/next_place_prediction) collected by Wi-Fi sensors all over the city. The second one is the public Foursquare check-in data. Since the characteristics of spatiotemporal data in these two datasets are quite different, we describe detail operations as follows.

4.1.1. Urban Travel Data

Our real-life dataset is collected by Wi-Fi sensors installed in eastern city of China, which includes MAC address, timestamp, and geo-information. We select minipart of residents and remove the private properties for our experiment. Since our study focuses on the mobility pattern of trajectories, we need to do preprocess to choose actual residents by three steps. First, according to the top-10 best-selling and most popular phone in China, we check if the MAC belongs to those android brands as follows: Huawei, OPPO, Vivo, Xiaomi, Meizu, Gionee, Samsung, Letv, and Lephone, which accounted for over 75 percent market in 2017. And then, we select those residents who have enough activity track records in total number and have at least two-week data in a month. Finally, we generate trajectories according to the criterion that distance and time interval between any two consecutive trajectory points is less than 1 kilometer and 30 minutes. We calculate the visited number n of each node and then draw the heat map with function as shown in Figure 5.

4.1.2. Foursquare Check-In Data

This dataset contains check-ins in NYC and Tokyo collected for about 10 months (from 12 April 2012 to 16 February 2013). It contains 227,428 check-ins in New York city and 573,703 check-ins in Tokyo. Each check-in is associated with its timestamp, GPS coordinates, and semantic meaning (represented by fine-grained venue categories). Since the moving records are not continuous, we set the time interval to 3 days to generate trajectories.

Besides representing the statistic values of three datasets in Table 1, we measure the regularity of trajectories by the Jaccard function. In Figure 6, we note that most paths are short (less than 50 points), and the longer the path, the lower the value of regularity.

4.2. Experimental Setup
4.2.1. Platform

All the experiments are conducted in two environments. First one is a Cloudera platform with 24 physical machines, which is used to preprocess and generate the dataset. The other platform is a Dell server 64-bit system (16 core CPU, each with 2.6 GHz, GPU GTX 1080ti, and 32G main memory). The algorithms and models in our paper were implemented by Python 3.

4.2.2. Evaluation Criteria

We calculate precision@k to measure the performance, which means that the correct point gets to be in the top-k probabilities for it to count as “correct.” Given the trajectory with n nodes, the top-k accuracy of trajectory is calculated as the average accuracy value of all nodes. The experiments are conducted in terms of the test-train model, where the first 70 percent of trajectories are chosen for training and the remaining 30 percent for testing.

4.3. Models and Performance Results

To evaluate the accuracy of our model, we compare the proposed model with some classical and state-of-the-art methods: (1) The statistic model is simple, which chooses the next node by the largest frequency value of user in the historical record. (2) The Markov model is widely used to predict location in decades, which sets up a spatiotemporal mixture model to cover the prediction task [9]. (3) The RNN model treats locations in trajectory as a sequential model and focuses on modeling the continuous spatiotemporal information. The original model as RNN works like sequence prediction, and the updated model DeepMove [7] adds historical attention to recurrent neural network, since the result has been illustrated that DeepMove has better performance than original RNN. In this experiment, we select DeepMove as state-of-the-art RNN model to compare with. We implement all the three submodels of DeepMove with different parameters as , and . (4) AI-RNN is our proposed model that considers related external nodes to augment intention via RNN models. The first three models deal with prediction based on a series of concrete and complete nodes. Our model, on the opposite side, assumes that the observed nodes in trajectory are not complete and finds out latent intention during the prediction process.

The parameters of our model are set as follows: learning rate lr is [3e−4, 3e−3], the decay of ls is [1e−5,1e−4], embedding size of each location is 500, embedding size of time section is 48, and finally embedding size of user is 10.

4.4. Performance Comparisons

From Table 2, we have several interesting observations that confirm our research motivation. First, the RNN-based methods outperform the statistics model and the Markov model in the urban travel dataset. Our AI-RNN model is higher than DeepMove nearly 5–10 percent. In contrast, in Foursquare NY and Foursquare TKY datasets, the performance of Markov is somehow better but is still lower than those of DeepMove and AI-RNN when k is growing.

Moreover, DeepMove and AI-RNN have similar results in the Foursquare dataset, which shows that the function of augmented intention is not so effective. Observations show that if the time interval or distance in the trajectory is large, the intention will be blurred, thereby reducing the effectiveness of the AI-RNN. For AI-RNN, interestingly, the accuracy of the maximum probability strategy is better than the other two strategies in the Foursquare dataset, which demonstrates that in those LBS check-in applications, users are more likely to choose the place of interest rather than the place along the way.

4.5. Case Study and Insight Analysis

In this section, we visualize several points on the city map and illustrate the difference of semantic context in a trajectory.

Figure 7 provides examples of representative trajectory context by different strategies. Figure 7(a) represents the historical paths of the given user, in which each trajectory has a different color and the geographical dimensions are demonstrated on the city map. We can see that Figure 7(b) is the actual path and Figure 7(c) selects nodes , and randomly which cannot draw a clear travel intention. Figure 7(d) shows that users are one way or round trip with clear directions, where points are merged for computing. As visualized in Figure 7(e), points and are generated alternate paths for prediction.

Besides Figure 7, we further evaluate the performance under different conditions and discuss model variations via three strategies.

We first analyze the prediction result on different user groups. From Figure 6, we find that regularity of movement pattern is completely diverse in users. According to the statistical value, we select two user groups by their trajectory similarity: the value of the regular group is larger than 0.4, while the irregular group user has the value less than 0.2. The results are presented in Table 3, which suggests that regular user is much easier to predict as observed in life.

Besides, we evaluate the model performance in different time sections of day. We divide trajectories into various groups by their attributes: date, time, distance, duration, and length. There is no difference in prediction score between the work day and holiday groups. When we move to analyze the time influence of the prediction performance as Figure 8 shows, we find that trajectories during late evening and early morning are more predictable than other time sections, but it is not easy to predict during the working hours. It means that the user transits during night time with more clear travel intention.

According to the statistic value and power law distribution shown in Figure 6, the majority length of trajectory for urban travel dataset varies from 5 to 50. Figure 9 plots the prediction accuracy obtained by all three AI-RNN models. Evaluation results demonstrate that the accuracy value varies significantly but has the same trend on all models. The performance trend is increased with the growth of the length, which is contrary to the overview of the article [7] mentioned.

Works close to our task can be classified into two categories: pattern-based methods and graph-based methods. In what follows, we provide a brief overview of related work in these fields.

5.1. Pattern-Based Methods

A large number of approaches for trajectory prediction using statistics, like probability-based approach, time series, and Markov predictors, have been proposed before the rise of deep learning. Wiest et al. [22] introduced probabilistic modeling, Gaussian mixture models, which can not only get the prediction result but also learn a whole distribution over the future trajectories. Similar approaches were introduced in [23, 24]. Scellato et al. [25] introduced a spatiotemporal prediction framework based on nonlinear time series analysis [26, 27], which can estimate future times of arrival and residence times in the different significant places and where the user will be after a given time interval. Markov-based techniques have also been applied to the prediction of the destinations (geographical locations) of vehicles using, for example, partial trajectories [28]. Gambs et al. and Mathew et al. [29, 30] developed prediction location models with the Markov chains and hidden Markov models, respectively.

5.2. Relation Inductive Model

Graph-based semisupervised learning methods are used to classify the nodes and interpret the relationship between nodes [19, 31]. Moreover, since graphs are often related to other disciplines by their structures, the deep learning model for graph is critical towards decision-making problem. Researchers in articles [18, 32] summarized how to use GNNs and GCNs for relational reasoning using a unified framework called graph networks, and Lee et al. [33] reviewed the attention models for graphs. Based on the above structural learning, Yao et al. proposed a cluster-based model for location prediction, which aims at discovering groups of similar trajectories and revealing movement patterns [20]. The article [21] introduces a CNN-based approach for representing semantic trajectories and predicting future locations.

6. Conclusion

In this paper, we proposed the AI-RNN model with graph neural network and recurrent neural network. By augmenting the latent semantic of trajectory via suitable strategies, AI-RNN could improve the accuracy of next-location prediction in randomness and vagueness travel behaviors in the city. Compared with existing sequential-prediction algorithms, AI-RNN enjoys the advantage of partly utilizing the context information for every node in trajectory during the training phase. Experimental results based on real-world urban travel dataset and LBS check-in dataset show that AI-RNN outperforms the state-of-the-art baselines when the trajectories are continuous with short time interval. Additional experimental results illustrate that the duration and the distance could also make an effective influence on the performances.

Data Availability

The sample data have already been submitted and are present in GitHub. The whole dataset can be obtained via [email protected].

Conflicts of Interest

The authors declare that they have no conflicts of interest.

Acknowledgments

The research was supported by Research Support Project of Education of Zhejiang Province (No. Y201941372).