Abstract

A novel model approach is proposed to estimate the spatiotemporal distribution of demand for free-floating carsharing. The proposed model is based on a Poisson regression model for right-censored data and estimates possibly time-varying demand rates of small subareas of a service region based on booking data with spatiotemporal information on pickups and dropoffs of cars. The approach allows operators to gain insights into the spatiotemporal distribution of demand for their service and to estimate the loss of demand due to unavailability of cars. Moreover, it can also be used as an input to improve the design of the service, through relocation techniques or to analyze the service with macrosimulation models. In addition, the approach is applied to a case study with real data.

1. Introduction

Carsharing is a collaborative mode of transportation that, if used appropriately, can improve urban transport services from a user and environmental perspective. Among the environmental impacts that have attracted the attention of scientists, we can include the reduction in vehicle kilometers travelled [1], emission of pollutants (according to [2] up to 56% reduction), energy consumption [2], and congestion [3]. Among the social impacts, it was highlighted the reduction of the number of privately owned vehicles (according to [2] up to 13 vehicles could be replaced with one shared car).

Carsharing may be classified into station-based and free-floating systems [4]. In station-based systems, users start and end their trip at stations distributed within the service region. In free-floating systems, in turn, users pick up a vehicle parked near the origin of their trip using an app for booking and end their trip by dropping the vehicle at some chosen parking within the service region. In comparison, station-based carsharing seems to be easier to operate because vehicles are distributed in a few known locations, while free-floating carsharing offers the user higher flexibility, and therefore, vehicles are spread throughout the service region.

Due to its flexibility, free-floating carsharing often suffers from a mismatch between the positioning of supply and the orientation of demand; i.e., the dropoff places do not correspond to where other users want to pick up the car [5]. Therefore, to operate free-floating carsharing efficiently by positioning vehicles where demand exists, it is crucial to know how demand is distributed across the service region. The contribution of this article is to develop an approach specifically suited to estimate the spatiotemporal distribution of demand in free-floating carsharing systems.

The challenges that carsharing research faces may be attributable to the following three categories: (i) the definition of a transport system that is congenial to the needs of users (trying to discover which these needs are), (ii) analysis of the environmental impact of this service, and (iii) the economic efficiency of the supply service (aligned with the expected demand). Demand is a crucial input for tackling any of these challenges, and therefore, we believe that a good demand model estimation is essential for improving research for free-floating carsharing systems.

In this paper, we develop an approach to estimate the spatiotemporal distribution of demand in a local free-floating carsharing system. Thereby, we define the total demand as the number of cars that would have been booked in presence of an infinite number of available cars distributed across the service region, i.e., as the number of car pickups (observed demand) plus the loss of demand due to the unavailability of cars. However, this problem is complex, as free-floating carsharing systems deal with significant fluctuations in demand, depending on daytime and the area of a city [6] and is stochastic since it varies even between identical circumstances. In addition, demand can hardly ever be measured directly. In most cases, we only dispose of data on effective bookings, while data on prematurely cancelled bookings are often unstructured or incomplete. Therefore, the total demand needs to be estimated based on incomplete data, using additional assumptions on how people search and decide for booking cars and advanced statistical techniques. Our approach requires the following four inputs, which we believe to be available in most cases: (i) position and time of available cars and pickup place and time, (ii) study area divided into cells and time divided into intervals, with shape and dimension of the cells and the interval length being the model inputs, (iii) assumptions on how users search and decide for booking a car. Although our approach supports various assumptions, we will assume that users start searching with a preferred pickup place in mind and book only if a car is available within a circle around that place, with the radius of that circle being the model input, and (iv) specification of a function for how the total demand varies across time, including unknown parameters to be estimated from the data. This time function can include month, weekdays, and daytime effects.

The remaining part of the paper is structured as follows. In Section 2, the relevant literature review is presented. The postulated statistical model to estimate the spatiotemporal distribution of the total demand is described in Section 3. In Section 4, the approach is tested on real data from a service in a major city in Switzerland, and Section 5 summarizes the work together with the conclusions.

The last 10 years of research have seen great growth in interest in vehicle-sharing [7, 8], but there are still a lot of questions that need to be addressed. The main reason for these unanswered questions is that vehicle sharing is still missing flexible service strategies that can maintain a high level of service and guarantee long-term profitability. In addition, we summarize the literature on demand forecast methods for vehicle-sharing services.

This section is organized according to five approaches. First, research that combines demand forecasting with relocation strategies is discussed. Then, studies using multiagent simulation tools are described, followed by studies based on the stated preferences of users. After, we focus on studies that use statistical models to estimate demand, and finally, selected research using neural networks and techniques for censored data is presented. The section ends with the identification of the research gap that we try to fill with this work.

2.1. Demand Estimation Combined with Relocation Strategies

To be successful, carsharing requires the availability of resources (in terms of available vehicles and available parking spots) in the proximity of desired origin and destination of a trip to keep the service attractive [9]. As introduced earlier, our approach results in a loss of demand output that could be used as an input for a free-floating user-based relocation system, a solution for rebalancing the stochasticity of cars’ spatial distribution, orienting them towards the expected demand. The relocation problem is the most studied on the carsharing’s supply side and has its first references in [10, 11]. Further information about user-based relocation approaches can be found in [1216].

Some papers include both a relocation strategy and its relative demand estimation methodology. Wang et al. [17], for example, adapt a model of logistical inventory management, to forecast demand and relocate vehicles in a one-way carsharing system. This work focuses on the specific forecasting method. Cucu et al. [18] check the balance between the stations, investigating them in relation to the time of departure, the day of the week, the weather conditions, and the traffic conditions associated with their addresses. Stokkink and Geroliminis [9] develop a user-based vehicle relocation approach through the incentivization of customers and a predictive model for the state of the system based on Markov chains, following the concepts of the previous work of Repoux et al. [19]. This approach is specifically designed for one-way station-based carsharing systems, which are different from our free-floating case. In this approach, the input demand for Markov chains is computed using the approximation method described by Raviv and Kolka [20]. Raviv’s method assumes that the arrival processes of renters and returners are nonhomogeneous Poisson processes and estimates the rates using an approximation of a user-defined function. Jian et al. [21] develop a discrete choice model that includes vehicle availability as a parameter that directly affects the user’s mode. In this way, supply and demand are strictly linked together. The model aims to determine the optimal relocation decisions to maximize the carsharing profit. The decisive variables are as follows: (i) number of vehicles relocated from node to at time , (ii) number of vehicles available at node at time , (iii) number of users booking one-way trips from node to at time , and (iv) number of users booking round trips from node to at time . This work has an interesting supply-demand focus, but it is not applied to free-floating carsharing, and as input, it also needs the total travel demand of each origin-destination pair at each time step (sum of demand from carsharing and other transport services).

2.2. Activity-Based Multiagent Simulations

A big fraction of current platforms, used for demand estimation of new one-way carsharing systems, is based on activity-based multiagent simulations, i.e., microscale computational models for simulating the actions and interactions of autonomous agents [22] that allow modelling the interaction of supply and demand. For station-based operators, Benarbia et al. [23] propose an agent-based relocation strategy based on real-time inventory control within the framework of generalized stochastic Petri nets (PNs) and a discrete event simulation. The work of Balac et al. [24], using a multiagent simulation tool (MATSim), investigates the effects of supply on the demand of the existing round-trip carsharing (also implemented in the one-way station-based). The use of MATSim with relocation agents is described by Paschke et al. [25]. MATSim was also used by Ciari et al. [26] to estimate the demand for one-way carsharing in the urban area of Zurich. This type of solution needs complex inputs, such as the entire transport network (including public transit scheduling) and population data, with which the daily plans of each user will be generated. This methodology analyses in detail the convenience of carsharing for each user, shows the potential demand for this service, and is useful for analyzing the possible impact of new policies and new services. On the other hand, however, it does not allow analyzing the positioning of the vehicles with respect to the users who actually use the service. The analysis tool is therefore more complicated to set up than the one we will propose, and the goal is slightly different. Furthermore, this type of demand estimation has not yet been carried out for free-floating Carsharing.

2.3. Stated Preference Technique

Many of the initial studies aimed at understanding the potential demand of station-based carsharing in the urban modal split. For example, Catalano et al. [27] calibrate a modal split model, by stating the preference technique (SP). An overview of the literature addressed until 2013 can be found in Jorge and Correia [28]. Until that time, demand estimation had been developed almost exclusively for round-trip station-based carsharing. Stated preferences technique is still used to catch behavioural patterns, related to specific location. Recent examples can be found in [29, 30]. Lately, Zhou et al. [31] have also adopted a stated preference methodology to elicit consumers’ valuation of vehicle self-driving capability, a factor rarely examined in the literature. Regression models indicate that latent demand for this new technology is associated with respondents’ travel patterns, demographics, values, lifestyles, and environmental concern [32]. Stated preference techniques provide real choice data on some individuals and can then be translated to a larger scale of the same environment, on the basis of a series of hypotheses. However, these methods require time to assess the service, they are not adaptable to territories with different characteristics, and they do not provide information on the latent demand related to vehicles’ positioning.

2.4. Regression Models

Descriptive statistics or regression models may also be used for demand estimation. For example, Wagner et al. [13] predict future demand for free-floating carsharing, using neighbourhood data and point of interest (POI) data. The technique includes zero-inflated and geographically weighted regression models, from which they derive indicators for the area’s attractiveness. Willing et al. [33] extend that approach, by additionally including daytime and weekday effects in the model. Within this family of methodologies, but without regression models, Gammelli et al. [34] predict shared mobility demand by incorporating the censored likelihood within a Gaussian process model, with a censored likelihood function capable of handling time-varying supply. Finally, Negahban [35] propose a methodology that combines simulation, bootstrapping, and subset selection to estimate the true demand in a bike-sharing service. Between these approaches, only the last two take into account how supply and demand are interconnected. Compared to the neural network approaches listed as follows, they may also be less accurate to predict future demand but easier to be applied in different scenarios.

2.5. Demand Forecasting Using Neural Networks

Neural networks can provide accurate forecasts for future demand but require a large number of parameters and context validation to be set up. Furthermore, loss of demand due to the unavailability of cars is rarely taken into account by practical implementations. Wang et al. [36] study relevant indicators affecting carsharing service’s demand at the operational level and construct a microdemand forecasting model for one-way electric carsharing systems, combining long short-term memory networks with the Granger causality test [37]. Yu et al. [38] propose a new approach based on deep learning techniques to assess the operation of a station-based carsharing system. They employed long short-term memory (LSTM) structure to forecast short-term future vehicle uses. Alencar et al. [39] evaluate seven state-of-the-art forecasting models on a given free-floating carsharing service, highlighting the potential of each technique. The assessed models include ARIMA and SARIMA, prophet, variants of boosting algorithms, and long short-term memory (LSTM). Guidon et al. [40] apply Cox proportional-hazards model and random survival forests to a free-floating E-bike-sharing system, using locational characteristics, weather, day of the week, and bikes in the vicinity to predict the time to pick up for each zone. Huttel et al. [41], instead, have addressed the problem of censored mobility demand and proposed to estimate the entire distribution of latent mobility demand via multioutput censored quantile regression neural networks. These methods try to model the reality but need parameterization and a context-specific initial study.

2.6. Supply-Demand Interaction in the Analyzed Literature

Many of the abovementioned studies ignore situations where there are not enough vehicles available, in which case a part of the demand is lost. Vehicle availability is a key factor to attract new users for a carsharing service, and for this reason, low availability can limit the creation of new demand [42]. Demand for carsharing is difficult to model since the availability of vehicles is intrinsically dependent on the number of trips and vice versa [28]. The supply-demand interaction for shared cars is illustrated by Li et al. [43], who analyze a free-floating carsharing in a dynamic user equilibrium model. Among the abovementioned studies, Stokkink and Geroliminis [9] and Repoux et al. [19] focus on the loss of demand, Raviv et al. [20, 21, 24, 26] focus on how supply influences demand, doing a step over the simple supply-demand balance. Finally, Negahban [35]; Gammelli et al. [34]; and Huttel et al. [41] (together with our work) model the relationship between supply and demand, taking into consideration the supply influence on the demand: they do it by treating the number of pickups (the observed demand) as a censored measurement of the total demand (the number of pickups plus the loss of demand due to unavailability of cars).

2.7. Related Research Conclusions

Data detail, accessibility and reliability, high computational time, calibration, and validation still remain major challenges for travel demand estimation for carsharing systems [22]. Local characteristics make it complex to standardize many of the listed methods. For this reason, we tried to develop a method that is easy to apply and ductile, keeping high reliability. This ductility allows future integration with an origin-destination commuting matrix (including the estimate of its carsharing modal split). Between the listed methods, some are more easily transferable than others, but we believe that ours reaches a higher level of reapplication easiness, maintaining convincing results. This transferability does not limit high-detail spatial and temporal analysis. Finally, this is also one of the first carsharing demand estimation methods suited for a free-floating service.

Carsharing with shared autonomous vehicles can provide the combined benefits of autonomous driving technology and access-based consumption [44]. The advent of self-driving vehicles will address carsharing’s problems related to parking and noncompetitive access times. Solving these problems will make carsharing a service that will be almost equal to the automated-vehicles taxi service. In these future scenarios, users will not need to walk to pick a car that is parked far from them, because the car will go towards their position. This means that the latent demand connected to vehicles’ positioning (related to accessibility) will be highly reduced, but it will still remain important to properly distribute cars following the expected demand, to further reduce the time from the request and the start of the trip.

The problem that we reviewed does not have a resolution methodology that is universally optimal, but a series of parallel methodologies, to be used on the basis of the characteristics and constraints of the analyzed service, the availability of data, and the granularity and logic of the desired outputs. The methodology that we are presenting, compared to the state of the art, brings together different characteristics and allows making demand forecasts with high resolution on free-floating carsharing services, using few data as inputs. When total origin-destination demand data is available, some of the analyzed works, such as the study by Jian et al. [21], could also be used to integrate our method and refine our output. To summarise the gap that we want to fill, our methodology for carsharing total-demand estimation covers the following four strengths: (1) suitability (and application) of the model on a free-floating carsharing system, (2) high temporal and spatial resolution, (3) transferability due to a low amount of local-geography-related inputs, and (4) computation of the loss of demand, given a certain supply configuration. As far as we know, previous studies never used methodologies that allowed focusing in parallel on all of the four listed targets. We believe that this combination of strengths makes the proposed methodology a valuable tool for transportation research, especially for contexts that require an agile application and do not allow for time-consuming data preparation.

3. Space-Time Model

We postulate and implement a statistical model to estimate the total demand for cars of a free-floating carsharing system at a given time and location within a service region. For this, the service region is divided into a grid of disjoint, bounded, and equally sized cells and time is divided in discrete, consecutive, and equally long intervals . The index refers to a 2-dimensional square-shaped cell defined by center coordinates (e.g., N E) and a common side length (e.g., 250 m) and to the interval with the time stamp (e.g., June 16, 2022, 08:00:00) and the interval length (e.g., 1 hour). For illustration, Figure 1 shows a discretization of a major city in Switzerland into 222 adjacent cells with 500 m side length.

Square-shaped cells are chosen for simplicity and because they can be scaled up easily to any region. The model, however, can use any other shaped cells, such as hexagons or other shaped cells better adapted to the service region. The size of the cells should be chosen just small enough to allow precise conclusions about the spatial distribution of the total demand, but not smaller as smaller cell sizes will increase the number of parameters and therewith the expected computational complexity. For square-shaped cells, we recommend to set the side lengths between 50 m and 500 m. The duration of the intervals, , should be as large as the total demand can be assumed to be constant within the intervals, which will be an implicit model assumption. Setting to 1 hour, as in our case study of Section 4, may be a good rule of thumb.

The variable of interest is the total demand for booking a free-floating vehicle. Let be a random variable for the number of users considering booking a car at cell and time interval . We assume that is Poisson distributed with rate , so that has the density and consequently the expectation . Note that, throughout this document, random variables are denoted with capital letters and associated observations with small letters.

The total demand is not directly observable, as when vehicles are not available, the system does not register any information of users looking for a vehicle. We try to estimate the abovementioned rate based on the number of cars available for rent, denoted by and the number of pickups observed, denoted by , in the proximity of cell at interval . The model accounts for four situations and is built sequentially: first, we formulate a model that predicts pickups of available cars by linear combinations of the total demand rates . Second, we extend that model by allowing the rates to vary across time. Third, we take into account situations where the number of available cars was potentially insufficient to satisfy the total demand. Fourth, a smoothing approach is proposed to consider that neighboring cells are expected to have a similar total demand and to simplify the parameter estimation.

To build up the model, we first discuss specific small case examples to outline the logic of how the model spatially links the total demand rates with the observable pickups, and how the model can deal with time-varying total demand. Afterwards, the model is generalized to any grid and extended to situations where not enough cars are available.

3.1. Spatially Linking the Total Demand and Pickups

A picked car at cell and interval (that is, ) does not necessarily imply that the demand originates from cell . It is also possible that the user would have preferred to pickup the car from an adjacent cell, where there was no car available at the time. To model the number of pickups as a function of the total demand rates , we assume that users have a preferred pickup cell (i.e., the origin of the demand) but choose with equal probability any car standing in a cell not further away than (e.g., 500 m) from that origin cell, as measured by the distance between the centers of the cells. If no car is close enough, the demand gets lost. Note that becomes operative only if the center-to-center distances of neighboring cells are smaller than , otherwise a pickup is simply linked to the demand from the same cell.

The assumption is used for its simplicity, while the following model can accept other assumptions better suited for the considered problem. For example, may be set to different values across sections of the service region. The prerequisite for an alternative assumption is that it defines for each cell an according set of cells that could be the origin of demand for a pickup from cell . In practice, is generally unknown and may be determined by using the rule of thumb of Seign and Bogenberger [45] of about 300−500 m, by conducting a survey or by choosing such that a goodness of fit measure (such as the likelihood criterion) is optimized.

Figure 2 shows a square grid with 55 cells and only one car (or more than one) at cell 18. As an example, we arbitrarily set such that a picked up car can be assigned to a demand from cell 18 or cells around. We call the corresponding set of cells, which is highlighted in yellow in Figure 2, as the demand area for cars in cell 18. If we assume temporarily that the rate parameters do not vary over time (i.e., ) and the number of cars available in cell 18 is higher than necessary to satisfy the total demand for cars (i.e., ), then the expected number of pickups from cell 18 is equal to the sum of the expected total demand of the individual cells.

Since the right hand of the above equation is a sum of Poisson distributed random variables, is also Poisson distributed with parameter (see e.g., [46] Exercise 4.40). Given that the assumption holds, this equation collects all relevant information for estimating the parameters for the situation shown in Figure 2. The demand of cells further away than from cell 18 cannot be served because there is no car in proximity, and therefore, the situation does not provide information on the corresponding parameters . In order to get estimates for each of the parameters , we need to have several data of various moments in time where each cell is part of a demand area of a standing car. Otherwise, if a cell is never part of a demand area, then the corresponding parameters cannot be identified.

In reality, users might have more than one vehicle in their proximity. For example, Figure 3 presents a situation where two cars are available, one at cell 9 and another at cell 18. Here, the demand areas around cells 9 and 18 overlap at cells 13 and 14. To handle such situations, the total demand rates from cells which have more than one vehicle at reach (i.e., closer than distance) may be split in half, such that each of the two cells with vehicles obtains half of the total demand rates. This results in the following two equations for the expected number of pickups:

Alternative rules for dividing cells of intersections of demand areas could be considered. For example, if we believe that users always pick up the closest car, then we would assign to the cars of cell 18 and to the cars of cell 9. Therewith,

It is not always possible to fully separate the demand areas with the closest car rule. If, for example, the car of cell 18 in Figure 3 is moved to cell 17, then cell 13 has the same distance to both cars. In these cases we split cell 13 in half as in equation (2).

The second step of the model accounts for total demand variation across time. For example, total demand may change between mornings and evenings and weekdays or seasons. To account for these variations, we adapt the equations from above with further parameters. For example, suppose that the rates vary between weekdays (Monday–Friday) and weekends (Saturday-Sunday) so that the total demand rate of cell is at weekdays and at weekends. Let be an indicator with value 1 if time interval corresponds to a weekend, and otherwise 0. Equation (2) for pickups of cars from cell 9 with an additional weekend effect extends to

3.2. Basic Model

Equation (4) refers to a specific situation for the considered 55 grid and a simple specification for time effects. For general situations, grids, and specifications for time effects, we relate the total demand for available cars in some cell and time with a linear combination of the expected total demand of the individual cells at some at a chosen reference time interval and a linear combination of further parameters multiplied with time-related variables , as follows:

Elements indicate the share of the total demand rate of cell that is assigned to cars standing in cell . They take values between 0 and 1, and the sum over all cells with vehicles must be 1 (i.e., ). The linear predictor on the right of equation (5) has two components. and are design and parameter vectors of length (number of cells) to predict the expected number of pickups of cars from cell for some reference time interval, and and are design and parameter vectors of length that take into account time effects towards the reference time. For example, expressing equations (4) with equation (5) yields and .

The proposed model is quite flexible to estimate how total demand is spatially distributed and which time effects are taken into account. The main restriction is that the dependencies between the expected pickups and the possibly time-varying total demand rates have to be linear regarding the unknown parameters. The linearity restriction is not as limiting as it might appear at first sight. Nonlinear evolution along time may be modeled using dummy variables or polynomials that are still linear in its parameters. Section 4 presents a case study with real data to provide a hands-on specification.

If we assume that the number of cars is always sufficient to satisfy the total demand of the corresponding demand areas, then the parameters and can be estimated using tools for Poisson regression models, such as maximum likelihood estimation. For some cell with cars at interval , the probability of the observed number of pickups is as follows:and and can be estimated by maximizing , with the number of cars available for booking in cell and interval .

The assumption that the number of cars available is always sufficient to satisfy the total demand from the according demand areas may be realistic if is chosen small enough to not expect more than one pickup within the time intervals. In general, however, this assumption does not hold, e.g., when two users want to book the same and only car available practically at the same time. Moreover, bypassing the assumption by decreasing blows up the data volume and therewith increases the already considerable computational effort for estimation even more.

Using a Poisson model for right censored data [47] allows to account for situations where the total demand possibly exceeded the number of available cars. The censored Poisson model assigns different probabilities depending on whether the number of pickups is smaller or equal to the number of available cars: In the first case, we assume that the demand was fully satisfied and compute its probability using the Poisson density function. In the second case, where the number of picked cars equals the number of available cars, we assume that the number of picked cars is right censored, i.e., could have been larger if there were more cars available. Therefore, we compute its probability as the cumulative Poisson density function from the number of picked cars to infinity. Expressed mathematically,

Estimations for and can be obtained by maximizing the log-likelihood,

Since no closed form solution exists to maximize equation (8), we developed a gradient-based implementation in R [48] based on the optimizer function nlminb() [49]. The implementation allows the parameters (and possibly ) to be log-transformed to avoid negative estimates for rate parameters and automatically drops parameters associated with cells that never were in a demand area and therefore cannot be assessed. The estimating equations and the developed R functions are available on request.

3.3. Smoothing

The postulated model does not assume any relationship between total demand rates from adjacent cells. In general, it is reasonable to think that adjacent cells might have similar rates, or that the spatial distribution of these parameters should change smoothly across the service region. Only in particular cases, like geographical circumstances (e.g., a river) or other demand singularities (e.g., location of a big demand attractor), the total demand rate parameters might experience an abrupt spatial change.

We propose to use a kernel smoothing approach (e.g., [50] Chapter 6) to construct dependencies between the total demand rate parameters. The idea is to estimate “pseudo” total demand rates for chosen supporting points, and calculate the total demand rates of the cells as weighted sums of these , that is,

Figure 4 shows as an example the use of nine supporting points for a grid with 55 cells. The supporting points are located at the edges of the cells, which is not a requirement.

Any kernel function can be used to compute the weights , such as the Epanechnikov [51] or the Gaussian kernel. We propose an implementation that uses higher weights if a supporting point is closer to cell : let be the euclidean distance between the center coordinates of cell and the support point . Using the standard Gaussian kernel, the are computed as follows:

Now, we can rewrite the linear predictor of our model to the following equation:which is again linear in its parameters and and can be estimated with the previously described tools.

The simulation studies in the Appendix show that estimates for from the smoothing approach can have lower variance than those from the original model, due to the smaller number of unknown parameters involved. The downsides of this are potential biases for the s, which can also be seen in the Appendix.

The smoothing approach involves specifying the location and the number of supporting points. An equally-spaced grid is used most frequently, for simplicity reasons. More supporting points allow capturing demand distributions with finer structures [52] but decrease the wiggliness and increase the computational effort. A practical implementation is provided in Section 4.

4. Case Study

We consider data provided by the Swiss commercial company Mobility (https://www.mobility.ch) from their so-called Mobility-Go free-floating carsharing service in a major city of Switzerland during 2021, where the service was operated with about 128 cars. The raw data consisted of 28,682 records on individual rentals without service trips and include information on the vehicle number, coordinates, and time stamps of the pickup and dropoff. To discretize space and time, each record was assigned to one of 747 cells of 250 m side length, based on the pickup coordinates, and to one of 8,759 hourly intervals (e.g., June 16, 8 to 9 o’clock) based on the pickup time. The number of cars available for an interval was computed as the number of cars at the beginning of that interval plus the number of cars dropped during the interval.

4.1. Descriptive Analysis

To provide an overview of the used data, we divided Basel into nine equally sized districts, divided according to the cardinal direction from the center of the city. Figure 5 shows the average number of pickups per hour along daytime, weekdays, and months for each area, together with the average of the nine districts.

Daytime presents a classical temporal demand pattern with few demands during night period (below 0.2 PU/h per district), a first strong increase in the morning between 7:00 and 9:00, and a moderated continuous increase until the daily peak at 19:30 in the afternoon. The values are higher in the afternoon because the graph also includes nonworking days. If we only consider working days, this daytime profile is much more balanced between morning and afternoon.

Regarding the days of the week, we can see that, as expected, Saturday is the busiest day with about 0.45 pickups per hour and area, followed by Friday and Sunday. For workdays, we see a slight growth from Monday to Thursday, while we could have expected a flat profile for these days. Along the months, we see a tendency of higher values in cold months and lower values during summer. The relative maximum in May is not self-explicative, but it is important to remember that from this graph, we can extrapolate little information, considering that we have a one-time pattern and not a pattern that was repeated many times, such as the daily profile. The pattern along the months could have been influenced by factors such as pricing, policies, information campaigns, or the COVID-19 pandemic that was still relevant in 2021.

The nine districts into which we divided the city differ in some cases from the average profile, both in terms of frequency and in terms of profile shape. Readers can examine Figure 5 to understand differences between the districts.

As mentioned above (Section 2), some relocation models (classified as nonpredictive) rely on few indicators to characterize the demand. Reiss and Bogenberger [53] used three indicators for each district to detect the attractiveness of a district: demand factor, origin-destination factor, and idle times. Here, we check the balance between supply and demand in the nine districts by using their demand factor, which is defined as the ratio between rentals and vehicles in a district.

In Table 1, we can consider the demand factor as a parameter to evaluate where relocation should be carried out (from the districts with many cars and low demand factor to the districts with a high demand factor). We note that the south-west and north-west districts have higher availabilities of cars than the center, despite having a considerably lower number of pickups. Candidates for receiving cars through relocation are the center and the north districts, exhibiting with 84% and 77% the highest ratios between pickups made and car availability. On the contrary, the south-east district has, in average, twice as much cars available than car pickups and is therefore the best candidate to take cars away for relocation. Here, the demand factor is applied on a large scale but could give further information if applied to smaller districts. Similar conclusions on relocation can be obtained from the proposed modelling approach applied in the subsequent analyses, which uses a much more detailed spatial resolution by design.

4.2. Spatio-Temporal Model Specifications

In this section, a spatiotemporal model is estimated based on the previously described data. Based on Seign and Bogenberger [45] suggesting that cars should be available within 300−500 m walking distance, we set for all models. This implies that users book a car only if they find an available vehicle not further away than 500 m from the preferred cell, as measured by the distance between the centers of the cells. Furthermore, we assumed the total demand rates to vary across months (Jan, Feb, …, Dec), weekdays (Mon, Tue, …, Sun), and day intervals (night: [0–6), morning: [6–12), afternoon: [12–18), and evening: [18–24)). Such time effects were implemented with dummy variables, where the reference categories (September, Monday, and night) refer to the interval with the lowest average number of pickups. Using the dummy variable specification allows to deal with nonlinear evolution across time, which can be identified from Figure 5.

We implemented four specifications of the proposed model ( to ), which differ by how overlaps of demand areas treated (cf. Section 3, Spatially linking the total demand and pickups), and whether or not smoothing (cf. Section 3, smoothing) was applied. The models and use overlapping demand areas (total demand rates from cells which have more than one vehicle at reach are distributed evenly across the intersecting demand areas), whereas and use the closest car demand areas (total demand rates are assigned, whenever possible, to the demand area of the closest car). Smoothing was applied only on the models and . A grid of 81 uniformly spread supporting points was used for smoothing, reducing the total number of unknown parameters from 767 to 101.

To estimate the models, corresponding design vectors and had to be prepared based on the stated assumptions for the demand areas around cars and time effects. Smoothing additionally required the preparation of the weights , see equation (11).

The number of unknown parameters is 767 for basic models (747 one for each zone and 20 temporal parameters) and 101 for smoothing models. For estimation, all parameters λ and were estimated on the log scale. This ensures the estimated total demand rates to be always larger than zero.

4.3. Spatial Distribution of the Total Demand Rates

Figure 6 shows the estimated coefficients, which refer to the estimated total demand rates (per hour) at a Monday in September 2021 from 0 to 6 o’clock AM. Most obvious is that basic models result a patchwork of estimated rates, while smoothing models do not. Comparing the two basic models, it can be seen that overlapping demand areas result in more peaks than the closest car demand areas. This may be related to our findings from the simulation studies in the Appendix, pursuant to which the total demand rate estimates of overlapping demand areas have higher variance, see Figure 7.

Figure 3 exhibits that the basic models may not be able to estimate the total demand rates for all cells, see the black squares in the north-east. This is because there was never a car available for rent within circle around these cells. Smoothing results estimates for these cells; however, these estimates should be interpreted carefully since kernel smoothing approaches are known for boundary bias (e.g., [50] Sec. 6.1).

In terms of model fit, we found that the log-likelihood, the Akaike information criterion [54], the root mean squared error (RMSE), and the mean absolute error (MAE) of the basic models are slightly superior to smoothing, see Table 2. While the superiority of the basic model regarding the log-likelihood, RMSE, and MAE was expected because the smoothing models are merely restricted submodels with fewer parameters, the superiority regarding the AIC indicates that the smoothing is too strong and should be improved, e.g., by adding supporting points or placing them more efficiently. Furthermore, the models and with overlapping demand areas perform insignificantly better than the according models and with closest car demand areas regarding the log-likelihood, AIC, and RMSE, but insignificantly worse regarding the MAE.

Figure 8 shows the estimated total demand rates of the model with overlapping demand areas and smoothing, which is the best among the smoothing models according to the log-likelihood. Figure 8 is identical to the top right plot of Figure 6, but with a finer color scale to facilitate closer examinations. The plot highlights two regions with higher total demand within the center of the service region, which can be attributed to regions close of the train station and the old town, and two local peaks at north-east and south-east.

Figures 6 and 8 present the estimated distribution of the total demand rates for the reference time Monday in September 2021 from 0 to 6 o’clock AM. The time effects discussed as follows allow total demand rates to vary over time, e.g., because the demand might vary across weekdays. However, because the considered models assume that time effects are constant across the whole service region, the estimated spatial distribution will not change and only the rates will be increased in every cell by the same factor.

4.4. Temporal Distribution of Total Demand Rates

Figure 9 shows the estimates of the four models regarding the three considered types of time effects. The shape of the coefficients along time is very similar between the four models. The plot on month effects on the top left reveals that the total demand was highest in January, and there was a temporary peak in May–June. We expect this pattern to be related to the COVID-19 pandemic and to not be repeated in 2022. Estimates for weekday and daytime effects can better accommodate the expected: We find a clear peak for total demand on Saturdays, and higher total demand at afternoon and evenings than at night and mornings.

Some coefficients almost reach value zero, which is the lowest possible value due to estimating the coefficients on the log scale. This is especially notable for month effects where the estimates indicate that the total demand of July, August, October, and November was practically the same as in the reference month September. To find out if those estimates with values close to 0 relate to convergence difficulties, we used different optimizer routines and applied a number of small model modifications, such as changing the reference categories and the side lengths of the cells. However, the optimizer routines reported to converge, and the model modifications did neither clear out the close-to-zero coefficients nor change the findings for time effects fundamentally. Moreover, the order of the estimated time effects is consistent with the results from the descriptive analyses, cf. Figure 9. For these reasons, we assume that the estimated coefficients are reliable.

4.5. Loss of Demand

The estimated models may be used to estimate the loss of demand. In line with our model, we distinguish between the following two types of loss of demand:(1)No cars in proximity: In situations where there is no offer, the entire demand gets lost. For some cell and time interval that is further away than from the nearest cell with cars, we estimate this type of loss of demand as , i.e., as the estimated total demand at the baseline setting (in our case: Mondays in September 2021, 0 to 6 o’clock AM) plus the estimated time effect for interval .(2)Not enough cars in proximity: In situations where all available cares are picked up, loss of demand occurs because more cars could have been rented with a larger offer. Consider some cell with cars and pickups at time . The conditional expectation for the total demand is in this situation(3)where the model estimate for is . Therefore, we estimate this type of loss of demand as , i.e., the estimated conditional expectation minus the number of pickups. The peculiarity of this type of loss of demand is that our model implies that it cannot solely be attributed to cell , but to all cells not further away than from cell . However, we did not find an analytical formula of how to divide to the neighboring cells, and therefore, they are attributed to the cell car in the following results. It should be noted that in the presented case study this second type of loss of demand is practically negligible compared to the first type.

The two proposed estimates for loss of demand above refer to the number of cars not rented by the carsharing system compared to the same system with the same demand but an infinite number of cars available. The following results on the estimated loss of demand are based on model (overlapping demand areas and smoothing) and the 2021 data used for estimating the model.

For the entire service region, we estimated an average loss of demand of 14.2 cars per day, thereof 12.9 because there were no cars in proximity, and 1.31 because there were not enough cars in proximity. Compared to the average number of pickups per day of 77.0, this means that the number of rentals could be increased by about 18.5% by providing an unlimited number of cars, assuming that increased offer would not increase the demand.

To detect the loss of demand locally, Figure 10 shows the spatial distribution of the average loss of demand per day, based on the model . According to Figure 10, loss of demand is especially pronounced in the center of the city and not exactly at the total demand peaks situated on the left and bottom of the center, see Figure 8 (Figure 10).

5. Summary and Conclusions

This article proposes a novel model approach to estimate the spatial and temporal distribution of total demand rates for free-floating carsharing. The proposed model is based on a Poisson regression model for right censored data and estimates possibly time-varying demand rates of discrete cells of the service region based on booking data with spatiotemporal information on pickups and dropoffs of cars. The model is quite flexible as it can accommodate various shapes of cells of selectable size and different temporal effects. The model was successfully applied for a case study in a major city of Switzerland with data from year 2021.

The proposed model is useful for the following purposes: first, the model provides insights to operators on how total demand was spatially distributed and evolved over time. This insight can hardly be gained using simple descriptive statistics, because total demand is often not directly observable and therefore must be estimated using auxiliary variables such as the number of pickups, and an advanced modelling technique such as regression. Second, the model may be used to estimate the loss of demand due to unavailability of cars. These insights may prove useful to designate convenient dropoff places in incentive schemes for user-based relocations or to extract input parameters for macrosimulation models.

5.1. Limitations

The total demand rates estimated with our approach refer to the free-floating carsharing service that provided the data. Therefore, for competitive situations with multiple services, they cannot be interpreted as the global demand rates of the considered service region. If global demand rates are of interest, the model must be estimated using data that combine the competing services. Moreover, the estimated total demand rates do not take into account for other transport services such as public transport. Therefore, they refer to a given split of available transport services and may be sensitive towards launches or discontinuations of other transport services.

5.2. Future Work

Further investigations could focus on practical aspects of the model. Implementations for larger and more frequented service regions would help to define the scope of our approach and to improve guidelines for model specification. Furthermore, operators may be interested into forecasting future total demand. Forecasting involves extrapolation and has yet not been elaborated with our model approach, partly because it seemed difficult to be implemented for data from the COVID-19 era. A forecasting approach should additionally take into account for auxiliary predictor variables such as weather and should be able to deal with temporal correlation (e.g., by using a model with autoregressive errors) to provide reliable prediction intervals.

Appendix

To validate our method, we performed a simulation study using a grid of square-shaped cells with side lengths 0.2. The total demand rates of the cells were proportional to a multivariate normal distribution centered at the center cell 13 and varying between 0.05 and 0.3 cars per hour. To include time effects, the individual total demand rates were increased at evenings (18–24 o’clock) by 0.1 on weekend days by 0.2. Figure 11 illustrates the specified total demand rates by a map and a scatterplot.

We generated pickups for independent hourly intervals. For this, we first generated for each interval and cell the number of cars available using a Poisson distribution with a common rate for all cells, and then generated the corresponding pickups based on our postulated model (equation (6)) and the specified total demand rates. Simulations were performed for six scenarios regarding data generation and model specification. Each scenario was replicated 512 times, resulting 512 estimated models per scenario.

For the baseline scenario, we used a car rate per cell of 0.16, which corresponds to the average of the total demand rates on weekdays between 0 and 18 o’clock used in this simulation study. Demand areas, which need to be found to construct the vectors of the postulated model, included adjacent and diagonally adjacent cells of the cells with cars (). Data for 4,321 hourly intervals were generated, which corresponds to the number of hours of the first half year of 2021 (including summer time changeover). Fitted models from the baseline scenario are correctly specified and therefore should identify the data generating total demand rates.

For the alternative scenarios, we halved and doubled the car rate, misspecified the parameter for estimation ( and instead of ), halved and doubled the number of time intervals, used the smoothing approach with 33 supporting points and considered demand areas that include all adjacent or diagonally adjacent cells including cells have a closer car somewhere else.

A. Results

Figure 12 shows the distribution of the estimated parameters for the baseline scenario. The estimates vary around the predefined total demand rates, suggesting that the estimation procedure is able to identify the data generating total demand rates if the model is correctly specified.

Figure 13 compares the estimated parameters for cells 1 (lowest total demand rate) and 13 (highest total demand rate) between the baseline and three alternative scenarios. The top left panel shows the effect of increasing the number of cars available. It can be seen that increasing the number of cars increases the accuracy of the estimates. Interestingly, the accuracy of the estimates for fewer cars is about the same or slightly better than for the baseline scenario.

The middle panel of Figure 13 shows the effect of misspecifying the parameter , i.e. the maximum deviation users would accept from the preferred pick up cell. While misspecifying seems not to affect the estimation of the total demand rate of cell 1, it does for cell 13. Specifically, choosing too small results a upward bias, and choosing to large a downward bias. This seems plausible because increasing implies that the total demand is spread over more cells.

The right panel of Figure 13 shows the effect of decreasing or increasing the number of observations. As expected, the accuracy improves with an increasing data size.

The left hand of Figure 7 compares parameter estimates between the baseline scenario and a smoothed estimation with 9 supporting points, which were evenly distributed within the surface of the grid. As could have been expected, the smoothing approach decreases the variance of the estimates; however, in case cell 1, it introduces a bias by overestimating the total demand rate.

The right hand of Figure 7 compares the parameter estimates between the overlapping and the closest car demand areas. In both cases the estimates vary around the data generating total demand rates. The estimates for the overlapping areas around cars have slightly higher variance.

Data Availability

Access to the booking data used to support the findings of this study requires the approval of the data owner and can be requested from the corresponding author.

Conflicts of Interest

The authors declare that they have no conflicts of interest or personal relationships that could have appeared to influence the work reported in this paper.

Acknowledgments

This paper represents independent research funded by the Swiss Innovation Agency Innosuisse (project title: User-Based Redistribution für Free-Floating Carsharing). The authors are grateful to the Innosuisse for its financial support.