The application of the negative binomial model starts with the variable selection. The first selection criterion is the non-redundancy of the variables. This reduced set of variables is used as explanatory factors for each GLM model, whereby the significance of the variable is a further selection step.
5.2. Significance Check
The 108 non-redundant variables are the first basic dataset for the regression model. We distinguish at first two NB models: Model I includes all non-redundant variables with 95% significance; Model II consists just of the very highly significant, namely 99.9% significant variables (
Table 1).
The McFadden’s
values of all models are not very different, which means that the highly significant variables have the most explanatory power. As explained in [
30], a
value between 0.2 and 0.4 is usually a good result as this is not measured in the same scale as that of a linear regression. The models nevertheless have a quite disappointing value and would mean an explanation rate of around 30% of the data. We therefore want to discuss a third regression model that skips the redundancy check and contains all variables from the four datasets. Model III consists of all 95% significant, but partly redundant variables listed in
Table 2. The fit of the model is not significantly better however.
There are some reasons that might cause the quite low pseudo
values. One usual explanation is a poor choice of statistical model. This is not likely since the residual plot in
Figure 2 shows a normal distribution of the residuals with just some outliers. The residual plots of the Poisson and quasi-Poisson model approaches were worse when these models were experimented with. The NB model could moreover factor in the indicated overdispersion of the data.
Another simple reason for the pseudo values could be that there are important variables missing in the model or some variables like the street length do not represent the supposed influence of parking lots in a sufficient way. We assume that some effects appear locally and could not be represented through the very general set of variables. For instances, the number of bars might be a plausible influence variable on the number of FFCS bookings. However, if there are several bars within a pedestrian precinct at one place and a similar number of bars with available infrastructure for cars, the booking demand is supposed to be different.
The results of our modeling approach show that precise forecasts with our chosen datasets are not possible. Nevertheless, we are able to find trends and general positive and negative tendencies of spatial characteristics that can be demonstrated by the significance and signs of the coefficients.
5.3. Interpretation of the Variables’ Effect
The interpretation is focused on the models with only non-redundant, significant variables (Model I). The variable selection process helped to focus on the factors that can explain the demand in the best way.
The different scales of the variables do not allow for a comparison between them; hence, the interpretation is focused on the sign of the estimate in
Table 2. We assume that because of the redundancy selection process done at the beginning of the analysis, these variables are also representing other similar ones from the original bigger set. The interpretation aims therefore at finding categories of significant variables.
There are some explanatory variables that are obviously related to mobility behavior. The average number of private cars and the index for registered vehicles describe the affinity of citizens in the district towards private car ownership, thus representing what we call the type of car user. The greater the percentage of people who own a vehicle, the less is the frequency of FFCS bookings.
Considered on a city level, the private car density also indicates, like rent, the centrality of an area. Berlin’s citizens tend not to own a car in central districts. Rents are also higher in these areas, and the sign of this variable is, therefore, positive. The distance to the city center having a coefficient with a negative sign can be assigned to the centrality category, as well. A high density of bars and companies in general is positive for the FFCS demand. The absolute number of buildings, however, has a negative influence, which may be caused by the fact that in dense areas, the absolute number is lower, but the number of units per building is significantly higher than in the periphery.
The rent in a district represents, as well, a certain measure of the attractiveness of a place, but also how much money the residents in this area can afford to pay for living in it. Thus, the variable is also a representative of the financial situation of the users. FFCS is a means of transport that is not affordable for every social class. A 10-min trip is as expensive as an inner city ride by public transport. Customers of flexible carsharing should not be too price sensitive since they value convenience. A high number of households of people from a low social class is therefore negatively influencing the demand. A too profit-oriented population is the other extreme and reduces the number of booking starts, as well.
The street length is not difficult to interpret either. The variable was inserted into the model as a proxy for parking opportunities. The positive sign shows that public parking is probably essential for a high number of bookings and more relevant than the size of a district.
A political party that turns out to be non-redundant and very significant is the far-right party NPD. Voters for this party are assumed to come from a very conservative milieu who tend to refuse the usage of new modes of transport. The negative sign thus indicates the low open-mindedness of citizens in these districts, which are recognizable in their negative attitude towards FFCS carsharing. The percentage of foreigners and the affinity towards analogous telephones can also be interpreted as traditional households, which do not positively influence the FFCS demand.
The age variable (03–05 years old population) and the household size form a category that can best be characterized by the expression family situation. The factors represent the percentage of young families in a district. Because of the fact that the birth of a child is still a reason most parents buy a car (thereby altering mobility behavior completely), the variables have a natural negative impact on the number of carsharing bookings. This may play a role: baby seats are not part of the standard equipment, and only backless booster seats are available on some rental vehicles. Despite all efforts of the FFCS provider, some customers take the equipment away from the car’s trunk. The variable 10–14 years describes families in a different situation. They are usually financially better situated and may use FFCS as a substitute for one’s own second car.
The rest of the variables are not always easy to categorize. Surprisingly, the residents’ density has a negative impact on the booking frequency. Intuitively, more people in an area would mean more potential customers. It is likely that the density has to be considered as a compensation for other variables in the model. Another reason is that a high density of citizens is positive for the demand of FFCS vehicles, but can only be satisfied if sufficient parking space is available. This is often not the reality in central districts. Especially in districts with many old buildings, it is common that many possibilities of curb parking do not exist.
An interesting fact is that at least one variable of these categories also appears in Model II. This indicates that a limitation of the model to the highly significant variables makes sense.
Some variables (such as the votes for the Greens) surprisingly do not appear in Model I or II because they were redundant. To ensure that we do not neglect any important group of variables in our interpretation, we consider also the at least 95% significant variables of the model containing all variables (Model III,
Table 2).
It is nearly only census data variables that prove to be redundant. The reason for this is that many variables appear in many specifications. For instance, the age groups for male and female are clearly correlated with the age groups in total, and the indices are related to the corresponding variable in EUR. Quite surprising is the fact that the voting results of the Bundestag election are for some parties already expressed by other variables and therefore turn out to be redundant.
A look at the variables of Model III shows that the parties are mostly significant, and the Greens have as expected a positive estimate.
There are also more age variables appearing in the list. Again, residents with very young children (up to 10 years) or between 35 and 44 years have a negative influence, while households with one or two persons have a positive impact. A higher density of registered vehicles appears again to be non-favorable for FFCS. Centrality proves also in this model to be an important influencing factor.
We can thus conclude that there are six important variable categories found to have a statistically-significant influence on the spatial distribution of the demand for FFCS in all three models. These were
Open-mindedness;
Type of car user;
Financial situation;
Centrality;
Parking availability;
Family situation.
Some of the variables have already appeared to be significant in a linear regression model applied to other smaller datasets [
26]. The results of these models are however more reliable due to a better model fit.
An important question remains as if these categories, which primarily characterize each district, can also be used to describe the typical customers of FFCS. As has been said already, the authors see the study from Seign as a reason for transferring socio-demographic characteristics of the district to the users. This conclusion is confirmed by the findings of Mueller et al. [
31] who present the results of surveys from onboard units of the vehicles in which users were asked to tell the purpose of their trip: Most of the costumers use carsharing for their trip back home.
5.4. Transfer of the Berlin Model to Munich and Cologne
In the following, the estimated NB model is applied to the cities of Munich and Cologne to assess the usability and transferability for other cities in order to predict potential hot spots for FFCS. It is also applied for Berlin itself to show the performance of the model in predicting its own estimation data. As mentioned above, all three considered models do not have enough explanatory power to be used as precise forecast models for the absolute number of bookings, which would be required to solve some operational problems of the carsharing operator [
32,
33,
34]. It is also not possible to estimate the exact number of bookings in another city since the fleet size and the number of customers influences the absolute booking frequency. Rather, the model could be useful for predicting booking hot spots. Categorizing the prediction of trips and observed bookings is hence a necessary step. The validation of the negative binomial model is done by applying it to other cities and comparing the results with observed booking data by distinguishing five categories between low demand and high demand.
The result is presented in two ways:
Figure 3,
Figure 4 and
Figure 5 show maps with the observed booking demand. This is simply calculated by aggregating the position of trip starts over the district grid distinguished into five categories. The maps below show those results for Model II on the right, and for a better comparison between observed and predicted categories, a difference plot is mapped on the left.
The other results are quantitative.
Table 3 shows the rate of correctly predicted districts (zero), as well as the rate of the underestimated (negative values) and overestimated ones (positive values) for each city.
Figure 3 and
Table 3 show the results of the model application for Berlin, which are, as expected, very good. Underestimated districts in the difference plot are colored green, overestimated regions in red. A correct prediction of the frequency category leads to yellow-colored cells. The model works more than satisfactory. More than 45% is predicted correctly, and over 85% has only a deviation of ±1.
The observed data for Munich (
Figure 4,
Table 3) also show a strong centrality. Some northern parts also have a high demand, whereas southern regions show fewer bookings. The southern districts are overestimated, whereas the area around the BMW headquarters in the north is slightly underestimated. This is a good example of an additional local effect that is unpredictable by transposing the model of another city. Nearly 70% of the cells are classified with a good accuracy, and around 30% of the cells are categorized in the right category.
The city of Cologne also obtains a prediction by Model II (
Figure 5,
Table 3). The model just fails in some northern parts of Cologne: 37% of all districts are predicted correctly; 78% have just a slight deviation.
The validation of the models by applying them to other cities shows in general satisfying results. Even if the number of variables is reduced to the very significant ones, the NB model can be used as an excellent instrument to explain and predict hot spots of FFCS demand. The success can easily be observed by looking at each difference plot.
Nevertheless, there are local effects that affect the demand for bookings and are not represented in the model. These are for instance an over-average popularity of carsharing at a company. The BMW headquarters is an example, but also other companies may have special agreements with the carsharing operator. Peripheral areas appear to be more likely unpredictable, as well. The POIs outside of the operating area are a possible influence for this effect. Furthermore, some inner city areas vary slightly from the model prediction, which could be caused by local parking restrictions.