0% found this document useful (0 votes)
9 views15 pages

A Deep Dive Into The Accuracy of IP Geolocation

A deep dive into the accuracy of IP Geolocation

Uploaded by

pierreloisel2000
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
9 views15 pages

A Deep Dive Into The Accuracy of IP Geolocation

A deep dive into the accuracy of IP Geolocation

Uploaded by

pierreloisel2000
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 15

IEEE TRANSACTIONS ON MOBILE COMPUTING 1

A deep dive into the accuracy of IP Geolocation


Databases and its impact on online advertising
Patricia Callejo, Marco Gramaglia, Rubén Cuevas, and Ángel Cuevas

Abstract—The quest for every time more personalized Internet experience relies on the enriched contextual information about each
user. Online advertising also follows this approach. Among the context information that advertising stakeholders leverage, location
information is certainly one of them. However, when this information is not directly available from the end users, advertising
stakeholders infer it using geolocation databases, matching IP addresses to a position on earth. The accuracy of this approach has
often been questioned in the past: however, the reality check on an advertising stakeholder shows that this technique accounts for a
large fraction of the served advertisements. In this paper, we revisit the work in the field, that is mostly from almost one decade ago,
arXiv:2109.13665v2 [cs.CY] 1 Jun 2022

through the lenses of big data. More specifically, we, i) benchmark two commercial Internet geolocation databases, evaluate the quality
of their information using a ground-truth database of user positions containing over 2 billion samples, ii) analyze the internals of these
databases, devising a theoretical upper bound for the quality of the Internet geolocation approach, and iii) we run an empirical study
that unveils the monetary impact of this technology by considering the costs associated with a real-world ad impressions dataset.

Index Terms—IP geolocation, GeoIP, Online advertising, Performance Evaluation

F
1 I NTRODUCTION

T HE ubiquitous connectivity provided by modern cel-


lular technologies, and the success of the mobile web
and application paradigms, introduced the necessity to con-
As demonstrated by recent studies [7], [8], [9], [10], [11],
data brokers and ad-tech providers track and create profiles
from user activities using any kind of data items and the
textualize the service provided with location information. physical location of end users is just one of them. It is
While smartphones have supported this capability since important to note that GeoIP is by far the most employed
their infancy, the complexity of the World Wide Web (which methodology. According to our data, coming from an online
is the common back end for practically the entire landscape advertising stakeholder, which we will describe in detail in
of mobile applications) and the growing concerns on the §3, at least 50% of the handled ad-requests by an online
user privacy requirements, makes the structured gathering advertising stakeholder included a user location inferred
of such information difficult. For instance, geographic ex- through an IP Geolocation Database, GeoIP Database for
tensions of HTTP headers were proposed [1] but never ap- short.
proved in IETF, leaving this information only at application Physically pinpointing Internet hosts on the earth is
level either through JavaScript [2] or with OS APIs, upon a problem that has been studied in the early 2000s [12]
user permission. Hence, user devices such as mobile phones with active measurement technologies, and several GeoIP
only share precise positioning data (such as the one got by databases have been available for getting the latitude and
GPS devices) if they are explicitly configured to do so, and longitude information given an IP address for over ten
only with applications and servers that have got a specific years. However, their precision has often been disputed [13]
user permission. since basically their beginnings. In contrast to the literature,
Still, locating users and terminals can also be useful which we thoroughly review in §7, in this work, we tackle
outside the application domain, so network operators and this problem from a different perspective. For first time
third-party developers are constantly using alternative tech- we analyze the performance of GeoIP databases within the
nologies to achieve location knowledge. For instance, net- context of a business use case, i.e., online advertising. The
work operators (that have access to information coming larger scale and richness of our dataset allows to contribute
from the lower layers of the network) can reconstruct users the first upper-bound of GeoIP performance as well as
positions and trajectories by inferring them through the adding more insights to the state-of-the-art work already
visited cell-towers [3]. However, the most used technique to available on this subject:
get positioning information without the explicit gathering - Ground-truth: the extensive data we analyze contains
of GPS data is IP Geolocation, GeoIP for short. This practice the ground-truth position for each of the mobile terminals
is very common, and it is used for important tasks in the (e.g., smartphones or tablets) as well as their associated
online services landscape, including geofencing [4], fraud IP addresses. This precise ground-truth location, gathered
detection, and online advertising [5], [6], our focus. by using high precision Geolocation technologies such as
GPS, allows us to compute the error that GeoIP generates
• Authors are with University Carlos III of Madrid. compared to the actual ground-truth position of the devices,
• P. Callejo, R. Cuevas, and A. Cuevas are also with UC3M-Santander without resorting to other kind of approximations for the
Big Data Institute. real location of the IP address of the end users.
• Corresponding e-mail: [email protected]
IEEE TRANSACTIONS ON MOBILE COMPUTING 2

- Pervasiveness: previous studies usually resort to either a 2 BACKGROUND


few number of devices that can simultaneously record the In this paper, we study the performance of GeoIP databases
IP address and the position of the terminal, or limit their in their usage in online advertising. To this end, in this
study to a subset of hosts, whose position is known a priori. section, we briefly summarize the background on these
Instead, the data we use in this study covers millions of topics.
mobile terminals over three countries (Spain, France and
Great Britain) for one month. To the best of our knowledge, 2.1 GeoIP Databases
this is the first study that unveils the performance of GeoIP
at this scale. A GeoIP Database provides the mapping between any IP
address in the world to a lat,long coordinate.
- Heterogeneity: previous works usually focus on a specific
In contrast with other options, such as gathering user
network access technology: either fixed hosts using cabled
GPS data, GeoIP has a very high scalability and pervasive-
access, or mobile hosts connecting to the internet through
ness, as i) an IP address labels every host in the Internet
wireless mobile broadband. The dataset we analyze in this
(even the ones that do not have a GPS device active),
study contains both types of connectivity, gathered through
and ii) it is an information available to any network ele-
the analysis of mobile phone data. Namely, we study WiFi
ment, with no specific permission granted by the end user.
connectivity (hence covering IP addresses assigned by net-
Hence, GeoIP databases are arguably the only geolocation
work operator to fixed internet access) and cellular ones
technology that meets the needs of scale and coverage for
(fully covering IP addresses assigned by mobile network op-
businesses such as online advertising, fraud detection, or
erators). This allows us to assess the performance of GeoIP
antipiracy.
for many use cases, and detailing the monetary implications
For these reasons, despite their accuracy has always been
of this technology for online advertising.
questioned, it is also a reality that GeoIP databases are the
- Deepness: besides evaluating the performance of GeoIP, de facto most used technology for location-based services on
thanks to the extensiveness of the datasets we analyze in the Internet.
this work, we are able to provide deeper insights and draw The details on the specific IP to location algorithms used
possible theoretical upper bounds for the performance of by free and commercial [14], [15], [16], [17] databases are
GeoIP, which could be substantially improved, according to often not disclosed, and range from very simple WHOIS
our analysis. lookups up to measurements on the delay associated to
- Impact on technology: motivated by the extreme usage an address from different vantage points on the network
of GeoIP in the context of online advertising, we run an infrastructure. However, the technical information shared
empirical evaluation to assess the convenience of using by some providers [14], [15] as well as previous academic
GeoIP databases in ad campaigns from a budgetary point papers studying them [13], [18], agree on the fact that
of view. Contrary to the conventional wisdom, our results GeoIP databases are built following a common approach.
show that despite the obvious lack of accuracy of current Providers divide the space of IP addresses into autonomous
GeoIP Databases, the potential higher cost of more precise systems and further split them into variable sized IP pre-
technologies, such as GPS, may make GeoIP the most eco- fixes. Then, by using active and passive measurements, they
nomically efficient location technology to be used under map the overall set of prefixes onto a geographical grid of
certain configurations of ad campaigns. anchor points, that is their best estimation of the position of
As discussed above, in this paper we analyze the ef- each prefix. We will also use this assumption throughout the
fectiveness of two major GeoIP databases, providing these paper.
specific contributions:
2.2 Location Data in online advertising
• We revisit the findings of works published around The differentiation factor of online advertising compared to
one decade ago on the precision of GeoIP leveraging other forms of advertising is its capacity to perform fine-
on a large scale ground-truth dataset. We assess grained ad targeting campaigns based on audiences defined
the performance of GeoIP for several metrics, and by the targeted users’ demography (e.g., age and gender),
quantify their best theoretical performance. preferences and interests (e.g., sports, restaurants, etc.), and
• We analyze the effectiveness of GeoIP databases location information. Location is then a fundamental param-
when dealing with different use cases. In particular, eter in the definition of online advertising campaigns [5],
we analyze the monetary implications of GeoIP in [6].
the context of online advertising.
2.2.1 A primer on online advertising
The paper is structured as follows: in §2 we discuss the Advertisers configure their campaigns in technological plat-
main motivation of this work: location-based advertising forms referred to as Demand Side Platforms (DSPs), which
and its ecosystem. In §3 we introduce the large scale datasets receive offers of available ad spaces from tens of thousands
we used to evaluate the performance of two largely used of different publishers through Ad Exchanges (AdX).
GeoIP databases, first by assessing their quality under dif- The AdX maps each ad-request from a publisher into
ferent scenarios (§4), then by analyzing their internals (§5), a bid-request message which is sent to several DSPs. Each
and eventually by quantifying their impact on real-world DSP checks if the properties of the ad-request (e.g., user’s
online advertising campaigns (§6). We finally position our demographic, interests and location) match the configura-
work in the state-of-the-art in §7 before concluding in §8. tion parameters of any of its campaigns. If so, the DSP
IEEE TRANSACTIONS ON MOBILE COMPUTING 3

returns a bid-response, including the price it is willing to GeoIP based on our dataset. These values further corrobo-
pay for the offered ad space. The AdX runs a real-time rate the fact that GeoIP is the most common technology for
auction process based on the received bid-responses and providing location information in online advertising.
selects the winning DSP, which will handle the delivery of If an ad-request does not include a location context,
the ad impression to the user. depending on the kind of ad campaign, it will be unlikely
The ad delivery process is (obviously) subject to a mon- to find a matching user. Hence, location is definitely a very
etary transaction. The most common pricing schemes in sensitive parameter for the efficiency of an ad campaign,
online advertising are CPM (Cost per one thousand impres- which may range from coarser levels (i.e., country) to very
sions) and CPC (Cost Per each click on an ad impression). fine ones, targeting users at a zip code level.
Note that CPM and CPC are metrics that are known a
posteriori, once the campaign is finished. A proxy metric 2.3 Ground-truth location data
for the cost of an ad impression is the bid floor. This is
To achieve our goal of assessing the accuracy of GeoIP
a variable in the bid-requests that indicates the minimum
location data and its impact on online advertising, we
bidding price accepted by the publisher offering the ad
had to resort to a data source that provides high-precision
space.
geolocation information for an extremely high volume of
users.
2.2.2 Location data sources Multiple location providers collect high-precision lo-
DSPs have access to the location associated to an ad space cation information from users. Some examples are Safe-
through the location information embedded in bid-requests graph [21], Cuebiq [22], Foursquare [23], and Tamoco [24] to
from three possible data sources [19]. name a few. These providers use different techniques to ob-
- User: The location data is provided by the user and em- tain accurate location information from users, as described
bedded in the ad-request. For instance, location information next:
(e.g., an address) provided by the user through a registration - Embedded SDKs in mobile apps: The location provider
form. This type of location appears rarely in bid-requests. agrees to include its SDK in the mobile app(s) of a given
- GPS/Location Services: This type of data is expected to app developer. This SDK leverages the permission granted
provide high-precision, and in practice, it should directly by the end users to the “host” application and collects the
come from the positioning device of the user, offering GPS GPS location information from the device, as well as other
precision. Given the high-precision of the data, bid-requests, parameters, including the IP address.
including this type of location data, are expected to have a - Check-ins: The user proactively registers a check-in at a
higher starting bid price for the auction. specific venue (e.g., restaurants, coffee shops, etc.) when
- IP address: An important number of ad-requests leave it happens (usually to contextualize posts on online plat-
the user device without any location information. Due to forms), the accurate location of the venue is well-known,
the importance of location in online advertising, it is com- and thus users can be located with high-precision and
mon that one of the intermediaries in the ecosystem (e.g., with total transparency to them, as they are consciously
the AdX) enriches the ad-request or its correspondent bid- interacting with the app to provide such information.
request with location information based on the IP address of In this paper, we use a dataset from a location provider
the device. To this end, they use GeoIP databases described distributing an embedded SDK in mobile apps (see details
above. in §3) as our ground-truth information for the precise posi-
To understand the importance of GeoIP in the online tioning of end users.
advertising ecosystem, we have computed the fraction of
daily bid-requests including a GeoIP, GPS or unavailable 3 DATASETS
location received by TAPTAP Digital [20], a mid-size DSP This section describes the datasets and the evaluation sce-
(See details in §3), in its bid stream (i.e., the bid-requests narios we use in the remainder of the paper. In our study, we
flow). In particular, we have measured this metric for the limit the analysis to three major European countries (Spain,
bid-requests of the three countries analyzed in this paper France, and Great Britain) where online advertising presents
(Spain, France, and Great Britain) during a period of 16 a strong penetration and for which we have a good coverage
days. The results show that the average fraction of daily bid- in our datasets.
requests across the considered countries, including GeoIP,
GPS or unavailable location data is 52%, 18% and 30%,
respectively. In particular, 48,0%, 50,8%, and 57,3% of the 3.1 GeoIP Databases
bid requests for Spain, France, and Great Britain, include We leverage two of the most widely used GeoIP databases to
GeoIP location information, respectively. It is important to analyze the performance of the GeoIP location technology.
remark that most DSPs process bid requests with unavail- We keep the name of these providers anonymous since
able location data. They extract the IP address of the device our research aims not to scrutinize specific providers but
from the bid request and obtain an associated location from rather assess the performance of the GeoIP in the context of
a GeoIP database. This location data assignation technique online advertising. In the rest of the paper, we name these
allowS DSPs to effectively have a location for all received datasets that refer to these GeoIP databases as GeoIP-DB-A
bid requests. In summary, roughly half of the bid-requests and GeoIP-DB-B , respectively. Both providers offer their
(and up to 80% in those DSPs using the described location database as commercial products, have wide coverage in
data assignation technique) include locations extracted from the three considered countries, and update their database
IEEE TRANSACTIONS ON MOBILE COMPUTING 4

weekly. We collected regular snapshots of such databases Spain France Great Britain
to check the consistency of the data along the time. For in- Level 1 Post Code Post Code Post Code
Level 2 City/Municipality Commune Local Authority District
stance, GeoIP-DB-A includes all the IP-location samples be- Level 3 Province Department County / Region
tween July-2020 and May-2021. For GeoIP-DB-B , instead, Level 4 Autonomous Community Region Country
Level 5 Country Country Kingdom
we gathered data for the period April-2021 and May-2021.
These databases include all the information needed to match
TABLE 1: Administrative Levels considered in each country.
the IP address to a position and other side information such
as the kind of access technology associated with a given IP.
GT-DB in the remainder of the paper. As we report in
3.2 Bid stream dataset §4.7, the results obtained with the ground-truth data from
In this work, we measure the impact of the accuracy of our provider are aligned with those reported by a major
GeoIP for location-based online advertising by analyzing GeoIP Database provider. We believe that this represents a
real bid-request flows (a.k.a. bid stream) gathered from the significant hint about the quality of the used ground-truth
Sonata DSP [25] operated by TAPTAP Digital [20], a digital data.
marketing company operating in 15 countries. Sonata is
a mid-size DSP whose bid stream includes a large-scale
4 G EO IP PERFORMANCE
sample of bid-requests generated from Spain, France, and
Great Britain. In particular, we processed the bid stream This section evaluates the performance obtained by GeoIP
collected by Sonata between 1-May-2021 and 17-May-2021, under several scenarios relevant to the online advertising
which includes an average number of daily bid-requests of market. We present the overall methodology implemented
257.6M, 64.5M, and 54.1M for ES, FR, and GB, respectively. to compute GeoIP performance in §4.1, before analyzing the
While a bid-request may include several user context related different results in the following subsections.
features, we only process the fields that are relevant for our
study, namely: <timestamp; IP address; Location
4.1 Methodology
Source; lat,long>. The location source field corre-
sponds to those defined in §2.2.2: GPS, GeoIP, or User, or We benchmark GeoIP-DB-A and GeoIP-DB-B using as ref-
unavailable in case no source is reported. Note that for the erence the GT-DB database, which provides high-accuracy
analysis we only select the GPS information, as we can use location for several millions of users and precise time infor-
it as ground-truth information. mation that allows us to compare the ground-truth samples
to the proper instance of the GeoIP databases.
By joining GT-DB with GeoIP-DB-A and GeoIP-DB-B
3.3 (Ground-truth) GPS location data
we obtain two latitude, longitude pairs for the same IP ad-
To validate the performance of GeoIP, we use a dataset dress at a specific time, one belonging to GT-DB (used as
from a location provider distributing an embedded SDK in ground-truth, posGT ) and the other belonging to the GeoIP
mobile apps.1 This dataset reports GPS location coordinates, generated instance, posIP . Thus, for all the IP addresses
which we consider as the reference ground-truth position of the GT-DB database we compute the distance between
for the end users. This location data provider operates in posGT and posIP using the Haversine distance [26] formula,
more than 15 international markets including Spain, France, which yields the distance between any latitude,longitude
Italy, Great Britain, US, Mexico, Argentina, Colombia, South pairs on the earth. Formally,
Africa, etc. Its SDK is embedded in dozens of applications,
including popular applications such as weather, news or E = hav (posGT , posIP ) (1)
radio apps. It offers a coverage of at least 5% of the pop-
We used this approach in the rest of this section for
ulation in the main markets where it operates. Finally, our
assessing the databases’ Precision, which is a metric that
provider’s location data is used by customers across differ-
evaluates the pure distance.
ent sectors such as online advertising, retail, e-commerce,
However, ad campaigns usually include specific location
real state and financial services, among others.
targets that often correspond to concrete administrative
In particular, this dataset includes the following data
boundaries, such as countries, regions, cities, or zip codes.
tuple per location event: <timestamp; lat,long; IP
Therefore, in the context of online advertising, the perfor-
address; carrier>. The dataset spans a period of 30
mance of a GeoIP service should be measured by its capacity
days (from 1-Sep-20 to 30-Sep-20) and provides a very reli-
to locate users within the targeted administrative region
able snapshot of the mobile users. On average, the number
properly. We refer to this metric as Accuracy in this paper.
of daily location samples is 31M, 16M, and 20M for ES, FR,
For the accuracy analysis, we used the Shapefiles avail-
and GB, respectively. In total, we have for the three countries
able on the open data portals [27], [28] of the different
more than 2.05B data samples for the considered period. To
countries we analyzed and extracted the geographical extent
the best of the authors knowledge, this is the largest ground-
information related to the different administrative regions.
truth dataset ever used for analyzing the performance of
In order to increase the scalability of this analysis, we
GeoIP databases, increasing in several orders of magnitude
divided the space into a fixed grid using the Uber H3 [29]
the datasets used in previous studies. We refer to it as
geographical spatial index, to transform geographical joins
1. The name of this provider is kept anonymous due to its express into standard joins.
request. We formally define the accuracy metric A as follows:
IEEE TRANSACTIONS ON MOBILE COMPUTING 5

50

57.5

42

Density 48 Density
10000 Density
1000 10000
55.0
40

lat
lat

lat
100 100 100

10
46
1
1 1

38
52.5

44

36

−5 0 50.0
lon −5 0 5 −7.5 −5.0 −2.5 0.0
lon lon

(a) Spain (b) France (c) Great Britain

Fig. 1: Number of anchor points in the analyzed countries. Brighter colors indicate a larger concentration of anchors. (Figure
best viewed in colors).

Spain France GB most densely populated areas of Spain, France, and Great
# IP ranges 139687 399500 1051937 Britain, such as the capitals and the most populous cities
# anchor points 5288 16367 10448 (e.g., Barcelona, Marseille, and the Liverpool–Manchester
Reuse factor 26.41 24.40 100.68
Megalopolis). In contrast, they present much lower reso-
lution in rural areas such as Castilla-La Mancha region
TABLE 2: Extent of the GeoIP-DB-A database.
in Spain, the Massif Central in France, and the Scottish
Highlands. As we quantify in §4.4, the lack of an anchor
point for these zones introduces a large error in the location
posIP ∈ R | posGT ∈ R estimation for the (fewer) users located there.
A= (2) Both GeoIP-DB-A and GeoIP-DB-B are periodically
posIP ∈ R
updated to account for movement among IP ranges and
where R is the targeted spatial region associated to e.g., refine the location estimation according to their algorithm.
an administrative division. Our accuracy analysis considers However, we did not notice any substantial deviation in
5 different administrative levels from smaller (Level 1) to the computed precision over time. Considering a 30 days
larger (Level 5) size as reported in Tab. 12 . time window, the median daily error recorded from the two
databases only shown a variance of 1.34 m, 2.51 m, and
4.2 Space and Time variability 11.64 m for Spain, France, and Great Britain, respectively3 .
Before analyzing the performance of GeoIP, we discuss For this reason, unless otherwise stated, in the rest of the
in this subsection some overall statistics of the analyzed paper, we analyze a time window of 30 days without
GeoIP datasets. As introduced in §2, location information distinguishing between weekend, weekdays, day or night,
is actually inferred based on IP prefixes rather than IP as the time dynamics involved in the update process are
addresses (i.e., contiguous IP addresses usually share the probably longer.
same position).
We report in Table 2 the GT-DB extent in the three coun- 4.3 Global cross-country comparison
tries under study, obtained by performing an exhaustive 4.3.1 Precision
search on the entire IP addresses space. Besides the number We first study the overall precision attained by
of different ranges and anchor points, we also compute the GeoIP-DB-A and GeoIP-DB-B in the reference countries
reuse factor, i.e., the number of IP ranges that are mapped by showing the CDF of E in Fig. 3. The precision distribu-
to the same position. tion shows poor behavior in the three countries, and the
The analysis of the reuse factor shows a good correlation two explored GeoIP databases, which shows very similar
with the average population density in the specific coun- results. For instance, the median error for GeoIP-DB-A
tries: Spain and France, with a population density of 92.76 (GeoIP-DB-B ) in Spain, France, and Great Britain is
and 123.28 persons per Km2 , present a reuse factor around 14.01 Km (14.91 Km), 13.61 Km (14.56 Km), 15.70 Km
25 (26.41 and 24.40, respectively). Great Britain, instead, has (18.9 Km), respectively. In addition, there are only a few
a much higher population density (279.95 persons per Km2 ) samples with an error below 1Km (at best 12.1% for
that is reflected by a higher reuse factor as well, 100.68. GeoIP-DB-A in France and 10.6% for GeoIP-DB-B in
Fig. 1 depicts the spatial landscape of the |posIP | set, Spain), while the percentage of samples with a very low
which reports a similar conclusion. The algorithms imple- precision beyond 100 Km grows up to 24% in the best case
mented by the GeoIP-DB-A database accurately match the (in Great Britain, for GeoIP-DB-B ).
2. Note that Levels 1 and 2 are not always hierarchical. For instance, 3. This corroborates the fact that providers are constantly improving
there are some zip codes in rural areas in Spain that include several their records, as the average deviation of the reported IP prefixes
villages. position is much higher, as reported by [18]
IEEE TRANSACTIONS ON MOBILE COMPUTING 6

GeoIP-DB-A GeoIP-DB-B GeoIP-DB-A GeoIP-DB-B


100% 100%
ES ES
80% 80% FR

Empirical CDF
FR
GB
60% GB 60%

40% 40%

20% 20%

0% 0%
Level 1 Level 2 Level 3 Level 4 Level 5 Level 1 Level 2 Level 3 Level 4 Level 5 100m 1Km 10Km 100Km 1000Km 100m 1Km 10Km 100Km 1000Km
 

Fig. 2: Accuracy by administrative regions


Fig. 3: Precision across countries

Globally, we notice two main differences in terms of


for different businesses. For instance, should advertisers
E among the countries: while Spain and France follow a expect the same level of location precision (accuracy) on
similar curve, the Great Britain case yields a lower precision
the delivered ads from campaigns targeting urban areas
when dealing with shorter E for both databases, compensat-
vs. rural areas?, how difficult is pinpointing the location
ing with a lower percentage of samples that are associated
of a fraudulent use of a credit card when it is committed
with very high values of E . This is quite remarkable when
from an urban vs. a rural area? Understanding the precision
looking at Fig. 1c, which showcases large areas without any
(accuracy) offered by GeoIP across areas with different
anchor point, showing how the two GeoIP providers can
urbanization levels is key to answer the previous questions.
target the potential audience better in GB.
Thus, following the classification [30] provided by the EU
countries to distinguish between urban, semi-urban, and
4.3.2 Accuracy rural areas, we categorize E depending on whether posGT
When translating the achieved precision E into the accuracy is in one of the previously mentioned areas.
A, the possible misplacement has an impact not only de- The results in Fig. 4 show the precision achieved per
pending on how much it is, but also on where is committed. country and urbanization level. As in the global country
An error of 5Km between the real and the estimated position analysis (see §4.3.1), Spain and France show similar be-
may be tolerable if the goal is to locate a user within their havior in precision. First, urban areas present a precision
province, while it could be too much if the target is a zip ≤10Km for 60% of the samples in both countries. For the
code. same portion of samples, the precision is close to 100Km in
Fig. 2 shows the accuracy results for the two Geolocation both cases for rural areas. It is also interesting to denote that
IP databases across Spain, France, and Great Britain, and the the portion of samples with very high precision (P < 2 Kms)
five administrative regions considered in our analysis (see is higher in semi-urban areas than in urban areas for these
Tab. 1). First, as expected, the accuracy grows as the size of countries. Finally, it is worth mentioning that Spain exhibits
the considered region increases in all cases. Except for the a higher spread than France between urban and rural areas.
case of Level 5 (country) where users are correctly located, In contrast to France and Spain, Great Britain shows a
we observe a quite poor behavior in the remaining levels, consistent trend in the E distribution, with a constant 25%
which is rather similar in both GeoIP databases. For Levels gap between rural and urban areas for all the considered
from 4 to 1, we roughly observe that (at least) 33%, 40%, distances.
70%, and 80% of the location samples failed to be located in As we observe in Fig. 1, the anchor point distribution is
the correct administrative region, respectively. In addition, clearly biased towards more densely populated area, a bias
although the E distribution is similar across countries, we that is intrinsically also present in the GT-DB dataset. Hence,
observe relevant impairments when dealing with accuracy. to further understand the achieved precision depending
For instance, while GeoIP-DB-A can achieve the highest on the ground-truth location of the users, we compute the
accuracy in France for Level 1 regions (26%) and the lowest correlation between the achieved precision and the number
for Great Britain (6%) their roles are swapped when the task of available anchor points in a given area, using the grid we
is to locate users within the Level 2 boundaries (22% for employ for the accuracy computation.
France and 38% for Great Britain, respectively). That is, for every cell in the tessellation we created for
As the achieved E and A by GeoIP-DB-A and the three countries, we first compute the median P and
GeoIP-DB-B are almost equal, in the remainder of the sec- the count of available anchor points (both on a logarithmic
tion we focus on the analysis of the performance achieved scale), and correlate these two variables with the Pearson’s
by GeoIP-DB-A only. Also, for the accuracy evaluation, we R coefficient. The achieved values (-0.44, 0.53, and -0.57
limit the discussion up to Level 4, as Level 5 yields full for Spain, France, and Great Britain respectively) show a
accuracy. very high negative correlation between them: increasing
the number of anchor points in a given area (a common
circumstance when moving from Rural to Semi-Urban, and
4.4 The impact of the urbanization level
from Semi-Urban to Urban) can very likely correspond to a
4.4.1 Precision precision improvement of one order of magnitude.
As we anticipated with the discussion of Fig. 1, the perfor-
mance achieved by GeoIP could be quite uneven depending 4.4.2 Accuracy
on the location of the real users, because the anchor points The accuracy A split across the different administrative
distribution targets most densely populated areas. Also, the levels is presented in Fig. 5. Spain and Great Britain show
urbanization level can be a very important factor to consider significant differences between urban and rural areas in all
IEEE TRANSACTIONS ON MOBILE COMPUTING 7

Spain France Great Britain


100%
Urban Urban Urban

Empirical CDF
80%
Semi-urban Semi-urban Semi-urban
60% Rural Rural Rural
40%

20%

0%
100m 1Km 10Km 100Km 1000Km 100m 1Km 10Km 100Km 1000Km 100m 1Km 10Km 100Km 1000Km
  
Fig. 4: Precision breakdown per urbanization level
Spain France Great Britain
100%
Urban Urban Urban
80% Semi-urban Semi-urban Semi-urban
60% Rural Rural Rural

40%

20%

0%
Level 1 Level 2 Level 3 Level 4 Level 1 Level 2 Level 3 Level 4 Level 1 Level 2 Level 3 Level 4

Fig. 5: Accuracy by administrative regions and level of urbanization (dashed lines represent the overall accuracy).

administrative levels, except Level 1 (i.e., zip code). The 4.5.2 Accuracy
Spanish case, in particular, showcases very large differences It is indeed remarkable that not even the use of the WiFi
between A measured in urban areas and rural areas: for technology yields good results for the most challenging
the Level 2 we can observe a dramatic drop from 47% to scenarios: the best case (Level 2 in GB) only achieves A
11%, further corroborating the considerations done in §4.3.2. equal to 52.7%, while in France this value drops to 29.0%
Contrarily, France presents comparable results among the for the same administrative level. Fig. 7 confirms the pro-
urbanization degrees, being the least unequal of the an- nounced unreliability of IP-based geolocation for cellular
alyzed countries. This effect may be because of the more access technologies, with A that are often below 10% for
uniform spread of anchor points across the country. the most challenging scenarios and around 50% for the least
ones (only the Level 4 in GB seems to be well mapped).
4.5 The influence of the access technology
4.5.1 Precision 4.6 The variation across different ISPs
Regardless of the type of devices that are used to gain 4.6.1 Precision
access to the Internet (e.g., a desktop PC, a laptop, or a The algorithms employed by GeoIP-DB-A to map IP ad-
mobile phone), a factor that likely affects on the precision dresses to a posIP may be computed using active la-
of GeoIP services is the access network technology. It seems tency measures between known milestones on the Internet.
obvious that pinpointing the location of a mobile device Hence, the number and the internal configuration of prefixes
will generate larger errors than estimating the position of for the different ISPs could have a relevant impact on
a device connected through a broadband fixed-access tech- E . We assess this by further split the precision yield by
nology (e.g., Fiber, ADSL, or WiFi). users connected through their mobile interface (discussed
Databases such as GeoIP-DB-A and GeoIP-DB-B usu- in §4.5) into the different ISPs. For this purpose, we use the
ally offer, for user targeting purposes, also the kind of access information available in the GT-DB database, which collects
technology associated to a given IP address. However, our the carrier name displayed in the mobile terminal. Fig. 8
ground-truth database GT-DB is built using data coming shows the achieved E for the four most relevant carriers in
from mobile terminals, which likely only have two types each of the considered countries.
of access network technologies: WiFi and mobile. We observe different notable behavior in the yielded
Fig. 6 shows E according to the connection interface E for all the countries. In Spain, there is almost an order
inferred from GeoIP-DB-A for Spain, France, and Great of magnitude difference for the median E between the
Britain. The behavior is consistent across the countries. As least and the most precise ISPs. For the French case, this
expected, cellular connections lead to much larger errors difference is even broader, with SFR as the best option
than WiFi. For most percentiles in the distribution, the gap with a remarkably high median precision of 4.5Km, and
exceeds one order of magnitude in all the countries. Even Orange as the worst case for GeoIP location purposes with
more, the number of location samples obtained through E =173.2Km in median. Considering that both operators are
the mobile network where E ≤1Km are anecdotal (3% for the most popular in France, according to their popularity
the best case, in France). The extremely bad performance in the GT-DB dataset, we ascribe this difference to a worse
of cellular connections is not compensated by WiFi. Even performance of the position matching algorithm used by
for the best case, in Spain, only 17% of the users could GeoIP-DB-A . Finally, the ISP choice in GB has the lowest
be located within 1Km, corroborating that this technology, impact among the analyzed countries, with a close gap be-
at least for the analyzed databases, is a no-go for precise- tween the median E of the analyzed operators. In a nutshell,
location targeted advertising. despite few cases, for all the analyzed operators, the 25th
IEEE TRANSACTIONS ON MOBILE COMPUTING 8

Spain France Great Britain


100%
WiFi WiFi WiFi

Empirical CDF
80%
Mobile Mobile Mobile
60%

40%

20%

0%
100m 1Km 10Km 100Km 1000Km 100m 1Km 10Km 100Km 1000Km 100m 1Km 10Km 100Km 1000Km
  

Fig. 6: Precision for different access technology: fixed vs. cellular.

Spain France Great Britain


100%
WiFi WiFi WiFi
75% Mobile Mobile Mobile

50%

25%

0%
Level 1 Level 2 Level 3 Level 4 Level 1 Level 2 Level 3 Level 4 Level 1 Level 2 Level 3 Level 4

Fig. 7: Accuracy by administrative regions and connection type. (dashed lines represent the overall accuracy)

Spain France Great Britain


1000Km

100Km

10Km

1Km

Yoigo Vodafone Movistar Orange Orange Bouygues NRJ SFR Three Sky TMobile O2

Fig. 8: Precision across different ISPs

Spain France Great Britain


100%
Yoigo Orange Three
80% Vodafone Bouygues Sky
60% Movistar NRJ TMobile
Orange SFR O2

40%

20%

0%
Level 1 Level 2 Level 3 Level 4 Level 1 Level 2 Level 3 Level 4 Level 1 Level 2 Level 3 Level 4

Fig. 9: Accuracy by administrative regions and ISPs (dashed lines represent the overall accuracy)

percentile of E is above 10Km. This confirms that making a mix of active and passive measurements that blackbox the
fine-grain selection of GeoIP locations per operator cannot core networks and the interconnections of ISPs.
be used for precise location-targeted advertising or other
similar services.
4.7 Providers’ reported performance
Most GeoIP Databases provide high-level reporting about
4.6.2 Accuracy
the offered precision and/or accuracy except for Maxmind,
We measure A for the different carriers in Fig 9. As expected, that offers a detailed reporting [31]. Despite Maxmind’s
the carriers that yield a lower median E generally translate report does not cover as many dimensions as we cover in
into a higher A. However, the quite large differences in our research, it offers precision data at several thresholds
the median precision observed in France do not translate (10, 25, 50, 100, and 250km) and accuracy values at two ad-
into very large differences in terms of accuracy, while the ministrative levels (zip code and city) for around a hundred
less dispersed situation in GB yields to a quite diverse countries. An important difference with Maxmind is that we
performance for some carriers, especially for the Level 2 provide a detailed description of our methodology to study
divisions. Instead, the differences in Spain in terms of E have the precision and accuracy of GeoIP, whereas Maxmind
a more direct relationship to A, as GeoIP-DB-A reaches does not disclose their methodology.
the lowest values for the Yoigo operator. This hints at the We have compared Maxmind’s and our outcome for
complexity of the task GeoIP databases perform: a complex the three analyzed countries in this paper. The results are,
IEEE TRANSACTIONS ON MOBILE COMPUTING 9

100%
ES ES FR GB
ES

Empirical CDF
80%
FR 100%
60% GB
FR


40%
50%

GB 20%

0% 0%
0% 5% 10% 15% 20% 100m 1Km 10Km 100Km Level 1 Level 2 Level 3 Level 4
 

Fig. 10: A for GeoIP-DB-A to the best possible anchor point (left), E (center) and A (right) achieved by the best possible
scenario.

in general, well-aligned. This means our study is the first answer this question by computing the fraction of users
academic validation of the correctness of the precision and that are mapped (with both posGT and posIP within the
accuracy results reported by Maxmind. same Voronoi cell. If this happens, then it means that the
selected posIP is actually the best possible one among the
set of available anchor points. If not, it means that there was
5 D ISSECTING THE G EO IP INTERNALS
an anchor point closer to posGT than posIP that was not
In this section, we go one step further and try to analyze selected. The results of this analysis are shown in the left
the different components that may contribute to the lack part of Fig. 10
of performance analyzed in §4. Both GeoIP-DB-A and In this situation, GeoIP-DB-A cannot go beyond an
GeoIP-DB-B do not disclose the algorithm and techniques overall accuracy above 20% in the best case (Spain), i.e.,
they use to provide the mapping between IP and location, more than 80% of the IP addresses are not mapped to the
although we know from generic statements published on best location. This is even more dramatic for the GB case,
the vendor websites and from the literature [] that they are where just 6% of the addresses are mapped to the best
likely using a mixture of active and passive measurement, anchor points.
possibly combined with machine learning technologies and This corroborates the complexity of the tasks that GeoIP
datasets close to GT-DB . providers face: while they can quite effectively map more
While proposing improved solutions for GeoIP is out densely populated areas with more anchor points, the users’
of the scope of this paper, in this section we propose a geographical spread (especially for the ones using the mo-
methodology to discover the upper bound of the GeoIP bile network, as we analyze in §4.5.1) makes very difficult
performance, a metric that we will leverage for the online to condensate IP ranges into the best possible anchor point.
advertising case study discussed in §6.

5.3 Anchor points optimal granularity


5.1 Methodology
In this experiment, we go one step forward and analyze
Mapping an IP address to its posIP as performed by the
a what if scenario in which we assume that the mapping
two databases, is a function internally composed by two
to the posIP is dynamically performed at each query over
sub-tasks; i) create a grid of possible anchor points, based
a fixed grid of predefined anchor points (i.e., the ones
on passive and active measurements, and ii) assign each IP
already present in GeoIP-DB-A ) selecting the best pos-
prefix to such grid. While these two operations are likely
sible anchor point, i.e., the one with the least euclidean
to be conducted together, thus creating a “‘moving” map of
distance to posGT . This approach allows us to remove any
the anchor points which may have considerable drifts, as
possible error due to the end users movement, so that we
reported in [18], in this section, we analyze the precision of
can evaluate the quality of the measurements performed to
the technology by splitting these two tasks into two inde-
map a given prefix onto a posIP . That is, the GeoIP-DB-A
pendent phases, assuming that the mapping is performed
developers could find a prefix whose unique characteris-
on top of a set of already defined anchor points. While this
tics are uniquely identified through their algorithms by a
operation is not feasible for GeoIP-DB-A and GeoIP-DB-B
lat,long pair, showing the potential of their system.
, it serves as a best case for GeoIP.
We evaluate this fact by repeating the analysis we per-
To this end, we take the set of posIP shown in Fig. 1
formed in §4.3, assessing the improvement in terms of E
in each of the countries and compute its Voronoi tessel-
and A. Results are shown in Fig. 10. Improvements are
lation [32], actually defining the areas in which the errors
dramatic: 99.9% of all our samples fall below the 10Km E
are minimized. We measure the performance of the GeoIP
mark, a remarkable precision that could be a game-changer
databases using the same metrics discussed in §4.1: E and
for many applications, including online advertising. Indeed,
A. We next analyze this scenario under different configura-
tions.
A also grows towards the highest accuracy levels, with
the precision for Level 1 (the most challenging one) that
grows from around 10% (see Fig. 2) to more than 60%, being
5.2 Internal accuracy practically error free at Levels 3 and 4.
The first question that we want to answer is how frequently These results show that i) GeoIP-DB-A technology can
the mapped posIP is actually the best possible one. We accurately measure where users actually are at a coarser
IEEE TRANSACTIONS ON MOBILE COMPUTING 10

granularity, as the GT-DB population always has an anchor Then, by using the accuracy obtained with the GeoIP and
point within 10 Km in the vast majority of cases, but ii) end GPS technologies (AIP and AGP S ) we can calculate the
users micro-mobility largely spoils the achieved granularity. Effective Cost (φ), that is defined as the normalized cost of
We claim that if GeoIP providers were able to account for correctly delivering an ad to a user located in the targeted
this micro-mobility in their mapping, the impact on the final area, and is calculated as follows:
applications such as online advertising would be huge, as
we discuss in the following section. ∗
CIP ∗
CGP S
φIP = φGP S = (4)
AIP AGP S
6 T HE IMPACT OF G EO IP ON ONLINE ADVERTIS -
Hence, the best expenditure strategy is defined by the
ING
min(φIP , φGP S ). For a given location-targeted campaign,
This section aims to estimate what is the impact of using by estimating the accuracy and cost for the two technologies,
GeoIP locations on online advertising campaigns. While the a DSP can steer its strategy according to this rule. Later
extensiveness of the GT-DB dataset used in §4 allowed us in this section, we empirically evaluate A as well as φ for
to understand the performance of the GeoIP overall, that different real-world ad campaign scenarios.
dataset may include all kinds of users, not only the ones Note that our methodology is not considering the poten-
which are actively targeted by ad providers. To this end, we tial economic side benefits/harms of showing ads to users
use our bid stream dataset to generate the ground-truth data outside the targeted area. For instance, a potential benefit
to guarantee that all the location samples are actually linked might be expanding the knowledge of a new brand to neigh-
to users targeted by online advertising campaigns. boring areas of the specific location target. Instead, potential
To measure the performance of a campaign, we rely harm might be bothering users with ads uninteresting to
on the accuracy (A) measure described in §4. However, them, which in addition introduces a waste of resources
to understand the best buying strategy from an economic (e.g., bandwidth [33] and battery).
point of view, in addition to the accuracy, we also have to
consider the monetary cost C associated with different types 6.1.2 Bid stream ground-truth dataset
of bid-requests, i.e., including GeoIP or GPS information. In
In order to precisely measure AIP , we have to select
order to isolate the monetary impact that the type of location
a set of bid-requests for which we know the end users
data has on advertising campaigns, we need to factor out
ground-truth location. We do this by keeping exclusively
other elements affecting the economic performance of a
the bid-requests that include a GPS location (See §2.2),
campaign. To this end, we make the following assumptions:
hence creating a reliable association between the users’
i) the bid stream has been filtered so that the available bid-
IP addresses and their position posGT (we assume that
requests already meet the goals of the campaign in terms
AGP S = 100%). Then, we retrieve the location infor-
of the targeted audience; ii) there is sufficient ad inventory
mation from the Geolocation databases using the IP ad-
of each type of location data (GeoIP vs. GPS) to meet the
dress, obtaining posIP −A and posIP −B . Our ground-truth
defined objective of the campaign in terms of the number
dataset includes the following information: <timestamp;
of ad impressions delivered, so that the advertiser/DSP can
IP address;posGT ;posIP −A ;posIP −B >.
freely choose to buy any combination of GeoIP and GPS
Moreover, in order to retrieve the value of CGP S and CIP
bid-requests to meet such objective.
we rely on the bid floor information available in our bid
6.1 Methodology stream dataset. Note that for estimating φGP S and φIP the
relevant information is not the absolute price value for GPS
6.1.1 Best bidding strategy and GeoIP ad impressions but the relative relation between
In this section, we model the best strategy that could be them (CGP∗ ∗
S and CIP ). Hence, our assumption here is that
followed by an advertiser to issue a specific targeting ad the ratio of GeoIP and GPS price value is well captured by
campaign, based on the characteristics of the location tech- the ratio of their corresponding bid floor prices.
nology and their associated cost. The goal of the advertiser
is to maximize the value for money for every ad campaign. 6.1.3 Simulation set-up
Let us introduce this in a toy example, where the GPS Our goal is to create a simulation set-up that mimics real
accuracy is 100% by definition and the GeoIP accuracy of location-targeted ad campaigns. For this purpose, we follow
the bid requests in this campaign is 20% (i.e., the location the guidance from industry players, such as TAPTAP Digi-
of the targeted user matches the location defined by the ad tal, to set up realistic values for our simulation parameters
campaign once every five times). In this case, if the average as described next:
cost of the GPS bid request is twice the cost of the GeoIP Campaign duration: We set up a campaign duration be-
bid request, it would be more economically effective to buy tween 1 and 2 weeks, which is a very common time frame
GPS bid requests. However, if the cost of GPS bid request used by advertisers for their ad campaigns.
was 6 times the cost of GeoIP bid requests, it would be more Win rate: This parameter defines the fraction of won bid-
economically effective to buy the latter. requests out of all the bids run by a DSP in an ad campaign.
To model this behaviour, we introduce the normalized
∗ ∗ We configure a win rate range between 20 and 40% in our
cost related to each technology (CIP and CGP S ), computed reference ad campaigns.
as:
Ad impression cost: We use the bid floor as a proxy metric
∗ CIP ∗ CGP S to estimate the cost of ad impressions. We have computed
CIP = CGP S = (3)
min (CIP , CGP S ) min (CIP , CGP S ) the CIP∗ ∗
and CGP S defined above as the median value of
IEEE TRANSACTIONS ON MOBILE COMPUTING 11

bid floor prices for GeoIP and GPS bid-requests collected GIP quantitatively compares the value increase (decrease)

for Spain, France, and Great Britain across 16 days. CIP is 1 in accuracy for the GeoIP with the increase (decrease) in

for the three countries, whereas CGP S is 1.01, 2.34, and 2.08 their cost. Thus, positive (negative) values of GIP provide a
for Spain, France, and Great Britain, respectively. quantitative reference of the expected order of magnitude
Geographical target: We consider campaigns targeting all improvement (harm) of setting a strategy to buy GeoIP
administrative levels introduced in Table 1 but the coun- instead of GPS bid-requests.
try level. As discussed in §4, GeoIP services have per- Finally, note that we compute AIP , AGPS , φIP , φGP S
fect accuracy in providing the location at country level. and GIP for both: i) the Actual mapping of the IP addresses
Then, it is expected that country level campaigns have an location to the anchor points implemented in GeoIP-DB-A ,
A ≈ 100%. Note that the 4 levels used in our simulations and ii) the Optimal assignment of IP addresses to the closest
(state, province, city, and zip code) are frequently used as anchor point, as discussed in §5.
targeted-locations in online advertising campaigns.
Urbanization level: A major portion of location-targeted 6.2 Results
advertising campaigns focus on urban areas. Then, our
simulations will focus on this type of areas. Note that the We note that the results presented in this section correspond
urbanization level is only meaningful for administrative to GeoIP-DB-A . For the sake of simplicity, we do not report
Levels 1 (zip code) and 2 (city) since we cannot select a the results associated with GeoIP-DB-B which lead to the
province or a state which is entirely urban or rural. same conclusions.
For each of the considered countries (Spain, France, and
Great Britain), we configure 4 campaign models based on 6.2.1 Accuracy
the geographical target and urbanization level: a) Level 4, b) Fig. 11 shows the accuracy from the ad campaigns simula-
Level 3, c) Level 2-Urban, and d) Level 1-Urban. Overall, we tions for the four geographical targets introduced in §6.1.3
have a total of 12 simulation scenarios. For each simulation in Spain, France and Great Britain when we consider the
scenario, we randomly select 5 different targets that meet Actual (left side) or the Optimal (right side) mapping of IP
its criteria, with the exception of Great Britain, which does addresses location to anchor points.
not account for Level 4 as it is in general yielding very high The results of the Actual allocation strategy follow the
accuracy (see Fig.2), hence generating 3 total targets. expected pattern for A: the larger is the geographical target,
Overall, we have 58 different target-locations in our the higher is the accuracy. Using Spain to illustrate this
stimulation set. Finally, for each of the 58 campaigns, we observation: A grows from 5.25% for campaigns targeting
run 3 repetitions where we set up a value of campaign zip codes in urban areas to 58.45% when the campaign
duration and win rate randomly selected from the range resolution is at the state level.
defined above for these parameters. In addition, it is interesting to notice that the accuracy
varies considerably across countries in all the geographical
6.1.4 Campaign execution targets, except for Level 3. Also, the accuracy reported with
We execute the simulated campaigns on the bid stream the GT-DB dataset (See Fig. 2 in §4) shows more evenly
coming from our ground-truth dataset. We filter only the spread behavior across countries. This suggests that the
bid-requests, including a posIP −A or posIP −B location users targeted by online advertising can be a rather skewed
matching the geographical target of the ad campaign in the subset of the overall population that can be reached by high-
selected time period. We just consider a random fraction precision location providers discussed in §4.
of the bids from the obtained subset according to the win When analyzing the Optimal assignment of IP addresses
rate defined for the campaign. The final set of bid-requests to the closest anchor point, we find that it largely outper-
resulting from this process represents the actual set of deliv- forms the Actual allocation irrespective of the geographical
ered ad impressions by the ad campaign. target we consider, as expected. The worst case in the Opti-
mal allocation (A = 73.18%) corresponding to the zip code
6.1.5 Evaluation metrics level in urban areas in Spain is only 20 percentage points
smaller than the best case in Actual allocation algorithm
First, we compute the Accuracy (AIP ) metric to assess the
(A = 93.16%), which comes from the sate level in GB.
impact of GeoIP location data in online advertising. We
In conclusion, the average A for the Optimal allocation
measure the accuracy on the delivered ad impressions as
strategy yields advantages for all geographical resolutions.
the fraction of them whose associated posGT falls within the
If the GeoIP services were capable to approximate this Op-
specific geographical target of the ad campaign. For each
timal performance, advertisers using location-targeted cam-
of the 58 simulated campaigns, we compute the average
paigns would experience a significant improvement in their
AIP across the three performed repetitions. Note that as campaigns’ KPIs without requiring any further investment.
indicated above, AGPS = 100%.
Second, we compute φIP vs. φGP S using the expressions 6.2.2 Optimal budget strategy
defined in Eq. 4 to identify the technology (GeoIP or GPS)
yielding the most economically efficient campaign. Tab. 3 shows the best buying strategy (i.e., buying GeoIP vs.
Third, using the values of φIP and φGP S , we define the GPS bid-requests) to be applied in each of the campaigns
Gain (GIP ) of an ad campaign, as follows: run for the four geographical targets introduced in §6.1.3
for the three countries as a result of comparing φIP and
φGP S in the simulated campaigns. Results are grouped by
 
φGP S
GIP = log (5) target location. For each target location, the table shows the
φGeoIP
IEEE TRANSACTIONS ON MOBILE COMPUTING 12

GeoIP-DB-A GeoIP-DB-Optimal Best Tech. Best Tech.


Country Target Location
100% ES Actual Optimal
FR
GB
ES Level 1 (Urban) (GPS, 15) (GPS, 12), (GeoIP, 3)

50% ES Level 2 (Urban) (GPS, 15) (GPS, 15)


ES Level 3 (GPS, 15) (GPS, 6), (GeoIP, 9)
ES Level 4 (GPS, 15) (GeoIP, 15)
0%
Level 1 Level 2 Level 3 Level 4 Level 1 Level 2 Level 3 Level 4
Urban Urban Urban Urban FR Level 1 (Urban) (GPS, 15) (GeoIP, 15)
FR Level 2 (Urban) (GPS, 15) (GeoIP, 15)
Fig. 11: Accuracy by administrative regions FR Level 3 (GPS, 9), (GeoIP, 6) (GeoIP, 15)
FR Level 4 (GPS, 3), (GeoIP, 12) (GeoIP, 15)
GB Level 1 (Urban) (GPS, 15) (GPS, 3), (GeoIP, 12)
GeoIP-DB-A GeoIP-DB-Optimal
GB Level 2 (Urban) (GPS, 3), (GeoIP, 12) (GeoIP, 15)
0 GB Level 3 (GPS, 6), (GeoIP, 9) (GeoIP, 15)
−1
GB Level 4 (GeoIP, 9) (GeoIP, 9)
IP

−2
ES FR GB
TABLE 3: Best technology (GPS vs. GeoIP) to set the buying
−3
Level 1 Level 2 Level 3 Level 4 Level 1 Level 2 Level 3 Level 4
strategy based on the analysis of φ.
Urban Urban Urban Urban

Fig. 12: Relative economic gain or loss (GIP )


cases where GeoIP offers a perfect accuracy. Instead, in the
configurations that yield a very poor accuracy using GeoIP
number of experiments where the recommended strategy (e.g., Level 1-Urban in France for GeoIP-DB-A ), there is
is buying GPS only or if GeoIP is an economically more a severe economic impact as measured by GIP : the wrong
profitable alternative. strategy of buying only GeoIP bid-requests can worsen the
First, if we consider the Actual allocation of IP prefixes to monetary efficiency by a factor of 10.
anchor points, GeoIP would be the recommended technol- The technologies become comparable just when the ac-
ogy in 30% and 55% of the experiments in France and Great curacy grows to very high values, hence the increase in
Britain, respectively, whereas GPS is the recommended tech- accuracy yielded by the GPS technology is not compensated
nology in all ad campaigns from Spain. This result is a by the higher cost of GPS bid-requests. Only for the Spanish
consequence of a higher cost of GPS bid-requests in France case, where GPS ads are comparatively cheaper than in
and Great Britain, which recommends buying GeoIP over other countries in our dataset, GeoIP is still lagging behind
GPS in certain campaigns, even when the accuracy of GeoIP from a monetary point of view.
is significantly worse. In the case of Spain, the cost of GPS Finally, Fig. 12 also shows GIP in the case of the Opti-
and GeoIP bid-requests is very similar so, the superior mal assignment. Comparatively, with such an improvement,
precision of GPS leads to recommend buying this type of GeoIP would be on par with GPS in 3 cases, and even
bid-requests. yielding a constant monetary gain in 9 of the 12 scenarios.
Second, the number of campaigns where GeoIP becomes
the best buying strategy when we use the Optimal allocation 7 R ELATED W ORK
instead of the Actual one grows from: 0 to 27 in Spain,
The utilization of IP addresses as a proxy for geolocating
from 18 to 60 in France, and from 30 to 51 in Great Britain.
devices has attracted the interest of the research community
This confirms that, if GeoIP providers could further improve
for more than a decade now, showing the importance of the
their technology to achieve a better approximation of users’
topic due to the widespread use of GeoIP solutions in online
positions, as discussed in §5, this technology has a large
advertising, fraud detection or anti-piracy solutions.
potential for further improvements for this application.
A first body of work, which does not directly study the
To conclude our analysis, we study the budgetary im-
performance of GeoIP, analyzed the geographical allocation
provement (harm) imposed by a strategy focused on buying
of IP addresses and IP prefixes from both spatial and tem-
only GeoIP bid-requests against the opposite one that only
poral angles. Gueye et al. [34] reported the difficulty of
focuses on the ones that carry GPS information. To this end,
accurately geolocate an IP address due to the geographic
the main bars in Fig. 12 show the average GIP for each
span of IP addresses blocks. Almost 15 years later, and
considered targeted-location in our ad campaigns, while
with a huge proliferation of the utilization of GeoIP location
the points are the GIP for each specific realization of the
services, these findings seem to remain valid. Padmanabhan
ad campaigns4 . Again, we present results considering the
et al. [35] studies the duration of IPv4 and IPv6 assignments
Actual performance of the GeoIP and the Optimal allocation
to a device through large-scale measurements. The paper
of prefixes to anchor points. Overall, this figure provides a
shows that 75% of mobile devices get IP addresses assigned
more detailed picture of the findings derived from Tab. 3 by
for a duration lower than a day, whereas devices with a
quantifying the advantage (disadvantage) of having an ad
fix connections keep the same IP address for dozens of
buying strategy that is entirely composed by GeoIP.
days typically. An earlier work by Balakrishnan et al. [36]
First of all, the current price differences that favor the
performs a similar measurement study on the stability of IP
cheaper GeoIP against the more expensive GPS bid-requests
address assignment in early 3G mobile networks in US. The
limit the maximum GIP that GeoIP can offer compared
paper reports that the IP addresses used by a mobile device
to GPS to less than an order of magnitude, even in those
could change in few minutes and thus they do not embed
4. As A ≈ 0 for all experiments targeting Level 1-Urban GB in geographical information with enough granularity to imple-
GeoIP-DB-A , the figure does not show this configuration. ment GeoIP solutions based on them. These results support
IEEE TRANSACTIONS ON MOBILE COMPUTING 13

our finding that the error of GeoIP-based locations of IP our study leverages over 2B ground-truth samples with GPS
addresses (or prefixes) using cellular access connections is precision. This allows us to present the most comprehensive
significantly larger that those using fixed connections. study of the GeoIP performance, studying it up to a zip code
The closest literature to our study is formed by studies resolution and covering the impact of several relevant fac-
that analyze the accuracy of GeoIP databases. In one of tors such as the level of urbanization, the access technology
the earliest studies on the topic, Poese et al. [13] use data or the specific ISP.
from an ISP to analyze the performance and accuracy of As a final remark, to the best of the authors’ knowledge,
5 different GeoIP databases. In particular, they find that there is only one company, Location Sciences [42], offering
none of these databases make a good mapping of the actual location data auditing products in the online advertising
IP prefixes used by the ISP. Furthermore, they also map ecosystem. Unfortunately, as all other auditing solutions in
the location of each IP prefix to the location of the Point- online advertising [43], [44], [45] their products are propri-
of-Presence (PoP) where the associated backbone router etary and it is unknown how they operate or which is their
is located. Unfortunately, this location ground-truth might actual performance.
be significantly less accurate than GPS coordinates from a
mobile device as we use in this paper. In an almost parallel 8 C ONCLUSION
study in time, Shavitt at al. [37] compare the performance
To the best of the authors’ knowledge, our study is: 1) the
of 6 GeoIP databases. They use two types of ground-truth
one that provides a deepest understanding of the perfor-
datasets: the geographical location of PoPs and a ground-
mance of GeoIP databases; 2) the first one providing an
truth database, including the location of 25k IP address up
upper bound of the performance these systems may offer
to the level of city. The paper uses the precision as the
and 3) The first one analyzing its impact on the online
studied performance metric. The authors also analyze the
advertising business. These three elements constitute (in our
correlation between the error of different GeoIP databases.
humble opinion) an important contribution to researchers
There are very few papers in the literature using ground-
and practitioners and make our paper novel compared to
truth data based on GPS location information. Triukose et
any other previous study in the context of GeoIP databases.
al. [38] leverage the GPS location provided by a mobile
In this paper, we present an analysis of two GeoIP
app, and assess the error of GeoIP location services using
databases, that are arguably among the most widespread
the IP address of the device. Complementary to our study,
technologies used to locate devices around the entire world,
this paper shows evidence that NATed IP addresses offer a
especially in the context of online advertising. To the best
worse location accuracy than public IP addresses. However,
of our knowledge, our study is: i) the one that provides
this study present an important limitation since the dataset
the deepest understanding of the performance of GeoIP
only include information about devices connected through
databases; ii) the first one providing an upper bound of the
cellular (3G/GPRS) technology. In a similar study, Komosny
performance these systems may offer, and iii) the first one
et al. [39] use 700 mobile devices from which they recover
analyzing its impact on the online advertising business.
the GPS location to construct a ground-truth dataset to
Armed with a dataset of 2B samples that includes a
evaluate the performance of 8 different GeoIP databases.
ground-truth location associated with an IP address, we
While, these studies rely on GPS ground-truth data, their
study the performance of GeoIP databases through sev-
dataset is formed by tens of thousands of location samples
eral unexplored dimensions so far: urban vs. rural areas,
compared to more than 2B samples in our dataset.
access technologies, or ISP providers. Our work revisits
Finally, there are some previous works complementary
the quantitative findings of previous studies regarding the
to ours, which analyze the performance of GeoIP databases
performance issues of this technology and extends them to
in geolocating network infrastructure elements. Instead, we
understand their causes better.
are interested in analyzing the performance in the geoloca-
Thanks to the extensiveness of our data, we can fur-
tion of end users. Gharaibeh et al. [40], use a ground-truth
ther dig into the performance of GeoIP databases, showing
dataset including the city level location of 16.5K router in-
possible causes behind the lack of accuracy and discussing
terface IP addresses, whereas Iordanou et al. [41] focus their
how, under ideal conditions, the overall precision could be
analysis on the location of servers. Both works conclude
improved by two orders of magnitudes.
that GeoIP databases are highly inefficient in geolocating
Finally, we prove that from a budgetary perspective,
network infrastructure elements.
GeoIP may be, in some cases, a better technology for ge-
Our study presents three major contributions in com-
ographically targeted ad campaigns compared to more pre-
parison with the previous literature: 1) we present the first
cise geolocation technologies (i.e., GPS) due to the expected
benchmark analysis about the upper-bound performance
higher cost of the latter. The most efficient technology in
that GeoIP could offer (see §5); 2) To the best of the authors’
economic terms is the one that better balances accuracy and
knowledge, all existing works analyze the GeoIP databases
cost. This is initially a counter-intuitive result since most of
performance in an isolated manner and just briefly mention
the literature in the area mostly focuses on reporting the
which businesses might be affected by the reported inac-
poor location capacity of GeoIP databases.
curacy of GeoIP. Instead, we present, for the first time, a
detailed quantitative analysis of the potential impact of the
extensive use of GeoIP in online advertising, which arguably ACKNOWLEDGEMENTS
represents the most important business where GeoIP is This research received funding from the European Union’s
applied; 3) We present the most thorough study of GeoIP Horizon 2020 innovation action programme under the PIM-
performance in terms of scale and resolution. In particular, CITY project (Grant 871370) and the TESTABLE project
IEEE TRANSACTIONS ON MOBILE COMPUTING 14

(Grant 101019206); the Agencia Estatal de Investigación New York, NY, USA: Association for Computing Machinery, 2001,
(AEI) under the ACHILLES project (Grant PID2019- p. 173–185. [Online]. Available: https://doi.org/10.1145/383059.
383073
104207RB-I00/AEI/10.13039/501100011033); the Spanish [13] I. Poese, S. Uhlig, M. A. Kaafar, B. Donnet, and B. Gueye, “Ip
Ministry of Economic Affairs and Digital Transformation geolocation databases: Unreliable?” ACM SIGCOMM Computer
and the European Union-NextGenerationEU through the Communication Review, vol. 41, no. 2, pp. 53–56, 2011.
UNICO 5G I+D 6G-RIEMANN-FR; the agreement between [14] Maxmind. (2021) GeoIP2 Databases . [Online]. Available:
https://www.maxmind.com/en/geoip2-databases
the Community of Madrid and the Universidad Carlos III
[15] D. Element. (2021) Netacuity Industry-Leading IP Geolocation
de Madrid for the funding of research projects on SARS- Data . [Online]. Available: https://www.digitalelement.com/
CoV-2 and COVID-19 disease, project name ”Multi-source solutions/
and multi-method prediction to support COVID-19 policy [16] IP2Location, “Identify geographical location and proxy by ip
address,” https://www.ip2location.com/, 2021.
decision making”, which was supported with REACT-EU
[17] Neustar, “Ip intelligence,” https://www.home.neustar/security-
funds from the European regional development fund “a intelligence/ip-geopoint, 2021.
way of making Europe; and the TAPTAP-UC3M Chair in [18] M. Gouel, K. Vermeulen, O. Fourmaux, T. Friedman, and R. Bev-
advanced AI and Data Science applied to advertising and erly, “Ip geolocation database stability and implications for net-
work research,” pp. 19–33, 2021.
marketing.
[19] I. T. Lab. (2021) OpenRTB (Real-Time Bidding). [Online].
Available: https://iabtechlab.com/standards/openrtb
[20] Taptap Digital, “Omnichannel advertising and marketing intel-
R EFERENCES ligence powered by location,” https://www.taptapdigital.com/,
[1] A. Daviel, F. Kaegi, and M. Kofahl, “Geographic 2021.
extensions for http transactions,” Working Draft, IETF Sec- [21] Safegraph. (2021) The Source of Truth for Places Data . [Online].
retariat, Internet-Draft draft-daviel-http-geo-header-05, December Available: https://www.safegraph.com/
2007, http://www.ietf.org/internet-drafts/draft-daviel-http-geo- [22] Cuebiq. (2021) Mobility data that fuels growth . [Online].
header-05.txt. [Online]. Available: http://www.ietf.org/internet- Available: https://www.cuebiq.com/
drafts/draft-daviel-http-geo-header-05.txt [23] Foursquare. (2021) Foursquare location data platform . [Online].
[2] W3C. (2018) Geolocation API Specification 2nd Edition. [Online]. Available: https://foursquare.com/
Available: https://www.w3.org/TR/geolocation-API/ [24] Tamoco. (2021) The World’s Smartest Location and Geospatial
[3] M. Fiore, P. Katsikouli, E. Zavou, M. Cunche, F. Fessant, Company . [Online]. Available: https://www.tamoco.com/
D. Le Hello, U. Aivodji, B. Olivier, T. Quertier, and R. Stanica, “Pri- [25] TapTap. (2021) Sonata, Global Platform for Mobile-Centric
vacy in trajectory micro-data publishing: a survey,” Transactions on Audience Engagement . [Online]. Available: https://www.
Data Privacy, vol. 13, pp. 91–149, 2020. sonataplatform.com/
[4] S. Rodriguez Garzon and B. Deva, “Geofencing 2.0: Taking [26] G. V. Brummelen, Heavenly Mathematics: The Forgotten Art of Spher-
location-based notifications to the next level,” in Proceedings of the ical Trigonometry. Princeton University Press, 2013.
2014 ACM International Joint Conference on Pervasive and Ubiquitous [27] U. Kingdom. (2021) Open Data Portal . [Online]. Available:
Computing, ser. UbiComp ’14. New York, NY, USA: Association https://data.gov.uk/
for Computing Machinery, 2014, p. 921–932. [Online]. Available: [28] France. (2021) Open Data Portal . [Online]. Available: https:
https://doi.org/10.1145/2632048.2636093 //www.data.gouv.fr/
[5] IAB, “Mobile Programmatic Playbook,” https://www.iab.com/ [29] Uber. (2021) H3: Hexagonal hierarchical geospatial indexing
wp-content/uploads/2015/05/MobileProgrammaticPlaybook. system . [Online]. Available: https://h3geo.org/
pdf, 2015. [30] Eurostat, “Degree of urbanisation (degurba),” https://ec.europa.
[6] IAB, “Speaking the same language in location-based mar- eu/eurostat/web/degree-of-urbanisation/background, 2018.
keting,” https://www.iab.com/blog/location-based-marketing-
[31] Maxmind. (2021) GeoIP2 City Accuracy. [Online]. Available: https:
glossary/, 2019.
//www.maxmind.com/en/geoip2-city-accuracy-comparison
[7] R. Gonzalez, C. Soriente, and N. Laoutaris, “User profiling in
the time of https,” in Proceedings of the 2016 Internet Measurement [32] F. Aurenhammer, Voronoi diagrams and Delaunay triangulations.
Conference, ser. IMC ’16. New York, NY, USA: Association Hackensack, New Jersey: World Scientific, 2013.
for Computing Machinery, 2016, p. 373–379. [Online]. Available: [33] B. Pourghassemi, J. Bonecutter, Z. Li, and A. Chandramowlish-
https://doi.org/10.1145/2987443.2987451 waran, “AdPerf: Characterizing the Performance of Third-Party
[8] T. Theodoridis, S. Papadopoulos, and Y. Kompatsiaris, “Assessing Ads,” Proc. ACM Meas. Anal. Comput. Syst., vol. 5, no. 1, Feb. 2021.
the reliability of facebook user profiling,” in Proceedings of [Online]. Available: https://doi.org/10.1145/3447381
the 24th International Conference on World Wide Web, ser. [34] B. Gueye, S. Uhlig, and S. Fdida, “Investigating the imprecision of
WWW ’15 Companion. New York, NY, USA: Association for ip block-based geolocation,” in Passive and Active Network Measure-
Computing Machinery, 2015, p. 129–130. [Online]. Available: ment, S. Uhlig, K. Papagiannaki, and O. Bonaventure, Eds. Berlin,
https://doi.org/10.1145/2740908.2742728 Heidelberg: Springer Berlin Heidelberg, 2007, pp. 237–240.
[9] J.-W. van Dam and M. van de Velden, “Online profiling and [35] R. Padmanabhan, J. P. Rula, P. Richter, S. D. Strowes, and
clustering of facebook users,” Decision Support Systems, vol. 70, A. Dainotti, “Dynamips: Analyzing address assignment practices
pp. 60–72, 2015. [Online]. Available: https://www.sciencedirect. in ipv4 and ipv6,” in Proceedings of the 16th International
com/science/article/pii/S0167923614002796 Conference on Emerging Networking EXperiments and Technologies,
[10] J. Estrada-Jiménez, J. Parra-Arnau, A. Rodrı́guez-Hoyos, and ser. CoNEXT ’20. New York, NY, USA: Association for
J. Forné, “Online advertising: Analysis of privacy threats and Computing Machinery, 2020, p. 55–70. [Online]. Available:
protection approaches,” Computer Communications, vol. 100, pp. https://doi.org/10.1145/3386367.3431314
32–51, 2017. [Online]. Available: https://www.sciencedirect.com/ [36] M. Balakrishnan, I. Mohomed, and V. Ramasubramanian,
science/article/pii/S0140366416307083 “Where’s that phone? geolocating ip addresses on 3g networks,”
[11] J. M. Carrascosa, J. Mikians, R. Cuevas, V. Erramilli, and in Proceedings of the 9th ACM SIGCOMM Conference on Internet
N. Laoutaris, “I always feel like somebody’s watching me: Measurement, ser. IMC ’09. New York, NY, USA: Association
Measuring online behavioural advertising,” in Proceedings of for Computing Machinery, 2009, p. 294–300. [Online]. Available:
the 11th ACM Conference on Emerging Networking Experiments https://doi.org/10.1145/1644893.1644928
and Technologies, ser. CoNEXT ’15. New York, NY, USA: [37] Y. Shavitt and N. Zilberman, “A geolocation databases study,”
Association for Computing Machinery, 2015. [Online]. Available: IEEE Journal on Selected Areas in Communications, vol. 29, no. 10,
https://doi.org/10.1145/2716281.2836098 pp. 2044–2056, 2011.
[12] V. N. Padmanabhan and L. Subramanian, “An investigation of [38] S. Triukose, S. Ardon, A. Mahanti, and A. Seth, “Geolocating ip
geographic mapping techniques for internet hosts,” in Proceedings addresses in cellular data networks,” in International Conference on
of the 2001 Conference on Applications, Technologies, Architectures, Passive and Active Network Measurement. Springer, 2012, pp. 158–
and Protocols for Computer Communications, ser. SIGCOMM ’01. 167.
IEEE TRANSACTIONS ON MOBILE COMPUTING 15

[39] D. Komosny, M. Vozňák, and S. Rehman, “Location accuracy of Ángel Cuevas received the M.Sc. (2007), and
commercial ip address geolocation databases,” Information Tech- the Ph.D.(2011) degrees in Telematics Engineer-
nology And Control, vol. 46, 09 2017. ing from the University Carlos III of Madrid. He is
[40] M. Gharaibeh, A. Shah, B. Huffaker, H. Zhang, R. Ensafi, currently an Associate Professor in the Depart-
and C. Papadopoulos, “A look at router geolocation in public ment of Telematic Engineering, University Carlos
and commercial databases,” in Proceedings of the 2017 Internet III of Madrid. He is a co-author of more than 70
Measurement Conference, ser. IMC ’17. New York, NY, USA: papers in prestigious international journals and
Association for Computing Machinery, 2017, p. 463–469. [Online]. conferences, such as the IEEE/ACM TRANSAC-
Available: https://doi.org/10.1145/3131365.3131380 TIONS ON NETWORKING, the ACM Transac-
[41] C. Iordanou, G. Smaragdakis, I. Poese, and N. Laoutaris, tions on Sensor Networks, Computer Networks
“Tracing cross border web tracking,” in Proceedings of the Internet (Elsevier), the IEEE NETWORK, the IEEE Com-
Measurement Conference 2018, ser. IMC ’18. New York, NY, USA: munications Magazine, USENIX Security, WWW, ACM CoNEXT, and
Association for Computing Machinery, 2018, p. 329–342. [Online]. ACM CHI. His research interests focuses on Internet measurements,
Available: https://doi.org/10.1145/3278532.3278561 web transparency, privacy, and P2P networks. He was a recipient of the
[42] Location Sciences, “Location sciences,” https://www. Best Paper Award at ACM MSWiM 2010.
locationsciences.ai/, 2021.
[43] Double Verify, “Double verify,” https://doubleverify.com/, 2021.
[44] Human, “Bot Mitigation — Know Who’s Real,” https://www.
humansecurity.com/, 2021.
[45] Integral Ad Science, “Integral ad science,” https://integralads.
com/, 2021.

Patricia Callejo is a post-doc researcher at


UC3M-Santander Big Data Institute. She ob-
tained her M.Sc (2016) and Ph.D (2020) at Uni-
versity III of Madrid in the field of Telematics
Engineering. She was granted by RIPE Aca-
demic Cooperation Initiative (RACI) on RIPE 76
that took place in Marseille, France in 2018.
The same year, she did an internship in the
International Computer Science Institute (ICSI)
at UC Berkeley (USA), as part of her PhD. She
is author of conference papers such as ACM
HotNets, ACM CoNEXT, and WWW. She has participated in EU H2020
projects. Her areas of interest include Internet measurements, online
advertising, and web transparency.

Marco Gramaglia received the M.Sc. and Ph.D.


degrees in telematics engineering from the Uni-
versity Carlos III of Madrid (UC3M), in 2009
and 2012, respectively. He held post-doctoral re-
search positions at ISMB, Italy, CNR-IEIIT, Italy,
and IMDEA Networks, Spain. He is currently
a PostDoctoral Researcher at UC3M. He was
involved in EU projects and has authored more
than 50 papers appeared in international confer-
ence and journals.

Rubén Cuevas is an Associate Professor in


the Telematic Engineering department at Uni-
versidad Carlos III de Madrid, Spain and the
Deputy Director of the UC3M-Santander Big
Data Institute. He has coauthored over 70 pa-
pers in prestigious international journal and con-
ferences such as ACM CoNEXT, WWW, Usenix
Security, ACM HotNets, the IEEE Infocom, ACM
CHI, IEEE/ACM TON, the IEEE TPDS, CACM,
PNAS, Nature Scientific Reports, PlosONE, or
Communications of the ACM. He has been the
PI of 10 research projects and participated in more than 25 projects.
His research work has been featured in major internationalca media
such as The Financial Times, BBC, The Guardian, New Scientist, Wired,
Corriere della Sera, Le Figaro, El Pais, etc. His main research interests
include online advertising, web transparency and Internet measure-
ments.

You might also like