0% found this document useful (0 votes)

88 views8 pages

Analysis of Geographic Queries in A Search Engine Log

This document analyzes geographic queries in a search engine log to identify opportunities for improving geographic search engines. The authors manually examined thousands of queries to observe typical properties of geographic queries. They then built a classifier to separate 36 million queries into geographic and non-geographic queries. Key findings include the most common types of geographic terms used, how geographic queries related to the websites visited and users issuing them, and a proposed new taxonomy of geographic query types. The goal is to provide insights into how people write geographic queries and how search engines can better process them.

Uploaded by

Maico Xuri

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

88 views8 pages

Analysis of Geographic Queries in A Search Engine Log

Uploaded by

Maico Xuri

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 8

Analysis of Geographic Queries in a Search Engine Log

Qingqing Gan Josh Attenberg Alexander Markowetz Torsten Suel

Polytechnic University Polytechnic University University of Science & Technology Polytechnic University
Brooklyn, NY 11201 Brooklyn, NY 11201 Hong Kong, S.A.R Brooklyn, NY 11201
[email protected] [email protected] [email protected] [email protected]
ABSTRACT area. Typical examples are “hotels new york”, “building codes in seat-
Geography is becoming increasingly important in web search. Search tle”, “virgina historical sites”, or “unemployment long island”. Such
engines can often return better results to users by analyzing features queries frequently contain names of cities, states, or countries – often
such as user location or geographic terms in web pages and user que- abbreviated, e.g., “CA”, “NYC”, or “SF”. Alternately, they may con-
ries. This is also of great commercial value as it enables location tain streets names, informal synonyms (e.g., “big apple”), or refer to
specific advertising and improved search for local businesses. As a re- landmarks and neighborhoods (e.g., “SoHo” in New York). In some
sult, major search companies have invested significant resources into cases, users include zip codes or phone numbers.
geographic search technologies, also often called local search. Because of geography’s important role in search requests, and the
This paper studies geographic search queries, i.e., text queries such significant commercial potential of such queries (e.g., for hotels, real
as “hotel new york” that employ geographical terms in an attempt to estate, or local businesses), search companies have recently invested
restrict results to a particular region or location. Our main motivation significant resources into geographic (geo) search technologies (also
is to identify opportunities for improving geographical search and re- called local search), i.e., methods aimed at giving improved answers
lated technologies, and we perform an analysis of 36 million queries to geographic search requests. Approaches range from integration of
of the recently released AOL query trace. First, we identify typical business directories (yellow pages) to answer fairly simple but lucra-
properties of geographic search (geo) queries based on a manual ex- tive queries (e.g., for hotels, shops, and restaurants), to a more de-
amination of several thousand queries. Based on these observations, tailed analysis of queries, page content, and site and link structure in
we build a classifier that separates the trace into geo and non-geo que- order to facilitate more general queries. Geo search applications can
ries. We then investigate the properties of geo queries in more detail, use a standard keyword interface and extract geographic terms from
and relate them to web sites and users associated with such queries. queries, employ graphic interfaces such as interactive maps, or use
We also propose a new taxonomy for geographic search queries. the current location of a mobile user. In general, geo search engines
combine knowledge regarding how people use geographic terms in
queries, how such terms are used in pages, and how sites are orga-
Categories and Subject Descriptors nized and linked with respect to geography. They commonly also use
H.3.1 [Information Systems]: Content Analysis and Indexing—In- external data sources, in particular gazetteers listing the names and
dexing methods; H.3.3 [Information Systems]: Information Search locations of states, cities, or businesses. Geo search technology has
and Retrieval—Search process recently been studied by a number of researchers, mainly focusing on
the extraction of geographic information from page content and struc-
General Terms ture [22, 24, 2, 14, 20, 9], indexing and query processing [38, 7, 35,
Measurement, Human Factors 21], and the automatic identification of geographic queries [10, 36].
Our main objective is to identify opportunities for improving geo-
Keywords graphic search engines. However, our observations should be of more
web search, geographic search, local search, query log mining general interest. We investigate real world queries of a large query log
from a standard (non-geographic) search engine, namely 36 million
1. INTRODUCTION queries from AOL. We study how people write geographic queries and
how these should be processed by search engines. Our paper builds
Over the last decade, search engines have become the primary means
on work in [28] and [37] that analyzed geographic queries.
of locating information for many people. For this reason, researchers
We are interested in what types of geographic queries (informa-
have started investigating available search query logs, in order to bet-
tional, navigational, transactional) users issue, what types of geogra-
ter understand what people are searching for, how they are searching,
phic terms they employ, and what they are looking for. We also study
and how this process can be improved. A number of recent studies
what sites users visited as a result of a geo query, how different geo-
[30, 11, 29, 4, 25], have looked at query logs from various perspec-
graphic terms were used by the same user, and what non-geographic
tives, including Computer Science, Library and Information Science,
terms are associated with geographic terms.
and Social Sciences. Our perspective is primarily from Computer Sci-
The remainder of this paper is organized as follows. Section 2 pro-
ence, where researchers mine query logs and click-through behavior
vides a basic background and an overview of related work. Section 3
to optimize system performance or provide more accurate results.
introduces the data set. Section 4 shows how geographic features can
While the Web has removed many geographical limitations in me-
be used to classify queries into geo and non-geo queries. The next
dia, communications, and e-commerce, many geographical aspects of
three sections investigate geographic properties of queries, users, and
the physical world are nonetheless reflected in the Web’s content and
sites, respectively. The main focus lies on our taxonomy of geographic
structure. As a result, geography often provides a useful and intuitive
queries. Finally, we conclude in Section 8.
constraint for Web search. This paper investigates geographic search
queries, i.e., keyword queries that employ geographical terms in order
to obtain search results related to a particular geographical location or 2. BACKGROUND AND RELATED WORK
Permission to make digital or hard copies of all or part of this work for There is significant literature on search engine logs, including stud-
personal or classroom use is granted without fee provided that copies are ies of general search logs [30, 11, 29, 4, 25], and various papers focus-
not made or distributed for profit or commercial advantage and that copies
bear this notice and the full citation on the first page. To copy otherwise, to
ing on special types of users and collections, e.g., multi-media search
republish, to post on servers or to redistribute to lists, requires prior specific [12], intranet search [31], blog search [23], or search in other lan-
permission and/or a fee. guages [19]. In particular, Kamvar and Baluja [15] studied the char-
LocWeb 2008, April 22, 2008, Beijing, China. acteristics of mobile queries submitted to Google’s search services for
ACM 978-1-60558-160-6/08/04 : : : $5.00.
PDA and cellular phones. We note that while mobile and geographic 3. Business, travel, employment
(local) search are often thought of as being closely related technolo- 4. Computers
gies, they are certainly not the same. It can be argued that many mobile 5. Science and medicine
queries are in fact geographic in nature, and that for certain types of
6. People, places, things, odds and ends
queries it may make sense to return results related to the current posi-
tion of the user. Kamvar and Baluja [15] investigate various features 7. Society and religion
of mobile queries, including query length and topics (but not geog- 8. Education, humanitarian interests
raphy), focusing on the user interface aspects of small screens and 9. The arts
limited input capabilities. In contrast, we focus on queries issued by 10. Government
desktop and laptop users to a general search engine. 11. Unknown and other
Search queries can be categorized according to several dimensions.
Broder [5] first proposed three distinct categories of queries: (i) nav- Even before the web, researchers studied how to exploit geogra-
igational, (ii) informational, and (iii) transactional. Of particular im- phic information embedded in documents for better text search and
portance to our approach is the work by Rose and Levinson [27] who analysis; see [16] for a good overview of early work. Initial work on
expanded Broder’s work into a more detailed taxonomy, also consist- geographic search on the web appears in [6, 9, 22], and in recent years
ing of three categories but differentiated further into ten search goals: a significant amount of research has addressed this new challenge.
A. Navigational: The user has a distinct Web site or page in mind Geographic queries were previously studied by Sanderson and Kohler
that he knows or assumes to exist. Navigational queries often contain [28] and by Zhang et al. [37]. The former provides a brief study of
fragments of URLs or names of organizations. The user commonly some of the properties of geographic queries, in particular frequency,
clicks on only one result, taking him directly to the desired page. topics, length, and spatial relationships. The latter study focuses on
B. Informational: These queries are similar to those traditionally the issue of geo modification in consecutive queries, i.e., how users
studied in IR, i.e., the user wants information about a certain topic, modify their choice of geographic terms when the previous query did
either broad (e.g., “history us”) or narrow (e.g., “special nutrition for not provide satisfactory results.
wound care”). Here, users often follow several of the resulting links. Assume a user looking for a nearby yoga class might look for “yoga
park slope” (a neighborhood in Brooklyn). When this search returns
Closed: queries seek a single, closed answer. poor results, she might try “yoga new york” and be swamped by many
Open: queries seek open-ended answers or answers of unlim- irrelevant results. Finally, “yoga brooklyn” satisfies her information
ited depth. need. For a single search task, she had to re-write the same query
Undirected: queries target anything or everything about a par- several times. One goal of geographic search technology is to avoid
ticular topic. successive query modification through proper analysis of queries and
Advice: queries seek advice or instructions to complete a task. collections. The automatic rewriting method in [37] provides one such
Locate: queries attempt to detect where a real world good or approach (also related to the query expansion technique for geogra-
service can be obtained. phic search in [8]). Our work here expands on [28] by providing a
more in-depth analysis of the properties of geo queries. This paper
List: queries search for lists of good pages on a topic, e.g., a also investigates the relationship between geography, page topic, and
Yahoo or ODP directory. users, and is to our knowledge the first work in this direction.
C. Resource: These queries target resources, not web documents. Closely related to the analysis of geographic queries is the auto-
Download: queries target a resource which must be downloaded matic detection of geo queries [10, 36, 37], and in general of geo-
to be useful. graphic terms in text data [18, 2]. In particular, automatic detection
is highly useful for measuring the statistical properties of geo que-
Entertainment: queries search for pages which when viewed ries in large logs. Such detection can be based either on individual
may provide entertainment. queries, or can include past queries, past click-through behavior, or
Interact: queries look for pages which require further interac- results returned by the engine. There have been many proposals on
tion, for instance map or weather services. how to use knowledge mined from search query logs, such as click-
Obtain: queries seek documents which are useful on or off the through information, repeated identical or related queries by the same
computer, such as tax forms or government documents. or different users, or co-occurrences of terms in queries, to deliver im-
In [27, 17], researchers studied users’ navigational behavior (in par- proved search results to users [3, 13, 33, 26, 34, 32, 1]. The study of
ticular, click-through behavior), since a user’s goal cannot always be geographic queries by the same or different users, or of click-through
inferred by just looking at a query. They find that over 60% of que- behavior on such queries, is also of interest in this context.
ries were informational, and a large fraction of the other nearly 40%
seemed to seek a commercial transactions, rather than request prod- 3. IDENTIFYING GEO QUERIES
uct information. Distributions of search taxonomies are subject to This section lays the foundation for our study. We describe the
changes in search technology and user behavior - somebody who a underlying data, discuss basic geographic properties, and introduce
few years ago may have looked for the Web site of a company (nav- a taxonomy of geographic queries. The relative frequency of geo-
igational) for product information may now be willing and able to graphical queries as well as their subtypes is evaluated on a manually
order the item directly from the site (locate). In this paper, we use geo-coded query set. Finally, we propose two classifiers to classify the
the classification in [27], utilizing click-through data to identify the entire query trace. These classifiers are highly accurate, as evaluated
information need reflected by a query. on the manually geo-coded samples. We then use these classifier to
We are also interested in examining geo-queries categorized accord- aid in our subsequent statistical evaluation of the entire trace.
ing to the topic-based taxonomy of Spink et al. [30]. Here queries
are assigned to one of eleven categories according to what topic most 3.1 Underlying Data
closely matches their intent. These categories are, in decreasing order We study a trace of the AOL search engine, recording queries of
according to the fraction of all queries in a general query log in with roughly 650 000
; users over three months in early 2006. The trace
each category: 36
consists of about million lines of data, each containing five fields:
1. Entertainment AnonID: an anonymous user-ID
2. Pornography Query: the actual query terms
QueryTime: when the query was issued often seek to locate a store using the company’s web site. We did not
Item-Rank: the rank of the clicked result evaluate the number of such queries, as it would be difficult to guess
if a user is interested in finding a local store or making an online pur-
ClickURL: the host-level result the user clicked on (if any)
chase. In any case, 13%is probably an underestimate of the frequency
In case the user clicked on multiple results to a single query, these of geographic search tasks.
events are recorded in the form of extra lines. For an in-depth descrip- In our experiments, we only consider geographic entities within the
tion of the data, see [25]. United States; thus, queries that refer to international locations or to
Although real-life queries are often malformed and misspelled, the the US as a whole are ignored. The rationale behind this decision is
user’s intent is usually quite clear. For example, “www.footballcamps- that any automatic query classifier needs to incorporate some under-
atlanta.google” is clearly malformed, but it is apparent what the user standing of the language issues, ambiguities and difficulties associated
was looking for. Similarly, “noweign cruise lines” is misspelled, but with the geographic query terms from a particular region. Such infor-
has a clear intention.1 When classifying queries by hand, we label ac- mation is usually compiled for a single region or country at a time;
cording to the intent of the user, not according to any mistakes, when for this reason, local search engines are commonly launched on a per-
possible. This is done using the methodology of Rose and Levinson country basis. Since we are best able to manage these issues within
[27], utilizing click-through data for clarification when queries alone the geographic and linguistic confines of the United States, we chose
are insufficient for determining intent. The rationale is that query clas- to focus our work on queries focused there.
sification per se should be interested in a user’s intent, not her way of After manual classification, we discovered 582queries with geogra-
expressing this intent. Also, most advanced search engines realize phic intent out of 4495 queries in the sample. We then looked at the
users’ mistakes and propose corrected versions of the query. Due to query length (number of terms) of these queries; the results are shown
limited resources, we do not perform spell-checking when performing in Table 3.2. Note that the columns titled “Non-Geo” and “Geo” in-
automatic classification on the entire query trace. dicate the distribution of geographic and non-geographic queries in
To detect geographic terms in queries, we use the US Census Bu- terms of query length; thus, : 14 48% of all geographic queries have
reau’s gazetteer, which contains names and locations of counties, their 2 terms. The column titled “Geo of all” depicts the percentage of all
subdivisions (district, borough, barrio), places (town, city, village, queries with a given number of terms which have a geographic intent;
etc.), and ZIP Codes for all 50 states. 18 78%
thus, : of all queries with three terms are geographic queries.
3.2 Hand-Tagging Geo Queries Num. Query Terms Non-Geo Geo Geo of all
We begin by extracting an initial sample of 6000
random queries 1 25 54%
: 1 03%
: 0 52%
:

from the data set. After discarding all queries consisting exclusively 2 33 95%
: 14 48%
: 5 22%
:

of URLs and some badly misspelled or malformed queries, 4495 que- 3 19 54%
: 35 04%
: 18 78%
:

ries remain. These are examined manually, and assigned one of four 4 10 47%
: 26 21%
: 24 56%
:

labels, according to their geographic intent and their use of common 5 5 19%
: 17 93%
: 30 86%
:

geographic terms. Thus, for each query we decide if it has a geo- > 5 5 31%
: 5 31%
: 11 19%
:

graphic intent, and if it contains the name of a city, county, or state

Table 3.2: Number of terms in geo and non-geo queries.
according to the gazetteer. Note that other geographic terms also ap-
pear frequently, such as street names or names of landmarks or places This table confirms what was noticed in [28] and [37]: geo que-
of interest (e.g., “statue of liberty” or “empire state building”). The ries tend to have more terms than non-geo queries, and conversely the
four categories are: (i) Geographic queries that contain a city, country likelihood that a query is a geo query increases with the number of
or state name as a geographic term. (ii) Geographic queries that do terms. However, one has to be very careful in interpreting these re-
not contain such terms. (iii) Non-geographic queries seemingly con- sults. It should be expected that many classes of specialized queries,
taining a geographic term, e.g., “whitney houston”. This category in- say geographic queries, people queries, or product queries, have more
cludes many entity names, such as “Kentucky Fried Chicken”, “New terms than average. If we imagine that each term in a query is chosen
York Times” or “First Niagara Bank”. (iv) Non-geographic queries from some distribution, then the likelihood that a geo term (or people
without geographic terms. The numerical results of this classification term, or product term) is present, and/or that a geographic or people
are presented in Table 3.1. or product intent is present, increases with the number of terms. Note
also that classes such as geographic and health queries are not mutu-
Types of Queries Num. of Queries
Geo with Geo terms 12 01%
:
ally exclusive, and that a longer query may be more likely to be in
Geo without Geo terms 0 93%
:
several classes. Thus, it is not impossible that most or even all such
Non-Geo with Geo terms 24 44% specialized classes of queries of interest have an above average num-
62 62%
:
Non-Geo without Geo terms : ber of terms. Finally, a very short query is less likely to be recognized
as a geographic query even if the underling intent is geographic (e.g.,
Table 3.1: Geo vs. non-geo queries. as query “walmart” that tries to find the closest store on the company
Table 3.1 may give the impression that only 13%
of the queries 12 7%
website). Related to this, [37] reports that : of query rewrites add
pursue a geographically focused task, but the real percentage should a geo-specific term; thus, the original query probably had geographic
be somewhat higher. The AOL query trace is based on a standard intent. A good geographic search engine might use the user’s loca-
search engine, with no explicit geo capabilities. Many users with tion and previous geographic queries to return likely results of interest
a geographical search task in mind may only use such search en- without a rewrite by the user.
gines to find a Web site that will allow them to restrain the geogra- 3.3 Taxonomies for Geo-Search Queries
phic focus of their query in a second step. In our random sample,
Following Rose and Levinson [27], we classified about 500geo
for example, we find about twenty five requests for mapping services
(e.g., mapquest.com). These users are most likely pursuing a geoqueries and about 500
non-geo queries from our sample into eleven
graphic search task. Similarly, users searching for “craigslist” will distinct categories according to the apparent goal of the user, as in-
have to specify a metropolitan area of interest as soon as they access ferred from the query itself and the associated click-through data. re-
www.craigslist.org. Many queries for retail chains, e.g., Radio sults, given in Figure 3.1, show significant differences between geo
Shack, Nordstrom, or Target, are likely geographic in nature as users and non-geo queries. Geo queries are more frequently aimed at lo-
cating goods and services; non-geo queries are more likely aimed at
1
Norwegian Cruise Line is a large cruise operator. entertainment, downloads, or lists of pages with further information.
which are often within a category.
To address this, we propose a new query taxonomy for geographic
queries that combines aspects of topicality and desired type of inter-
action. We came up with 23
categories as follows:
1. Tourism/Travel: hotels, maps, flights, transport, local attractions
2. Government: searches for government entities, info, and laws
3. Real Estate: houses, apartments, and commercial real estate
4. Education: requests for educational or school related information
5. Business: non-online business related searches, except when in another
Figure 3.1: Distribution of geo and non-geo queries according to category
the taxonomy of Rose-Levinson. Note that the bars in each color 6. Night Life: including restaurants, entertainment, and casinos
sum up to a total of 1.0. 7. Undirected: broad informational requests for a topic
Navigational queries of a geographic nature often point to regional 8. Medical: hospitals, doctors, and general health and medical informa-
sections of nation-wide corporation or service. We observe two typ- tion
ical cases: (1) Site-Wide. The geographic term is used to distin- 9. Media: news, radio, papers, magazines, and other media
guish the desired Web site from other similar Web sites. For exam- 10. Employment: searches seeking employment opportunities
ple, “DMV ny” targets www.nydmv.state.ny.us, while “DMV ca”
11. Automotive: requests for automotive information and searches for au-
targets www.dmv.ca.gov. Similarly, many different cities have bars tomotive businesses
or restaurants with identical names (e.g., Joe’s Pizza) that are not af-
12. Civic: searches seeking civic, religious, and non-profit organizations
filiated in any way. (2) Site-Internal. Here the non-geographic terms
already determines the desired Web site, and the geographic term tar- 13. Closed: seeking an answer to a specific question
gets a particular page or item inside this site (e.g., “craigslist boston”). 14. Obtain: seeking a specific document or resource that is useful on or off
The difference between “locate” queries in the context of geo vs. the computer
non-geo queries is pronounced. Most geo-query “locate” searches 15. List: searches for a site which can provide further information. Seeking
consist of the name of a particular store or a search for a service in a hub rather than an authority
an area, e.g., “florists phoenix” or “crobar nyc”, while a typical non- 16. Advice: requests for advice or directions to complete a task
geographic counterpart may contain the name of a good to buy online, 17. Downloads: requesting software or files to be downloaded to a user’s
such as “ellsworth kelly prints”. Also, while there are many naviga- computer
tional queries among the geo queries, a majority of these are searches 18. Interactive: requesting pages which require further interaction in order
for local or state government agencies. Many “open” geographic que- to be useful
ries are searches for local media, news, or people. Such topical differ- 19. People: seeking individual people
ences are not conveyed by the taxonomy of Rose and Levinson.
20. Open: open ended questions or requests for information
Next, we turn to the topical classification scheme used by Spink et
al. [30], which also consists of eleven categories, listed in Section 2. 21. e-Business: attempts to find a online retailer of a product or service
Labelling the same set if geo and non-geo queries, we get the results 22. Entertainment: queries seeking to be entertained by the contents of a
shown in Figure 3.2. page. Including pornography and pictures
23. Navigational: requests clearly looking for a specific web site

We note here that this taxonomy is specifically designed to allow

better understanding of geo queries, and in particular the first twelve
classes captures common types of queries that we found in our trace.
The distribution of geo and non-geo queries in this finer-grained, hy-
brid taxonomy is shown in Figure 3.3. As we see, geo queries focus
on the first 13 categories, and are less frequent in the others (with the
exception of category 20). While there are significant number of com-
mercial geo queries for hotels, restaurants, cafes, real estate, and local
businesses, one interesting observation was the large number of local
queries about government, civil organizations, education, and media
Figure 3.2: Distribution of geo and non-geo queries according to that may not be well served by the current generation of geo search
the topicality classification of Spink et al. technology that is heavily focused on the former cases.
We again see some obvious difference in several categories. Cat-
egory two and four are exclusively non-geographic: there were no
queries asking for local pornography or local information about com-
puters. Category 6 is dominated by geographic queries. There are
frequent requests for local news and events, local government ser-
vices, weather. On the other hand, many non-geo queries were about
celebrities and national news. In category 5 (science and medicine),
there were many queries for local medical services, but unsurprisingly
very little local physics or other sciences. Category 8 shows that much
information about schools and education is sought at the local level,
for all levels of education. The same applies to category 10; there are
frequent searches for branches of local government and official forms
and information (e.g., about zoning laws and taxes). But as the taxon-
Figure 3.3: Distribution of geo and non-geo queries according to
omy of Rose and Levinson, Spink’s taxonomy also does not capture
our hybrid classification
some important difference between geo and non-geo queries users,
4. QUERY CLASSIFICATION Class Precision Recall F-Measure
Non-Geo 0 911
: 0 899
: 0 905
:
The sample data set used in the previous section is of insufficient Geo 0 903
: 0 915
: 0 909
:
size for many tasks. For example, making statements about frequently
appearing terms in geographic queries requires more information than Table 4.1: Accuracy of the Geo-NonGeo Classifier
our sample set allows. Categorizing the entire AOL trace by hand Place-Person If a city, county or state name is present, could this
is, however, not feasible. Instead, we use the manually labeled sam- term also be a person’s first or last name? First and last names
ple to bootstrap two classifiers. The first differentiates geographical were obtained from the US Census Bureau.
queries from those without geographic intent, while the second clas-
sifies geographic queries roughly according to informational versus Name-Place If a city, county or state name is present, does this term
navigational queries. As our experiments show, both classifiers are appear prior to a last name or after a first name?
sufficiently accurate, and thus they are subsequently used to classify
all 36million queries.
As shown in Table 3.1, there are actually more non-geographic que-
ries containing geo terms than there are geographic queries. In order
The biggest challenge in geographic query classification comes from to produce a good classifier, we used training data consisting of 50%
ambiguous geographic terms. It is obvious to readers of the yellow
press that queries such as “Paris Hilton” do not commonly refer to
geographic queries with geographic terms and 50%
geographic que-
ries without geographic terms. In total, the training set consisted of
hotels in the capital of France. Similarly, “Cadillac” commonly tar-
gets automobiles, not a city in Michigan. In order to disambiguate
around ; 1 200 queries.
Utilizing the popular machine learning software, Weka3 , we eval-
queries containing these terms, we have to inspect their other terms. uate our decision-tree based classifier using ten-fold cross validation.
Abbreviations of state names such as “CA” often indicate a geographic
meaning. This rule of thumb however does not apply to certain states
About : 90 69% of all queries were correctly classified; see Table 4.1
for the results. Note that this accuracy is measured on the already fil-
like “MD”, “LA”, or “OR”. Many such cases are hard to classify, even tered data, i.e., the classifier differentiates between geo and non-geo
for humans. queries that both contain geographic terms. If used on all queries, its
4.1 Geo Non-Geo Classification accuracy would be higher. Our classifier compares favorably to that
of [10] in terms of accuracy. After applying the classifier to the en-
This first classifier detects geographical queries in two stages. First,
a simple filter removes all queries without any geographic terms. In
tire AOL log, around : 13 39% of all queries are identified as having
other words, queries with no locality terms are classified as non-geo geographic intent.
queries; as shown earlier this affects about 1% of all queries that are
4.2 Informational vs. Navigational Queries
geographic but have no city, country, or state name. After applying
this filter, we are left with queries falling into categories “geo with It is not feasible to automatically classify geographic queries ac-
geo terms” and “non-geo with geo terms”. These are then classified cording to any of the fine-grained taxonomies illustrated in Section 3.3.
according to the following features: From a user’s point of view there is a clear distinction between naviga-
tional or resource queries. A user wants to either find a website, or find
Property & Tourism Does the query contain terms about properties a resource, e.g., buy something. However, the resulting queries often
or hotels?2 look similar, and can even be identical. Assume a user investigating
State Does the query contain a state name, or its abbreviation? the latest sportswear. She might search for “adidas”, a navigational
State-Pos The position of the state name from the end of the query; query to learn about available models. But a user intending to buy
0
e.g., if it is the rightmost term in the query. We notice that shoes online might also enter “adidas” and then proceed to the online
when a state name is included in a query, the state name often store. This query now targets a resource; the query is the same, but
appears at the end of the query. the user’s intention is very different. Thus, it is clearly not possible to
Ambiguous State-Abbreviation Does one of the following state ab- infer user intent from queries alone, even for a human classifier. How-
breviations appear as the only locality information in the query: ever, we can resort to a cruder taxonomy which is still meaningful and
“OH”, “OR”, “MD”, “AS”(American Samoa) ? These abbrevi- that allows for automatic classification. We hence limit ourselves to
ations are often used in a non-geographic sense. two simple categories, navigational and informational. The first con-
City Does the query contain a city name? tains all queries that are navigational according to the definition of
County Does the query contain a county name? Rose and Levinson, or that request a download. The second category
contains all other queries.
County-follow If the answer is true for the previous questions, is the
This classifier differs from the previous in that it does not look at the
county name followed by word “county”, “village”, “co”, “bor-
query terms, but instead looks at users’ click-through data. The under-
ough” etc? People searching for a county or city often append
lying assumption is that for a navigational query, a user only clicks on
such indicative terms.
a single result, as suggested in [17]. For an informational query, she
State-follow If a city or county term appears in a query, does the term may instead follow several links. This hypothesis is captured by the
occur next or prior to a state name? The city or county must be following two features used by our classifier:4
inside that particular state.
Place-Size If a city or county term is found, how large is its popula- Avg. number of clicks per query This feature represents how many
tion? If it is a very popular city or county in US, it is most likely results a user clicks on after issuing a query. This number is
that the query searches for that city/county. On the other hand, averaged over all users who issued a particular query.
a small city is the target of few search queries. Click distribution This feature is based on the intuition that most
Geo-Web-Freq If a city, county or state name is present, what is the clicks resulting from a navigational query focus on a few popu-
frequency of this term in general Web documents? lar URLs. The click distribution of a query is defined according
to the number of clicked times for each different URL associ-
5
Geo-Query-Freq If a city, county or state name is present, what is
the frequency of this term in general search queries? ated with the same query. We look at measures of distribution:
2
average, mean, standard deviation, skew, and kurtosis.
In particular: apartment, balcony, bath, bathroom, bed and break-
fast, bedroom, building, condo, condominium, duplex, estate, flats, Additionally, we investigate:
garage, home, hotel, house, inn, kitchen, lawn, lease, lodge, lodging, 3
map, motel, property, real estate, realestate(sic.), rental, renting, sub- http://www.cs.waikato.ac.nz/ml/weka/
4
let, view, villa, waterfront, and their plural forms, e.g., apartments. For a detailed explanation of both features, see [17].
Class Precision Recall F-Measure Query Granularity Top-5 terms
Informational 0 85
: 0 951
: 0 898
: city level “hotel”
Navigational 0 928
: 0 789
: 0 853
: “beach”
“city”
Table 4.2: Accuracy of Info-Navi Classifier “news”
“auto”
Geo-URL Does the clicked URL contain the name of a city, county county level “county”
or state? “real estate”
“house”
The resulting classifier is reasonably accurate. Given a training set
400
“property”
of around hand labeled queries distributed evenly between in- “home”
formational and navigational, the classifier achieves an accuracy of state level “jobs”
87 94%
: . Note that we only select queries with more than 10 clicks “lottery”
“sale”
to evaluate our classifier. If a user issued an identical query several
“park”
times and every time followed the same result, then we counted only a “department”
single click. Table 4.2 shows the accuracy numbers for this classifier.
Table 5.2: Top-5 query terms
5. GEOGRAPHIC QUERY PROPERTIES
There are important differences between geo and non-geo queries; Term Likelihood to appear in a geographic query
estate 81 61%
:
81 59%
users look for different “things” when searching locally than globally.
shores
81 05%
:
The classifiers presented in the previous section facilitate the study of
cemeteries :
properties of geo queries on a large scale. First, we classify the en- appraiser 80 98%
:
tire AOL trace into geo and non-geoqueries. Then, we analyze term lodging 80 79%
:
frequencies for both types of queries. Finally, we explore the distri-
bution of geographic and non-geographic queries in different topical Table 5.3: Terms most likely to appear in geographic queries
categories as well as geographic distribution.
5.4 Geo Queries and Topical Categories
5.1 Frequent Terms In Section 3, we showed that geo and non-geo queries focus on dif-
Table 5.1 outlines the five most frequent terms for geographic and ferent search topics. To explore this notion in the larger dataset, we
non-geographic queries, taken from the results of our automatic classi- relate our queries to web sites covered by the Open Directory Project
fier. Note that no geo terms (city, county, or state names) or stop words (ODP). Thus, we assume that a query falls into some category iff the
are counted; this applies to all remaining sections. Unsurprisingly, the clicked URL (i.e., website, since click-though data is provided on a
most frequent terms in non-geographic queries are unrelated to geog- site level only) associated with this query is covered under that cate-
raphy, while other terms are more likely to appear in geo queries than gory. We limit ourselves to the ODP top-level categories. For each
in non-geo ones. category, Figure 5.1 shows the number of geo and non-geo queries.

Query Type Top-5 terms

non-geographic “free”
“google”
“new”
“yahoo”
“pictures”
general geographic “hotel(s)”
“sale”
“real estate”
“beach”
“home(s)”

Table 5.1: Top-5 query terms

5.2 Frequent Terms at Varying Granularity

Do geographic queries at different granularity (e.g. county vs. city)
address different information needs? This is indeed the case, as shown Figure 5.1: Query distribution over different topics
in Table 5.2, which outlines the most frequent terms in different gran-
Note that we filter out duplicate query/click pairs from the same
ularity. (We note here that county vs. city is not just a different gran-
user. A small portion of sites are covered by more than one category.
ularity, but also often an indication of more rural or suburban versus
Of course, categories are not entirely exclusive. In particular, many
urban environments, complicating the picture a bit. City residents are
sites (e.g., a local football club) are commonly classified by location
often more likely to refer to their location by city name rather than the
(“regional”) as well as topic (“sports”). Obviously, the “regional” cat-
county the city is located in, which may have little relevance to them.)
egory applies to a larger number of geographic queries. In order to
5.3 Indicative Terms compare geo and non-geo queries in terms of their distribution over
Some terms are more likely to appear in geo queries than in non- topics, we removed the regional category and plotted the results again,
geo queries, of a non-geographic nature, and vice versa. Table 5.3 shown in Figure 5.2. We can see that geographical queries clearly tend
displays the five terms that are most likely to be in a geo queries. towards a few categories in ODP, such as Society and Sports. This
This is computed as the number of times a term appears in geographic also includes a large number of clicks on pages of religious, civic,and
queries divided by the number of instances in which the term appears governmental sites.
in the general query log. This could be used to further improve the
performance of our classifier. For example, the term “estate” is much
5.5 Geo Query Distribution over US States
more likely to appear in a geo query. Here, we only take into account This section investigates how geo queries are distributed among dif-
query terms which appear more than 1000times in the whole query ferent states in the US. A geographic query includes at least one lo-
log, reducing noise induced by infrequent terms. cation term, i.e., a city, county, or state name. We assign a state to
the associated queries are non-geographic in nature, we call non-geo
sites. Next, we look at the differences between geo and non-geo sites.
6.2 Geo Sites and Top-Level Domains
In Figure 6.2, we look at how geo and non-geo sites are distributed
among different top-level domains. We see that .gov and .org sites
are more often visited via geo queries, as such sites are more often
associated with local government and civil organizations.

Figure 5.2: Query distribution, without “regional”

each query according to this term. In the case that only a city name
is found and is associated with more than one state, we associate this
query with the city having the largest population. For example, there Figure 6.2: Distribution of geo/non-geo queries for different top
are more than five “Brooklyn” in the US, but we assign “New York”’ level domains
as the state for any such query.
In our experiments, we look at the popularity of different states in 6.3 Geo Sites and Topical Categories
geographic queries. The five most popular states are: Florida, Califor- Now we investigate the topical distribution of geo and non-geo
nia, Texas, New York, and Ohio. Combined, queries about those five sites, using again the ODP hierarchy. Confirming our previous find-
36 72%
states count for : of all geographic queries in our data set. This ings, we see that geo-sites are more likely to be associated with the re-
is not surprising as these are also very populous states. Also, people gional category. In fact, the vast majority of geo sites that were found
show different interests for different states. For example, “Kids and in ODP were in the regional category. This indicates that our way of
teens” is the most popular topic in both Florida and New York, while defining a geo site could in fact be used to identify good candidates
the same topic is the least popular one in other states (possibly due for the regional category. More detailed results are again omitted due
to the importance of tourism for these states). Detailed results on this to space constraints.
experiment are omitted for space reasons.

6. GEO PROPERTIES OF WEB SITES

6.1 Geo vs Non-Geo Sites
In the previous section, we investigated geo queries. In this sec-
tion, we extend our study to sites that are commonly associated with
such queries. In particular, we look at what sites are are mostly visited
by clicking through on geo queries, and how such sites are distibuted
over topics and assosiated with geo terms. Figure 6.1 divides all sites
receiving more than 10
clicks into ten bins. Bins are assigned accord-
ing to the fraction of these queries that were geo queries. Thus, the
first column on the left represents sites visited exclusively from geo
queries, while the rightmost column represents sites visited only from
non-geo queries. We can see that there is a strong bimodal behavior; Figure 6.3: Distribution of sites in different categories
many sites are either mostly geo or mostly non-geo in nature when 6.4 Local vs National sites
characterized by the queries used to visit them. There is also a rea-
sonable number of sites, shown in column 2 to 4, that have mostly Some sites seem to appear only in results for queries regarding a
non-geo queries but also some geo queries; such sites may have some particular area (say, “www.brooklynyoga.com” for Brooklyn), while
limited amount fo geographic information on their site such as, such other sites are associated with geographic query terms from around
as a store location or company address. the country. Examples of such sites include “www.realtor.com” and
“travel.yahoo.com”. This tells us that some sites have a broad geo-
0.4 graphic relevence while others provide a service only to a particular
0.35
area. In additional experiments, omitted for space reasons, we studied
0.3
the properties of such local versus nationwide sites. In summary, as
% of total sites

0.25

0.2
shown in this section, geo queries can be used to mine interesting facts
0.15
about the sites that are visited via those queries.
0.1

0.05 7. GEOGRAPHIC USER PROPERTIES

0
This section studies user behavior in connection with geographic
0

search tasks. Due to space constraints, we can only summarize some

-0

200
-0

1.
.1

0
2

% of Geo queries in each site of our observations. We focused on users with at least geographic
queries, and then manually examined the users’ searching behavior,
Figure 6.1: Distribution of sites according to the queries that are looking at the following questions:
used to find them Do users repeatedly conduct searches on the same geographic area?
Based on this, we define a geo site as a site where more than 80% of The answer is yes. Indeed, one could probably easily infer the home-
its associated queries are geo queries. Those sites where more 80% of towns of many of these users from the geo terms in their queries, as
users exhibit a tendency to conduct searches for local services. The Information Retrieval, pages 61–64, 2005.
non-geo terms associated with a user’s geo-terms also reveal much of [9] J. Ding, L. Gravano, and N. Shivakumar. Computing geographical scopes of web
resources. In Proc. of the 26th Int. Conf. on Very Large Data Bases (VLDB), pages
a user’s relationship with an area. Thus, if terms such as “school”, 545–556, 2000.
“yoga” or “real estate” tend to appear with geo terms, we have reason [10] L. Gravano, V. Hatzivassiloglou, and R. Lichtenstein. Categorizing Web queries
to believe that the user lives nearby. On the other hand, terms like according to geographical locality. In Proc. of the 12th Int. Conf. on Information
“hotel” or “vacation” might indicate the user lives somewhere else. and Knowledge Management, pages 325–333, 2003.
[11] B. J. Jansen and U. Pooch. A review of Web searching studies and a framework for
Do people in a single session of querying reformulate their que- future research. J. of the American Society for Information Science and Technology,
ries, trying different names for the same area? That is, how fre- 52(3):235–246, 2001.
quent is geo modification, as discussed in Section 2? Indeed, not too [12] B. J. Jansen, A. Spink, and J. Pedersen. An analysis of multimedia searching on
often. There are different ways to define search sessions. Manually AltaVista. In Proc. of the 5th ACM SIGMM Int. Workshop on Multimedia
Information Retrieval (MIR), pages 186–192, 2003.
checking the search history, we can identify instances when a person [13] T. Joachims. Optimizing search engines using clickthrough data. In Proc. of the
changes the topic of a search, and thus define a user search session eighth ACM SIGKDD Int. Conf. on Knowledge Discovery and Data Mining (KDD),
as a series of queries on a similar topic over a continuous block of pages 133–142, 2002.
time. This period can vary from several minutes to several days, as [14] C. B. Jones, A. I. Abdelmoty, D. Finch, G. Fu, and S. Vaid. The SPIRIT spatial
search engine: Architecture, ontologies and spatial indexing. In Proc. of the 3rd Int.
long as a user stays focused on a topic. When people search for lo- Conf. on Geographic Information Science, pages 125–139, 2004.
cal information or services, they are often fairly confident about the [15] M. Kamvar and S. Baluja. A large scale study of wireless search behavior: Google
appropriate geo terms. Thus, when users modify their queries, they mobile search. In Proc. of the SIGCHI conference on Human Factors in Computing
more often modify the non-geo terms. Users occasionally change the Systems, pages 701–709, 2006.
[16] R. R. Larson. Geographic information retrieval and spatial browsing. In L. Smith
geographic constraint present in the query while maintaining the non- and M. Gluck, editors, GIS and Libraries: Patrons, Maps and Spatial Information,
geographic portion of the information request. We found that in most pages 81–124, 1996.
of these cases, the user is querying about a location away from their [17] U. Lee, Z. Liu, and J. Cho. Automatic identification of user goals in Web search. In
likely home. The geographic terms are sometimes adjusted to point to Proc. of the 14th Int. Conf. on the World Wide Web, pages 391–400, 2005.
[18] J. L. Leidner. Toponym resolution in text: Which sheffield is it? In Proc. of the 27th
different parts of a city, since in some cases a tourist or traveler may be Int. ACM SIGIR Conf. on Research and Development in Information Retrieval,
flexible about where to go for a temporary stay. We note that the state pages 602–602, 2004.
names show very strong consistency across a user’s search session. [19] D. Lewandowski. Query types and search topics of German Web search engine
How are user queries clustered locally? For a particular user, one users. Information Services and Use, 26:261–1269, 2006.
[20] A. Markowetz, Y.-Y. Chen, T. Suel, X. Long, and B. Seeger. Design and
can derive their main geographical focus as the state or area addressed implementation of a geographic search engine. In 8th Int. Workshop on the Web and
by most of the geo queries of this user. This is likely the place of Databases (WebDB), 2005.
residence of the user. Similarly, one can define secondary and further [21] B. Martins, M. Silva, and L. Andrade. Indexing and ranking in GeoIR systems. In
clusters, potentially recent travel destinations of this user. Proc. of the 2. Int. Workshop on Geo-IR, 2005.
[22] K. McCurley. Geospatial mapping and navigation of the web. In Proc. of the 10th
Int. Conf. on the World Wide Web, pages 221–229, 2001.
8. CONCLUSION [23] G. Mishne and M. de Rijke. A study of blog search. In Proc. of the European Conf.
on Information Retrieval, pages 289–301, 2006.
In this paper, we investigated geographic properties of search que-
[24] Y. Morimoto, M. Aono, M. Houle, and K. McCurley. Extracting spatial knowledge
ries. Though, our main objective was to derive new techniques for from the web. In Proc. of the Symp. on Applications and the Internet, pages
geographic search engines, we believe our observations are of gen- 326–333, 2003.
eral interest. Our main contributions here are a more detailed study [25] G. Pass, A. Chowdhury, and C. Torgeson. A picture of search. In Proc. of the 1st
Int. Conf. on Scalable Information Systems, 2006.
of geographic search queries, a new taxonomy for such queries, and
[26] F. Radlinski and T. Joachims. Query chains: Learning to rank from implicit
experiments that relate such queries to the sites that are visited and the feedback. In Proc. of the Eleventh ACM SIGKDD Int. Conf. on Knowledge
users that pose them. We believe that with improved understanding Discovery in Data Mining, pages 239–248, 2005.
of users’ query goals and websites’ informational content, search en- [27] D. E. Rose and D. Levinson. Understanding user goals in Web search. In Proc. of
the 13th Int. Conf. on the World Wide Web, pages 13–19, 2004.
gines can take measures to improve response relevance. Due to space
[28] T. Sanderson and J. Kohler. Analyzing geographic queries. In Proc. of the
constraints, we had to omit many details of our results. Workshop on Geographic Information Retrieval, 2005.
There are many intriguing open questions left by our work. In [29] C. Silverstein, H. Marais, M. Henzinger, and M. Moricz. Analysis of a very large
particular, we would like to explore additional properties of the web Web search engine query log. SIGIR Forum, 33(1):6–12, 1999.
sites associated with geographic queries, and of geographic search ses- [30] A. Spink, D. Wolfram, M. B. J. Jansen, and T. Saracevic. Searching the Web: the
public and their queries. J. of the American Society for Information Science and
sions, and study how user behavior on geo queries (particularly click- Technology, 52(3):226–234, 2001.
through data) can be harvested for better geographic search. [31] D. Stenmark. One week with a corporate search engine: A time-based analysis of
intranet information seeking. In Proc. of the Americas’ Conf. on Information
Systems, 2005.
9. REFERENCES [32] B. Tan, X. Shen, and C. Zhai. Mining long-term search history to improve search
[1] E. Agichtein and Z. Zheng. Identifying “best bet” Web search results by mining
past user behavior. In Proc. of the 12th ACM SIGKDD Int. Conf. on Knowledge accuracy. In Proc. of the 12th ACM SIGKDD Int. Conf. on Knowledge Discovery
Discovery and Data Mining (KDD), pages 902–908, 2006. and Data Mining (KDD), pages 718–723, 2006.
[2] E. Amitay, N. Har’El, R. Sivan, and A. Soffer. Web-a-where: geotagging web [33] Q. Tan, X. Chai, W. Ng, and D. Lee. Applying co-training to clickthrough data for
content. In Proc. of the 27th Ann. Int. ACM SIGIR Conference on Research and search engine adaptation. In Proc. of the 9th Int. Conf. on Database Systems for
Development in Information Retrieval, pages 273–280, 2004. Advanced Applications (DASFAA), 2004.
[3] D. Beeferman and A. Berger. Agglomerative clustering of a search engine query [34] J. Teevan, E. Adar, R. Jones, and M. Potts. History repeats itself: Repeat queries in
log. In Proc. of the Sixth ACM SIGKDD Int. Conf. on Knowledge Discovery and yahoo’s logs. In Proc. of the 29th Annual International ACM SIGIR Conf. on
Data Mining (KDD), pages 407–416, 2000. Research and Development in Information Retrieval, pages 703–704, 2006.
[4] S. M. Beitzel, E. C. Jensen, A. Chowdhury, D. Grossman, and O. Frieder. Hourly [35] S. Vaid, C. B. Jones, H. Joho, and M. Sanderson. Spatio-textual indexing for
analysis of a very large topically categorized Web query log. In Proc. of the 27th geographical search on the web. In Proc. of 9th Int. Symp. on Spatial and Temporal
Annual Int. ACM SIGIR Conf. on Research and Development in Information Databases (SSTD), 2005.
Retrieval, pages 321–328, 2004. [36] L. Wang, C. Wang, X. Xie, J. Forman, Y. Lu, W.-Y. Ma, and Y. Li. Detecting
[5] A. Broder. A taxonomy of Web search. SIGIR Forum, 36(2):3–10, 2002. dominant locations from search queries. In Proc. of the 28th Annual Int. ACM
SIGIR Conf. on Research and Development in Information Retrieval, 2005.
[6] O. Buyukkokten, J. Cho, H. Garcia-Molina, L. Gravano, and N. Shivakumar.
Exploiting Geographical Location Information of Web Pages. In 2nd Int. Workshop [37] V. Zhang, B. Rey, E. Stipp, and R. Jones. Geomodification in query rewriting. In
on the Web and Databases (WebDB), pages 91–96, 1999. Proc. of the Workshop on Geographic Information Retrieval, 2006.
[7] Y. Chen, T. Suel, and A. Markowetz. Efficient query processing in geographic web [38] Y. Zhou, X. Xie, C. Wang, Y. Gong, and W. Ma. Hybrid index structures for
search engines. In Proc. of the ACM SIGMOD Int. Conf. on Management of Data, location-based web search. In Proc. of the 14th ACM Int. Conf. on Information and
pages 277–288, 2006. Knowledge Management, pages 155–162, 2005.
[8] T. M. Delboni, K. A. V. Borges, and A. H. F. Laender. Geographic Web search
based on positioning expressions. In Proc. of the Workshop on Geographic

GIS For Web Developers Adding Where To Your Web Applications 1st Edition Scott Davis Download
100% (2)
GIS For Web Developers Adding Where To Your Web Applications 1st Edition Scott Davis Download
53 pages
Lab-1-Sse-Google-Earth-Pro CIVILE BUITEMS
No ratings yet
Lab-1-Sse-Google-Earth-Pro CIVILE BUITEMS
20 pages
Shodan Dork Cheat Sheet
No ratings yet
Shodan Dork Cheat Sheet
6 pages
Tourist Attractions Dataset
No ratings yet
Tourist Attractions Dataset
19 pages
Rec Om Friend Zheng Published
No ratings yet
Rec Om Friend Zheng Published
44 pages
BLU HIT App #3 - Location Relevance - Guidelines - 2021-02-29
No ratings yet
BLU HIT App #3 - Location Relevance - Guidelines - 2021-02-29
17 pages
Maps Search Guidelines Notes
No ratings yet
Maps Search Guidelines Notes
8 pages
Mapping The Popularity of Urban Restaurants Using Social Media Data
No ratings yet
Mapping The Popularity of Urban Restaurants Using Social Media Data
8 pages
Eb Scraping A Promising Tool For Geographic Data Acquisition
No ratings yet
Eb Scraping A Promising Tool For Geographic Data Acquisition
18 pages
GIS For Web Developers
100% (1)
GIS For Web Developers
258 pages
Skygraph: Retrieving Regions of Interest Using Skyline Subgraph Queries
No ratings yet
Skygraph: Retrieving Regions of Interest Using Skyline Subgraph Queries
12 pages
Spatial Variation in Search Engine Queries
100% (2)
Spatial Variation in Search Engine Queries
10 pages
Driving Rules
No ratings yet
Driving Rules
28 pages
Best Keyword Cover Search: Nternational Ournal of Nnovative Esearch in Omputer and Ommunication Ngineering
No ratings yet
Best Keyword Cover Search: Nternational Ournal of Nnovative Esearch in Omputer and Ommunication Ngineering
14 pages
Mining Interesting Locations and Travel Sequences From GPS Trajectories
No ratings yet
Mining Interesting Locations and Travel Sequences From GPS Trajectories
10 pages
GIS For Web Developers
100% (2)
GIS For Web Developers
258 pages
Use of GIS and Remote...
No ratings yet
Use of GIS and Remote...
27 pages
Netflix Exercise Assignment
No ratings yet
Netflix Exercise Assignment
11 pages
Providing Semantic Links To The Invisible Geospatial Web
No ratings yet
Providing Semantic Links To The Invisible Geospatial Web
6 pages
Google Hacking #1
100% (1)
Google Hacking #1
31 pages
Research Paper3
No ratings yet
Research Paper3
8 pages
(Garcia Et Al, 2020) - Self-Eficacy Among Users of Information and Communication Technologies in The Brazilian School Environment
No ratings yet
(Garcia Et Al, 2020) - Self-Eficacy Among Users of Information and Communication Technologies in The Brazilian School Environment
4 pages
Toward Traffic-Driven Location-Based Web Search
No ratings yet
Toward Traffic-Driven Location-Based Web Search
10 pages
Web Query Mining
No ratings yet
Web Query Mining
16 pages
GIS For Web Developers
100% (2)
GIS For Web Developers
262 pages
Chap - Week8 - Queries and Information Needs
No ratings yet
Chap - Week8 - Queries and Information Needs
44 pages
Web Search and Geographic Location: Mikew@sims - Berkeley.edu
No ratings yet
Web Search and Geographic Location: Mikew@sims - Berkeley.edu
7 pages
MM 1
No ratings yet
MM 1
4 pages
Yahoo! Geoplanet Exploring Places (Without Maps) : Ravenshaw Management Centre, Cuttack
No ratings yet
Yahoo! Geoplanet Exploring Places (Without Maps) : Ravenshaw Management Centre, Cuttack
26 pages
A Differential Notion of Place For Local Search: Vlad Tanasescu John Domingue
No ratings yet
A Differential Notion of Place For Local Search: Vlad Tanasescu John Domingue
8 pages
Key Guidelines Topics
No ratings yet
Key Guidelines Topics
43 pages
Location Detection Over Social Media
No ratings yet
Location Detection Over Social Media
22 pages
Fundamentals 0f Cyber Security Labs - SEM1 - LAB2
No ratings yet
Fundamentals 0f Cyber Security Labs - SEM1 - LAB2
22 pages
Microsoft AI-900 Vfeb-2024 by - Xakinato 110q
No ratings yet
Microsoft AI-900 Vfeb-2024 by - Xakinato 110q
69 pages
05 - Analisis de Sitio
No ratings yet
05 - Analisis de Sitio
49 pages
Activity 1
No ratings yet
Activity 1
4 pages
Usability Memo Repaired
No ratings yet
Usability Memo Repaired
17 pages
Maps SX SGuidelines
No ratings yet
Maps SX SGuidelines
16 pages
Using The Wisdom of The Crowds For Keyword Generation
No ratings yet
Using The Wisdom of The Crowds For Keyword Generation
10 pages
Geoserver Tutorial
No ratings yet
Geoserver Tutorial
24 pages
M2B - Map Word Selection - Stage 2 (MAPSv2 M2B) - v1.3
No ratings yet
M2B - Map Word Selection - Stage 2 (MAPSv2 M2B) - v1.3
16 pages
Popple Rater - Query Categorization (7.22.2021)
No ratings yet
Popple Rater - Query Categorization (7.22.2021)
13 pages
General Guidelines On Random-Query Evaluation
No ratings yet
General Guidelines On Random-Query Evaluation
22 pages
BLU HIT App #3 - Location Relevance - Guidelines - 2020-12-22
No ratings yet
BLU HIT App #3 - Location Relevance - Guidelines - 2020-12-22
14 pages
GeoServer Cookbook Sample Chapter
No ratings yet
GeoServer Cookbook Sample Chapter
43 pages
Cultural Heritage and Web Mapping
No ratings yet
Cultural Heritage and Web Mapping
6 pages
So What's So Special About Spatial?: Abstract. Geospatial Information Can Act As A Thread That Can Be Used
No ratings yet
So What's So Special About Spatial?: Abstract. Geospatial Information Can Act As A Thread That Can Be Used
8 pages
2 Exercises On Concurrency
No ratings yet
2 Exercises On Concurrency
15 pages
Dynamic Organization of User Historical Queries: M. A. Arif, Syed Gulam Gouse
No ratings yet
Dynamic Organization of User Historical Queries: M. A. Arif, Syed Gulam Gouse
3 pages
Discover Qgis3 Toc
No ratings yet
Discover Qgis3 Toc
7 pages
10 AI Sample Paper
No ratings yet
10 AI Sample Paper
11 pages
Operadores Google - John Long
No ratings yet
Operadores Google - John Long
65 pages
21CSC305P ML - Lab Programs 1 - 9
No ratings yet
21CSC305P ML - Lab Programs 1 - 9
36 pages
Introduction To AI, ML and DL: Dr. Manjubala Bisi
No ratings yet
Introduction To AI, ML and DL: Dr. Manjubala Bisi
33 pages
IDA117V Supervised ML
No ratings yet
IDA117V Supervised ML
39 pages
Crop Disease Detection Using Remote Sensing Image Analysis
No ratings yet
Crop Disease Detection Using Remote Sensing Image Analysis
204 pages
(Toledo and Seabra) - A Brazilian Web Platform For Developing Massive Online Open Courses
No ratings yet
(Toledo and Seabra) - A Brazilian Web Platform For Developing Massive Online Open Courses
6 pages
Applications of Object Detection in
No ratings yet
Applications of Object Detection in
19 pages
AI For Sustainability
No ratings yet
AI For Sustainability
8 pages
Asss
100% (4)
Asss
2 pages
Computational Intelligence and Neuroscience - 2021 - Wang - A Real Time Object Detector For Autonomous Vehicles Based On
No ratings yet
Computational Intelligence and Neuroscience - 2021 - Wang - A Real Time Object Detector For Autonomous Vehicles Based On
11 pages
Facial Expression Recognition Model Depending On O
No ratings yet
Facial Expression Recognition Model Depending On O
17 pages
Data Science Study Plan v1
No ratings yet
Data Science Study Plan v1
29 pages
Confusion Matrix
No ratings yet
Confusion Matrix
14 pages
Finding Out Topics in Educational Materials Using Their Components
No ratings yet
Finding Out Topics in Educational Materials Using Their Components
11 pages
Writing Technical Articles
No ratings yet
Writing Technical Articles
9 pages
A Systematic Literature Review On Fault Prediction Performance in Software Engineering
100% (2)
A Systematic Literature Review On Fault Prediction Performance in Software Engineering
7 pages
Germany Credit Analysis
No ratings yet
Germany Credit Analysis
41 pages
Hullmi: Human vs. LLM Identification With Explainability
No ratings yet
Hullmi: Human vs. LLM Identification With Explainability
17 pages
Geoparsing and Geosemantics For Social Media: Spatio-Temporal Grounding of Content Propagating Rumours To Support Trust and Veracity Analysis During Breaking News
No ratings yet
Geoparsing and Geosemantics For Social Media: Spatio-Temporal Grounding of Content Propagating Rumours To Support Trust and Veracity Analysis During Breaking News
27 pages
2006 - Patig - Evolution of Entity-Relationship Modelling
No ratings yet
2006 - Patig - Evolution of Entity-Relationship Modelling
17 pages
Automatic Mood Classification of Indian Popular Music
No ratings yet
Automatic Mood Classification of Indian Popular Music
64 pages
SAM2CLIP2SAM: Vision Language Model For Segmentation of 3D CT Scans For Covid-19 Detection
No ratings yet
SAM2CLIP2SAM: Vision Language Model For Segmentation of 3D CT Scans For Covid-19 Detection
16 pages
Government Information Quarterly: Judie Attard, Fabrizio Orlandi, Simon Scerri, Sören Auer
No ratings yet
Government Information Quarterly: Judie Attard, Fabrizio Orlandi, Simon Scerri, Sören Auer
20 pages
Real-Time Driver Drowsiness Detection System Using Eye Aspect Ratio and Eye Closure Ratio
No ratings yet
Real-Time Driver Drowsiness Detection System Using Eye Aspect Ratio and Eye Closure Ratio
7 pages
Robotic Assistant For Object Recognition Using Con
No ratings yet
Robotic Assistant For Object Recognition Using Con
13 pages
Abstract:: Keywords: Emotion Detection, Natural Language Processing, Adversarial Transfer Learning
No ratings yet
Abstract:: Keywords: Emotion Detection, Natural Language Processing, Adversarial Transfer Learning
17 pages
A Classification and Regression Tree Algorithm For Heart Disease Modeling and Prediction
No ratings yet
A Classification and Regression Tree Algorithm For Heart Disease Modeling and Prediction
9 pages
ML Process in Azure Cloud
No ratings yet
ML Process in Azure Cloud
17 pages
Multi Spectral Thresholding and Region Based Segmentation in Digital Image Processing
No ratings yet
Multi Spectral Thresholding and Region Based Segmentation in Digital Image Processing
8 pages
Enhancing Precision in Medical Imaging A 3D CNN Approach For Fiducial Point Detection in MRI Data
No ratings yet
Enhancing Precision in Medical Imaging A 3D CNN Approach For Fiducial Point Detection in MRI Data
11 pages
Class 10 Ai Sample Paper - 4 - MS
No ratings yet
Class 10 Ai Sample Paper - 4 - MS
3 pages
1.exploring Unsupervised Machine Learning
No ratings yet
1.exploring Unsupervised Machine Learning
12 pages
Analisis Gases Disueltos - Ingles
No ratings yet
Analisis Gases Disueltos - Ingles
5 pages
Acosta-Escalante Et Al. - 2018 - Meta-Classifiers in Huntington's Disease Patients Classification, Using Iphone's Movement Sensors Place-2
No ratings yet
Acosta-Escalante Et Al. - 2018 - Meta-Classifiers in Huntington's Disease Patients Classification, Using Iphone's Movement Sensors Place-2
5 pages
Google Search Revealed: Mastering the Algorithm for Search Dominance
From Everand
Google Search Revealed: Mastering the Algorithm for Search Dominance
Azhar ul Haque Sario
No ratings yet
Introduction to Cognitive Science: Cognitive Processing of Visual Design Elements In Virtual Environments
From Everand
Introduction to Cognitive Science: Cognitive Processing of Visual Design Elements In Virtual Environments
Ben Posetti
No ratings yet
Search Algorithms and Systems: Definitive Reference for Developers and Engineers
From Everand
Search Algorithms and Systems: Definitive Reference for Developers and Engineers
Richard Johnson
No ratings yet
Technical Mapping Solutions: Definitive Reference for Developers and Engineers
From Everand
Technical Mapping Solutions: Definitive Reference for Developers and Engineers
Richard Johnson
No ratings yet
Procurement and Supply in Projects: Misunderstood and Under Researched
From Everand
Procurement and Supply in Projects: Misunderstood and Under Researched
Douglas Macbeth
5/5 (1)
Exploring ArcMap 10.5
From Everand
Exploring ArcMap 10.5
Prof. Sham Tickoo
No ratings yet
Knowledge Reasoning: Fundamentals and Applications
From Everand
Knowledge Reasoning: Fundamentals and Applications
Fouad Sabry
No ratings yet
Activity Recognition: Fundamentals and Applications
From Everand
Activity Recognition: Fundamentals and Applications
Fouad Sabry
No ratings yet
Image Retrieval: Unlocking the Power of Visual Data
From Everand
Image Retrieval: Unlocking the Power of Visual Data
Fouad Sabry
No ratings yet
Automatic Image Annotation: Enhancing Visual Understanding through Automated Tagging
From Everand
Automatic Image Annotation: Enhancing Visual Understanding through Automated Tagging
Fouad Sabry
No ratings yet
Image Retrieval: Fundamentals and Applications
From Everand
Image Retrieval: Fundamentals and Applications
Fouad Sabry
No ratings yet
Automatic Image Annotation: Fundamentals and Applications
From Everand
Automatic Image Annotation: Fundamentals and Applications
Fouad Sabry
No ratings yet

Analysis of Geographic Queries in A Search Engine Log

Uploaded by

Analysis of Geographic Queries in A Search Engine Log

Uploaded by

Analysis of Geographic Queries in a Search Engine Log

Qingqing Gan Josh Attenberg Alexander Markowetz Torsten Suel

graphic intent, and if it contains the name of a city, county, or state

We note here that this taxonomy is specifically designed to allow

Query Type Top-5 terms

Table 5.1: Top-5 query terms

5.2 Frequent Terms at Varying Granularity

Figure 5.2: Query distribution, without “regional”

6. GEO PROPERTIES OF WEB SITES

0.05 7. GEOGRAPHIC USER PROPERTIES

search tasks. Due to space constraints, we can only summarize some

You might also like