Analysis of Geographic Queries in A Search Engine Log
Analysis of Geographic Queries in A Search Engine Log
from the data set. After discarding all queries consisting exclusively 2 33 95%
: 14 48%
: 5 22%
:
of URLs and some badly misspelled or malformed queries, 4495 que- 3 19 54%
: 35 04%
: 18 78%
:
ries remain. These are examined manually, and assigned one of four 4 10 47%
: 26 21%
: 24 56%
:
labels, according to their geographic intent and their use of common 5 5 19%
: 17 93%
: 30 86%
:
geographic terms. Thus, for each query we decide if it has a geo- > 5 5 31%
: 5 31%
: 11 19%
:
0.25
0.2
shown in this section, geo queries can be used to mine interesting facts
0.15
about the sites that are visited via those queries.
0.1
0.
0.
0.
0.
0.
0.
0.
0.
0.
2-
3-
4-
5-
6-
7-
8-
9-
200
-0
0.
0.
0.
0.
0.
0.
0.
1.
.1
0
2
% of Geo queries in each site of our observations. We focused on users with at least geographic
queries, and then manually examined the users’ searching behavior,
Figure 6.1: Distribution of sites according to the queries that are looking at the following questions:
used to find them Do users repeatedly conduct searches on the same geographic area?
Based on this, we define a geo site as a site where more than 80% of The answer is yes. Indeed, one could probably easily infer the home-
its associated queries are geo queries. Those sites where more 80% of towns of many of these users from the geo terms in their queries, as
users exhibit a tendency to conduct searches for local services. The Information Retrieval, pages 61–64, 2005.
non-geo terms associated with a user’s geo-terms also reveal much of [9] J. Ding, L. Gravano, and N. Shivakumar. Computing geographical scopes of web
resources. In Proc. of the 26th Int. Conf. on Very Large Data Bases (VLDB), pages
a user’s relationship with an area. Thus, if terms such as “school”, 545–556, 2000.
“yoga” or “real estate” tend to appear with geo terms, we have reason [10] L. Gravano, V. Hatzivassiloglou, and R. Lichtenstein. Categorizing Web queries
to believe that the user lives nearby. On the other hand, terms like according to geographical locality. In Proc. of the 12th Int. Conf. on Information
“hotel” or “vacation” might indicate the user lives somewhere else. and Knowledge Management, pages 325–333, 2003.
[11] B. J. Jansen and U. Pooch. A review of Web searching studies and a framework for
Do people in a single session of querying reformulate their que- future research. J. of the American Society for Information Science and Technology,
ries, trying different names for the same area? That is, how fre- 52(3):235–246, 2001.
quent is geo modification, as discussed in Section 2? Indeed, not too [12] B. J. Jansen, A. Spink, and J. Pedersen. An analysis of multimedia searching on
often. There are different ways to define search sessions. Manually AltaVista. In Proc. of the 5th ACM SIGMM Int. Workshop on Multimedia
Information Retrieval (MIR), pages 186–192, 2003.
checking the search history, we can identify instances when a person [13] T. Joachims. Optimizing search engines using clickthrough data. In Proc. of the
changes the topic of a search, and thus define a user search session eighth ACM SIGKDD Int. Conf. on Knowledge Discovery and Data Mining (KDD),
as a series of queries on a similar topic over a continuous block of pages 133–142, 2002.
time. This period can vary from several minutes to several days, as [14] C. B. Jones, A. I. Abdelmoty, D. Finch, G. Fu, and S. Vaid. The SPIRIT spatial
search engine: Architecture, ontologies and spatial indexing. In Proc. of the 3rd Int.
long as a user stays focused on a topic. When people search for lo- Conf. on Geographic Information Science, pages 125–139, 2004.
cal information or services, they are often fairly confident about the [15] M. Kamvar and S. Baluja. A large scale study of wireless search behavior: Google
appropriate geo terms. Thus, when users modify their queries, they mobile search. In Proc. of the SIGCHI conference on Human Factors in Computing
more often modify the non-geo terms. Users occasionally change the Systems, pages 701–709, 2006.
[16] R. R. Larson. Geographic information retrieval and spatial browsing. In L. Smith
geographic constraint present in the query while maintaining the non- and M. Gluck, editors, GIS and Libraries: Patrons, Maps and Spatial Information,
geographic portion of the information request. We found that in most pages 81–124, 1996.
of these cases, the user is querying about a location away from their [17] U. Lee, Z. Liu, and J. Cho. Automatic identification of user goals in Web search. In
likely home. The geographic terms are sometimes adjusted to point to Proc. of the 14th Int. Conf. on the World Wide Web, pages 391–400, 2005.
[18] J. L. Leidner. Toponym resolution in text: Which sheffield is it? In Proc. of the 27th
different parts of a city, since in some cases a tourist or traveler may be Int. ACM SIGIR Conf. on Research and Development in Information Retrieval,
flexible about where to go for a temporary stay. We note that the state pages 602–602, 2004.
names show very strong consistency across a user’s search session. [19] D. Lewandowski. Query types and search topics of German Web search engine
How are user queries clustered locally? For a particular user, one users. Information Services and Use, 26:261–1269, 2006.
[20] A. Markowetz, Y.-Y. Chen, T. Suel, X. Long, and B. Seeger. Design and
can derive their main geographical focus as the state or area addressed implementation of a geographic search engine. In 8th Int. Workshop on the Web and
by most of the geo queries of this user. This is likely the place of Databases (WebDB), 2005.
residence of the user. Similarly, one can define secondary and further [21] B. Martins, M. Silva, and L. Andrade. Indexing and ranking in GeoIR systems. In
clusters, potentially recent travel destinations of this user. Proc. of the 2. Int. Workshop on Geo-IR, 2005.
[22] K. McCurley. Geospatial mapping and navigation of the web. In Proc. of the 10th
Int. Conf. on the World Wide Web, pages 221–229, 2001.
8. CONCLUSION [23] G. Mishne and M. de Rijke. A study of blog search. In Proc. of the European Conf.
on Information Retrieval, pages 289–301, 2006.
In this paper, we investigated geographic properties of search que-
[24] Y. Morimoto, M. Aono, M. Houle, and K. McCurley. Extracting spatial knowledge
ries. Though, our main objective was to derive new techniques for from the web. In Proc. of the Symp. on Applications and the Internet, pages
geographic search engines, we believe our observations are of gen- 326–333, 2003.
eral interest. Our main contributions here are a more detailed study [25] G. Pass, A. Chowdhury, and C. Torgeson. A picture of search. In Proc. of the 1st
Int. Conf. on Scalable Information Systems, 2006.
of geographic search queries, a new taxonomy for such queries, and
[26] F. Radlinski and T. Joachims. Query chains: Learning to rank from implicit
experiments that relate such queries to the sites that are visited and the feedback. In Proc. of the Eleventh ACM SIGKDD Int. Conf. on Knowledge
users that pose them. We believe that with improved understanding Discovery in Data Mining, pages 239–248, 2005.
of users’ query goals and websites’ informational content, search en- [27] D. E. Rose and D. Levinson. Understanding user goals in Web search. In Proc. of
the 13th Int. Conf. on the World Wide Web, pages 13–19, 2004.
gines can take measures to improve response relevance. Due to space
[28] T. Sanderson and J. Kohler. Analyzing geographic queries. In Proc. of the
constraints, we had to omit many details of our results. Workshop on Geographic Information Retrieval, 2005.
There are many intriguing open questions left by our work. In [29] C. Silverstein, H. Marais, M. Henzinger, and M. Moricz. Analysis of a very large
particular, we would like to explore additional properties of the web Web search engine query log. SIGIR Forum, 33(1):6–12, 1999.
sites associated with geographic queries, and of geographic search ses- [30] A. Spink, D. Wolfram, M. B. J. Jansen, and T. Saracevic. Searching the Web: the
public and their queries. J. of the American Society for Information Science and
sions, and study how user behavior on geo queries (particularly click- Technology, 52(3):226–234, 2001.
through data) can be harvested for better geographic search. [31] D. Stenmark. One week with a corporate search engine: A time-based analysis of
intranet information seeking. In Proc. of the Americas’ Conf. on Information
Systems, 2005.
9. REFERENCES [32] B. Tan, X. Shen, and C. Zhai. Mining long-term search history to improve search
[1] E. Agichtein and Z. Zheng. Identifying “best bet” Web search results by mining
past user behavior. In Proc. of the 12th ACM SIGKDD Int. Conf. on Knowledge accuracy. In Proc. of the 12th ACM SIGKDD Int. Conf. on Knowledge Discovery
Discovery and Data Mining (KDD), pages 902–908, 2006. and Data Mining (KDD), pages 718–723, 2006.
[2] E. Amitay, N. Har’El, R. Sivan, and A. Soffer. Web-a-where: geotagging web [33] Q. Tan, X. Chai, W. Ng, and D. Lee. Applying co-training to clickthrough data for
content. In Proc. of the 27th Ann. Int. ACM SIGIR Conference on Research and search engine adaptation. In Proc. of the 9th Int. Conf. on Database Systems for
Development in Information Retrieval, pages 273–280, 2004. Advanced Applications (DASFAA), 2004.
[3] D. Beeferman and A. Berger. Agglomerative clustering of a search engine query [34] J. Teevan, E. Adar, R. Jones, and M. Potts. History repeats itself: Repeat queries in
log. In Proc. of the Sixth ACM SIGKDD Int. Conf. on Knowledge Discovery and yahoo’s logs. In Proc. of the 29th Annual International ACM SIGIR Conf. on
Data Mining (KDD), pages 407–416, 2000. Research and Development in Information Retrieval, pages 703–704, 2006.
[4] S. M. Beitzel, E. C. Jensen, A. Chowdhury, D. Grossman, and O. Frieder. Hourly [35] S. Vaid, C. B. Jones, H. Joho, and M. Sanderson. Spatio-textual indexing for
analysis of a very large topically categorized Web query log. In Proc. of the 27th geographical search on the web. In Proc. of 9th Int. Symp. on Spatial and Temporal
Annual Int. ACM SIGIR Conf. on Research and Development in Information Databases (SSTD), 2005.
Retrieval, pages 321–328, 2004. [36] L. Wang, C. Wang, X. Xie, J. Forman, Y. Lu, W.-Y. Ma, and Y. Li. Detecting
[5] A. Broder. A taxonomy of Web search. SIGIR Forum, 36(2):3–10, 2002. dominant locations from search queries. In Proc. of the 28th Annual Int. ACM
SIGIR Conf. on Research and Development in Information Retrieval, 2005.
[6] O. Buyukkokten, J. Cho, H. Garcia-Molina, L. Gravano, and N. Shivakumar.
Exploiting Geographical Location Information of Web Pages. In 2nd Int. Workshop [37] V. Zhang, B. Rey, E. Stipp, and R. Jones. Geomodification in query rewriting. In
on the Web and Databases (WebDB), pages 91–96, 1999. Proc. of the Workshop on Geographic Information Retrieval, 2006.
[7] Y. Chen, T. Suel, and A. Markowetz. Efficient query processing in geographic web [38] Y. Zhou, X. Xie, C. Wang, Y. Gong, and W. Ma. Hybrid index structures for
search engines. In Proc. of the ACM SIGMOD Int. Conf. on Management of Data, location-based web search. In Proc. of the 14th ACM Int. Conf. on Information and
pages 277–288, 2006. Knowledge Management, pages 155–162, 2005.
[8] T. M. Delboni, K. A. V. Borges, and A. H. F. Laender. Geographic Web search
based on positioning expressions. In Proc. of the Workshop on Geographic