Privacy in Search Log Publishing

The document proposes implementing the ZEALOUS algorithm to publish search logs in a privacy-preserving manner. It summarizes that existing methods like k-anonymity have vulnerabilities, while ZEALOUS provides stronger privacy guarantees and comparable utility. The paper describes ZEALOUS's query substitution, index caching, and frequent item generation modules. An experimental study shows ZEALOUS outperforms k-anonymity in privacy and utility when publishing frequently searched keywords and queries from search logs. Future work could focus on releasing information about infrequent search terms while maintaining privacy.

Uploaded by

riya_gade

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

119 views14 pages

Privacy in Search Log Publishing

Uploaded by

riya_gade

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

TO MAINTAIN PRIVACY IN

PUBLISHING SEARCH LOGS BY

IMPLEMENTING ZEALOUS
ALGORITHM
ABSTRACT :
Many search engine’s use the servers databases on the history of the users
search Queries. Search logs serve as Gold mines to many of the researchers and
statistical analysers. Many companies use this search log information for vital
purpose and care is taken to avoid disclosing the sensitive part of the users search
queries. In this project a ZEALOUS algorithm is implemented and experimental
study has been conducted to show how this algorithm is used to publish frequently
searched queries and keywords. Traditional methods like k-anonymity which deals
about the vulnerability in attacks in compared with a more ensured and guaranteed
ZEALOUS algorithm which provides a stronger utility for this problem and finally
concludes with a larger experimental study on generalized real time applications
comparing ZEALOUS and previous works , showing ZEALOUS yields a better
comparable utility and further achieving a stronger privacy.
INTRODUCTION:
Search engines play a crucial role in the navigation through the vastness
of the web. Today’s search engines do mine information about their users.
They store the queries, clicks, IP-addresses, and other information about the
interactions with users is called a search log. As search logs contain valuable
information they can be used in the development and testing of new
algorithms to improve performance and quality. Scientists and marketing
companies all around the world would like to tap this gold mine for their
own research; search engine companies, however do not release them
because they contain sensitive information about their users.
Search engines such as Bing, Google, Yahoo etc., log interactions
with their users. When a user submits a query and clicks on one or more
results, a new entry is added to the search log. Without loss of generality,
we assume that a search log has the following schema {USER-ID,
QUERY. TIME, CLICKS}, where a USER-ID identifies a user, a QUERY
is a set of keywords, TIME is the timestamp and CLICKS is a list of
URL’s that the user clicked on.
ARCHITECTURE :
EXISTING SYSTEM:
We show that existing proposals to achieve anonymity in search logs are
insufficient in the light of attackers who can actively influence the search log.
However, we show that it is impossible to achieve good utility with differential
privacy.
Disadvantage:
Existing work on publishing frequent item sets often only tries to achieve
anonymity or makes strong assumptions about the background knowledge of an
attacker.
PROPOSED SYSTEM:
The main focus of this paper is search logs, our results apply to other
scenarios as well. For example, consider a retailer who collects customers
transactions consists of a basket of products together with their prices, and a
time-stamp. In this case ZEALOUS can be applied to publish frequently
purchased products or sets of products. This information can also be used in a
recommender system or in a market basket analysis to decide on the goods
and promotions in a store.

Advantage:
Our results show that ZEALOUS yields comparable utility to k-
anonymity while at the same time achieving much stronger privacy
guarantees.
MODULES IMPLEMENTATIONS:

In this paper, we have 3 modules. They are as follows:

1. Query Substitution.
2. Index Caching.
3. Item Set Generation and Ranking.
QUERY SUBSTITUTION :
Query substitutions are suggestions to rephrase a user query to match it
to documents or advertisements that do not contain the actual keywords of
the query. Query substitutions can be applied in query refinement,
sponsored search, and spelling error correction. Query substitution as a
representative application for search quality. First, the query is partitioned
into subsets of keywords, called phrases, based on their mutual information.
Next, for Each phrase, candidate query substitutions are determined based
on the distribution of queries.
INDEX CACHING
Index caching, as a representative application for search performance
the index caching application does not require high coverage because of its
storage restriction. However, high precision of the top-j most frequent items
is necessary to determine which of them to keep in memory. On the other
hand, in order to generate many query substitutions, a larger number of
distinct queries and query pairs are required. Thus should be set to a large
value for index caching and to a small value for query substitution. In our
experiments we fixed the memory size to be 1 GB. Our inverted index
stores the document posting list for each keyword sorted according to their
relevance which allows retrieving the documents in the order of their
relevance.
ITEM SET GENERATION AND RANKING
All of our results apply to the more general problem of publishing
frequent items / item sets / consecutive item sets. Our results (positive as
well as negative) can be applied more generally to the problem of
publishing frequent items or item sets. We then compare these rankings
with the rankings produced by the original search log which serve as
ground truth. To measure the quality of the query substitutions it does not
only compare the ranks of a substitution in the two rankings, but is also
penalizes highly relevant substitutions according to [q0, . . . , qj−1] that
have a very low rank in [q_0 , . . . , q_j−1].
CONCLUSION

The experimental result shows the importance of the maintaining privacy

in publishing search logs. Search logs are needed to provide privacy as their
contain user’s information. In our project we implement both k-
anonymity and Zealous algorithm.
As k-anonymity algorithm doesn’t provide good utility and stronger
privacy, so in this case we implemented Zealous algorithm which provides
good utility and stronger privacy for publishing search logs.
We analyzed Zealous algorithm for publishing frequent keywords,
queries and clicks of a search log. Our project concludes with the comparison
between k-anonymity and Zealous algorithms.
FUTURE ENHANCEMENT

The project can be used in the development of algorithms that release

useful information about infrequent keywords, queries, and clicks in a
search log while preserving user privacy. Thus, searched user queries
cannot be matched to a particular user.
By
S. Sanghee Sneha
R. Soundarya Bhargavi
R. Rajyalakshmi Srinija
P. Mounika

Privacy in Search Log Publishing
No ratings yet
Privacy in Search Log Publishing
7 pages
International Journal of Engineering Research and Development (IJERD)
No ratings yet
International Journal of Engineering Research and Development (IJERD)
6 pages
Web Query Mining Applications Explained
No ratings yet
Web Query Mining Applications Explained
16 pages
Dynamic Organization of User Historical Queries: M. A. Arif, Syed Gulam Gouse
No ratings yet
Dynamic Organization of User Historical Queries: M. A. Arif, Syed Gulam Gouse
3 pages
Personalized Query Results Using User Search Logs
No ratings yet
Personalized Query Results Using User Search Logs
10 pages
Hidden Markov Models in Search Engines
No ratings yet
Hidden Markov Models in Search Engines
20 pages
Social Information Retrival Thesis
100% (1)
Social Information Retrival Thesis
104 pages
Irt Ans
No ratings yet
Irt Ans
9 pages
IR - Set 1
No ratings yet
IR - Set 1
5 pages
Ranking and Suggesting Popular Items
No ratings yet
Ranking and Suggesting Popular Items
3 pages
I Jcs It 2014050105
No ratings yet
I Jcs It 2014050105
5 pages
43 Megha Jain
No ratings yet
43 Megha Jain
3 pages
WPES07 Felipe
No ratings yet
WPES07 Felipe
7 pages
Mir2ed Toc
No ratings yet
Mir2ed Toc
17 pages
Master S Thesis Kilian Grashoff The Impact of Group Size On Collaborative Search
No ratings yet
Master S Thesis Kilian Grashoff The Impact of Group Size On Collaborative Search
76 pages
PHD Thesis
No ratings yet
PHD Thesis
240 pages
Theory Assignment
No ratings yet
Theory Assignment
4 pages
Rank Optimization
No ratings yet
Rank Optimization
7 pages
Upgrade Query Testimonial Technique
No ratings yet
Upgrade Query Testimonial Technique
3 pages
Query Recommender System Using Hierarchical Classification
No ratings yet
Query Recommender System Using Hierarchical Classification
4 pages
User Specific Search Using Grouping and Organization
No ratings yet
User Specific Search Using Grouping and Organization
6 pages
A Privacy-Preserving Framework For Large-Scale Content-Based Information Retrieval Using K-Secure Sum Protocol
No ratings yet
A Privacy-Preserving Framework For Large-Scale Content-Based Information Retrieval Using K-Secure Sum Protocol
4 pages
Ir Cbcs
No ratings yet
Ir Cbcs
3 pages
Unit-4 1
No ratings yet
Unit-4 1
7 pages
Social Media APIs & Data Analysis
No ratings yet
Social Media APIs & Data Analysis
37 pages
1 Overview
No ratings yet
1 Overview
44 pages
Paper 2007
No ratings yet
Paper 2007
12 pages
Irs Unit-4 Notes - 241202 - 150037
No ratings yet
Irs Unit-4 Notes - 241202 - 150037
18 pages
Relevancy Based Content Search in Semantic Web
No ratings yet
Relevancy Based Content Search in Semantic Web
2 pages
A Privacy Protection in Personalized Web Search For Knowledge Mining A Survey
No ratings yet
A Privacy Protection in Personalized Web Search For Knowledge Mining A Survey
5 pages
Enhancing Search Engine Clustering
No ratings yet
Enhancing Search Engine Clustering
5 pages
Ieee 2010 Titles: Data Alcott Systems (0) 9600095047
No ratings yet
Ieee 2010 Titles: Data Alcott Systems (0) 9600095047
6 pages
4 PB
No ratings yet
4 PB
6 pages
IEEE Solved PROJECTS 2009
No ratings yet
IEEE Solved PROJECTS 2009
64 pages
CSI 4107 - Winter 2016 - Midterm
0% (1)
CSI 4107 - Winter 2016 - Midterm
10 pages
WM Da Research Paper
No ratings yet
WM Da Research Paper
6 pages
Chap5 Index Construction
No ratings yet
Chap5 Index Construction
38 pages
Hatakenaka 2011
No ratings yet
Hatakenaka 2011
6 pages
Personalized Search Engine Using Feedback From Social Network
No ratings yet
Personalized Search Engine Using Feedback From Social Network
15 pages
Twitter Sentiment Analysis Guide
No ratings yet
Twitter Sentiment Analysis Guide
48 pages
Ontology-Based Web People Search System
No ratings yet
Ontology-Based Web People Search System
8 pages
Optimizing Image Search by Tagging and Ranking: Author-M.Nirmala (M.E)
No ratings yet
Optimizing Image Search by Tagging and Ranking: Author-M.Nirmala (M.E)
11 pages
IRS Syllabus
No ratings yet
IRS Syllabus
2 pages
Suggestion Model For Query Database System
No ratings yet
Suggestion Model For Query Database System
4 pages
Bulu
No ratings yet
Bulu
47 pages
User Search Techniques in IR Systems
No ratings yet
User Search Techniques in IR Systems
63 pages
Ieee 2010 Project Titles
No ratings yet
Ieee 2010 Project Titles
7 pages
Personalizing Search Based On User Search Histories: Thenmozhi. M, Swathishri. J, Nivedha. A, Kalaiselvi. A
No ratings yet
Personalizing Search Based On User Search Histories: Thenmozhi. M, Swathishri. J, Nivedha. A, Kalaiselvi. A
3 pages
4 IRinArabic2021 Ranked Retrieval I
No ratings yet
4 IRinArabic2021 Ranked Retrieval I
49 pages
Topic-Sensitive PageRank Guide
No ratings yet
Topic-Sensitive PageRank Guide
11 pages
Rankingfor Web Databases Using SVM and K-Means Algorithm: P.Ayyadurai, S.Jayanthi
No ratings yet
Rankingfor Web Databases Using SVM and K-Means Algorithm: P.Ayyadurai, S.Jayanthi
6 pages
Personalized Image Search Framework
No ratings yet
Personalized Image Search Framework
6 pages
Supporting Privacy Protection in Personalized
No ratings yet
Supporting Privacy Protection in Personalized
72 pages
Efficient Search Autocomplete Algorithm
No ratings yet
Efficient Search Autocomplete Algorithm
14 pages
IR Unit III - Notes
No ratings yet
IR Unit III - Notes
18 pages
Information Retrieval: Practice Exercises
No ratings yet
Information Retrieval: Practice Exercises
4 pages
03 Inspecting Tightening Torque
No ratings yet
03 Inspecting Tightening Torque
4 pages
HR Audit (MCQ)
100% (8)
HR Audit (MCQ)
30 pages
The Quality Implementation Framework A Synthesis o
No ratings yet
The Quality Implementation Framework A Synthesis o
20 pages
13th Finance Commission Grant
No ratings yet
13th Finance Commission Grant
2 pages
"Pet Shop Management System: Converting A Pet Grooming and Clinic System Into An All-in-One" Cabunagan, Hipolito, Tolentino, and Vargas (2023)
No ratings yet
"Pet Shop Management System: Converting A Pet Grooming and Clinic System Into An All-in-One" Cabunagan, Hipolito, Tolentino, and Vargas (2023)
4 pages
Career Planning and Development
67% (3)
Career Planning and Development
16 pages
Cembureau Net Zero Roadmap
No ratings yet
Cembureau Net Zero Roadmap
31 pages
Characteristics of Economic Planning
No ratings yet
Characteristics of Economic Planning
26 pages
ISM Audit Report: Marine Management
No ratings yet
ISM Audit Report: Marine Management
21 pages
Chronic Subdural Hematoma Review
No ratings yet
Chronic Subdural Hematoma Review
6 pages
DOJ Motion For No Jail AJ Discala Jan 7 2022
No ratings yet
DOJ Motion For No Jail AJ Discala Jan 7 2022
25 pages
METAR Brief
No ratings yet
METAR Brief
1 page
Prof Prac 2
No ratings yet
Prof Prac 2
13 pages
Mil Organization Ms-1
100% (2)
Mil Organization Ms-1
17 pages
Hazard Identification and Risk Assesment
No ratings yet
Hazard Identification and Risk Assesment
4 pages
Geographical Indication
100% (1)
Geographical Indication
28 pages
Zinc in Soils and Crop Nutrition
100% (2)
Zinc in Soils and Crop Nutrition
116 pages
Electrical JSA for Pre-Test Procedures
No ratings yet
Electrical JSA for Pre-Test Procedures
2 pages
Attachment 5 - MOIA PDF
No ratings yet
Attachment 5 - MOIA PDF
2 pages
M.adrio Hafidzi
No ratings yet
M.adrio Hafidzi
356 pages
Environmental Law Institutions I
No ratings yet
Environmental Law Institutions I
6 pages
House of The Rising Sun Easy Piano
No ratings yet
House of The Rising Sun Easy Piano
2 pages
Labour Law (Child Labour)
No ratings yet
Labour Law (Child Labour)
5 pages
Thesis Proposal
No ratings yet
Thesis Proposal
11 pages
Understanding HAZCHEM Coding Basics
100% (1)
Understanding HAZCHEM Coding Basics
5 pages
Winterbotham Wire Transfer Request Form
No ratings yet
Winterbotham Wire Transfer Request Form
2 pages
Less08 Users PDF
No ratings yet
Less08 Users PDF
26 pages
Soal Handling Petty Cash
No ratings yet
Soal Handling Petty Cash
2 pages
Comprehensive Guide to Photoelectric Sensors
No ratings yet
Comprehensive Guide to Photoelectric Sensors
5 pages
Storage and Handling of Workplace Dangerous Goods: National Code
No ratings yet
Storage and Handling of Workplace Dangerous Goods: National Code
124 pages

Privacy in Search Log Publishing

Uploaded by

Privacy in Search Log Publishing

Uploaded by

TO MAINTAIN PRIVACY IN

PUBLISHING SEARCH LOGS BY

In this paper, we have 3 modules. They are as follows:

The experimental result shows the importance of the maintaining privacy

The project can be used in the development of algorithms that release

You might also like