To Maintain Privacy in Publishing Search Logs by Implementing Zealous Algorithm
To Maintain Privacy in Publishing Search Logs by Implementing Zealous Algorithm
Advantage:
Our results show that ZEALOUS yields comparable utility to k-
anonymity while at the same time achieving much stronger privacy
guarantees.
MODULES IMPLEMENTATIONS:
1. Query Substitution.
2. Index Caching.
3. Item Set Generation and Ranking.
QUERY SUBSTITUTION :
Query substitutions are suggestions to rephrase a user query to match it
to documents or advertisements that do not contain the actual keywords of
the query. Query substitutions can be applied in query refinement,
sponsored search, and spelling error correction. Query substitution as a
representative application for search quality. First, the query is partitioned
into subsets of keywords, called phrases, based on their mutual information.
Next, for Each phrase, candidate query substitutions are determined based
on the distribution of queries.
INDEX CACHING
Index caching, as a representative application for search performance
the index caching application does not require high coverage because of its
storage restriction. However, high precision of the top-j most frequent items
is necessary to determine which of them to keep in memory. On the other
hand, in order to generate many query substitutions, a larger number of
distinct queries and query pairs are required. Thus should be set to a large
value for index caching and to a small value for query substitution. In our
experiments we fixed the memory size to be 1 GB. Our inverted index
stores the document posting list for each keyword sorted according to their
relevance which allows retrieving the documents in the order of their
relevance.
ITEM SET GENERATION AND RANKING
All of our results apply to the more general problem of publishing
frequent items / item sets / consecutive item sets. Our results (positive as
well as negative) can be applied more generally to the problem of
publishing frequent items or item sets. We then compare these rankings
with the rankings produced by the original search log which serve as
ground truth. To measure the quality of the query substitutions it does not
only compare the ranks of a substitution in the two rankings, but is also
penalizes highly relevant substitutions according to [q0, . . . , qj−1] that
have a very low rank in [q_0 , . . . , q_j−1].
CONCLUSION