0% found this document useful (0 votes)

27 views72 pages

Supporting Privacy Protection in Personalized

Uploaded by

sathish

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

27 views72 pages

Supporting Privacy Protection in Personalized

Uploaded by

sathish

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

You are on page 1/ 72

SUPPORTING PRIVACY PROTECTION IN PERSONALIZED WEB

ABSTRACT

Personalized web search (PWS) has demonstrated its effectiveness in improving the
quality of various search services on the Internet. However, evidences show that users’
reluctance to disclose their private information during search has become a major barrier for the
wide proliferation of PWS. We study privacy protection in PWS applications that model user
preferences as hierarchical user profiles. We propose a PWS framework called UPS that can
adaptively generalize profiles by queries while respecting userspecified privacy requirements.
Our runtime generalization aims at striking a balance between two predictive metrics that
evaluate the utility of personalization and the privacy risk of exposing the generalized profile.
We present two greedy algorithms, namely GreedyDP and GreedyIL, for runtime generalization.
We also provide an online prediction mechanism for deciding whether personalizing a query is
beneficial. Extensive experiments demonstrate the effectiveness of our framework. The
experimental results also reveal that GreedyIL significantly outperforms GreedyDP in terms of
efficiency.

1
INTRODUCTION

The web search engine has long become the most important portal for ordinary people
looking for useful information on the web. However, users might experience failure when search
engines return irrelevant results that do not meet their real intentions. Such irrelevance is largely
due to the enormous variety of users’ contexts and backgrounds, as well as the ambiguity of
texts. Personalized web search (PWS) is a general category of search techniques aiming at
providing better search results, which are tailored for individual user needs. As the expense, user
information has to be collected and analyzed to figure out the user intention behind the issued
query.

The solutions to PWS can generally be categorized into two types, namely click-log-
based methods and profile-based ones. The click-log based methods are straightforward— they
simply impose bias to clicked pages in the user’s query history. Although this strategy has been
demonstrated to perform consistently and considerably well, it can only work on repeated queries
from the same user, which is a strong limitation confining its applicability. In contrast, profile-
based methods improve the search experience with complicated user-interest models generated
from user profiling techniques.

Profile-based methods can be potentially effective for almost all sorts of queries, but are
reported to be unstable under some circumstances. Although there are pros and cons for both
types of PWS techniques, the profile-based PWS has demonstrated more effectiveness in
improving the quality of web search recently, with increasing usage of personal and behavior
information to profile its users, which is usually gathered implicitly from query histor browsing
history click-through data bookmarks, user documents and so forth. Unfortunately, such
implicitly collected personal data can easily reveal a gamut of user’s private life.

Privacy issues rising from the lack of protection for such data, for instance the AOL
query logs scandal, not only raise panic among individual users, but also dampen the data-
publisher’s enthusiasm in offering personalized service. In fact, privacy concerns have become
the major barrier for wide proliferation of PWS services.

2
MOTIVATIONS

To protect user privacy in profile-based PWS, researchers have to consider two

contradicting effects during the search process. On the one hand, they attempt to improve the
search quality with the personalization utility of the user profile. On the other hand, they need to
hide the privacy contents existing in the user profile to place the privacy risk under control. A
few previous studies suggest that people are willing to compromise privacy if the personalization
by supplying user profile to the search engine yields better search quality. In an ideal case,
significant gain can be obtained by personalization at the expense of only a small (and less-
sensitive) portion of the user profile, namely a generalized profile. Thus, user privacy can be
protected without compromising the personalized search quality.

In general, there is a tradeoff between the search quality and the level of privacy
protection achieved from generalization. Unfortunately, the previous works of privacy preserving
PWS are far from optimal. The problems with the existing methods are explained in the
following observations: only once offline, and used to personalize all queries from a same user
indiscriminatingly. Such “one profile fits all” strategy certainly has drawbacks given the variety
of queries. One evidence reported is that profile-based personalization may not even help to
improve the search quality for some ad hoc queries, though exposing user profile to a server has
put the user’s privacy at risk.

A better approach is to make an online decision on

a. whether to personalize the query (by exposing the profile) and
b. what to expose in the user profile at runtime.

To the best of our knowledge, no previous work has supported such feature.The existing
methods do not take into account the customization of privacy requirements. This probably
makes some user privacy to be overprotected while others insufficiently protected. all the
sensitive topics are detected using an absolute metric called surprisal based on the information
theory, assuming that the interests with less user document support are more sensitive. However,

3
this assumption can be doubted with a simple counterexample: If a user has a large number of
documents about “sex,” the surprisal of this topic may lead to a conclusion that “sex” is very
general and not sensitive, despite the truth which is opposite.

Unfortunately, few prior work can effectively address individual privacy needs during the
generalization. Many personalization techniques require iterative user interactions when creating
personalized search results. They usually refine the search results with some metrics which
require multiple user interactions, such as rank scoring, average rank, and so on. This paradigm
is, however, infeasible for runtime profiling, as it will not only pose too much risk of privacy
breach, but also demand prohibitive processing time for profiling. Thus, we need predictive
metrics to measure the search quality and breach risk after personalization, without incurring
iterative user interaction.

CONTRIBUTIONS

The above problems are addressed in our UPS (literally for User customizable Privacy-
preserving Search) framework. The framework assumes that the queries do not contain any
sensitive information, and aims at protecting the privacy in individual user profiles while
retaining their usefulness for PWS. As illustrated in Fig. 1, UPS consists of a nontrusty search
engine server and a number of cliens. Each client (user) accessing the search service trusts no
one but himself/ herself. The key component for privacy protection is an online profiler
implemented as a search proxy running on the client machine itself.

The proxy maintains both the complete user profile, in a hierarchy of nodes with semantics,and
the user-specified (customized) privacy requirements represented as a set of sensitive-nodes. The
framework works in two phases, namely the offline and online phase, for each user. During the
offline phase, a hierarchical user profile is constructed and customized with the user-specified
privacy requirements. The online phase handles queries as follows:

4
Fig. 1. System architecture of UPS

1. When a user issues a query qi on the client, the proxy generates a user profile in runtime in
the light of query terms. The output of this step is a generalized user profile Gi satisfying the
privacy requirements. The generalization process is guided by considering two conflicting
metrics, namely the personalization utility and the privacy risk, both defined for user profiles.
2. Subsequently, the query and the generalized user profile are sent together to the PWS server
for personalized search.
3. The search results are personalized with the profile and delivered back to the query proxy.
4. Finally, the proxy either presents the raw results to the user, or reranks them with the
complete user profile.

UPS is distinguished from conventional PWS in that it

1) provides runtime profiling, which in effect optimizes the personalization utility while
respecting user’s privacy requirements;
2) allows for customization of privacy needs; and
3) does not require iterative user interaction.

Our main contributions are summarized as following: . We propose a privacy-preserving

personalized web search framework UPS, which can generalize profiles for each query according
to user-specified privacy requirements. . Relying on the definition of two conflicting metrics,

5
namely personalization utility and privacy risk, for hierarchical user profile, we formulate the
problem of privacy-preserving personalized search as _-Risk Profile Generalization, with itsNP-
hardness proved. . We develop two simple but effective generalization algorithms, GreedyDP
and GreedyIL, to support runtime profiling. While the former tries to maximize the
discriminating power (DP), the latter attempts to minimize the information loss (IL).

By exploiting a number of heuristics, GreedyIL outperforms GreedyDP significantly. .

We provide an inexpensive mechanism for the client to decide whether to personalize a query in
UPS. This decision can be made before each runtime profiling to enhance the stability of the
search results while avoid the unnecessary exposure of the profile. . Our extensive experiments
demonstrate the efficiency and effectiveness of our UPS framework.

PROFILE-BASED PERSONALIZATION

Previous works on profile-based PWS mainly focus on improving the search utility. The
basic idea of these works is to tailor the search results by referring to, often implicitly, a user
profile that reveals an individual information goal. In the remainder of this section, we review
the previous solutions to PWS on two aspects, namely the representation of profiles, and the
measure of the effectiveness of personalization. Many profile representations are available in the
literature to facilitate different personalization strategies.

Earlier techniques utilize term lists/vector or bag of words to represent their profile.
However, most recent works build profiles in hierarchical structures due to their stronger
descriptive ability, better scalability, and higher access efficiency. The majority of the
hierarchical representations are constructed with existing weighted topic hierarchy/graph, such as
ODP1 Wikipedia and so on. Another work in builds the hierarchical profile automatically via
term-frequency analysis on the user data. In our proposed UPS framework, we do not focus on
the implementation of the user profiles.

Actually, our framework can potentially adopt any hierarchical representation based on a
taxonomy of knowledge. As for the performance measures of PWS in the literature, Normalized

6
Discounted Cumulative Gain (nDCG) is a common measure of the effectiveness of an
information retrieval system. It is based on a humangraded relevance scale of item-positions in
the result list, and is, therefore, known for its high cost in explicit feedback collection. To reduce
the human involvement in performance measuring, researchers also propose other metrics of
personalized web search that rely on clicking decisions, including Average Precision (AP) Rank
Scoring and Average Rank.

We use the Average Precision metric, proposed by Dou et al., to measure the
effectiveness of the personalization in UPS. Meanwhile, our work is distinguished from previous
studies as it also proposes two predictive metrics, namely personalization utility and privacy risk,
on a profile instance without requesting for user feedback. Privacy Protection in PWS System
Generally there are two classes of privacy protection problems for PWS. One class includes
those treat privacy as the identification of an individual, as described. The other includes those
consider the sensitivity of the data, particularly the user profiles, exposed to the PWS server.
Typical works in the literature of protecting user identifications (class one) try to solve the
privacy problem on different levels, including the pseudoidentity, the group identity, no identity,
and no personal information.

Solution to the first level is proved to fragile. The third and fourth levels are impractical
due to high cost in communication and cryptography. Therefore, the existing efforts focus on the
second level. Both provide online anonymity on user profiles by generating a group profile of k
users. Using this approach, the linkage between thequery and a single user is broken. the useless
user profile (UUP) protocol is proposed to shuffle queries among a group of users who issue
them. As a result any entity cannot profile a certain individual. These works assume the
existence of a trustworthy third-party anonymizer, which is not readily available over the Internet
at large. Viejo and Castell_a-Roca use legacy social networks instead of the third party to
provide a distorted user profile to the web search engine.

In the scheme, every user acts as a search agency of his or her neighbors. They can
decide to submit the query on behalf of who issued it, or forward it to other neighbors. The
shortcomings of current solutions in class one is the high cost introduced due to the collaboration

7
and communication. The solutions in class two do not require third-party assistance or
collaborations between social network entries. In these solutions, users only trust themselves and
cannot tolerate the exposure of their complete profiles an anonymity server. Krause and Horvitz
employ statistical techniques to learn a probabilistic model, and then use this model to generate
the near-optimal partial profile.

One main limitation in this work is that it builds the user profile as a finite set of
attributes, and the probabilistic model is trained through predefined frequent queries. These
assumptions are impractical in the context of PWS. Xu et al.proposed a privacy protection
solution for PWS based on hierarchical profiles. Using a user-specified threshold, a generalized
profile is obtained in effect as a rooted subtree of the complete profile. Unfortunately, this work
does not address the query utility, which is crucial for the service quality of PWS. For
comparison, our approach takes both the privacy requirement and the query utility into account.
A more important property that distinguishes our work from is that we provide personalized
privacy protection in PWS. The concept of personalized privacy protection is first introduced by
Xiao and Tao in Privacy-Preserving Data Publishing (PPDP).

A person can specify the degree of privacy protection for her/his sensitive values by
specifying “guarding nodes” in the taxonomy of the sensitive attribute. Motivate by this, we
allow users to customize privacy needs in their hierarchical user profiles. Aside from the above
works, a couple of recent studies have raised an interesting question that concerns the privacy
protection in PWS.

The work have found that personalization may have different effects on different queries.
Queries with smaller click-entropies, namely distinct queries, are expected to benefit more from
personalization, while those with larger values (ambiguous ones) are not. Moreover, the latter
may even cause privacy disclosure. Therefore, the need for personalization becomes
questionable for such queries. Teevan et al. collect a set of features of the query to classify
queries by their clickentropy. While these works are motivative in questioning whether to
personalize or not to, they assume the availability of massive user query logs (on the server side)

8
and user feedback. In our UPS framework, we differentiate distinct queries from ambiguous ones
based on a client-side solution using the predictive query utility metric.

This paper is an extension to our preliminary study reported. In the previous work, we
have proposed the prototype of UPS, together with a greedy algorithm GreedyDP (named as
GreedyUtility) to support online profiling based on predictive metrics of personalization utility
and privacy risk. In this paper, we extend and detail the implementation of UPS. We extend the
metric of personalization utility to capture our three new observations. We also refine the
evaluation model of privacy risk to support user-customized sensitivities. Moreover, we propose
a new profile generalization algorithm called GreedyIL. Based on three heuristics newly added in
the extention, the efficiency and stability of the new algorithm outperforms the old one
significantly.

PRELIMINARIES and PROBLEM DEFINITION

In this section, we first introduce the structure of user profile in UPS. Then, we define the
customized privacy requirements on a user profile. Finally, we present the attack model and
formulate the problem of privacypreserving profile generalization. For ease of presentation,
Table 1 summarizes all the symbols used in this paper.

9
USER PROFILE

Consistent with many previous works in personalized web services, each user profile in
UPS adopts a hierarchical structure. Moreover, our profile is constructed based on the
availability of a public accessible taxonomy, denoted as R, which satisfies the following
assumption. Assumption 1. The repository R is a huge topic hierarchy covering the entire topic
domain of human knowledge. That is, given any human recognizable topic t, a corresponding
node The repository is regarded as publicly available and can be used by anyone as the
background knowledge. Such repositories do exist in the literature, for example, the ODP,
Wikipedia, WordNet, and so on. In addition, each topic t 2 R is associated with a repository
support, denoted by supRðtÞ, which quantifies how often the respective topic is touched in
human knowledge.

If we consider each topic to be the result of a random walk from its parent topic in R, we
have the following recursive equation: supRðtÞ ¼ X t02Cðt;RÞ supRðt0Þ: ð1Þ Equation (1) can
be used to calculate the repository support of all topics in R, relying on the following assumption
that the support values of all leaf topics in R are available. Assumption 2. Given a taxonomy
repository R, the repository support is provided by R itself for each leaf topic. In fact,
Assumption 2 can be relaxed if the support values are not available. In such case, it is still
possible to “simulate” these repository supports with the topological structure of R. That is,
supRðtÞ can be calculated as the count of leaves in subtrðt; RÞ. Based on the taxonomy
repository, we define a probability model for the topic domain of the human knowledge.

In the model, the repository R can be viewed as a hierarchical partitioning of the universe
(represented by the root topic) and every topic t 2 R stands for a random event. Now, we present
the formal definition of user profile. A diagram of a sample user profile is illustrated in Fig. 2a,
which is constructed based on the sample taxonomy repository in Fig. 2b. We can observe that
the owner of this profile is mainly interested in Computer Science and Music, because the major
portion of this profile is made up of fragments from taxonomies of these two topics in the sample
repository. Some other taxonomies also serve in comprising the profile, for example, Sports and
Adults.

10
Fig. 2. Taxonomy-based user profile.

CUSTOMIZED PRIVACY REQUIREMENTS

Customized privacy requirements can be specified with a number of sensitive-nodes

(topics) in the user profile, whose disclosure (to the server) introduces privacy risk to the user.
Definition 3 (SENSITIVE NODES/S). Given a user profile H, the sensitive nodes are a set of
user specified sensitive topics S _ H, whose subtrees are nonoverlapping, i.e., 8s1; s2 2 Sðs1 6¼
s2Þ; s2 62 subtrðs1; HÞ. In the sample profile shown in Fig. 2a, the sensitive nodes S ¼ fAdults;
Privacy;Harmonica; Figure ðSkatingÞg are shaded in gray color in H. It must be noted that user’s
privacy concern differs from one sensitive topic to another.

In the above example, the user may hesitate to share her personal interests (e.g.,
Harmonica, Figure Skating) only to avoid various advertisements. Thus, the user might still
tolerate the exposure of such interests to trade for better personalization utility. However, the
user may never allow another interest in topic Adults to be disclosed. To address the difference
in privacy concerns, we allow the user to specify a sensitivity for each node s 2 S.

11
ATTACK MODEL

Our work aims at providing protection against a typical model of privacy attack, namely
eavesdropping. As shown to corrupt Alice’s privacy, the eavesdropper Eve successfully
intercepts the communication between Alice and the PWS-server via some measures, such as
man-in-themiddle attack, invading the server, and so on. Consequently, whenever Alice issues a
query q, the entire copy of q together with a runtime profile G will be captured by Eve. Based on
G, Eve will attempt to touch the sensitive nodes.

Fig. 3. Attack model of personalized web search

Alice by recovering the segments hidden from the original H and computing a confidence
for each recovered topic, relying on the background knowledge in the publicly available
taxonomy repository R. Note that in our attack model, Eve is regarded as an adversary satisfying
the following assumptions: Knowledge bounded. The background knowledge of the adversary is
limited to the taxonomy repository R. Both the profile H and privacy are defined based on R.
Session bounded. None of previously captured Information is available for tracing the same
victim in a long duration. In other words, the eavesdropping will be started and ended within a
single query session.

The above assumptions seem strong, but are reasonable in practice. This is due to the fact
that the majority of privacy attacks on the web are undertaken by some automatic programs for
sending targeted (spam) advertisements to a large amount of PWS-users. These programs rarely
12
act as a real person that collects prolific information of a specific victim for a long time as the
latter is much more costly. If we consider the sensitivity of each sensitive topic as the cost of
recovering it, the privacy risk can be defined as the total (probabilistic) sensitivity of the
sensitive nodes, which the adversary can probably recover from G. For fairness among different
users, we can normalize the privacy risk with P s2S senðsÞ, which stands for the total wealth of
the user. Our approach to privacy protection of personalized web search has to keep this privacy
risk under control.

GENERALIZING USER PROFILE

Now, we exemplify the inadequacy of forbidding operation. In the sample profile in Fig.
2a, Figure is specified as a sensitive node. Thus, rsbtrðS; HÞ only releases its parent Ice Skating.
Unfortunately, an adversary can recover the subtree of Ice Skating relying on the repository
shown in Fig. 2b, where Figure is a main branch of Ice Skating besides Speed. If the probability
of touching both branches is equal, the adversary can have 50 percent confidence on Figure. This
may lead to high privacy risk if senðFigureÞ is high. A safer solution would remove node Ice
Skating in such case for privacy protection. In contrast, it might be unnecessary to remove
sensitive nodes with low sensitivity. Therefore, simply forbidding the sensitive topics does not
protect the user’s privacy needs precisely.

To address the problem with forbidding, we propose a technique, which detects and
removes a set of nodes X from H, such that the privacy risk introduced by exposing G ¼ rsbtrðX;
HÞ is always under control. Set X is typically different from S. For clarity of description, we
assume that all the subtrees of H rooted at the nodes in X do not overlap each other. This process
is called generalization, and the output G is a generalized profile. The generalization technique
can seemingly be conducted during offline processing without involving user queries. However,
it is impractical to perform offline generalization due to two reasons:The output from offline
generalization may contain many topic branches, which are irrelevant to a query. A more flexible
solution requires online generalization, which depends on the queries. Online generalization not
only avoids unnecessary privacy disclosure, but also removes noisy topics that are n irrelevant to
the current query.

13
For example, given a query qa ¼ “K-Anonymity,” which is a privacy protection
technique used in data publishing, a desirable result of online generalization might be Ga,
surrounded by the dashed ellipse.

UPS PROCEDURES

In this section, we present the procedures carried out for each user during two different
execution phases, namely the offline and online phases. Generally, the offline phase constructs
the original user profile and then performs privacy requirement customization according to user-
specified topic sensitivity. The subsequent online phase finds the Optimal _-Risk Generalization
solution in the search space determined by the customized user profile. As mentioned in the
previous section, the online generalization procedure is guided by the global risk and utility
metrics. The computation of these metrics relies on two intermediate data structures, namely a
cost layer and a preference layer defined on the user profile.

The cost layer defines for each node t 2 H a cost value costðtÞ _ 0, which indicates the
total sensitivity at risk caused by the disclosure of t. These cost values can be computed offline
from the user-specified sensitivity values of the sensitive nodes. The preference layer is
computed online when a query q is issued. It contains for each node t 2 H a value indicating the
user’s query-related preference on topic t. These preference values are computed relying on a
procedure called query topic mapping. Specifically, each user has to undertake the following
procedures in our solution:

1. offline profile construction,

2. offline privacy requirement customization,
3. online query-topic mapping, and
4. online generalization.

METRICS
METRIC OF UTILITY

14
The purpose of the utility metric is to predict the search quality (in revealing the user’s
intention) of the query q on a generalized profile G. The reason for not measuring the search
quality directly is because search quality depends largely on the implementation of PWS search
engine, which is hard to predict. In addition, it is too expensive to solicit user feedback on search
results. Alternatively, we transform the utility prediction problem to the estimation of the
discriminating power of a given query q on a profile G under the following assumption.
Assumption 3. When a PWS search engine is given, the search quality is only determined by the
discriminating power of the exposed query-profile pair hq; Gi. Although the same assumption
has been made to model utility, the metric in that work cannot be used in our problem settings as
our profile is a hierarchical structure rather than a flat one.

To propose our model of utility, we introduce the notion of Information Content (IC),
which estimates how specific a given topic t is. Formally, the IC of a topic t is given by ICðtÞ ¼
log_1 PrðtÞ; ð9Þ where PrðtÞ is given in (3). The more often topic t is mentioned, the smaller IC
(less specific) will it have. The root topic has an IC of 0, as it dominates the entire topic domain
and always occurs. Now, we develop the first component of the utility metric called Profile
Granularity (PG), which is the KL-Divergence between the probability distributions of the topic
domain with and without hq; Gi exposed. That is the probability Prðt j q; GÞ (referred to as
normalized preference) can be computed with (7).

We can justify that this component can capture the first two observations we proposed
above, by decomposing PGðq; GÞ into two terms which respect ob1 and ob2 separately. The
first term can be considered as the expected IC of topics in TGðqÞ. The second one quantifies
the uncertainty of the distribution of the user preference on topics in TGðqÞ. Such uncertainty is
modeled as a penalty to the utility. The second component of utility is called Topic Similarity
(TS), which measures the semantic similarity among the topics in TGðqÞ as observation ob3
suggests. This can be computed as the Information Content of the Least Common Ancestor of
TGðqÞ .
METRIC OF PRIVACY

15
The privacy risk when exposing G is defined as the total sensitivity contained in it, given
in normalized form. In the worst case, the original profile is exposed, and the risk of exposing all
sensitive nodes reaches its maximum, namely 1. However, if a sensitive node is pruned and its
ancestor nodes are retained during the generalization, we still have Online Decision: To
Personalize or Not The results reported demonstrate that there exist a fair amount of queries
called distinct queries, to which the profile-based personalization contributes little or even
reduces the search quality, while exposing the profile to a server would for sure risk the user’s
privacy. To address this problem, we develop an online mechanism to decide whether to
personalize a query.

The basic idea is straightforward— if a distinct query is identified during generalization,

The entire runtime profiling will be aborted and the query will be sent to the server without a
user profile. We identify distinct queries using the discriminating power. Specifically, remember
that the personalization utility is defined as the gain in DP when exposing the generalized profile
with the query. The benefits of making the above runtime decision are twofold:

1. It enhances the stability of the search quality;

2. It avoids the unnecessary exposure of the user profile.

LITERATURE SURVEY

16
Yabo Xu describe Personalized web search is a promising way to improve search quality
by customizing search results for people with individual information goals. However, users are
uncomfortable with exposing private preference information to search engines. On the other
hand, privacy is not absolute, and often can be compromised if there is a gain in service
orprofitability to the user. Thus, a balance must be struck between search quality and privacy
protection. This paper presentsa scalable way for users to automatically build rich user profiles.
These profiles summarize a user’s interests into a hierarchical organization according to specific
interests. Two parameters for specifying privacy requirements are proposed to help the user to
choose the content and degree of detail of the profile information that is exposed to the search
engine. Experiments showed that the user profile improved search quality when compared to
standard MSN rankings. More importantly, results verified our hypothesis that a significant
improvement on search quality can be achieved by only sharing some higher-level userprofile
information, which is potentially less sensitive than detailed personal information.

Xuehua Shen, Bin Tan, ChengXiang Zhai describe Personalized search is a promising
way to improve the accuracy of web search, and hasbeen attracting much attention recently.
However, eÆective personalized search requires collecting and aggregating user information,
which often raise serious concerns of privacy infringement for many users. Indeed, these
concerns have become one of the main barriers for deploying personalized search applications,
and how to do privacy-preserving personalization is a great challenge. In this paper, we
systematically examine the issue of privacy preservation in personalized search. We distinguish
and deØne four levels of privacy protection, and analyze various software architectures for
personalized search. We show that client-side personalization has advantages over the existing
server-side personalized search services in preserving privacy, and envision possible future
strategies to fully protect user privacy.

Ms. A. S. Patil1,Prof. M.M.Ghonge2,Dr. M. V. Sarode describe Searching is one of the

common tasks performed on the Internet. Search engines are the basic tool of the internet, from
where one can collect related information and searched according to the specified query or
keyword given by the user, and are extremely popular for recursively used sites.The information

17
on the web is growing dramatically. The users have to spend lots of time on the web finding the
information they are interested in. Today, the traditional search engines do not give users enough
personalized help but provide the user with lots of irrelevant information. In such case,
personalized web search (PWS) has demonstrated its effectiveness in improving the quality of
various search services on the Internet.However, evidences show that users’ are not willing to
disclose their private information during search has become a majorbarrier for the wide use of
PWS. This paper gives information about privacy protection in PWS applications that model user
preferences as hierarchicaluser profiles. This paper proposes a PWS framework called UPS that
can adaptively generalize profiles by queries while respecting userspecified privacy
requirements. It aims at providing protection against a typical model of privacy attack.

Jordi Castellà-Roca, Alexandre Viejo, Jordi Herrera-Joancomartíc, describe Web search

engines (e.g. Google, Yahoo, Microsoft Live Search, etc.) are widely used to find certain data
among a huge amount of information in a minimal amount of time. However, these useful tools
also pose a privacy threat to the users: web search engines profile their users by storing and
analyzing past searches submitted by them. To address this privacy threat, current solutions
propose new mechanisms that introduce a high cost in terms of computation and communication.
In this paper we present a novel protocol specially designed to protect the users’ privacy in front
of web search profiling. Our system provides a distorted user profile to the web search engine.
We offer implementation details and computational and communication results that show that
the proposed protocol improves the existing solutions in terms of query delay. Our scheme
provides an affordable overhead while offering privacy benefits to the users.

B.Lavanya1, K. Rajani Dev Describe In this paper we investigate the problem of

protecting privacy for publishing search engine logs. Search engines play a crucial role in the
navigation through the vastness of the Web. Privacy-preserving data publishing (PPDP) provides
methods and tools for publishing useful information while preserving data privacy. Recently,
PPDP has received considerable attention in research communities, and many approaches have
been proposed for different data publishing scenarios. In this paper we study privacy
preservation for the publication of search engine query logs. Consider an issue that even after
removing all personal characteristics of the searcher, which can serve as links to his identity, the

18
publication of such data,is still subject to privacy attacks from adversaries who have partial
knowledge about the set. Our experimental results show that the query log canbe appropriately
anonymized against the specific attack, while retaining a significant volume of useful data.In this
paper we study about problem in search logs and why the log is not secure and how to make log
secure using data mining algorithm and techniques like Generalization, Suppression and Quasi
identifier.

SYSTEM ANALYSIS

19
FEASIBILITY STUDY

Feasibility study is a process which defines exactly what a project is and what strategic issues
need to be considered to assess its feasibility, or likelihood of succeeding. Feasibility studies are
useful both when starting a new business, and identifying a new opportunity for an existing
business. Ideally, the feasibility study process involves making rational decisions about a number
of enduring characteristics of a project, including:

● Technical feasibility- do we’ have the technology’? If not, can we get it?
● Operational feasibility- do we have the resources to build the system? Will the system be
acceptable? Will people use it?
● Economic feasibility, technical feasibility, schedule feasibility, and operational
feasibility- are the benefits greater than the costs?

TECHNICAL FEASIBILITY

Technical feasibility is concerned with the existing computer system (Hardware,

Software etc.) and to what extend it can support the proposed addition. For example, if particular
software will work only in a computer with a higher configuration, an additional hardware is
required. This involves financial considerations and if the budget is a serious constraint, then the
proposal will be considered not feasible.

OPERATIONAL FEASIBILITY

Operational feasibility is a measure of how well a proposed system solves the problems,
and takes advantages of the opportunities identified during scope definition and how it satisfies
the requirements identified in the requirements identified in the requirements analysis phase of
system development.

ECONOMIC FEASIBILITY

20
Economic analysis is the most frequently used method for evaluating the effectiveness of
a candidate system. More commonly known as cost/ benefit analysis, the procedure is to
determine the benefits and savings that are expected from a candidate system and compare them
with costs. If benefits outweigh costs, then the decision is made to design and implement the
system.

EXISTING SYSTEM

⮚ The profile-based personalization may not even help to improve the search quality for
some ad hoc queries, though exposing user profile to a server has put the user’s privacy at
risk.
⮚ The existing methods do not take into account the customization of privacy requirements.
This probably makes some user privacy to be overprotected while others insufficiently
protected.
⮚ Many personalization techniques require iterative user interactions when creating
personalized search results. They usually refine the search results with some metrics
which require multiple user interactions, such as rank scoring, average rank, and so on.
⮚ It is infeasible for runtime profiling, as it will not only pose too much risk of privacy
breach, but also demand prohibitive processing time for profiling. Thus, we need
predictive metrics to measure the search quality and breach risk after personalization,
without incurring iterative user interaction.
⮚ The existing profile-based Personalized Web Search does not support runtime profiling.
A user profile is typically generalized for only once offline, and used to personalize all
queries from a same user indiscriminatingly. One evidence reported in is that
.
DISADVANTAGE:

1. All the sensitive topics are detected using an absolute metric called surprisal based on
the information theory.
2. The existing profile-based PWS do not support runtime profiling.
3. The existing methods do not take into account the customization of privacy
requirements.

21
4. Many personalization techniques require iterative user interactions when creating
personalized search results.

PROPOSED SYSTEM:

⮚ Relying on the definition of two conflicting metrics, namely personalization utility and
privacy risk, for hierarchical user profile, we formulate the problem of privacy-preserving
personalized search as Risk Profile Generalization, with its NP-hardness proved.
⮚ This project proposes a privacy-preserving personalized web search framework UPS,
which can generalize profiles for each query according to user-specified privacy
requirements.
⮚ It develops two simple but effective generalization algorithms, GreedyDP and GreedyIL,
to support runtime profiling. While the former tries to maximize the discriminating power
(DP), the latter attempts to minimize the information loss (IL). By exploiting a number of
heuristics, GreedyIL outperforms GreedyDP significantly.
⮚ It provides an inexpensive mechanism for the client to decide whether to personalize a
query in UPS.
⮚ This decision can be made before each runtime profiling to enhance the stability of the
search results while avoid the unnecessary exposure of the profile.

ADVANTAGES:

1. It enhances the stability of the search quality.

2. It avoids the unnecessary exposure of the user profile.
3. Increasing usage of personal and behaviour information to profile its users, this is
usually gathered implicitly from query history, browsing history, click-through
data bookmarks, user documents, and so forth.
4. The framework allowed users to specify customized privacy requirements via the
hierarchical profiles. In addition, UPS also performed online generalization on

22
user profiles to protect the personal privacy without compromising the search
quality

SYSTEM CONFIGURATION

HARDWARE SPECIFICATION:

 Processor : Intel I3 Core Processor

 Ram : 4 GB (or) Higher
 Hard disk : 1TB

SOFTWARE SPECIFICATION:

 Web Server : Apache Tomcat Latest Version

 Server-side Technologies : Java, Java Server Pages

 Client-side Technologies : Hyper Text Markup Language, Cascading Style

Sheets, Java Script, AJAX

 Database Server : MS SQL

 Operating System : Windows (or) Linux (or) Mac any version

SOFTWARE DESCRIPTION

23
ABOUT THE SOFTWARE

FEATURES OF VISUAL BASIC .NET

THE .NET FRAMEWORK

The .NET Framework is a new computing platform that simplifies application
development in the highly distributed environment of the Internet.

OBJECTIVES OF .NET FRAMEWORK:

1. To provide a consistent object-oriented programming environment whether object codes is

stored and executed locally on Internet-distributed, or executed remotely.

2. To provide a code-execution environment to minimizes software deployment and guarantees

safe execution of code.

3. Eliminates the performance problems.

There are different types of application, such as Windows-based applications and Web-
based applications. To make communication on distributed environment to ensure that code be
accessed by the .NET Framework can integrate with any other code.

COMPONENTS OF .NET FRAMEWORK

1. THE COMMON LANGUAGE RUNTIME (CLR):

The common language runtime is the foundation of the .NET Framework. It manages
code at execution time, providing important services such as memory management, thread
management, and remoting and also ensures more security and robustness. The concept of code
management is a fundamental principle of the runtime. Code that targets the runtime is known as
managed code, while code that does not target the runtime is known as unmanaged code

THE .NET FRAME WORK CLASS LIBRARY:

24
It is a comprehensive, object-oriented collection of reusable types used to develop
applications ranging from traditional command-line or graphical user interface (GUI)
applications to applications based on the latest innovations provided by ASP.NET, such as Web
Forms and XML Web services.

The .NET Framework can be hosted by unmanaged components that load the common
language runtime into their processes and initiate the execution of managed code, thereby
creating a software environment that can exploit both managed and unmanaged features.
The .NET Framework not only provides several runtime hosts, but also supports the
development of third-party runtime hosts.

Internet Explorer is an example of an unmanaged application that hosts the runtime (in the
form of a MIME type extension). Using Internet Explorer to host the runtime to enables embeds
managed components or Windows Forms controls in HTML documents.

FEATURES OF THE COMMON LANGUAGE RUNTIME:

The common language runtime manages memory; thread execution, code

execution, code safety verification, compilation, and other system services these are all run on
CLR.

Security.

Robustness.

Productivity.

Performance.

SECURITY

The runtime enforces code access security. The security features of the runtime thus
enable legitimate Internet-deployed software to be exceptionally feature rich. With regards to
security, managed components are awarded varying degrees of trust, depending on a number of
factors that include their origin to perform file-access operations, registry-access operations, or
other sensitive functions.

25
ROBUSTNESS:

The runtime also enforces code robustness by implementing a strict type- and code-
verification infrastructure called the common type system(CTS). The CTS ensures that all
managed code is self-describing. The managed environment of the runtime eliminates many
common software issues.

PRODUCTIVITY:

The runtime also accelerates developer productivity. For example, programmers can
write applications in their development language of choice, yet take full advantage of the
runtime, the class library, and components written in other languages by other developers.

PERFORMANCE:

The runtime is designed to enhance performance. Although the common language

runtime provides many standard runtime services, managed code is never interpreted. A feature
called just-in-time (JIT) compiling enables all managed code to run in the native machine
language of the system on which it is executing. Finally, the runtime can be hosted by high-
performance, server-side applications, such as Microsoft® SQL Server™ and Internet
Information Services (IIS).

ASP.NET

ASP.NET is the next version of Active Server Pages (ASP); it is a unified Web
development platform that provides the services necessary for developers to build enterprise-
class Web applications. While ASP.NET is largely syntax compatible, it also provides a new
programming model and infrastructure for more secure, scalable, and stable applications.

ASP.NET is a compiled, NET-based environment, we can author applications in

any .NET compatible language, including Visual Basic .NET, C#, and JScript .NET.
Additionally, the entire .NET Framework is available to any ASP.NET application. Developers
can easily access the benefits of these technologies, which include the managed common
language runtime environment (CLR), type safety, inheritance, and so on.

26
ASP.NET has been designed to work seamlessly with WYSIWYG HTML editors and
other programming tools, including Microsoft Visual Studio .NET. Not only does this make Web
development easier, but it also provides all the benefits that these tools have to offer, including a
GUI that developers can use to drop server controls onto a Web page and fully integrated
debugging support.

Developers can choose from the following two features when creating an ASP.NET
application. Web Forms and Web services, or combine these in any way they see fit. Each is
supported by the same infrastructure that allows you to use authentication schemes; cache
frequently used data, or customizes your application's configuration, to name only a few
possibilities.

Web Forms allows us to build powerful forms-based Web pages. When building these
pages, we can use ASP.NET server controls to create common UI elements, and program them
for common tasks. These controls allow we to rapidly build a Web Form out of reusable built-in
or custom components, simplifying the code of a page.

An XML Web service provides the means to access server functionality remotely. Using
Web services, businesses can expose programmatic interfaces to their data or business logic,
which in turn can be obtained and manipulated by client and server applications. XML Web
services enable the exchange of data in client-server or server-server scenarios, using standards
like HTTP and XML messaging to move data across firewalls. XML Web services are not tied to
a particular component technology or object-calling convention. As a result, programs written in
any language, using any component model, and running on any operating system can access
XML Web services.

Each of these models can take full advantage of all ASP.NET features, as well as the
power of the .NET Framework and .NET Framework common language runtime.

Accessing databases from ASP.NET applications is an often-used technique for

displaying data to Web site visitors. ASP.NET makes it easier than ever to access databases for
this purpose. It also allows us to manage the database from your code.

27
ASP.NET provides a simple model that enables Web developers to write logic that runs
at the application level. Developers can write this code in the global.aspx text file or in a
compiled class deployed as an assembly. This logic can include application-level events, but
developers can easily extend this model to suit the needs of their Web application.

ASP.NET provides easy-to-use application and session-state facilities that are familiar
to ASP developers and are readily compatible with all other .NET Framework APIs.

ASP.NET offers the IHttpHandler and IHttpModule interfaces. Implementing the

IHttpHandler interface gives you a means of interacting with the low-level request and response
services of the IIS Web server and provides functionality much like ISAPI extensions, but with a
simpler programming model. Implementing the IHttpModule interface allows you to include
custom events that participate in every request made to your application.

ASP.NET takes advantage of performance enhancements found in the .NET

Framework and common language runtime. Additionally, it has been designed to offer
significant performance improvements over ASP and other Web development platforms. All
ASP.NET code is compiled, rather than interpreted, which allows early binding, strong typing,
and just-in-time (JIT) compilation to native code, to name only a few of its benefits. ASP.NET is
also easily factorable, meaning that developers can remove modules (a session module, for
instance) that are not relevant to the application they are developing.

ASP.NET provides extensive caching services (both built-in services and caching APIs).
ASP.NET also ships with performance counters that developers and system administrators can
monitor to test new applications and gather metrics on existing applications.

Writing custom debug statements to your Web page can help immensely in
troubleshooting your application's code. However, it can cause embarrassment if it is not
removed. The problem is that removing the debug statements from your pages when your
application is ready to be ported to a production server can require significant effort.

ASP.NET offers the Trace Context class, which allows us to write custom debug
statements to our pages as we develop them. They appear only when you have enabled tracing
for a page or entire application. Enabling tracing also appends details about a request to the page,

28
or, if you so specify, to a custom trace viewer that is stored in the root directory of your
application.

The .NET Framework and ASP.NET provide default authorization and authentication
schemes for Web applications. we can easily remove, add to, or replace these schemes,
depending upon the needs of our application .

DATA ACCESS WITH ADO.NET

As you develop applications using ADO.NET, you will have different requirements for
working with data. You might never need to directly edit an XML file containing data - but it is
very useful to understand the data architecture in ADO.NET.

ADO.NET offers several advantages over previous versions of ADO:

Interoperability

Maintainability

Programmability

Performance Scalability

INTEROPERABILITY:

ADO.NET applications can take advantage of the flexibility and broad acceptance of
XML. Because XML is the format for transmitting datasets across the network, any component
that can read the XML format can process data. The receiving component need not be an
ADO.NET component.

The transmitting component can simply transmit the dataset to its destination without
regard to how the receiving component is implemented. The destination component might be a
Visual Studio application or any other application implemented with any tool whatsoever.

The only requirement is that the receiving component be able to read XML. SO, XML
was designed with exactly this kind of interoperability in mind.

MAINTAINABILITY:

29
In the life of a deployed system, modest changes are possible, but substantial,
Architectural changes are rarely attempted because they are so difficult. As the performance load
on a deployed application server grows, system resources can become scarce and response time
or throughput can suffer. Faced with this problem, software architects can choose to divide the
server's business-logic processing and user-interface processing onto separate tiers on separate
machines.

In effect, the application server tier is replaced with two tiers, alleviating the shortage of
system resources. If the original application is implemented in ADO.NET using datasets, this
transformation is made easier.

ADO.NET data components in Visual Studio encapsulate data access functionality in

various ways that help you program more quickly and with fewer mistakes.

PERFORMANCE:

ADO.NET datasets offer performance advantages over ADO disconnected record sets. In
ADO.NET data-type conversion is not necessary.

SCALABILITY:

ADO.NET accommodates scalability by encouraging programmers to conserve limited

resources. Any ADO.NET application employs disconnected access to data; it does not retain
database locks or active database connections for long durations.

VISUAL STUDIO .NET

Visual Studio .NET is a complete set of development tools for building ASP Web
applications, XML Web services, desktop applications, and mobile applications In addition to
building high-performing desktop applications, you can use Visual Studio's powerful
component-based development tools and other technologies to simplify team-based design,
development, and deployment of Enterprise solutions.

Visual Basic .NET, Visual C++ .NET, and Visual C# .NET all use the same integrated
development environment (IDE), which allows them to share tools and facilitates in the creation

30
of mixed-language solutions. In addition, these languages leverage the functionality of the .NET
Framework and simplify the development of ASP Web applications and XML Web services.

Visual Studio supports the .NET Framework, which provides a common language
runtime and unified programming classes; ASP.NET uses these components to create ASP Web
applications and XML Web services. Also it includes MSDN Library, which contains all the
documentation for these development tools.

XML WEB SERVICES:

XML Web services are applications that can receive the requested data using XML over
HTTP. XML Web services are not tied to a particular component technology or object-calling
convention but it can be accessed by any language, component model, or operating system. In
Visual Studio .NET, you can quickly create and include XML Web services using Visual Basic,
Visual C#, JScript, Managed Extensions for C++, or ATL Server.

XML SUPPORT:

Extensible Markup Language (XML) provides a method for describing structured data.
XML is a subset of SGML that is optimized for delivery over the Web. The World Wide Web
Consortium (W3C) defines XML standards so that structured data will be uniform and
independent of applications. Visual Studio .NET fully supports XML, providing the XML
Designer to make it easier to edit XML and create XML schemas.

VISUAL BASIC .NET

Visual Basic.NET, the latest version of visual basic, includes many new features. The
Visual Basic supports interfaces but not implementation inheritance.

Visual basic.net supports implementation inheritance, interfaces and overloading. In

addition, Visual Basic .NET supports multithreading concept.

COMMON LANGUAGE SPECIFICATION (CLS):

31
Visual Basic.NET is also compliant with CLS (Common Language Specification) and
supports structured exception handling. CLS is set of rules and constructs that are supported by
the CLR (Common Language Runtime). CLR is the runtime environment provided by the .NET
Framework; it manages the execution of the code and also makes the development process easier
by providing services.

Visual Basic.NET is a CLS-compliant language. Any objects, classes, or components that

created in Visual Basic.NET can be used in any other CLS-compliant language. In addition, we
can use objects, classes, and components created in other CLS-compliant languages in Visual
Basic.NET .The use of CLS ensures complete interoperability among applications, regardless of
the languages used to create the application.

IMPLEMENTATION INHERITANCE:

Visual Basic.NET supports implementation inheritance. This means that, while creating
applications in Visual Basic.NET, we can drive from another class, which is know as the base
class that derived class inherits all the methods and properties of the base class. In the derived
class, we can either use the existing code of the base class or override the existing code.
Therefore, with help of the implementation inheritance, code can be reused.

CONSTRUCTORS AND DESTRUCTORS:

Constructors are used to initialize objects, whereas destructors are used to destroy them.
In other words, destructors are used to release the resources allocated to the object. In Visual
Basic.NET the sub finalize procedure is available. The sub finalize procedure is used to complete
the tasks that must be performed when an object is destroyed. The sub finalize procedure is
called automatically when an object is destroyed. In addition, the sub finalize procedure can be
called only from the class it belongs to or from derived classes.

GARBAGE COLLECTION:

Garbage Collection is another new feature in Visual Basic.NET. The .NET Framework
monitors allocated resources, such as objects and variables. In addition, the .NET Framework
automatically releases memory for reuse by destroying objects that are no longer in use. In
Visual Basic.NET, the garbage collector checks for the objects that are not currently in use by

32
applications. When the garbage collector comes across an object that is marked for garbage
collection, it releases the memory occupied by the object.

OVERLOADING:

Overloading is another feature in Visual Basic.NET. Overloading enables us to define

multiple procedures with the same name, where each procedure has a different set of arguments.
Besides using overloading for procedures, we can use it for constructors and properties in a class.

MULTITHREADING:

Visual Basic.NET also supports multithreading. An application that supports

multithreading can handle multiple tasks simultaneously, we can use multithreading to decrease
the time taken by an application to respond to user interaction. To decrease the time taken by an
application to respond to user interaction, we must ensure that a separate thread in the application
handles user interaction.

STRUCTURED EXCEPTION HANDLING:

Visual Basic.NET supports structured handling, which enables us to detect and remove
errors at runtime. In Visual Basic.NET, we need to use Try…Catch…Finally statements to create
exception handlers. Using Try…Catch…Finally statements, we can create robust and effective
exception handlers to improve the performance of our application.

BACK END DESIGN

Features of SQL-SERVER

The OLAP Services feature available in SQL Server version 7.0 is now called SQL
Server 2000 Analysis Services. The term OLAP Services has been replaced with the term
Analysis Services. Analysis Services also includes a new data mining component. The
Repository component available in SQL Server version 7.0 is now called Microsoft SQL Server
2000 Meta Data Services. References to the component now use the term Meta Data Services.
The term repository is used only in reference to the repository engine within Meta Data Services

SQL-SERVER database consist of six type of objects,

33
They are,

1. TABLE

2. QUERY

3. FORM

4. REPORT

5. MACRO

TABLE

A database is a collection of data about a specific topic.

VIEWS OF TABLE:

We can work with a table in two types,

1. Design View

2. Datasheet View

Design View

To build or modify the structure of a table we work in the table design view. We can specify
what kind of data will be hold.

Datasheet View

To add, edit or analyses the data itself we work in tables datasheet view mode.

QUERY:

A query is a question that has to be asked the data. Access gathers data that answers the
question from one or more table. The data that make up the answer is either dynaset (if you edit
it) or a snapshot(it cannot be edited).Each time we run query, we get latest information in the
dynaset.Access either displays the dynaset or snapshot for us to view or perform an action on
it ,such as deleting or updating.

34
FORMS:

A form is used to view and edit information in the database record by record .A form
displays only the information we want to see in the way we want to see it. Forms use the familiar
controls such as textboxes and checkboxes. This makes viewing and entering data easy.

Views of Form:

We can work with forms in several primarily there are two views,

They are,

1. Design View

2. Form View

Design View

To build or modify the structure of a form, we work in forms design view. We can add
control to the form that are bound to fields in a table or query, includes textboxes, option buttons,
graphs and pictures.

Form View

The form view which display the whole design of the form.

REPORT:

A report is used to views and print information from the database. The report can ground
records into many levels and compute totals and average by checking values from many records
at once. Also the report is attractive and distinctive because we have control over the size and
appearance of it.

MACRO:

35
A macro is a set of actions. Each action in macros does something. Such as opening a
form or printing a report .We write macros to automate the common tasks the work easy and
save the time

C# (pronounced C Sharp) is a multi-paradigm programming language that encompasses

functional, imperative, generic, object-oriented (class-based), and component-oriented
programming disciplines. It was developed by Microsoft as part of the .NET initiative and later
approved as a standard by ECMA (ECMA-334) and ISO (ISO/IEC 23270). C# is one of the 44
programming languages supported by the .NET Framework's Common Language Runtime.

C# is intended to be a simple, modern, general feedback-purpose, object-oriented

programming language. Anders Hejlsberg, the designer of Delphi, leads the team which is
developing C#. It has an object-oriented syntax based on C++ and is heavily influenced by other
programming languages such as Delphi and Java. It was initially named Cool, which stood for
"C like Object Oriented Language". However, in July 2000, when Microsoft made the project
public, the name of the programming language was given as C#. The most recent version of the
language is C# 3.0 which was released in conjunction with the .NET Framework 3.5 in 2007.
The next proposed version, C# 4.0, is in development.

History:-

In 1996, Sun Microsystems released the Java programming language with Microsoft soon
purchasing a license to implement it in their operating system. Java was originally meant to be a
platform independent language, but Microsoft, in their implementation, broke their license
agreement and made a few changes that would essentially inhibit Java's platform-independent
capabilities. Sun filed a lawsuit and Microsoft settled, deciding to create their own version of a
partially compiled, partially interpreted object-oriented programming language with syntax
closely related to that of C++.

During the development of .NET, the class libraries were originally written in a
language/compiler called Simple Managed C (SMC). In January 1999, Anders Hejlsberg formed
a team to build a new language at the time called Cool, which stood for "C like Object Oriented
Language".Microsoft had considered keeping the name "Cool" as the final name of the language,

36
but chose not to do so for trademark reasons. By the time the .NET project was publicly
announced at the July 2000 Professional Developers Conference, the language had been renamed
C#, and the class libraries and ASP.NET runtime had been ported to C#.

C#'s principal designer and lead architect at Microsoft is Anders Hejlsberg, who was
previously involved with the design of Visual J++, Borland Delphi, and Turbo Pascal. In
interviews and technical papers he has stated that flaws in most major programming languages
(e.g. C++, Java, Delphi, and Smalltalk) drove the fundamentals of the Common Language
Runtime (CLR), which, in turn, drove the design of the C# programming language itself. Some
argue that C# shares roots in other languages.

Features of C#:-

By design, C# is the programming language that most directly reflects the underlying
Common Language Infrastructure (CLI). Most of C#'s intrinsic types correspond to value-types
implemented by the CLI framework. However, the C# language specification does not state the
code generation requirements of the compiler: that is, it does not state that a C# compiler must
target a Common Language Runtime (CLR), or generate Common Intermediate Language (CIL),
or generate any other specific format. Theoretically, a C# compiler could generate machine code
like traditional compilers of C++ or FORTRAN; in practice, all existing C# implementations
target CIL.

Some notable C# distinguishing features are:

● There are no global variables or functions. All methods and members must be declared
within classes. It is possible, however, to use static methods/variables within public
classes instead of global variables/functions.
● Local variables cannot shadow variables of the enclosing block, unlike C and C++.
Variable shadowing is often considered confusing by C++ texts.
● C# supports a strict Boolean data type, bool. Statements that take conditions, such as
while and if, require an expression of a Boolean type. While C++ also has a Boolean
type, it can be freely converted to and from integers, and expressions such as if(a) require
only that a is convertible to bool, allowing a to be an int, or a pointer. C# disallows this

37
"integer meaning true or false" approach on the grounds that forcing programmers to use
expressions that return exactly bool can prevent certain types of programming mistakes
such as if (a = b) (use of = instead of ==).
● In C#, memory address pointers can only be used within blocks specifically marked as
unsafe, and programs with unsafe code need appropriate permissions to run. Most object
access is done through safe object references, which are always either pointing to a valid,
existing object, or have the well-defined null value; a reference to a garbage-collected
object, or to random block of memory, is impossible to obtain. An unsafe pointer can
point to an instance of a value-type, array, string, or a block of memory allocated on a
stack. Code that is not marked as unsafe can still store and manipulate pointers through
the System.IntPtr type, but cannot dereference them.
● Managed memory cannot be explicitly freed, but is automatically garbage collected.
Garbage collection addresses memory leaks. C# also provides direct support for
deterministic finalization with the using statement (supporting the Resource Acquisition
Is Initialization idiom).
● Multiple inheritance is not supported, although a class can implement any number of
interfaces. This was a design decision by the language's lead architect to avoid
complication, avoid dependency hell and simplify architectural requirements throughout
CLI.
● C# is more type safe than C++. The only implicit conversions by default are those which
are considered safe, such as widening of integers and conversion from a derived type to a
base type. This is enforced at compile-time, during JIT, and, in some cases, at runtime.
There are no implicit conversions between Booleans and integers, nor between
enumeration members and integers (except for literal 0, which can be implicitly
converted to any enumerated type). Any user-defined conversion must be explicitly
marked as explicit or implicit, unlike C++ copy constructors (which are implicit by
default) and conversion operators (which are always implicit).
● Enumeration members are placed in their own scope.
● C# provides syntactic sugar for a common pattern of a pair of methods, accessor (getter)
and mutator (setter) encapsulating operations on a single attribute of a class, in form of
properties.

38
● Full type reflection and discovery is available.
● C# currently (as of 3 June 2008) has 77 reserved words.

Common Type system (CTS)

C# has a unified type system. This unified type system is called Common Type System
(CTS).

A unified type system implies that all types, including primitives such as integers, are
subclasses of the System. Object class. For example, every type inherits a To String () method.
For performance reasons, primitive types (and value types in general feedback) are internally
allocated on the stack.

Categories of data types

CTS separate data types into two categories:

● Value types
● Reference types

Value types are plain aggregations of data. Instances of value types do not have
referential identity or referential comparison semantics - equality and inequality comparisons for
value types compare the actual data values within the instances, unless the corresponding
operators are overloaded. Value types are derived from System.ValueType, always have a
default value, and can always be created and copied. Some other limitations on value types are
that they cannot derive from each other (but can implement interfaces) and cannot have a default
(parameter less) constructor. Examples of value types are some primitive types, such as int (a
signed 32-bit integer), float (a 32-bit IEEE floating-point number), char (a 16-bit Unicode code
point), and System.DateTime (identifies a specific point in time with millisecond precision).

In contrast, reference types have the notion of referential identity - each instance of
reference type is inherently distinct from every other instance, even if the data within both
instances is the same. This is reflected in default equality and inequality comparisons for
reference types, which test for referential rather than structural equality, unless the corresponding

39
operators are overloaded (such as the case for System. String). In general feedback, it is not
always possible to create an instance of a reference type, nor to copy an existing instance, or
perform a value comparison on two existing instances, though specific reference types can
provide such services by exposing a public constructor or implementing a corresponding
interface (such as ICloneable or I Comparable). Examples of reference types are object (the
ultimate base class for all other C# classes), System. String (a string of Unicode characters), and
System. Array (a base class for all C# arrays).

Both type categories are extensible with user-defined types.

Boxing and unboxing

Boxing is the operation of converting a value of a value type into a value of a

corresponding reference type.

ACCESS PRIVIEGES

IIS provides several new access levels. The following values can set the type of access
allowed to specific directories:

o Read
o Write
o Script
o Execute
o Log Access
o Directory Browsing.
IIS WEBSITE ADMINISTRATION

Administering websites can be time consuming and costly, especially for people who
manage large internet Service Provider(ISP)Installations. to save time and money Sip’s support
only large company web siesta the expense of personal websites. But is there a cost-effective
way to support both? The answer is yes, if you can automate administrative tasks and let users
administer their own sites from remote computers.This solution reduces the amount of time and

40
money it takes to manually administer a large installation, without reducing the number of web
sites supported.

Microsoft Internet Information server (IIS) version 4.0 offers technologies to do this:

1. Windows scripting Host(WSH)

2. IIS Admin objects built on top of Active Directory service Interface(ADS))

With these technologies working together behind the scenes, the person can administers
sites from the command line of central computer and can group frequently used commands in
batch files.

Then all user need to do is run batch files to add new accounts,change permissions, add a
virtual server to a site and many other tasks.

.NET FRAMEWORK

The .NET Framework is many things, but it is worthwhile listing its most important
aspects. In short, the .NET Framework is:

A Platform designed from the start for writing Internet-aware and Internet-enabled
applications that embrace and adopt open standards such as XML, HTTP, and SOAP.

A Platform that provides a number of very rich and powerful application development
technologies, such as Windows Forms, used to build classic GUI applications, and of course
ASP.NET, used to build web applications.

A Platform with an extensive class library that provides extensive support for date access
(relational and XML), a director services, message queuing, and much more.

A platform that has a base class library that contains hundreds of classes for performing
common tasks such as file manipulation, registry access, security, threading, and searching of
text using regular expressions.

A platform that doesn’t forget its origins, and has great interoperability support for
existing components that you or third parties have written, using COM or standard DLLs.

41
A Platform with an independent code execution and management environment called the
Common Language Runtime(CLR), which ensures code is safe to run, and provides an abstract
layer on top of the operating system, meaning that elements of the .NET framework can run on
many operating systems and devices.

ASP.NET

ASP.NET is part of the whole. NET framework, built on top of the Common Language
Runtime (also known as the CLR) - a rich and flexible architecture, designed not just to cater for
the needs of developers today, but to allow for the long future we have ahead of us. What you
might not realize is that, unlike previous updates of ASP, ASP.NET is very much more than just
an upgrade of existing technology – it is the gateway to a whole new era of web development.

ASP.NET is a feature at the following web server releases

o Microsoft IIS 5.0 on WINDOWS 2000 Server

o Microsoft IIS 5.1 on WINDOWS XP
ASP.NET has been designed to try and maintain syntax and run-time compatibility with
existing ASP pages wherever possible. The motivation behind this is to allow existing ASP
Pages to be initially migrated ASP.NET by simply renaming the file to have an extension
of .aspx. For the most part this goal has been achieved, although there are typically some basic
code changes that have to be made, since VBScript is no longer supported, and the VB language
itself has changed.

Some of the key goals of ASP.NET were to

o Remove the dependency on script engines, enabling pages to be type safe and
compiled.
o Reduce the amount of code required to develop web applications.
o Make ASP.NET well factored, allowing customers to add in their own custom
functionality, and extend/ replace built-in ASP.NET functionality.
o Make ASP.NET a logical evolution of ASP, where existing ASP investment and
therefore code can be reused with little, if any, change.

42
o Realize that bugs are a fact of life, as ASP.NET should be as fault tolerant as
possible.

Benefits of ASP.NET

The .NET Framework includes a new data access technology named ADO.NET, an
evolutionary improvement to ADO. Though the new data access technology is evolutionary, the
classes that make up ADO.NET bear little resemblance to the ADO objects with which you
might be familiar. Some fairly significant changes must be made to existing ADO applications to
convert them to ADO.NET. The changes don't have to be made immediately to existing ADO
applications to run under ASP.NET, however.

ADO will function under ASP.NET. However, the work necessary to convert ADO
applications to ADO.NET is worthwhile. For disconnected applications, ADO.NET should offer
performance advantages over ADO disconnected record sets. ADO requires that transmitting and
receiving components be COM objects. ADO.NET transmits data in a standard XML-format file
so that COM marshaling or data type conversions are not required.

ASP.NET has several advantages over ASP.

The following are some of the benefits of ASP.NET:

o Make code cleaner.

o Improve deployment, scalability, and reliability.
o Provide better support for different browsers and devices.
o Enable a new breed of web applications.

VB Script

VB Script, some times known as Visual Basic Scripting Edition, is Microsoft’s answer to
Java Script. Just as Java Script’s syntax loosely based on Java. VB Script’s syntax is loosely
based on Microsoft Visual Basic a popular programming language for Windows machines.

43
Like Java Script, VB Script is a simple scripting and we can include VB Script statements
within an HTML document. To begin a VB Script, we use the <script LANGUAGE=”VB
Script”>tag.

VB Script can do many of the same things as Java Script and it even looks similar in
some cases.

It has two main advantages:

For these who already know Visual Basic, it may be easier to learn than Java Script.

It is closely integrated with ActiveX, Microsoft’s standard for Web-embedded

applications.

VB Script’s main disadvantage is that only Microsoft Internet Explorer supports it. Both
Netscape and Internet Explorer, on the other hand, support Java Script is also a much more
popular language, and we can see it in use all over the Web.

ActiveX

ActiveX is a specification develops by Microsoft that allows ordinary Windows programs

to be run within a Web page. ActiveX programs can be written in languages such as Visual Basic
and they are complied before being placed on the Web server.

ActiveX application, called controls, are downloaded and executed by the Web browser,
like Java applets. Unlike Java applets, controls can be installed permanently when they are
downloaded; eliminating the need to download them again. ActiveX’s main advantage is that it
can do just about anything.

This can also be a disadvantage:

Several enterprising programmers have already used ActiveX to bring exciting new
capabilities to Web page, such as “the Web page that turns off your computer” and “the Web
page that formats disk drive”.

44
Fortunately, ActiveX includes a signature feature that identifies the source of the control
and prevents controls from being modified. While this won’t prevent a control from damaging
system, we can specify which sources of controls we trust.

ActiveX has two main disadvantages

It isn’t as easy to program as scripting language or Java.

ActiveX is proprietary.

It works only in Microsoft Internet Explorer and only Windows platforms.

ADO.NET

ADO.NET provides consistent access to data sources such as Microsoft SQL Server, as
well as data sources exposed via OLE DB and XML. Data-sharing consumer applications can
use ADO.NET to connect to these data sources and retrieve, manipulate, and update data.

ADO.NET cleanly factors data access from data manipulation into discrete components
that can be used separately or in tandem. ADO.NET includes .NET data providers for connecting
to a database, executing commands, and retrieving results. Those results are either processed
directly, or placed in an ADO.NET Dataset object in order to be exposed to the user in an ad-hoc
manner, combined with data from multiple sources, or remote between tiers. The ADO.NET
Dataset object can also be used independently of a .NET data provider to manage data local to
the application or sourced from XML.

Why ADO.NET?

As application development has evolved, new applications have become loosely coupled
based on the Web application model. More and more of today's applications use XML to encode
data to be passed over network connections. Web applications use HTTP as the fabric for
communication between tiers, and therefore must explicitly handle maintaining state between
requests. This new model is very different from the connected, tightly coupled style of
programming that characterized the client/server era, where a connection was held open for the
duration of the program's lifetime and no special handling of state was required.

45
In designing tools and technologies to meet the needs of today's developer, Microsoft
recognized that an entirely new programming model for data access was needed, one that is built
upon the .NET Framework. Building on the .NET Framework ensured that the data access
technology would be uniform—components would share a common type system, design
patterns, and naming conventions.

ADO.NET was designed to meet the needs of this new programming model:
disconnected data architecture, tight integration with XML, common data representation with the
ability to combine data from multiple and varied data sources, and optimized facilities for
interacting with a database, all native to the .NET Framework.

Leverage Current ADO Knowledge

Microsoft's design for ADO.NET addresses many of the requirements of today's

application development model. At the same time, the programming model stays as similar as
possible to ADO, so current ADO developers do not have to start from scratch in learning a
brand new data access technology. ADO.NET is an intrinsic part of the .NET Framework
without seeming completely foreign to the ADO programmer.

ADO.NET coexists with ADO. While most new .NET applications will be written using
ADO.NET, ADO remains available to the .NET programmer through .NET COM
interoperability services. For more information about the similarities and the differences between
ADO.NET and ADO.ADO.NET provides first-class support for the disconnected, n-tier
programming environment for which many new applications are written. The concept of working
with a disconnected set of data has become a focal point in the programming model. The
ADO.NET solution for n-tier programming is the Dataset.

XML Support

46
XML and data access are intimately tied—XML is all about encoding data, and data
access is increasingly becoming all about XML. The .NET Framework does not just support
Web standards—it is built entirely on top of them.

Architecture:

Enhanced

Uml Diagram:

47
Use case Diagram:

Updating Website

User Privacy Preference

List of User Access

Admininstrator

Search the Query

Class Diagram:

48
Sequence Diagram:

49
Adam:User
User Optimization Privacy User User
EVE:Privacy Raj : Administrator
Administrator Server
Search

Registration

Registration Successfully

Update User

Successfully Updated

Add Query

Optimization Access

Query

Show the List

User Status

Query Level

Activity Diagram:

50
Collaboration:

51
ER Diagram

52
City
Mobile Email Gender
Password Username

User
websiteURL
Server
Search
privacy

adminprofile

Add

Admin

username
password

DFD:
Supporting Privacy Protection in Personalized Web Searchprocess
53

User Admin
Level 1:

54
User Registration

Normal User UserTable

Search the URL query

View Profile
UserTable

ServerTable

Edit Profile
Display the URL results
UserTable

ServerTable

Admin
Level 3

Level 2:

Admin Manage Privacy Users

55 Server Table
MODULES

56
1. Profile-Based Personalization.

2. Privacy Protection in PWS System.

3. Generalizing User Profile.

4. Online Decision.

MODULES DESCRIPTION

PROFILE-BASED PERSONALIZATION

This paper introduces an approach to personalize digital multimedia content based on

user profile information. For this, two main mechanisms were developed: a profile generator that
automatically creates user profiles representing the user preferences, and a content-based
recommendation algorithm that estimates the user's interest in unknown content by matching her
profile to metadata descriptions of the content. Both features are integrated into a personalization
system.

PRIVACY PROTECTION IN PWS SYSTEM

We propose a PWS framework called UPS that can generalize profiles in for each query
according to user-specified privacy requirements. Two predictive metrics are proposed to
evaluate the privacy breach risk and the query utility for hierarchical user profile. We develop
two simple but effective generalization algorithms for user profiles allowing for query-level
customization using our proposed metrics. We also provide an online prediction mechanism
based on query utility for deciding whether to personalize a query in UPS. Extensive
experiments demonstrate the efficiency and effectiveness of our framework.

GENERALIZING USER PROFILE

The generalization process has to meet specific prerequisites to handle the user profile.
This is achieved by preprocessing the user profile. At first, the process initializes the user profile
by taking the indicated parent user profile into account. The process adds the inherited properties
to the properties of the local user profile. Thereafter the process loads the data for the foreground
and the background of the map according to the described selection in the user profile.

57
Additionally, using references enables caching and is helpful when considering an
implementation in a production environment. The reference to the user profile can be used as an
identifier for already processed user profiles. It allows performing the customization process
once, but reusing the result multiple times. However, it has to be made sure, that an update of the
user profile is also propagated to the generalization

process. This requires specific update strategies, which check after a specific timeout or a
specific event, if the user profile has not changed yet. Additionally, as the generalization process
involves remote data services, which might be updated frequently, the cached generalization
results might become outdated. Thus selecting a specific caching strategy requires careful
analysis.

ONLINE DECISION

The profile-based personalization contributes little or even reduces the search quality,
while exposing the profile to a server would for sure risk the user’s privacy. To address this
problem, we develop an online mechanism to decide whether to personalize a query. The basic
idea is straightforward. if a distinct query is identified during generalization, the entire runtime
profiling will be aborted and the query will be sent to the server without a user profile.

THE GREEDYIL ALGORITHM

The GreedyIL algorithm improves the efficiency of the generalization using heuristics
based on several findings. One important finding is that any prune-leaf operation reduces the
discriminating power of the profile. In other words, the DP displays monotonicity by prune-leaf.
Formally, we have the following theorem: Theorem 2. If G0 is a profile obtained by applying a
prune-leaf operation on G, then DPðq; GÞ _ DPðq; G0Þ. Considering operation Gi _t _! Giþ1 in
the ith iteration, maximizing DPðq; Giþ1Þ is equivalent to minimizing the incurred information
loss, which is defined as DPðq; GiÞ _ DPðq; Giþ1Þ. The above finding motivates us to maintain
a priority queue of candidate prune-leaf operators in descending order of the information loss
caused by the operator. Specifically, each candidate operator in the queue is a tuple like op ¼ ht;
ILðt; GiÞi, where t is the leaf to be pruned by op and ILðt; GiÞ indicates the IL incurred by
pruning t from Gi. This queue, denoted by Q, enables fast retrieval of the bestso- far candidate

58
operator. Theorem 2 also leads to the following heuristic, which reduces the total computational
cost significantly. Heuristic 1. The iterative process can terminate whenever _-risk is satisfied.

The second finding is that the computation of IL can be simplified to the evaluation of
_PGðq; GÞ ¼ PGðq; GiÞ _ PGðq; Giþ1Þ. The reason is that, referring to (12), the second term
(TSðq; GÞ) remains unchanged for any pruning operations until a single leaf is left (in such case
the only choice for pruning is the single leaf itself). Furthermore, consider two possible cases as
being illustrated in Fig. 4: (C1) t is a node with no siblings, and (C2) t is a node with siblings.
The case C1 is easy to handle.

However, the evaluation of IL in case C2 requires introducing a shadow sibling4 of t.

Each time if we attempt to prune t, we actually merge t into shadow to obtain a new shadow leaf
shadow0, together with the preference of t, i.e., Prðshadow0 j q; GÞ ¼ Prðshadow j q; GÞ þ Prðt
j q; GÞ: Finally, we have the following heuristic, which significantly eases the computation of
ILðtÞ. It can be seen that all terms in (16) can be computed efficiently.

Heuristic 2. ILðtÞ ¼ Prðt j q; GÞðICðtÞ _ ICðparðt; GÞÞÞ; case C1 dpðtÞ þ dpðshadowÞ

_ dpðshadow0Þ; case C2; _ ð16Þ where dpðtÞ ¼ Prðt j q; GÞ log Prðtjq;GÞ PrðtÞ . The third
finding is that, in case C1 described above, prune-leaf only operates on a single topic t. Thus, it
does not impact the IL of other candidate operators in Q. While in case C2, pruning t incurs
recomputation of the preference values of its sibling nodes. Therefore, we have Heuristic 3. Once
a leaf topic t is pruned, only the candidate operators pruning t’s sibling topics need to be updated
in Q.

In other words, we only need to recompute the IL values for operators attempting to
prune t’s sibling topics. Algorithm 1 shows the pseudocode of the GreedyIL algorithm. In
general, GreedyIL traces the information loss instead of the discriminating power. This saves a
lot of computational cost. In the above findings, Heuristic 1 (line 5) avoids unnecessary
iterations. Heuristics 2 (line 4, 10, 14) further simplifies the computation of IL. Finally Heuristics
3 (line 16) reduces the need for IL-recomputation significantly. In the worst case, all topics in the
seed profile have sibling nodes, then GreedyIL has computational complexity of OðjG0j _ jTG0

59
ðqÞjÞ. However, this is extremely rare in practice. Therefore, GreedyIL is expected to
significantly outperform GreedyDP

NORMALIZATION

The basic objective of normalization is to be reducing redundancy which means that

information is to be stored only once. Storing information several times leads to wastage of

60
storage space and increase in the total size of the data stored.
If a Database is not properly designed it can gives rise to modification anomalies. Modification
anomalies arise when data is added to, changed or deleted from a database table. Similarly, in
traditional databases as well as improperly designed relational databases, data redundancy can be
a problem. These can be eliminated by normalizing a database.

Normalization is the process of breaking down a table into smaller tables. So that each
table deals with a single theme. There are three different kinds of modifications of anomalies and
formulated the first, second and third normal forms (3NF) is considered sufficient for most
practical purposes. It should be considered only after a thorough analysis and complete
understanding of its implications.

FIRST NORMAL FORM (1NF):

This form also called as a “flat file”. Each column should contain data in respect of a
single attributes and no two rows may be identical. To bring a table to First Normal Form,
repeating groups of fields should be identified and moved to another table.

SECOND NORMAL FORM (2NF):

A relation is said to be in 2NF if it is 1NF and non-key attributes are functionality

dependent on the key attributes. A ‘Functional Dependency’ is a relationship among attributes.
One attribute is said to be functionally dependent on another if the value of the first attribute
depends on the value of the second attribute. In the given description flight number and halt code
is the composite key.

THIRD NORMAL FORM (3NF) :

Third Normal Form normalization will be needed where all attributes in a relation tuple are not
functionally dependent only on the key attribute. A transitive dependency is one in which one
in which one attribute depends on second which is turned depends on a third and so on.

SYSTEM TESTING

61
The goal of testing is to improve the program’s quality. Quality is assured primarily
through some form of software testing. The history of testing goes back to the beginning of the
computing field. Testing is done at two Levels of Testing of individual modules and testing the
entire system. During the system testing, the system is used experimentally to ensure that the
software will run according to the specifications and in the way the user expects. Testing is very
tedious and time consuming. Each test case is designed with the intent of finding errors in the
way the system will process it.

The purpose of testing is to discover errors. Testing is the process of trying to discover
every conceivable fault or weakness in a work product. It provides a way to check the
functionality of components, sub assemblies, assemblies and/or a finished product It is the
process of exercising software with the intent of ensuring that the Software system meets its
requirements and user expectations and does not fail in an unacceptable manner. There are
various types of test. Each test type addresses a specific testing requirement.

LEVELS OF TESTING

The software underwent the following tests by the system analyst.

WHITE BOX TESTING

By using this technique it was tested that all the individual logical paths were executed at
least once. All the logical decisions were tested on both their true and false sides. All the loops
were tested with data in between the ranges and especially at the boundary values.

White Box Testing is a testing in which in which the software tester has knowledge of the
inner workings, structure and language of the software, or at least its purpose. It is purpose. It is
used to test areas that cannot be reached from a black box level.

62
BLACK BOX TESTING

By the use of this technique, the missing functions were identified and placed in their
positions. The errors in the interfaces were identified and corrected. This technique was also used
to identify the initialization and termination errors and correct them.

Black Box Testing is testing the software without any knowledge of the inner workings,
structure or language of the module being tested. Black box tests, as most other kinds of tests,
must be written from a definitive source document, such as specification or requirements
document, such as specification or requirements document. It is a testing in which the software
under test is treated, as a black box .you cannot “see” into it. The test provides inputs and
responds to outputs without considering how the software works.

UNIT TESTING

It is the verification of a single module usually in the isolated environment. The System
Analyst tests each and every module individually by giving a set of known input data and
verifying for the required output data. The System Analyst tested the software Top Down model
starting from the top of the model. The units in a system are the modules and routines that are
assembled to perform a specific function. The modules should be tested for correctness of login
applied and should detect errors in coding. This is the verification of the system to its initial
objective. This is a verification process when it is done in a simulated environment and it is a
validation process when it is done in a line environment.

Unit testing involves the design of test cases that validate that the internal program logic is
functioning properly, and that program inputs produce valid outputs. All decision branches and
internal code flow should be validated. It is the testing of individual software units of the
application .it is done after the completion of an individual unit before integration. This is a
structural testing, that relies on knowledge of its construction and is invasive. Unit tests perform
basic tests at component level and test a specific business process, application, and/or system
configuration. Unit tests ensure that each unique path of a business process performs accurately
to the documented specifications and contains clearly defined inputs and expected results.

63
Unit testing is usually conducted as part of a combined code and unit test phase of the
software lifecycle, although it is not uncommon for coding and unit testing to be conducted as
two distinct phases.

Test strategy and approach

Field testing will be performed manually and functional tests will be written in detail.

Test objectives

● All field entries must work properly.

● Pages must be activated from the identified link.
● The entry screen, messages and responses must not be delayed.

Features to be tested

● Verify that the entries are of the correct format

● No duplicate entries should be allowed
● All links should take the user to the correct page.

INTEGRATION TESTING

The purpose of unit testing is to determine that each independent module is correctly
implemented. This gives little chance to determine that the interface between modules is also
correct and for this reason integration testing must be performed. One specific target of
integration testing is the interface. Whether parameters match on both sides as to type,
permissible ranges, meaning and utilization. Module testing assures us that the detailed design
was correctly implemented; now it is necessary to verity that the architectural design
specifications were met. Chosen portions of the structure tree of the software are put together.
Each sub tree should have some logical reason for being tested. It may be a particularly difficult
or tricky part of the code; or it may be essential to the function of the rest of the product. As the
testing progresses, we find ourselves putting together larger and longer parts of the tree, until the
entire product has been integrated.

64
Integration tests are designed to test integrated software components to determine if they
actually run as one program. Testing is event driven and is more concerned with the basic
outcome of screens or fields. Integration tests demonstrate that although the components were
individually satisfaction, as shown by successfully unit testing, the combination of components is
correct and consistent. Integration testing is specifically aimed at exposing the problems that
arise from the combination of components.

Software integration testing is the incremental integration testing of two or more

integrated software components on a single platform to produce failures caused by interface
defects.

The task of the integration test is to check that components or software applications, e.g.
components in a software system or – one step up – software applications at the company level –
interact without error.

Functional test
Functional tests provide systematic demonstrations that functions tested are available as
specified by the business and technical requirements, system documentation, and user manuals.

Functional testing is centered on the following items:

Valid Input : identified classes of valid input must be accepted.

Invalid Input : identified classes of invalid input must be rejected.

Functions : identified functions must be exercised.

Output : identified classes of application outputs must be exercised.

Systems/Procedures: interfacing systems or procedures must be invoked.

Organization and preparation of functional tests is focused on requirements, key functions, or

special test cases. In addition, systematic coverage pertaining to identify Business process flows;
data fields, predefined processes, and successive processes must be considered for testing.
Before functional testing is complete, additional tests are identified and the effective value of
current tests is determined..

65
VALIDATION TESTING

The main aim of this testing is to verify that the software system does what it was
designed forThe system was tested to ensure that the purpose of automating the system “Machine
Order”.Alpha testing was carried out to ensure the validity of the system.

OUTPUT TESTING

Asking the users about the format required by them tests the outputs generated by the
system under consideration .The output format on the screen is found to be correct as the format
was designed in the system design. Output testing was done and it did not result in any change or
correction in the system.

USER ACCEPTANCE TESTING

User Acceptance Testing is a critical phase of any project and requires significant
participation by the end user. It also ensures that the system meets the functional requirements.

The system under consideration is tested for user acceptance by constantly keeping in
touch with prospective system users at time of developing and making changes whenever
required. The following points are considered.

⮚ Input screen design

⮚ Output screen design

⮚ Online message to guide the user

⮚ Menu driven system

⮚ Format of adhoc queries and reports

66
PERFORMANCE TESTING
Performance is taken as the last part of implementation. Performance is perceived as
response time for user queries, report generation and process related activities.

Test Cases:

67
Name of Module: User Login

Test Case Test Scenario Type of Prerequisites, Test steps Result Pass/Fail
ID Test Case if any

QSW- To login to the Functional Login as The System Accept Welcome to Pass
>CP0001 Account the authorized the username and the account
user will type person password and check
his username it with the database.
and Password.

The System Accept Pass

To login to the Login as the username and
QSW- Invalid Login
Account the Functional authorized password and check
>CP0002
user will type person it with the database.
his username
and Password

Name of Module: Parsing

Test Case Test Scenario Type of Prerequisites Test steps Result Pass/Fail

68
ID Test Case , if any

QSW- To download Functional Stop word Checks if the input Pre-processed Pass
>CP0003 the Html page removel,ste and output flows output
and extract mming and correctly
information indexing

QSW-
To download Functional Stop word
>CP0004 If any other format
the Html page removel,ste Please enter
and extract mming and like xml ,doc ,etc Pass
your correct
information indexing Input

Name of Module: Searching

Test Case Test Scenario Type of Prerequisites, Test steps Result Pass/Fail
ID Test Case if any

QSW- Extract the Functional To get search N-gram searching- Accurate Pass
>CP0005 accurate result gram extraction. search result
information

N-gram searching-
Extract the gram extraction.
QSW- To get search Obsolete
accurate Functional Fail
>CP0006 result search result
information

CONCLUSION AND FUTURE WORK

69
This paper presented a client-side privacy protection framework called UPS for
personalized web search. UPS could potentially be adopted by any PWS that captures user
profiles in a hierarchical taxonomy. The framework allowed users to specify customized privacy
requirements via the hierarchical profiles. In addition, UPS also performed online generalization
on user profiles to protect the personal privacy without compromising the search quality. We
proposed two greedy algorithms, namely GreedyDP and GreedyIL, for the online generalization.
Our experimental results revealed that UPS could achieve quality search results while preserving
user’s customized privacy requirements. The results also confirmed the effectiveness and
efficiency of our solution.

For future work, we will try to resist adversaries with broader background knowledge,
such as richer relationship among topics (e.g., exclusiveness, sequentiality, and so on), or
capability to capture a series of queries (relaxing the second constraint of the adversary) from the
victim. We will also seek more sophisticated method to build the user profile, and better metrics
to predict the performance (especially the utility) of UPS.

REFERENCES

70
[1] Z. Dou, R. Song, and J.-R. Wen, “A Large-Scale Evaluation and Analysis of Personalized
Search Strategies,” Proc. Int’l Conf. World Wide Web (WWW), pp. 581-590, 2007.
[2] J. Teevan, S.T. Dumais, and E. Horvitz, “Personalizing Search via Automated Analysis of
Interests and Activities,” Proc. 28th Ann. Int’l ACM SIGIR Conf. Research and Development in
Information Retrieval (SIGIR), pp. 449-456, 2005.

[3] M. Spertta and S. Gach, “Personalizing Search Based on User Search Histories,” Proc.
IEEE/WIC/ACM Int’l Conf. Web Intelligence (WI), 2005.

[4] B. Tan, X. Shen, and C. Zhai, “Mining Long-Term Search History to Improve Search
Accuracy,” Proc. ACM SIGKDD Int’l Conf. Knowledge Discovery and Data Mining (KDD),
2006.

[5] K. Sugiyama, K. Hatano, and M. Yoshikawa, “Adaptive Web Search Based on User Profile
Constructed without any Effort from Users,” Proc. 13th Int’l Conf. World Wide Web (WWW),
2004.

[6] X. Shen, B. Tan, and C. Zhai, “Implicit User Modeling for Personalized Search,” Proc. 14th
ACM Int’l Conf. Information and Knowledge Management (CIKM), 2005.

[7] X. Shen, B. Tan, and C. Zhai, “Context-Sensitive Information Retrieval Using Implicit
Feedback,” Proc. 28th Ann. Int’l ACM SIGIR Conf. Research and Development Information
Retrieval (SIGIR), 2005.

[8] F. Qiu and J. Cho, “Automatic Identification of User Interest for Personalized Search,” Proc.
15th Int’l Conf. World Wide Web (WWW), pp. 727-736, 2006.

[9] J. Pitkow, H. Schu¨ tze, T. Cass, R. Cooley, D. Turnbull, A. Edmonds, E. Adar, and T.
Breuel, “Personalized Search,” Comm. ACM, vol. 45, no. 9, pp. 50-55, 2002.

71
[10] Y. Xu, K. Wang, B. Zhang, and Z. Chen, “Privacy-Enhancing Personalized Web Search,”
Proc. 16th Int’l Conf. World Wide Web (WWW), pp. 591-600, 2007.

[11] K. Hafner, Researchers Yearn to Use AOL Logs, but They Hesitate, New York Times,
Aug. 2006.

[12] A. Krause and E. Horvitz, “A Utility-Theoretic Approach to Privacy in Online Services,” J.

Artificial Intelligence Research, vol. 39, pp. 633-662, 2010.

[13] J.S. Breese, D. Heckerman, and C.M. Kadie, “Empirical Analysis of Predictive Algorithms
for Collaborative Filtering,” Proc. 14th Conf. Uncertainty in Artificial Intelligence (UAI), pp.
43-52, 1998.

[14] P.A. Chirita, W. Nejdl, R. Paiu, and C. Kohlschu¨ tter, “Using ODP Metadata to Personalize
Search,” Proc. 28th Ann. Int’l ACM SIGIR Conf. Research and Development Information
Retrieval (SIGIR), 2005.

[15] A. Pretschner and S. Gauch, “Ontology-Based Personalized Search and Browsing,” Proc.
IEEE 11th Int’l Conf. Tools with Artificial Intelligence (ICTAI ’99), 1999.