Web Mining Unit-1
Web Mining Unit-1
UNIT-I
World Wide Web
The World Wide Web (WWW), often called the Web, is a system of interconnected webpages and
information that you can access using the Internet. It was created to help people share and find
information easily, using links that connect different pages together. The Web allows us to browse
websites, watch videos, shop online, and connect with others around the world through our computers
and phones.All public websites or web pages that people may access on their local computers and
other devices through the internet are collectively known as the World Wide Web or W3. Users can
get further information by navigating to links interconnecting these pages and documents. This data
may be presented in text, picture, audio, or video formats on the internet.
What is WWW?
WWW stands for World Wide Web and is commonly known as the Web. The WWW was started by
CERN in 1989. WWW is defined as the collection of different websites around the world, containing
different information shared via local servers(or computers). Web pages are linked together using
hyperlinks which are HTML-formatted and, also referred to as hypertext, these are the fundamental
units of the Internet and are accessed through Hypertext Transfer Protocol (HTTP).
System Architecture
From the user’s point of view, the web consists of a vast, worldwide connection of documents or web
pages. Each page may contain links to other pages anywhere in the world. The pages can be retrieved
and viewed by using browsers of which internet explorer, Netscape Navigator, Google Chrome, etc
are the popular ones. The browser fetches the page requested interprets the text and formatting
commands on it, and displays the page, properly formatted, on the screen.
The basic model of how the web works are shown in the figure below. Here the browser is displaying
a web page on the client machine. When the user clicks on a line of text that is linked to a page on the
abd.com server, the browser follows the hyperlink by sending a message to the abd.com server asking
it for the page.
Working of WWW
A Web browser is used to access web pages. Web browsers can be defined as programs which display
text, data, pictures, animation and video on the Internet. Hyperlinked resources on the World Wide
Web can be accessed using software interfaces provided by Web browsers. Initially, Web browsers
were used only for surfing the Web but now they have become more universal.
The below diagram indicates how the Web operates just like client-server architecture of the internet.
When users request web pages or other information, then the web browser of your system request to
the server for the information and then the web server provide requested services to web browser
back and finally the requested service is utilized by the user who made the request.
Web browsers can be used for several tasks including conducting searches, mailing, transferring files,
and much more. Some of the commonly used browsers are Internet Explorer, Opera Mini, and Google
Chrome.
Features of WWW
WWW Internet
It is originated in 1989. It is originated in 1960.
WWW is an interconnected network of websites and Internet is used to connect a computer
documents that can be accessed via the Internet. with other computer .
WWW used protocols such as HTTP Internet used protocols such as TCP/IP
It is based on software. It is based on hardware.
There is a entire infrastructure in
It is a service contained inside an infrastructure.
internet.
Data Mining Vs Web Mining
Data mining is the process of extracting knowledge or insights from large amounts of data
using various statistical and computational techniques. The data can be structured, semi-
structured or unstructured, and can be stored in various forms such as databases, data
warehouses, and data lakes.
The primary goal of data mining is to discover hidden patterns and relationships in the data
that can be used to make informed decisions or predictions. This involves exploring the data
using various techniques such as clustering, classification, regression analysis, association
rule mining, and anomaly detection.
Data mining has a wide range of applications across various industries, including marketing,
finance, healthcare, and telecommunications. For example, in marketing, data mining can be
used to identify customer segments and target marketing campaigns, while in healthcare, it
can be used to identify risk factors for diseases and develop personalized treatment plans.
Data mining architecture refers to the overall design and structure of a data mining system. A
data mining architecture typically includes several key components, which work together to
perform data mining tasks and extract useful insights and information from data. Some of the
key components of a typical data mining architecture include:
• Data Sources: Data sources are the sources of data that are used in data mining. These
can include structured and unstructured data from databases, files, sensors, and other
sources. Data sources provide the raw data that is used in data mining and can be
processed, cleaned, and transformed to create a usable data set for analysis.
• Data Preprocessing: Data pre-processing is the process of preparing data for analysis.
This typically involves cleaning and transforming the data to remove errors,
inconsistencies, and irrelevant information, and to make it suitable for analysis. Data
preprocessing is an important step in data mining, as it ensures that the data is of high
quality and is ready for analysis.
• Data Mining Algorithms: Data mining algorithms are the algorithms and models that
are used to perform data mining. These algorithms can include supervised and
unsupervised learning algorithms, such as regression, classification, and clustering, as
well as more specialized algorithms for specific tasks, such as association rule mining
and anomaly detection. Data mining algorithms are applied to the data to extract useful
insights and information from it.
• Data Visualization: Data visualization is the process of presenting data and insights in a
clear and effective manner, typically using charts, graphs, and other visualizations. Data
visualization is an important part of data mining, as it allows data miners to
communicate their findings and insights to others in a way that is easy to understand
and interpret.
There are many different types of data mining, but they can generally be grouped into three
broad categories: descriptive, predictive, and prescriptive.
• Predictive data mining involves using data to build models that can make predictions
or forecasts about future events or outcomes. This type of data mining is often used to
identify and model relationships between different variables, and to make predictions
about future events or outcomes based on those relationships.
• Prescriptive data mining involves using data and models to make recommendations or
suggestions about actions or decisions. This type of data mining is often used to
optimize processes, allocate resources, or make other decisions that can help
organizations achieve their goals.
Data warehousing and mining software is a type of software that is used to store, manage, and
analyze large data sets. This software is commonly used in the field of data warehousing and
data mining, and it typically includes tools and features for pre-processing, storing, querying,
and analyzing data.
Some of the most common types of data warehousing and mining software include:
• Data mining tools – Data mining tools are software tools that are used to extract
information and insights from large data sets. These tools typically include algorithms
and methods for exploring, modeling, and analyzing data, and they are commonly used
in the field of data mining.
• Data visualization tools – Data visualization tools are software tools that are used to
visualize and display data in a graphical or graphical format. These tools are commonly
used in data mining to explore and understand the data, and to communicate the results
of the analysis.
• Data warehousing platforms – Data warehousing platforms are software systems that
are designed to support the creation and management of data warehouses. These
platforms typically include tools and features for loading, transforming, and managing
data, as well as tools for querying and analyzing the data.
1. It all starts when the user puts up certain data mining requests, these requests are then
sent to data mining engines for pattern evaluation.
2. These applications try to find the solution to the query using the already present
database.
3. The metadata then extracted is sent for proper analysis to the data mining engine which
sometimes interacts with pattern evaluation modules to determine the result.
4. This result is then sent to the front end in an easily understandable manner using a
suitable interface.
1. Data Sources: Database, World Wide Web(WWW), and data warehouse are parts of
data sources. The data in these sources may be in the form of plain text, spreadsheets, or
other forms of media like photos or videos. WWW is one of the biggest sources of data.
2. Database Server: The database server contains the actual data ready to be processed. It
performs the task of handling data retrieval as per the request of the user.
3. Data Mining Engine: It is one of the core components of the data mining architecture
that performs all kinds of data mining techniques like association, classification,
characterization, clustering, prediction, etc.
4. Pattern Evaluation Modules: They are responsible for finding interesting patterns in
the data and sometimes they also interact with the database servers for producing the
result of the user requests.
5. Graphic User Interface: Since the user cannot fully understand the complexity of the
data mining process so graphical user interface helps the user to communicate
effectively with the data mining system.
6. Knowledge Base: Knowledge Base is an important part of the data mining engine that is
quite beneficial in guiding the search for the result patterns. Data mining engines may
also sometimes get inputs from the knowledge base. This knowledge base may contain
data from user experiences. The objective of the knowledge base is to make the result
more accurate and reliable.
1. No Coupling: The no coupling data mining architecture retrieves data from particular
data sources. It does not use the database for retrieving the data which is otherwise
quite an efficient and accurate way to do the same. The no coupling architecture for data
mining is poor and only used for performing very simple data mining processes.
2. Loose Coupling: In loose coupling architecture data mining system retrieves data from
the database and stores the data in those systems. This mining is for memory-based data
mining architecture.
3. Semi-Tight Coupling: It tends to use various advantageous features of the data
warehouse systems. It includes sorting, indexing, and aggregation. In this architecture,
an intermediate result can be stored in the database for better performance.
4. Tight coupling: In this architecture, a data warehouse is considered one of its most
important components whose features are employed for performing data mining tasks.
This architecture provides scalability, performance, and integrated information
Association rules
Association rule mining finds interesting associations and relationships among large sets of data
items. This rule shows how frequently a itemset occurs in a transaction. Association rule
learning is a type of unsupervised learning technique that checks for the dependency of one
data item on another data item and maps accordingly so that it can be more profitable. It tries to
find some interesting relations or associations among the variables of dataset. It is based on
different rules to discover the interesting relations between variables in the database.
Here market basket analysis is a technique used by the various big retailer to discover the
associations between items. We can understand it by taking an example of a supermarket, as in
a supermarket, all products that are purchased together are put together.
For example, if a customer buys bread, he most likely can also buy butter, eggs, or milk, so these
products are stored within a shelf or mostly nearby. Consider the below diagram
Here the If element is called antecedent, and then statement is called as Consequent. These
types of relationships where we can find out some association or relation between two items is
known as single cardinality. It is all about creating rules, and if the number of items increases,
then cardinality also increases accordingly. So, to measure the associations between thousands
of data items, there are several metrics. These metrics are given below:
• Support
• Confidence
• Lift
Support
Support is the frequency of A or how frequently an item appears in the dataset. It is defined as
the fraction of the transaction T that contains the itemset X. If there are X datasets, then for
transactions T, it can be written as:
Confidence
Confidence indicates how often the rule has been found to be true. Or how often the items X and
Y occur together in the dataset when the occurrence of X is already given. It is the ratio of the
transaction that contains X and Y to the number of records that contain X.
Lift
It is the ratio of the observed support measure and expected support if X and Y are independent
of each other. It has three possible values:
• If Lift= 1: The probability of occurrence of antecedent and consequent is independent of
each other.
• Lift>1: It determines the degree to which the two itemsets are dependent to each other.
• Lift<1: It tells us that one item is a substitute for other items, which means one item has
a negative effect on another.
1. Apriori
2. Eclat
3. F-P Growth Algorithm
Apriori Algorithm
This algorithm uses frequent datasets to generate association rules. It is designed to work on
the databases that contain transactions. This algorithm uses a breadth-first search and Hash
Tree to calculate the itemset efficiently.
It is mainly used for market basket analysis and helps to understand the products that can be
bought together. It can also be used in the healthcare field to find drug reactions for patients.
Eclat Algorithm
Eclat algorithm stands for Equivalence Class Transformation. This algorithm uses a depth-
first search technique to find frequent itemsets in a transaction database. It performs faster
execution than Apriori Algorithm.
The F-P growth algorithm stands for Frequent Pattern, and it is the improved version of the
Apriori Algorithm. It represents the database in the form of a tree structure that is known as a
frequent pattern or tree. The purpose of this frequent tree is to extract the most frequent
patterns.
It has various applications in machine learning and data mining. Below are some popular
applications of association rule learning:
For retail information, sequential patterns are beneficial for shelf placement and promotions.
This industry, and telecommunications and different businesses, can also use sequential
patterns for targeted marketing, user retention, and several tasks.
• Sequence: A sequence is formally defined as the ordered set of items {s1, s2, s3, …, sn}.
As the name suggests, it is the sequence of items occurring together. It can be considered
as a transaction or purchased items together in a basket.
• Subsequence: The subset of the sequence is called a subsequence. Suppose {a, b, g, q, y,
e, c} is a sequence. The subsequence of this can be {a, b, c} or {y, e}. Observe that the
subsequence is not necessarily consecutive items of the sequence. From the sequences
of databases, subsequences are found from which the generalized sequence patterns are
found at the end.
• Sequence pattern: A sub-sequence is called a pattern when it is found in multiple
sequences. The goal of the GSP algorithm is to mine the sequence patterns from the
large database. The database consists of the sequences. When a subsequence has a
frequency equal to more than the “support” value. For example: the pattern <a, b> is a
sequence pattern mined from sequences {b, x, c, a}, {a, b, q}, and {a, u, b}.
Sequential pattern mining, also known as GSP (Generalized Sequential Pattern) mining, is a
technique used to identify patterns in sequential data. The goal of GSP mining is to discover
patterns in data that occur over time, such as customer buying habits, website navigation
patterns, or sensor data.
Market basket analysis: GSP mining can be used to analyze customer buying habits and identify
products that are frequently purchased together. This can help businesses to optimize their
product placement and marketing strategies.
1. Fraud detection: GSP mining can be used to identify patterns of behavior that are
indicative of fraud, such as unusual patterns of transactions or access to sensitive data.
2. Website navigation: GSP mining can be used to analyze website navigation patterns,
such as the sequence of pages visited by users, and identify areas of the website that are
frequently accessed or ignored.
3. Sensor data analysis: GSP mining can be used to analyze sensor data, such as data from
IoT devices, and identify patterns in the data that are indicative of certain conditions or
states.
4. Social media analysis: GSP mining can be used to analyze social media data, such as
posts and comments, and identify patterns in the data that indicate trends, sentiment, or
other insights.
5. Medical data analysis: GSP mining can be used to analyze medical data, such as patient
records, and identify patterns in the data that are indicative of certain health conditions
or trends.
• Apriori-based Approaches
o GSP
o SPADE
• Pattern-Growth-based Approaches
o FreeSpan
o PrefixSpan
2. 200 <(ad)c(bcd)(abe)>
3. 300 <(ef)(ab)(def)cb>
4. 400 <eg(adf)CBC>
Transaction: The sequence consists of many elements which are called transactions.
k-length Sequence:
The number of items involved in the sequence is denoted by K. A sequence of 2 items is called a
2-len sequence. While finding the 2-length candidate sequence this term comes into use.
Example of 2-length sequence is: {ab}, {(ab)}, {bc} and {(bc)}.
• {bc} denotes a 2-length sequence where b and c are two different transactions. This can
also be written as {(b)(c)}
• {(bc)} denotes a 2-length sequence where b and c are the items belonging to the same
transaction, therefore enclosed in the same parenthesis. This can also be written as
{(cb)}, because the order of items in the same transaction does not matter.
Support means the frequency. The number of occurrences of a given k-length sequence in the
sequence database is known as the support. While finding the support the order is taken care.
Illustration:
s1: <a(bc)b(cd)>
s2: <b(ab)abc(de)>
We need to find the support of {ab} and {(bc)}
Since, b and c are present in same element, their order does not matter.
s1: <a(bc)b(cd)>, first occurrence.
s2: <b(ab)abc(de)>, it seems correct, but is not. b and c are present in different elements
here. So, we don’t consider it.
Hence, support of {(bc)} is 1.
Pruning Phase:
While building Ck (candidate set of k-length), we delete a candidate sequence that has a
contiguous (k-1) subsequence whose support count is less than the minimum support
(threshold). Also, delete a candidate sequence that has any subsequence without minimum
support.
Here are some of the scenarios where machine learning can help in tackling the challenges of
data mining.
1. The quality of the output of data mining tools depends on the data quality. It sometimes may
not even address the data quality issues. This leads to wrong results as the tool analyzes faulty
data. So, it is important to clean the data before processing it.
In such situations, machine learning algorithms are recommended as they can be incorporated
with data mining tools to automate the data entry process and get quality data. This
combination can easily identify any duplicate data and eliminate it. After this, a random forest
algorithm can be used to classify the data.
2. Data mining tools can be used to identify process-related issues, but they cannot find the root
cause of the issues. Machine learning algorithms, on the contrary, can help in solving the
problem. We can also introduce software with root cause analysis and data mining tools that
can tackle these kinds of issues.
3. Real-time data can be structured and unstructured. Some traditional data mining tools can
process only structured data and, therefore, are not applicable to unstructured data. This can be
solved by using these two machine learning algorithms - Optical Character Recognition (OCR)
and Natural Processing Language (NLP).
4. Sometimes, data mining tools provide less clarity when processing a large number of
variables. The addition of data increases the complexity of the data mining outputs which is
hard for humans to understand. Data mining tools integrated with machine learning algorithms
and computer vision help to overcome this. Hence processed data can be captured and the
relevant output can be generated.
5. Data mining tools analyze the past performance of the process rather than analyzing the
ongoing process. They cannot guarantee predicting performance in the future. Using machine
learning applications with data mining can predict the final results and future events. They also
send an alert message to users if there are any shortcomings and if any improvements are
required.
Web Mining
Web Mining is the process of Data Mining techniques to automatically discover and extract
information from Web documents and services. The main purpose of web mining is to discover
useful information from the World Wide Web and its usage patterns.
Web mining is the process of discovering patterns, structures, and relationships in web data. It
involves using data mining techniques to analyze web data and extract valuable insights. The
applications of web mining are wide-ranging and include:
• Search engine optimization: Web mining can be used to analyze search engine queries
and search engine results pages (SERPs). This information can be used to improve the
visibility of websites in search engine results and increase traffic to the website.
• Fraud detection: Web mining can be used to detect fraudulent activity on websites.
This information can be used to prevent financial fraud, identity theft, and other types of
online fraud.
• Sentiment analysis: Web mining can be used to analyze social media data and extract
sentiment from posts, comments, and reviews. This information can be used to
understand customer sentiment towards products and services and make informed
business decisions.
• Web content analysis: Web mining can be used to analyze web content and extract
valuable information such as keywords, topics, and themes. This information can be
used to improve the relevance of web content and optimize search engine rankings.
• Customer service: Web mining can be used to analyze customer service interactions on
websites and social media platforms. This information can be used to improve the
quality of customer service and identify areas for improvement.
• Healthcare: Web mining can be used to analyze health-related websites and extract
valuable information about diseases, treatments, and medications. This information can
be used to improve the quality of healthcare and inform medical research.
Web mining can be broadly divided into three different types of techniques of mining: Web
Content Mining, Web Structure Mining, and Web Usage Mining. These are explained as following
below.
• Web Content Mining: Web content mining is the application of extracting useful
information from the content of the web documents. Web content consist of several
types of data – text, image, audio, video etc. Content data is the group of facts that a web
page is designed. It can provide effective and interesting patterns about user needs. Text
documents are related to text mining, machine learning and natural language
processing. This mining is also known as text mining. This type of mining performs
scanning and mining of the text, images and groups of web pages according to the
content of the input.
• Web Usage Mining: Web usage mining is the application of identifying or discovering
interesting usage patterns from large data sets. And these patterns enable you to
understand the user behaviors or something like that. In web usage mining, user access
data on the web and collect data in form of logs. So, Web usage mining is also called log
mining.
• Dynamic data source in the internet: The required online data is updated in real time.
For instance, news, weather, fashion, finance, sports, and so forth is not possible to
indicate properly.
• Data relevancy: It is much believed that a particular person is typically only concerned
with a limited percentage of the internet throughout the process, with the remaining
portion containing data that may provide unexpected outcomes for the actual
requirement and is unfamiliar to the user to verify.
• Too much large web: Basically, the web is getting bigger and bigger very quickly in the
system. The web seems to be too big for data mining and data warehousing as per
requirement.
Data Mining is the process that Web Mining is the process of data
attempts to discover pattern and mining techniques to automatically
Definition
hidden knowledge in large data discover and extract information from
sets in any system. web documents.
Data Mining is very useful for web Web Mining is very useful for a
Application
page analysis. particular website and e-service.
Parameters Data Mining Web Mining
Target Users Data scientist and data engineers. Data scientists along with data analysts.
Clustering, classification,
Web content mining, Web structure
Problem Type regression, prediction,
mining.
optimization and control.
It includes tools like machine Special tools for web mining are Scrapy,
Tools
learning algorithms. PageRank and Apache logs.
Depending upon the type of Web Structural data, Web Structure Mining can be categorised
into two types:
1.Extracting patterns from the hyperlink in the Web: The Web works through a
system of hyperlinks using the hyper text transfer protocol (http). Hyperlink is a structural
component that connects the web page according to different location. Any page can create a
hyperlink of any other page and that page can also be linked to some other page. the
intertwined or self-referral nature of web lends itself to some unique network analytical
algorithms. The structure of Web pages could also be analyzed to examine the pattern of
hyperlinks among pages.
2. Mining the document structure. It is the analysis of tree like structure of web page
to describe HTML or XML usage or the tags usage . There are different terms associated with
Web Structure Mining :
PageRank (PR) is an algorithm used by Google Search to rank websites in their search engine
results. PageRank was named after Larry Page, one of the founders of Google. PageRank is a
way of measuring the importance of website pages. According to Google:
PageRank works by counting the number and quality of links to a page to determine a rough
estimate of how important the website is. The underlying assumption is that more important
websites are likely to receive more links from other websites.
rank of a page is dependent on the number of pages and the quality of links pointing to the
target node.
So, we can say that the Web Structure Mining is the type of Mining that can be performed
either at the document level (intra-page) or at the hyperlink level (inter-page). The research
done at the hyperlink level is called as Hyperlink Analysis. the Hyperlink Structure can be
used to retrieve useful information on the Web.
Web structure Mining basically has two main approaches or there are two basic strategic
models for successful websites:
• Page rank
• Hubs and Authorities
• Hubs: These are pages with large number of interesting links. They serve as a hub or
a gathering point, where people visit to access a variety of information. More focused
sites can aspire to become a hub for the new emerging areas. The pages on website
themselves could be analyzed for quality of content that attracts most users.
• Authorities: People usually gravitate towards pages that provide the most complete
and authentic information on a particular subject. This could be factual information,
news, advice, etc. these websites would have the most number of inbound links from
other websites.
Web data are generally semi-structured and/or unstructured, while data mining is primarily
concerned with structured data . It performs scanning and mining of text, image and images,
and groups of web pages according to the content of input by displaying the list in search
engines.
For Example: if the user is searching for a particular song then the search engine will display
or provide suggestions relevant to it.
Web content mining deals with different kinds of data such as text, audio, video, image, etc.
1. Agent-Based Approaches:
• Intelligent- Search- This type of search basically refers to a particular goal of the user
and will return the results based on the conclusion of that goal.
• Information-Filtering/ Categorization – This type of search basically deals with the
filtering of data, i.e., removal of unwanted information or redundant information using
certain ai based methods. Like, HyPursuit, BO ( Bookmark Organizer).
• Growth of Sophisticated AI systems replacing users in an automated or unautomated
manner. One of these is Deep Learning, wherein the system is trained by feeding it with
certain kinds of data.
2. Database Approaches:
Used for transforming unstructured data into a more structured and high-level collection of
resources, such as in relational databases, and using standard database querying mechanisms
and data mining techniques to access and analyze this information.
• Multilevel Databases:
o Lowest Level – semi-structured information is kept.
o High Level- generalization from lower levels organized into relations and
objects.
• Web Query Systems:
o Web-query systems are developed such as SQL, and Natural Language
Processing for extracting data.
1. Pre-processing
2. Clustering
3. Classifying
4. Identifying the associations
5. Topic identification, tracking, and drift analysis
Web usage mining, a subset of Data Mining, is basically the extraction of various types of
interesting data that is readily available and accessible in the ocean of huge web pages, Internet-
or formally known as World Wide Web (WWW). Being one of the applications of data mining
technique, it has helped to analyze user activities on different web pages and track them over a
period of time. Basically, Web Usage Mining can be divided into 2 major subcategories based on
web usage data.
1. Web Content Data: The common forms of web content data are HTML, web pages, images
audio-video, etc. The main being the HTML format. Though it may differ from browser to
browser the common basic layout/structure would be the same everywhere. Since it’s the most
popular in web content data. XML and dynamic server pages like JSP, PHP, etc. are also various
forms of web content data.
2. Web Structure Data: On a web page, there is content arranged according to HTML tags
(which are known as intrapage structure information). The web pages usually have hyperlinks
that connect the main webpage to the sub-web pages. This is called Inter-page structure
information. So basically relationship/links describing the connection between webpages is
web structure data.
3. Web Usage Data: The main source of data here is-Web Server and Application Server. It
involves log data which is collected by the main above two mentioned sources. Log files are
created when a user/customer interacts with a web page. The data in this type can be mainly
categorized into three types based on the source it comes from:
• Server-side
• Client-side
• Proxy side.
There are other additional data sources also which include cookies, demographics, etc.
1. Web Server Data: The web server data generally includes the IP address, browser logs,
proxy server logs, user profiles, etc. The user logs are being collected by the web server data.
2. Application Server Data: An added feature on the commercial application servers is to build
applications on it. Tracking various business events and logging them into application server
logs is mainly what application server data consists of.
3. Application-level data: There are various new kinds of events that can be there in an
application. The logging feature enabled in them helps us get the past record of the events.
• Privacy stands out as a major issue. Analyzing data for the benefit of customers is good.
But using the same data for something else can be dangerous. Using it within the
individual’s knowledge can pose a big threat to the company.
• Having no high ethical standards in a data mining company, two or more attributes can
be combined to get some personal information of the user which again is not
respectable.
1. Personalization of Web Content: The World Wide Web has a lot of information and
is expanding very rapidly day by day. The big problem is that on an everyday basis the
specific needs of people are increasing and they quite often don’t get that query result.
So, a solution to this is web personalization. Web personalization may be defined as
catering to the user’s need-based upon its navigational behavior tracking and their
interests. Web Personalization includes recommender systems, check-box
customization, etc. Recommender systems are popular and are used by many
companies.