0% found this document useful (0 votes)
41 views6 pages

V3i416 PDF

Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
41 views6 pages

V3i416 PDF

Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 6

Volume 3, Issue 4, April-2016, pp.

204-209 ISSN (O): 2349-7084

International Journal of Computer Engineering In Research Trends


Available online at: www.ijcert.org

A Survey on Web Page Recommendation


and Data Preprocessing
1
Ms. Sonule Prashika Abasaheb, 2Prof. Tanveer I. Bagban

1 (M.E. PART-II, Department of Computer Science and Engineering, D.K.T.E. Societys Textile
and Engineering Institute, Ichalkaranji, Shivaji University, Kolhapur, Maharashtra, India. Email:
[email protected])

2 (Associate Professor, Department of Information Technology, D.K.T.E. Societys Textile and


Engineering Institute, Ichalkaranji, Shivaji University, Kolhapur, Maharashtra, India. Email:
[email protected])

Abstract: - In todays era, as we all know internet technologies are growing rapidly. Along with this, instantly, Web page
recommendations are also improving. The aim of a Web page recommender system is to predict the Web page or pages,
which will be visited from a given Web-page of a website. Data preprocessing is one basic and essential part of Web page
recommendation. Data preprocessing consists of cleanup and constructing data to organize for extracting pattern. In this
paper, we discuss and focus on Web page Recommendation and role of data preprocessing in Web page recommendation,
considering how data preprocessing is related to Web page recommendation.

Keywords - Recommender System, Web server logs, Web mining, Web usage mining, Data Preprocessing.

I.INTRODUCTION designed. Thus, even though recommenders have been


appreciated and valued in experimental research, they
The unpredictable increase and growth of
have not been commercially successful [1]. These are one
information on the World Wide Web, with the progress of the problems faced by user while accessing most
of innovative electronic devices, has made information interested Web-pages on website.
of Web increasingly important in everyones life. In
todays era, as we all know internet technologies are The main reason behind these above given problems, is
growing rapidly, so web has become large storage of the huge amount of explosive growth of information,
information and this amount of information grows with which is irrelevant and noisy. Considering the above
high and rapid rate of change without any control of mentioned problems, it seems that there was a need of
editor; consequently, websites are also introduced cleaning and construct or structure that irrelevant data.
rapidly with new innovations. This constructing data from noisy and complex data, is
nothing but data preprocessing, we will elaborate this in
Web page recommender systems can recommend Web section 4.
pages automatically which are most interesting to a
particular user based on that current Web navigation The main problematic thing and difficulty is in accessing
behaviour of user. Regularly, Web users have to struggle most interested Web pages. Problems like this relates
for finding useful pages and are very probable to leave with usage of Web. Hence, there is a need of cleaning
the site, if the index pages of a website are not well and constructing or structuring Web log data, which is

2016, IJCERT All Rights Reserved Page | 204


Sonule Prashika Abasaheb et al., International Journal of Computer Engineering In Research Trends
Volume 3, Issue 4, April-2016, pp. 204-209

nothing but data preprocessing part in Web Usage The recommenders are normally implemented by
Mining [3]. Data preprocessing plays a vigorous role filtering algorithms categorized into three main types,
because of redundant irrelevant log data nature [4]. depending on how the recommendations are performed.
Thus, we find that, data preprocessing is one basic and Thus, recommendation process or methods can be
essential part of Web-page recommendation. This paper classified as below, according to source of knowledge
is structured as below: Section 2 comprises a review of used by them for recommendations ([5], [6]).
Web page recommendation. Section 3 clarifies
categorization of recommendation system and web Content based recommendations: This type of systems
mining, and it discusses how data preprocessing is use, Content-Based algorithms (CB), which filter, clean
related to Web page recommendation. Section 4 and recommend articles and items that are similar to
illustrates data preprocessing and its steps. Section 5 others accessed by the user in the past.
provides comparative analysis of data preprocessing
Collaborative Recommendations: Such process uses,
techniques use; and finally, section 6 gives the
Collaborative Filtering (CF) algorithms, that clean, filter
conclusion.
and recommend articles and items based on the other
II. WEB PAGE RECOMMENDATION user preferences. For example, it recommends items
which user has not accessed in the past but, mentors of
In todays era, along with the rapid growth of internet that user liked and accessed more in the past.
technologies, there is fast increase in innovated websites.
This overpowers Web users, by offering many choices. Knowledge-based Recommendation: To generate a
Therefore, Web users sometimes, probably make poor recommendation, this method use domain knowledge
decisions when they surf the Web. So there is a need of a about users and items, thus need to clean users log data.
system that recommends the Web pages to Web user, to
Hybrid Recommendations: This method combines
make their work easy, such systems are nothing but Web
methods or techniques of two or more recommendations
page recommender system.
from above explained categories, in order to gain better
In 1996, the term recommender system was first invented optimization of system.
at a workshop, and has been used inconsistently and
B. Classification of Web mining
imprecisely in published work [1]. Web-page
recommender systems have become valuable A common classification or taxonomy of Web mining is
increasingly for helping Web users to find the most done into three different types; Web content mining, Web
interesting and important Web-pages on specific structure mining and Web usage mining ([2], [3], [4]).
websites. Best Web page recommendations can improve
website usage along with Web user satisfaction. The 1) Web Content Mining: This is the process of extraction
important characteristic of the recommendation system is and integration of useful information, data and
to study from current users historic data and also from knowledge from Web page contents ([3], [10]). It is the
remaining users. The recommendation system decides process to discover useful information and web page
current users domain from the historic data of user, then contents like video, audio, hyperlinks as well as metadata
pages prediction is done according to the domain of user [2].
([4], [8]).
2) Web Structure Mining: A graph theory is used to
III. TAXONOMY analyse the node and connection of a website structure in
this process. A web graph structure, contains web pages
The recommendation system is one of the applications as nodes and hyperlinks as like edges connecting pages
exploited by results from web usage mining [4]. Thus, which are related to each other ([2], [3], [4]).
Web Page recommendations are mostly related to web
mining. Therefore, we will see categorization of both Web 3) Web Usage Mining: This is the data mining technique
Page recommendation Technique and Web mining in this application to discover usage patterns from web data. In
section and how data preprocessing is related to Web our viewpoint, access logs on server side are the usage
page recommendations. data which keeps user navigation information [3]. We
will focus on Web Usage Mining in this section, as the
A. Classification of recommendation technique: recommendation system is one of applications exploited
by results from web usage mining [4].

2016, IJCERT All Rights Reserved Page | 205


Sonule Prashika Abasaheb et al., International Journal of Computer Engineering In Research Trends
Volume 3, Issue 4, April-2016, pp. 204-209

3.1) Web Usage Mining: Web Usage Mining is a portion IV. DATA PREPROCESSING
of web mining which compacts and deals with the
interesting knowledge extraction and abstraction from According to recommendation technique categories, each
log files. Usage mining tools discover and predict user of it require to clean and filter data. Data preprocessing
behavior, in order to help the designer to improve the converts data into a format which will be more efficiently
web site, such that regular users can get a personalized and easily managed users purpose. The main data
and adaptive service or to attract visitors. Web Usage preprocessing task is to choose standardized data from
Mining is also termed as Web Usage Analysis or Web Log the initial log files, organized for algorithm of user
Mining or Analysis of Click Stream [3]. navigation pattern discovery [7]. This section discusses
the significance of data preprocessing methods and steps
Web Usage Mining consists of three main phases. These involved in getting required content, effectively.
three phases are Data preprocessing, Pattern discovery
and Pattern analysis ([2], [3]). Among these three phases, A] Data preprocessing steps:
data preprocessing plays a vigorous role because of
redundant and noisy nature of log data. Therefore, in Data preprocessing steps are methods or techniques that
next section we will focus on data preprocessing as it is can be applied according to source file available. These
basic and essential part of Web-page recommendation. preprocessing techniques are different for different
This unit gives an overview of these three phases as sources of log files. Various authors have shown that
below. which log file source needs which preprocessing
technique [2]. We will see this in section V below.
Data preprocessing: It consists of cleanup and
Various steps involved in data preprocessing phase are
constructing data to organize for extracting pattern. The
Data preprocessing phase includes some steps or shown through the figure 2; below, as given in [3]:
techniques ([2], [3]). Several data preprocessing
techniques have been used in improving other phases of
web usage mining like Pattern discovery and Pattern Data Cleaning
analysis.
Web Log
Pattern discovery: This phase deals with mining and User
File
extraction of information from preprocessed data i.e. Identification
results of data preprocessing phase. Techniques from
different fields such as data mining, pattern recognition
and machine learning are applied to web usage data in Data Session
order to discover users web access patterns [3]. Preprocessing Identification

Pattern analysis: This is the final stage of Web Usage


Mining. Its goal is to eliminate the irrelative patterns in
Path Completion
order to extract the user interesting patterns from the
results of the pattern discovery process explained above
[3]. Figure 1, below shows these Web usage mining
Transaction
phases, by a basic structure [3].

Figure 2. Data preprocessing steps.


Data preprocessing
1) Data Cleaning: It is a process of removing items such
as gif files, jpeg or sound files and references due to
Pattern Discovery spider navigations which are irrelevant. Data quality
which will be improved also improves analysis on data
quality ([3], [8]).
Pattern analysis 2) User Identification: This is an important step in web
Figure 1. Basic structure Web Usage Mining. usage mining of individual user identification that

2016, IJCERT All Rights Reserved Page | 206


Sonule Prashika Abasaheb et al., International Journal of Computer Engineering In Research Trends
Volume 3, Issue 4, April-2016, pp. 204-209

accesses a web site. This is essential phase to determine For example, (log file source) Server Log File needs
who has accessed Website and which pages are mostly preprocessing techniques Data Cleaning, Log File
accessed ([3], [8]). Filtering, Session Identification and User Identification;
and Source log file like- English Study Web site Log File
3) Session Identification: A sequence or series of web needs techniques such as Data Cleaning, User
pages that user browses in a single access is called as Identification, Session Identification, Path Completion,
session. The objective of session identification is to Transaction Identification; and algorithms, like
discover and divide the page accesses of each user Reference Length and Maximal Reference Length. This
session, into individual separate sessions ([3], [8], [9]). is illustrated by some examples in analysis table I as
mentioned below [2].
4) Path Completion: Path completion is used to obtain
the complete user access path. This step is difficult but TABLE I. ANALYSIS FOR PREPROCESSING BASED ON
important as this is final step in which user session file is SOURCE OF LOG FILE
completed [3].

5) Transaction Identification: The aim of transaction Source of Preprocessing Algorithms Authors


identification is to create important reference cluster for log file technique Applied
each user. So, this is done by merging or dividing
English Study Data Cleaning Maximal Forward Yan LI,
approaches. Both approaches has a transaction list and Web site Log User References(MFR), Boqin
some parameters as an input; and a final transaction list File Identification Reference Length FENG and
to be operated on by a function in the module which is Qinjiao
Session MAO [10]
same like input format, is nothing but the output [3]. Identification

V. COMPARATIVE ANALYSIS Path Completion

Transaction
As shown in section two Web page recommendation
systems, use different algorithms according to the type of Identification
recommender system. The algorithms like Content-Based
algorithms (CB) and Collaborative Filtering (CF) IIS Server Log Data Cleaning Based on referred Ling Zheng,
algorithms are used by recommendation system based on File web page and Hui Gui
User fixed priori and Feng Li
their type. Content based recommendations systems use,
Identification threshold [11]
Content-Based algorithms (CB). Collaborative
Recommendations systems use, Collaborative Filtering Session
(CF) algorithms, that clean, filter and recommend articles Identification

and items based on the other user preferences ([5], [6]). Path Completion

Data Preprocessing Techniques and algorithms can be Web server Data Based on JING
applied according to source file available. These Log file Preprocessing Collaborative Chang-bin
Filtering and Chen Li
preprocessing techniques explained above are different [12]
for different sources of log files. Various Authors has
shown that which log file source needs which
Chizhou Data Filtering Frame page and Fang
preprocessing technique and which algorithms can be College Session Page Threshold Yuankang
applied. Website Identification and Huang
Zhiqiu [13]

According to the table above, the details of algorithms


and data preprocessing implementation of web usage
mining are presented by Yan Lis paper [10]. It explains
the reference length algorithm, which modifies the
reference length of pages in complete path by considering

2016, IJCERT All Rights Reserved Page | 207


Sonule Prashika Abasaheb et al., International Journal of Computer Engineering In Research Trends
Volume 3, Issue 4, April-2016, pp. 204-209

the average reference length of auxiliary pages which is 3. Naga Lakshmi et al., An Overview of Preprocessing
estimated and valued in advance through the maximal on Web Log Data for Web Usage Analysis, International
forward references and reference length algorithms. Journal of Innovative Technology and Exploring
Engineering ISSN: 2278-3075, Volume-2, Issue-4, March
FP-Growth Algorithm was used by Huiping Peng [14] for 2013.
the web log records to be processed and a set of frequent
access patterns to be accessed. Then using both 4. Mitali Srivastava, Rakhi Garg, Preprocessing
combinations of site topology interestingness and browse Techniques in Web Usage Mining: A Survey.
interestingness of association rules for web mining a new
pattern to provide valuable data for the site construction 5. J. Manuel Adn-Coello, C. M. Tobar, Y. Yuming,
was exposed. Improving the Performance of Web Service
Recommenders Using Semantic Similarity. In JCS&T
To solve some problems that exist in traditional data Vol. 14, No. 2, October 2014.
preprocessing technology for web log mining, an
improved data preprocessing technology is used by the 6. Sabanaz S. Peerzade, Vanita D. Jadhav, A Review on
author ling Zheng [11]. The algorithms based on fixed Web Service Recommendation System Using
priori threshold and referred web page were used. Collaborative Filtering, Volume 3, Issue 3, March 2015.
These were applied on IIS Server Log file source of log
7. Chaoyang Xiang, Shenghui He and Lei Chen, A
file. Data Cleaning, User Identification, Session
Studying System Based On Web Mining, IEEE
Identification and Path Completion were the
International Symposium On Intelligent Ubiquitous
preprocessing techniques applied on such sources of file.
Computing and Education, pp.433-435, 2009.
A Web log data preprocessing algorithm, which is based
8. A Survey on Preprocessing Methods for Web Usage
on collaborative filtering, was brought by JIANG Chang-
Data, (IJCSIS) International Journal of Computer Science
bin and Chen Li [12]. Even though statistic data are not
and Information Security, Vol. 7, No. 3, 2010.
enough and records visiting user history are absent it can
perform user session identification fast, rapid and 9. R. Cooley, B. Mobasher, J. Srivastav (1999), Data
flexibly. preparation for mining world wide web browsing
pattern in Journal of Knowledge and Data Engineering
Algorithms named as Frame page, Page Threshold were
Workshop, IEEE, Vol.1 .
applied on Chizhou College Website. Data preprocessing
techniques; Data Filtering and Session identification were 10. Yan LI, Boqin FENG and Qinjiao MAO, Research on
used as in [13]. Path Completion Technique in Web Usage Mining, IEEE
International Symposium on Computer Science and
VI. CONLUSION Computational Technology, pp. 554-559, 2008.
This paper illustrated Web page recommendation and its
11. Ling Zheng, Hui Gui and Feng Li, Optimized Data
types to know what Web page recommendation is and
Preprocessing Technology For Web Log Mining, IEEE
later it focuses on how data preprocessing is important
International Conference on Computer Design and
part of Web page recommendation. Thus, data
Applications, pp. VI-19-VI-21, 2010.
preprocessing plays a vigorous role in reducing and
removing redundant, noisy and irrelevant nature of log 12. JING Chang-bin and Chen Li, Web Log Data
data; and it is basic phase and essential for Web page Preprocessing Based on Collaborative Filtering, IEEE
recommendation. 2nd International Workshop on Education Technology
and Computer Science, pp.118-121, 2010.
REFERENCES
13. Fang Yuankang et al., A Session Identification
1. Ben Schafer, Joseph A. Konstan, and John T. Riedl,
Algorithm Based on Frame Page and Page threshold,
Recommender Systems for the Web.
IEEE Conference, pp.645- 647, 2010.
2. Vijayashri Losarwar et al., Data Preprocessing in Web
14. Huiping Peng, Discovery of Interesting Association
Usage Mining International Conference on Artificial
Rules Based on Web Usage Mining, IEEE Conference,
Intelligence and Embedded Systems July 15-16, 2012
pp.272-275, 2010.
Singapore.

2016, IJCERT All Rights Reserved Page | 208


Sonule Prashika Abasaheb et al., International Journal of Computer Engineering In Research Trends
Volume 3, Issue 4, April-2016, pp. 204-209

AUTHOR PROFILE
1. Ms. Sonule Prashika Abasaheb is student of M.E.
PART-II, Department of Computer Science and
Engineering, D.K.T.E. Societys Textile and Engineering
Institute, Ichalkaranji, Shivaji University, Kolhapur,
Maharashtra, India. Her interest area is Web Mining.

2. Prof. Tanveer I. Bagban is working as associate


professor in, department of Information Technology,
D.K.T.E. Societys Textile and Engineering Institute,
Ichalkaranji, Shivaji University, Kolhapur, Maharashtra,
India. He has 14 years of Teaching Experience. His
interest area is Web Mining, Information Extraction. He
has published about 4 papers in international journals
and 2 papers in National Conferences.

2016, IJCERT All Rights Reserved Page | 209

You might also like