0% found this document useful (0 votes)
45 views5 pages

User Web Usage Mining For Navigation Improvisation Using Semantic Related Frequent Patterns

This document proposes a method to improve user navigation on websites by mining user web usage logs and incorporating semantic information. It involves generating frequent patterns from the web usage data using semantic relations between pages. The quality of the patterns is evaluated using standard methods. Experimental results show that presenting navigation options using these semantic-enriched frequent patterns can improve measures of user navigation on a site. The proposed approach combines conventional web usage mining techniques with semantic information from ontologies to generate more precise navigation recommendations.

Uploaded by

Vaibhav Sawant
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
45 views5 pages

User Web Usage Mining For Navigation Improvisation Using Semantic Related Frequent Patterns

This document proposes a method to improve user navigation on websites by mining user web usage logs and incorporating semantic information. It involves generating frequent patterns from the web usage data using semantic relations between pages. The quality of the patterns is evaluated using standard methods. Experimental results show that presenting navigation options using these semantic-enriched frequent patterns can improve measures of user navigation on a site. The proposed approach combines conventional web usage mining techniques with semantic information from ontologies to generate more precise navigation recommendations.

Uploaded by

Vaibhav Sawant
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 5

User Web Usage Mining for Navigation Improvisation Using

Semantic Related Frequent Patterns


Mr. N.P.Jilhedar Dr. S. K. Shirgave
M.E. Student, CSE Department, Associate Professor & Head,
D. Y. Patil College of Engineering & Technology, IT Department, DKTES. T.E.I.,
Kolhapur, Maharashtra, India. Ichalkaranji, Maharashtra, India.
[email protected] [email protected]

Abstract - Web sites have abundant web usage log standard web user navigation or structure of a website
which provides useful information that can be used for and the content of the web pages. Many researches
user navigation improvisation. Traditional web site have been directed to constantly improve the process
does not use this rich web usage data for any of customizing the web, most of them simply using the
investigation. It can be used to generate efficient navigation patterns of users [16][18]. However, with
frequent patterns which can support in user navigation the frequency of web pages grows, personalization
improvisation. It can also be help in re-organizing web based on web usage mining has the defect of not
site for efficient navigation. In this paper we propose a taking the context of the website in mind. Thus
frequent pattern generation approach using semantic semantic web, which elaborates the context of a web
relations with user web usage data. The quality of web page, is equally important to consider the concept.
usage pattern generated is measured with standards Although some research has explored this area, there
methods for evaluation. Experiment results show that is still room for improvement.
more precise presentation using user pattern
generation can improve user navigation measures. Semantic Web is focused on making content
comprehensible website not only by humans but also
Keywords: Web mining, Usage Mining, Navigation, by computers. To accomplish this, it helps the
Semantic, Frequent Pattern. software agents to look for expected contents. Hence
in increased efforts in the annotation of Web pages
1. INTRODUCTION and objects as semantic information using ontologies
(such as product catalogs or hierarchies of concepts)
The expansion in the dimension of the World Wide are observed. Ontological instances can be built by
Web (WWW) has made it the place of tremendous using Web site specific domain knowledge [19][20].
interest for the e-commerce, Web services and Web Therefore, in this paper, the semantic information of a
information system. Research is being done website can be combined with patterns generated by
enormously in order to maximize the advantage of conventional mining web use to generate frequent
using the web sites for such web based applications patterns navigation enriched with semantic
[13][14]. It is the ability of a site to keep visitors on a information of web pages.
deeper level and to successfully guide with useful
information, which is seen as a key point in the final 2. RELATED WORKS
success of the site. However lacking in the size,
structure and complexity of the Web, it is the Various data mining techniques [20][21] can be used
challenging task to access the relevant information to model and understand the Web user activity [4][8].
efficiently. Web Usage Mining (WUM) is the However WUM process can be divided into three
approach to extract the knowledge from analysis of inter-dependent stages as data collection and
web usage data about a particular website [5][7]. This preprocessing, pattern discovery and pattern analysis.
usage data can be obtained from server logs and can The preprocessing stage consists of cleaning the click
analyze the behavioral patterns and profiles those stream data obtained from server logs and partitioning
interact with the web sites. This analyzed data is into set of user transactions to represent individual
beneficial and can be used for different needs such as users' activity [3].
web personalization, recommender systems,
presentation of promotional contents etc. Different statistical and database operations are
performed in pattern discovery stage to obtain the
Web mining is a process that allows searching and patterns reflecting behavior of users. Some of the
predicting users’ interests and helps in personalizing techniques of discovery and analysis of common
the web. Web mining deals with the analysis of patterns are session and visitor analysis, cluster
analysis, and correlation analysis of association and defined by a metric to evaluate the goodness of quality
sequential pattern analysis and navigation[1][2]. implementation.

2.1 Web Usage Mining 3. PROPOSED FREQUENT PATTERN


GENERATION APPROACH
Web usage mining is the appliance of data mining
methods for analyzing recordings of the use of the Web usage mining, the first function is to record web
website, especially in the form of web server logs. resource requests made by visitors to a website, a web
A central problem is to be found in a large number of server gather this information mostly[10][11]. The
models in the standard model identified the following content and structure of web pages in particular to
as the most interesting. The growth of the information website and the authors of the pages is to reflect the
available on the World Wide Web has become intentions of the designers and the underlying
necessary for organizations to discover ways to gain information architecture. The actual structural
an edge over their competitors to examine the patterns behavior of the users of these resources can be
found. provided.

Jespersen et al [9] proposed a hybrid approach to An approach to the process of creating a model of
analyze visitor click-stream sequences. Hypertext Web navigation, where frequent navigation patterns
probabilistic grammar approach in a combination and ontology instances are formed instead of a Web page
sequence of the table to be used for general purposes addresses websites into a framework that we have
Weblog mining is used for mining. Mobasher et al [6] developed to integrate semantic information as shown
presented web personalization system, associated with in figure-1. Evaluation of the generated patterns
the proposed mining offline tasks, the use of data and quality is measured by a mechanism involving the
knowledge discovery based on Web page automatic recommendation of the Website.
customization process is online. Lumberjacks by Chi
et.al [22] are user profiles are constructed by
combining both clustering and user sessions and
statistical traffic analysis with traditional k-means
algorithm.

2.2 Pattern Discovery

Web usage mining application of data mining


techniques is to discover the ways to use the data of
the website to understand and better meet the needs of
Web-based applications to serve [15][17].Web usage
mining consists of three phases, namely pre-
processing, recognition and analysis of pattern.

R. Cooley et al. [12] propose that the process of web Figure-1 Framework of Frequent Pattern Extraction
mining can be divided into two main sections. The
first part of the transformation of the domain-
3.1 Pre-processing
dependent Web data into a suitable form of transaction
processes. This preprocessing transaction data is for
Preprocessing involves in removal of noisy and
the identification and integration of the components. In irrelevant data, and in addition to this, the semantic
the second part, data mining and pattern matching information of web pages with your registration data is
techniques such as association rules and sequential also integrated. In this step log server files are pruned
patterns. and transactions are extracted and individual’s
ontology class is assigned to the address of the web
Using conventional techniques, the quality of the web page.
usage patterns generated by the mining and next page
prediction accuracy is limited by the recommendation.
A. Pruning Process
Therefore, the objective is to design a framework to
correct this problem by combining the conventional In this step did not answer requests and petitions made
scheme with semantic information, which is clearly by software agents (eg, web crawlers) disposed of by
using error codes and access to information in logs information structure, user profiles, website content,
records. etc. Web usage Log data is collected from
reachouthyderabad.com website and we modify the
B. Extraction of Navigation History user ip address as per our requirement for the
evaluation. A web log is a transcript of transactions
Browsing history is a set of Web objects requested by made between a group of users and group of servers.
the user in his / her active session. This step is Figure-2 show several lines of a typical log. For
analogous to the process of the session. The historical reasons, many web logs use the same format.
implemented method extracts the log data using the The fields are separated by white space, typically a
Internet and stores them in the database. This data will single space, although some fields are additionally
be used for the pattern generation and result quoted.
evaluation.

C. Mapping to Ontology Instances

In the last step of preprocessing is conducted mapping


between ontological instances and web address
requested in the Web server log. For this, ontologies
and ontology instances will be constructed using Web
site domain information.

3.2 Rule and Pattern Generation

Pattern Generation phase incorporates sequential


association mining rule. In this case of association
mining rule, the frequent patterns generated tend to
maintain the sequence relation between the set of
items discovered. The pattern generation and
evaluation is done in different minimum support Figure -2: Web Usage Log Data
threshold values. The generated pattern outputs are
stored in database for presenting frequent navigation 4.2 Generated Frequent Pattern
pattern to the user.
This process executes over the extracted log data and
3.3 Result Evaluation Measures performs the sequential pattern generation in related to
different user with different support threshold. The
To evaluate generated patterns, preference is given to obtained results are shown in figure-3.
build recommendation engines. In this step the
approach to follow is that the page the user visits early
in the visit, generally do not affect the next page,
because users have the habit of clicking what is
referred by recent pages. For this purpose the concept
of window count can be used, which is the maximum
number of previous pages to be taken into account for
recommendation, which user has visited already.
Recommendation is generated by comparing active
user's navigation history and sequential association
rules.

4. EXPERIMENT EVALUATION

4.1 Data sets

A large volume of data was collected by web servers


from web sites. This data is stored in log files of Web
access. Along with the log files of Web access, other Figure-3: Frequent Pattern Results of different Users
data can be used in Web usage Mining as web at different Support Threshold
5. RESULT EVALUATIONS C. F1-Measure: Genrally, high precision and
coverage is more likely preferable. The measure
5.1 Evaluation Measures that captures this is the F1-measure, defined as,

The obtained results data pattern will be evaluated 2 precision R, t coverage R, t


F1 R, t
based on the following measures. Once the precision R, t coverage R, t
recommendation set is ready then the rules are mapped
back to the Webpage locations and recommended to D. R-Measure: In this work the precision of metric
the users. The actual evaluation of this method is and coverage with threshold will be used to
performed by using the statistical technique tenfold determine goodness of the recommendation.
cross validation to define the effectiveness in terms of Another measure call as, R measure is calculated
coverage, precision, F1 and R measure. by dividing the coverage by the range of the
recommendation set,
A. Precision: It is calculated as the number of
recommendations for the ratio of the relevant coverage R, t
recommendations. R R, t
|R|
|R| |t w| 5.2 Result Analysis
,
|R|
We evaluate the proposed mechanism with different
B. Coverage: It is calculated to measure the users with different support threshold and computed
effectiveness of recommendation system to the following measures as shown in Table-1. The
produce correct recommendation, given by, obtained result shows an improvisation in precision
rate and an optimized coverage, F1-mwasure and R1-
|R| |t w| Measures.
,
|R|

Table-1: Result Evaluation Measures of different Users at different Support Threshold

Log Users Sup. Th. Precision Coverage F1-Measure R1-Measure


User1 2 0.5 0.5 0.5 0.011363637
4 0.5176471 0.48235294 0.4993772 0.010962567
6 0.5641026 0.43589744 0.49178174 0.00990676
8 0.5714286 0.42857143 0.48979595 0.0097402595
10 0.60273975 0.39726028 0.47888914 0.009028642
User2 2 0.5 0.5 0.5 0.02631579
4 0.5135135 0.4864865 0.49963477 0.025604552
6 0.5135135 0.4864865 0.49963477 0.025604552
8 0.5135135 0.4864865 0.49963477 0.025604552
10 0.5277778 0.4722222 0.49845678 0.024853801
User3 2 0.5 0.5 0.5 0.011904762
4 0.54545456 0.45454547 0.4958678 0.010822511
6 0.64615387 0.35384616 0.45727813 0.008424909
8 0.67741936 0.32258064 0.43704474 0.0076804915
10 0.7 0.3 0.42000002 0.0071428576
User4 2 0.5 0.5 0.5 0.02631579
4 0.5135135 0.4864865 0.49963477 0.025604552
6 0.6551724 0.3448276 0.45184305 0.01814882
8 0.76 0.24 0.36479998 0.012631578
10 0.7916667 0.20833333 0.3298611 0.010964912

6. CONCLUSION Web site and recommendations for restructuring. In


conventional use of the Web, the semantic information
Frequent use of the Web Site browsing patterns content of the Website does not take part in the
generated by the mining techniques provide valuable process of model generation. In this work, the use of
information for more than one application, such as a the Web in the form of frequent pattern mining
sequence patterns, we investigated the effect of [10]. Joshi K. P., Joshi A., Yesha Y., Krishnapuram, R.,
semantic information. “Warehousing and Mining We Logs”, Proceedings of the 2nd
ACM CIKM Workshop on Web Information and Data
A considerable improvement in the quality of the Management, pp. 63-68, 1999.
pattern is shown in the integration of semantic [11]. J Srivastava, R Cooley, Mukund Deshpande, P Ning Tan,
“Web Usage Mining: Discovery and Applications of Usage
information provided in experimental results. In add-
Patterns from Web Data”, SIGKDD Explorations, Vol. 1,
on, this mechanism handles new object problems also.
Issue 2, 2000.
Both the concept single and combined association
[12]. R. Cooley, B. Mobasher, and Jaideep Srivastava, “Web
rules defines that they haves high precision and
Mining: Information and Pattern Discovery on the WWW”, in
coverage values in compare to traditional coverage of
Proceedings of the 9th IEEE International Conference on
web usage mining. The improvisation is greater for the Tools with Artificial Intelligence, 1997.
combination of association rules, therefore, it can be [13]. A G Buchner, M Baumgarten, S S Anand, M D Mulvenna, J G
deduced that when the amount of contributing Hughes, “Navigation Pattern Discovery from Internet Data”, in
semantic information increases, the pattern also WEBKDD, San Diego, CA 1999.
increases quality. The conceptual analysis of [14]. F Masseglia, P Poncelet and R Cicchetti, “WebTool: An
individual patterns can be used to understand user Integrated framework for data mining”, In proc. of the 9th
intent. Whoever has the highest accuracy and coverage International Conf. on Database and Expert System
may reflect the intention of the user for browsing. Application, Italy, August, 1999.
[15]. D. Pierrakos, G. Paliouras, C. Papatheodorou, and C.
REFRENCES D.Spyropoulos, “WUM as a tool for Personalization: A
Survey”, User Modeling and User-Adapted Interaction,
[1]. P. Senkul, S.Salin, "Improving pattern quality in web usage 13(4):311-372, 2003.
mining by using semantic information" Springer -Verlag [16]. B. Mobasher, H. Dai, T. Luo, Y. Sun, and J. Zhu, “Integrating
London Limited 2011. web usage and content mining for more effective
[2]. J Deane, Praveen P, "Ontological analysis of web surf history personalization”, In Proceedings of the International
to maximize the clickthrough probability of web Conference on E-Commerce and Web Technologies, pages
advertisements". Springer - Elsevier 2009. 165-176, Greenwich, UK, 2000.
[3]. Ahu Sieg, B Mobasher, R Burke, "Learning ontology based [17]. R. Cooley, B. Mobasher, and J. Srivastava, “Web mining:
user profiles: A Semantic Approach to Personalized Web Information and pattern discovery on the world wide web”, In
Search", IEEE Int. Informatics Bulletin, 2007. Proceedings of the 9th IEEE International Conference on
[4]. B. Berendt, B. Mobasher, M. Nakagawa, and M. Spiliopoulou, Tools with Artificial Intelligence (ICTAI'97), November 1997.
“The impact of site structure and user environment on session [18]. J. Srivastava, R. Cooley, M. Deshpande, and P.-N. Tan, “Web
reconstruction in web usage analysis”, Proceeding for Fourth usage mining: discovery and application of usage patterns
WEBKDD Web Mining for Usage Patterns & User Profiles at from web data”. SIGKDD Explorations, 1(2):12-23, 2000.
KDD, 2002. [19]. W. Lin, S.A. Alvarez, and C. Ruiz. “Efficient adaptive support
[5]. B. Berendt and M. Spiliopoulou, “Analyzing navigation association rule mining for recommender systems”. Data
behaviour in web sites integrating multiple information Mining and Knowledge Discovery, 6:83-105, 2002.
systems”. The VLDB Journal, 9(1):56-75, 2000. [20]. R. Agrawal and R. Srikant. “Fast Algorithms for Mining
[6]. B. Mobasher, H. Dai, T. Luo, & M. Nakagawa., “Effective Association Rules”, Proceedings of the Twentieth International
Personalization Based on Association Rule Discovery from Conference on Very Large Data Bases, VLDB, pages 487-499.
Web Usage Data”, Proceedings of the 3rd Int’s Workshop on Morgan Kaufmann, September 1994.
Web Info. and Data Management, 2001. [21]. R. Agrawal and R. Srikant. “Mining Sequential Patterns”.
[7]. S. Gunduz, M. Tamer O, “A Webpage Prediction Model Based Proceedings of the Eleventh International Conference on Data
on Clickstream Tree Representation of User Behavior”, 9th- Engineering, March 6-10, 1995, Taipei, Taiwan, pages 3{14.
ACM SIGKDD Int’s Conf. on KDM Washington, DC, USA, IEEE Computer Society, 1995.
August 24 - 27, 2003. [22]. E.H. Chi, A. Rosien. and J. Heer, LumberJack: Intelligent
[8]. S E. Jespersen, J Thorhauge, T B Pederson, “A Hybrid Discovery and Analysis of Web User Traffic Composition. In
Approach to Web Usage Mining”,Dept. of Comp. Science Proceedings of ACM-SIGKDD Workshop on Web Mining for
Aalborg University, July 2002. Usage Patterns and User Profiles, Canada, ACM press, 2001
[9]. S.E. Jespersean, J. Throhauge, and T.Bach, “A hybrid
approach to WUM, Data Warehousing and Knowledge
Discovery”, Springer Verlag Germany, pp73-82, 2002.

You might also like