An Approach To Web Adaptation by Modelli
An Approach To Web Adaptation by Modelli
Abstract—User reviews provide a rich source of User-specific features pertain to users’ media
information regarding user interests. Many Web preferences. For example, a user would rather choose a
platforms allow or even encourage their visitors to leave graphical to text presentation. Information features refer to
their feedback regarding the products and services they representational differences and capabilities of media since
have consumed. The Term Frequency (TF) and the not all media are equally suitable for projecting the same
Inverse Document Frequency (IDF) are two factors that piece of information. Contextual information refers to user
have been used extensively in capturing users’ environmental conditions, such as noise, light, weather,
preferences. This paper collects users’ reviews from e- speed, etc. that may affect the presentation quality to the
tourism Web platforms, calculates the TF and the IDF user. Media constraints imply the need to effectively
for each user and adopts a multi-criteria approach in combine the characteristics and capabilities of different
order to quantify users’ preferences and dynamically media in order to improve the quality of presentation.
adapt the websites design accordingly. It utilizes the Limitations of technical resources relate to device
Analytic Hierarchy Process (AHP) and similarity limitations such as screen size, bandwidth, etc. With respect
methods in order to determine the relative importance of to content personalisation, the analysis of User Generated
terms and Web pages and then rearranges them in a Content (UGC) provides Web developers as well as service
new website structure. and product designers with valuable information regarding
users’ preferences as well as suppliers quality and potential
Keywords-Web Adaptation; TF-IDF; AHP; Multi-Criteria [4]. The Term Frequency (TF) and the Inverse Document
Analysis. Frequency (IDF) are used extensively in capturing user
I. INTRODUCTION preferences [5]. Several representational methodologies
have been proposed for developing user profiles. Most
According to the Internet Web statistics, there are frequently though are the three different formats namely:
approximately 7,634,758,428 Web users around the globe keywords, semantic networks and concept-based
[1]. Data also shows that more than 1.5 billion websites representations [5][6]. Keywords represent domains of
exist today, with more 200 million being active [2]. A great users’ preferences. They are associated with weights that
number of Web users often leave their feedback in the form indicate the strength of user interests for a particular topic.
of users’ reviews, thus developing a very rich source of Polysemy and Synonymy are problems associated with
information regarding services and products, customers’ keywords. Semantic networks, address these problems, by
needs and suppliers’ quality. Thus, a huge amount of representing keywords with nodes on graphs that are
information becomes available to users for almost every connected with each other, including co-occurrences.
single topic. Although this is a very promising development, Concept-based representations resemble semantic networks
at the same time searching through this vast ocean of data in in structure but they differ in having nodes to represent
order to identify the required information is quite of an abstract topics rather than keywords [5][6]. Filtering and
endeavour. Within the context of Web personalisation, clustering techniques are very useful in reducing the number
context personalisation aims at providing the right of concepts that are found on the Web when attempting to
information to the right user, while presentation formulate user profiles. However, [6] argues that these
personalisation focuses on presenting the content with the techniques lack effectiveness for they produce the same
most suitable media combination taking into account users’ structure of user preferences for users with different needs,
factors such as media limitations, users’ media preferences, thus failing to produce highly refine, accurate and
etc [3]. With respect to presentation personalisation, [3] personalised representations of individual users. Research
suggest 5 groups of factors to be taken into consideration: shows that while many approaches have been used in order
to produce and use user profiles, e.g. in Web
personalisation, recommender systems, etc., there exists no Step 2: Calculate the importance Wtk for each term (t k )
definite procedure for deriving user interests [6][7]. The
AHP have been used in documents ranking [8]. However, and formulate User-Interests Vector (UIV). Calculate the
the use of multi-criteria methods in analysing the TF-IDF is importance of each term (t k ) , using the following formula:
overlooked. This paper addresses the need for investigating
alternative ways of developing user preferences models and
suggests the analysis of the TF-IDF with the use of AHP. Wtk TFtk * IDFtk (1)
Thus, this research aims to propose a multi-criteria
approach based on the AHP and the TF-IDF for adapting where, Wtk , represents the weight of term (t k ) , TFtk , is
websites design according to users’ preferences relative
importance. The relative importance of users’ interests has Nu
not been considered in the literature. When it comes to the term frequency for term (t k ) , IDFtk log( ) , Nu ,
d tk
personalisation though, it is the relative importance of terms
for each individual user that would rank and distinguish is the total number of documents published by user (u) and
users’ interests and subsequently decide how to structure
websites. Web adaptation is a decision making process
d tk , represents the number of documents that contain term
where users would pairwise compare terms and decide (tk ) . The UIV shows the importance that each user
which ones they mostly prefer to know about. Their choices
influence the Web design, which needs to adapt to users’ perceives for each term. The UIV takes the following from:
preferences. The rest of the paper is structured as follows. UIVun {w(u ,1) , w(u , 2 ) ,..., w(u ,k ) } . Thus, UIV {w( u ,t ) } ,
Section II presents the proposed methodology and the
methods used for data analysis. In Section III, the paper where w(u ,t ) indicates the weight, i.e. the importance of
discusses the empirical study and the data analysis. Finally, term ( t 1,...k ) for user ( u 1,...n ). By combining all
the paper presents its conclusions in Section IV.
users’ preferences, the UIM matrix is formed.
II. METHODOLOGY AND METHODS
This paper aims to dynamically rearrange the structure w(1,1) w(1, 2) w(1,3) ... w(1,k )
of websites, according to user interests. By capturing and w( 2,1) w( 2, 2) w( 2,3) ... w( 2,k )
modelling user preferences, this paper proposes an approach
to reallocate Web pages based on their importance. Web UIM u w(3,1)
n
w( 3, 2) w(3,3) ... w( 3,k )
pages’ importance is calculated based on user priorities.
Data is collected from platforms such as TripAdvisor.com ... ... ... ... ...
and Booking.com. User reviews, regarding users’ stay in w( n,1) w( n, 2) w( n,3) ... w( n,k )
Greece and Italy hotels, were collected by using the Scrapy
Web crawler tool. The reviews were then analysed by Each row of the UIM matrix represents the preferences
utilizing the Knime text mining tool. Next, the importance of the user associated with the corresponding. The UIM will
of each term was calculated and analysed by utilizing the later be used to calculate the interests’ similarities among
Analytic Hierarchy Process (AHP) multicriteria analysis users and recommend users items to see.
method. In recent years, many researchers adopted Multi-
Criteria Decision Making (MCDM) approaches to problem Step 3: Evaluate the relative importance of each Web
solving such assessing alternative solutions, to selection page for every user using AHP.
problems, strategic analysis [9] etc. The steps of the
proposed methodology adopted follow. Each Web page is assessed in terms of the importance of
the terms it contains. Drawing on each user’s UIV, the AHP
A. Methodology for evaluating business strategy pairwise comparison matrix is calculated. The pairwise
based on Web analytics. comparison matrix that shows terms’ perceived importance
Step 1: Collect documents published by users. for each user takes the following form.
w(u ,i ) UPV u1
where a u (i , j ) , i , j {1,..., k } with i
w( u , j )
UPV u 2
and j indicating the terms and and u {1,... n} , where u, USM
u
, wp 1,... p , where wp
...
indicates the users. If i j then au ( i , j ) 1 .
UPV u wp
In order to proceed with the AHP, assume the following indicates the Web pages, u {1,... n} , and
hierarchy:
u indicating the users.
p,n
upv
i , j 1, u 1
iu * upv ju
si , j (3)
p,n p,n
upv
Figure 1: The AHP hierarchy for the evaluation of Web pages relative 2 2
importance ( upv ) * (
iu ju )
i , j 1, u 1 i , j 1, u 1
The terms relative importance for each user reflects Web
pages importance for each user, since a Web page is a
collection of terms. Thus, the AHP analysis returns the where
u 1,...n and i , j 1, 2,..., p representing users
relative importance of each term; therefore, the relative and Web pages respectively. Web pages with high similarity
weight of each Web page for each user. A Web page is values are re-grouped in website layers.
modelled as a vector which elements are the relative
Step 6: Rearrange Web pages into website layers (L).
importance rw(t ) of each term as resulted from the AHP,
The total number of Web pages (P) in a website is
weighted by the normalised frequency ( pt wp ,tk ) that term calculated by using the following formula:
Step 4: website Modelling. By considering the UPVwpp Web pages of similar importance are grouped together
into the layers. Therefore, a layer
of all Web pages in a website, the User Site Matrix (USM)
is formed. Thus, USM {UPV( p ) } .The USM matrix
LWL {WPi , WPj ,...WPp } , where LWL is the WL-th
takes the following form: website layer, which consists of a group of Web pages
A* w max * w
WEIGHTS
(6)
row ID USER-1 Terms TF Term Weight
(Wt) Formula
where w represents the eigenvector of the matrix A (1)
The application of the AHP returns the relative weights for the terms AHP weights by using formulas (2) and (9). The
each term. The AHP weights are shown in Table 3. results are shown in Table 6.
TABLE III. AHP TERMS’ RELATIVE WEIGHTS TABLE VI. THE WEB PAGES’ IMPORTANCE BASED ON TERMS’
RELATIVE IMPORTANCE
Terms AHP Terms weights
Wp-1 Wp-2 Wp-3 Wp-4 Wp-5
furnish 0.0110 0.0161 0.0036 0 0.0110
furnish 0.106724446 4046 70371 8 4046
restaurant 0.223721576 restaurant 0.0077 0.0067 0 0.0101 0.0231
14537 79442 69 43611
bathroom 0.04884162 bathroom 0.0050 0.0074 0.0033 0.0022 0.0117
52581 00245 68 2 89357
timeliness 0.210234397 timeliness 0.0362 0.0063 0.0072 0.0095 0.0072
terrace 0.118765775 4731 70739 49 56 49462
terrace 0.0040 0.0071 0.0040 0.0053 0.0040
food 0.094866174 95372 97926 95 98 95372
food 0.0098 0.0201 0.0098 0.0086 0.0032
design 0.064756238
13742 23128 14 24 71247
coffee 0.012009207 design 0.0022 0.0039 0.0089 0.0147 0.0022
32974 2462 32 17 32974
balcony 0.018065962 coffee 0.0004 0.0010 0.0012 0.0016 0.0020
14111 91746 42 38 70553
sea 0.038853743
balcony 0.0018 0.0005 0.0031 0.0008 0.0006
fruits 0.026953998 68893 47453 15 21 22964
sea 0.0040 0.0035 0.0066 0.0052 0.0040
sleep 0.018103432 19353 32158 99 98 19353
understanding 0.018103432 fruits 0.0027 0.0008 0.0009 0.0012 0.0009
88345 16788 29 25 29448
sleep 0.0006 0.0005 0.0012 0.0016 0.0006
Next, assume a subset of the terms’ frequencies, as 24256 48589 49 46 24256
shown in Table 4, for each web page. This data is collected understanding 0.0006 0.0005 0.0006 0.0008 0.0006
by counting the number each term appears on each Web 24256 48589 24 23 24256
page. For example, the term “furnish” appears 3 times in Web pages’ 0.0865 0.0750 0.0509 0.0621 0.0717
importance 36189 51795 97 36 13313
Web page-1, 5 times in Web page-2, etc. using
formula (9),
TABLE IV. THE FREQUENCIES FOR EACH TERM PER WEB PAGE
i.e. sum of
Terms Frequencies per Web page terms’
weights
Wp-1 Wp-2 Wp-3 Wp-4 Wp-5
Drawing on the Web pages relative importance, their
furnish 3 5 1 0 3
similarity is calculated, using formula (3). The similarity
restaurant 1 1 0 1 3 degrees are shown in Table 7.
bathroom 3 5 2 1 7 TABLE VII. WEB PAGES IMPORTANCE SIMILARITIES
Similarity Wp-1 Wp-2 Wp-3 Wp-4 Wp-5
Table 5 shows the normalised frequencies for each term per Degrees
every page. The normalised frequencies are calculated by Wp-1 1 0.618507 0.662001 0.627411 0.568
dividing each term’s frequency of appearance on a web page 331 567 733 705
by the sum of all terms frequencies of appearance on that Wp-2 0.6185 1 0.777926 0.624975 0.659
web page. 07331 514 126 007
Wp-3 0.6620 0.777926 1 0.568704 0.431
01567 514 978 098
TABLE V.TERMS’ RELATIVE FREQUENCIES
Wp-4 0.6274 0.624975 0.568704 1 0.623
Terms Relative Normalised Frequencies per Web page 11733 126 978 204
Wp-5 0.5687 0.659006 0.431097 0.623203 1
Wp-1 Wp-2 Wp-3 Wp-4 Wp-5 04978 777 94 947
furnish 0.1034482 0.1515 0.0344 0 0.1034482 Drawing on the importance and similarities degrees
76 15 83 76 shown in Tables 6 and 7, respectively, Web pages are
restaura 0.0344827 0.0303 0 0.0454 0.1034482
nt 59 03 55 76 grouped into layers. Assuming that the allowed number of
bathroo 0.1034482 0.1515 0.0689 0.0454 0.2413793 terms per page (tpp=2) and total number of terms (T=13),
m 76 15 66 55 1 then the maximum number of Web pages is 13/2=6.5 round
up to 7, by applying formula (4). By applying formula (5),
Next, the importance of each Web page is calculated
assuming wpl=2 pages per layer, the maximum number of
drawing on the terms’ relative normalised frequencies and
layers is 7/2, which rounded returns maximum 4 layers in
the website. Results show that Web page-1 is the most number-of-websites/, Last viewed 10/5/2018., 2018. .
important followed by Web page-2 and Web page-5. [3] A. Bunt, G. Carenini, and C. Conati, “Adaptive content
However, Web page-5 is the least important of the three, presentation for the web,” in The adaptive web, Springer,
2007, pp. 409–432.
thus it is arranged in a hierarchical level below in the
website, since the maximum number of pages per layer [4] V. Baka, “The becoming of user-generated reviews: Looking
at the past to understand the future of managing reputation in
(wpl) is set at two (2). Web page-3 is grouped with Web the travel sector,” Tour. Manag., vol. 53, pp. 148–162, 2016.
page-2 since the two are more similar than with the other [5] A. Hawalah and M. Fasli, “Dynamic user profiles for web
pages. Thus, Web page-3 is linked to Web page-2 but in a personalisation,” Expert Syst. Appl., vol. 42, no. 5, pp. 2547–
layer below, due to (wpl) limitation. Similarly, Web page-4 2569, 2015.
is linked with Web page-1 but following in a layer below. [6] S. Saleheen and W. Lai, “UIWGViz: An architecture of user
Thus, the resulting website structure is shown in Figure 2. interest-based web graph vizualization,” J. Vis. Lang.
Comput., vol. 44, pp. 39–57, 2018.
[7] B. Magnini and C. Strapparava, “Improving user modelling
with content-based techniques,” in International Conference
on User Modeling, 2001, pp. 74–83.
[8] A. I. El-Dsouky, H. A. Ali, and R. S. Rashed, “Ranking
Documents Based on the Semantic Relations Using
Analytical Hierarchy Process,” Int. J. Adv. Comput. Sci.
Appl., vol. 7, no. 2, pp. 164–173, 2016.
[9] N. B. Moghaddam, M. Sahafzadeh, A. S. Alavijeh, H.
Yousefdehi, and S. H. Hosseini, “Strategic environment
analysis using DEMATEL method through systematic
approach: Case study of an energy research institute in Iran,”
Manag. Sci. Eng., vol. 4, no. 4, p. 95, 2010.
Figure 2: The resulting Web structure
[10] T. L. Saaty, “On the measurement of intengibles. A principal
eigenvector approach to relative measurement derived from
In the same way, similarities among terms are calculated paired comparisons,” Not. Am. Math. Soc., vol. 60, no. 2, pp.
so that terms are re-arranged accordingly, i.e. to be removed 192–208, 2013.
from one page and linked with another. By calculating the [11] T. L. Saaty, Decision making for leaders: the analytic
similarities between terms and among Web pages, terms can hierarchy process for decisions in a complex world. RWS
be grouped dynamically and re-grouped so that the content publications, 1990.
of Web pages changes, thus manipulating the page’s [12] T. L. Saaty, “Axiomatic foundation of the analytic hierarchy
importance in a flexible way and produce alternative process,” Manage. Sci., vol. 32, no. 7, pp. 841–855, 1986.
websites’ designs. [13] T. L. Saaty, “The analytical hierarchy process: planning,
setting priorities, resource allocation.” McGraw-Hill
International Book Co., New York, 1980.
IV. CONCLUSIONS
[14] M.-K. Chen and S.-C. Wang, “The critical factors of success
UGC provides a rich source of information regarding for information service industry in developing international
user preferences. Content personalisation and presentation market: Using analytic hierarchy process (AHP) approach,”
personalisation rely on understanding and modelling users’ Expert Syst. Appl., vol. 37, no. 1, pp. 694–704, 2010.
interests. This paper suggests that the use of multi-criteria [15] O. S. Vaidya and S. Kumar, “Analytic hierarchy process: An
overview of applications,” Eur. J. Oper. Res., vol. 169, no. 1,
approaches can be used in conjunction with similarity pp. 1–29, 2006.
methods to analyse text indices such as the TF-IDF, etc. The [16] D.-H. Byun, “The AHP approach for selecting an automobile
proposed approach utilises the AHP in order to calculate the purchase model,” Inf. Manag., vol. 38, no. 5, pp. 289–297,
relative importance of terms and subsequently of the 2001.
associated Web pages. Upon their importance and [17] E. W. T. Ngai, “Selection of websites for online advertising
similarities terms and Web pages can be re-arranged, thus using the AHP,” Inf. Manag., vol. 40, no. 4, pp. 233–242,
producing Web structures that dynamically adapt to user 2003.
preferences following. As soon as user interests’ change and [18] U. Cebeci, “Fuzzy AHP-based decision support system for
these changes can be traced in UGC, the proposed approach selecting ERP systems in textile industry by using balanced
scorecard,” Expert Syst. Appl., vol. 36, no. 5, pp. 8900–8909,
recalculates importance and similarity degrees and adapts 2009.
the Web design. Future research can focus on calculating [19] U. S. Bititci, P. Suwignjo, and A. S. Carrie, “Strategy
similarities of users and adopting recommender systems management through quantitative modelling of performance
technologies and methods in the Web design domain. measurement systems,” Int. J. Prod. Econ., vol. 69, no. 1, pp.
15–22, 2001.
REFERENCES
[1] “Internet Web Stats Web Users,” Internet Web Stats Web
Users (2018). https://www.internetworldstats.com/stats.htm,
Last viewed 10/5/2018., 2018. [Online]. Available:
https://www.internetworldstats.com/stats.htm.
[2] “Internet Web Stats websites (2018),” Internet Web Stats
websites (2018). http://www.internetlivestats.com/total-