Expert Systems With Applications
Expert Systems With Applications
Expert Systems
with Applications
Expert Systems with Applications xxx (2007) xxx–xxx
www.elsevier.com/locate/eswa
Department of Computer and Communication Engineering, St. John’s University, Taipei, 499, Sec. 4, TamKing Road, Tamsui,
Taipei County 25135, Taiwan, ROC
Abstract
In this paper, we advocate the use of ontology-supported website models to provide a semantic level solution for a search engine so
that it can provide fast, precise and stable search results with a high degree of user satisfaction. A website model contains a website profile
along with a set of webpage profiles. The former remembers the basic information of a website, while the latter contains the basic infor-
mation, statistics information, and ontology information about each webpage stored in the website. Based on the concept, we have devel-
oped a Search Agent which manifests the following interesting features: (1) Ontology-supported construction of website models, by
which we can attribute correct domain semantics into the Web resources collected in the website models. One important technique used
here is ontology-supported classification (OntoClassifier). Our experiments show that the OntoClassifier performs very well in obtaining
accurate and stable webpages classification to support correct annotation of domain semantics. (2) Website models-supported Website
model expansion, by which we can collect Web resources based on both user interests and domain specificity. The core technique here is a
Focused Crawler which employs progressive strategies to do user query-driven webpage expansion, autonomous website expansion, and
query results exploitation to effectively expand the website models. (3) Website models-supported Webpage Retrieval, by which we can
leverage the power of ontology features as a fast index structure to locate most-needed webpages for the user.
Ó 2007 Elsevier Ltd. All rights reserved.
0957-4174/$ - see front matter Ó 2007 Elsevier Ltd. All rights reserved.
doi:10.1016/j.eswa.2007.09.024
Please cite this article in press as: Yang, S.-Y., An ontological website models-supported search agent for web services, Expert Systems
with Applications (2007), doi:10.1016/j.eswa.2007.09.024
ARTICLE IN PRESS
indexing jungle (see Fig. 2 for an example). Current problem: lack of semantic understanding of Web docu-
domain-specific search engines do help users to narrow ments. New standards for representing website documents,
down the search scope by the techniques of query expan- including XML (Henry, David, Murray, & Noah, 2001),
sion, automatic classification and focused crawling; their RDF (Brickley & Guha, 2004), DOM (Arnaud et al.,
weakness, however, is almost completely ignoring the user 2004), Dublin metatag (Weibel, 1999), and WOM (Man-
interests (Wang, 2003). ola, 1998), can help cross-reference of Web documents;
In general, current search engines face two fundamental they alone, however, cannot help the user in any semantic
problems. First, the index structures are usually very differ- level during the searching of website information. OIL
ent from what the user conjectures about his problems. (2000), DAML (2003), DAML+OIL (2001), and the con-
Second, the classification/clustering mechanisms for data cept of ontology stand for a possible rescue to the attribu-
hardly reflect the physical meanings of the domain con- tion of information semantics. In this paper, we advocate
cepts. These problems stem from a more fundamental the use of ontology-supported website models (Yang,
Please cite this article in press as: Yang, S.-Y., An ontological website models-supported search agent for web services, Expert Systems
with Applications (2007), doi:10.1016/j.eswa.2007.09.024
ARTICLE IN PRESS
2006) to provide a semantic level solution for a search must include descriptions of explicit concepts and their
engine so that it can provide fast, precise and stable search relationships of a specific domain (Ashish & Knoblock,
results with a high degree of user satisfaction. 1997). We have outlined a principle construction procedure
Basically, a website model consists of a website profile in Yang and Ho (1999); following the procedure we have
for a website and a set of webpage profiles for the webpages developed an ontology for the PC domain. Fig. 3 shows
contained in the website. Each webpage profile reflecting a part of the PC ontology taxonomy. The taxonomy repre-
webpage describes how the webpage is interpreted by the sents relevant PC concepts as classes and their parent–child
domain ontology, while a website profile describes how a relationships as isa links, which allow inheritance of fea-
website is interpreted by the semantics of the contained tures from parent classes to child classes. We then carefully
webpages. The website models are closely connected to selected those properties of each concept that are most
the domain ontology, which supports the following func- related to our application and used them as attributes to
tions used in website model construction and application: define the corresponding class. Fig. 4 exemplifies the defini-
query expansion, webpage annotation, webpage/website tion of ontology class ‘‘CPU’’. In the figure, the uppermost
classification (Yang, 2006), and focused collection of node uses various fields to define the semantics of the CPU
domain-related and user-interested Web resources (Yang, class, each field representing an attribute of ‘‘CPU’’, e.g.,
2006). We have developed a Search Agent using website interface, provider, synonym, etc. The nodes at the bottom
models as the core technique, which helps the agent suc- level represent various CPU instances that capture real
cessfully tackle the problems of search scope and user inter- world data. The arrow line with term ‘‘io’’ means the
ests. Our experiments show that the Search Agent can instance of relationship. Our ontology construction tool
locate, integrate and update both domain-related and is Protégé 2000 (Noy & McGuinness, 2001) and the com-
user-interested Web resources in the website models for plete PC ontology can be referenced from the Protégé
ready retrieval. Ontology Library at Stanford Website (http://protege.
The personal computer (PC) domain is chosen as the stanford.edu/ontologies.html). Fig. 5 demonstrates how
target application of our Search Agent and will be used the ontology looks like on Protégé 2000, where the left col-
for explanation in the remaining sections. The rest of the umn represents the taxonomy hierarchically and the right
paper is organized as follows. Section 2 develops the column contains respective attributes for a specific class
domain ontology. Section 3 describes Website models node selected. The example shows that the CPU ontology
and how they are constructed. Section 4 illustrates how contains synonyms, along with a bunch of attributes and
Website models can be used to do better Web search. Sec- constraints on their values. Although the domain ontology
tion 5 describes the design of our search agent and reports was developed in Chinese (but was changed to English here
how it performs. Section 6 discusses related works, while for easy explanation) corresponding English names are
Section 7 concludes the work. treated as Synonyms and can be processed by our system
too.
2. Domain ontology as the first principles In order to facilitate Web search, the domain ontology
was carefully pre-analyzed with respect to how attributes
Ontology is a method of conceptualization on a specific are shared among different classes and then re-organized
domain (Noy & Hafner, 1997). It plays diverse roles in into Fig. 6. Each square node in the figure contains a set
developing intelligent systems, for example, knowledge of representative ontology features for a specific class,
sharing and reusing (Decker et al., 2000, 1998), semantic while each oval node contains related ontology features
analysis of languages (Moldovan & Mihalcea, 2000), etc. between two classes. The latter represents a new node type
Development of an ontology for a specific domain is not called ‘‘related concept’’. We select representative ontology
yet an engineering process, but it is clear that an ontology features for a specific class by first deriving a set of
Hardware
isa
isa
isa
isa isa isa isa isa
Please cite this article in press as: Yang, S.-Y., An ontological website models-supported search agent for web services, Expert Systems
with Applications (2007), doi:10.1016/j.eswa.2007.09.024
ARTICLE IN PRESS
CPU
Synonym= Central Processing Unit
D-Frequency String
Interface Instance* CPU Slot
L1 Cache Instance Volume Spec.
Abbr. Instance CPU Spec.
...
io io io io io io io
XEON THUNDERBIRD 1.33G DURON 1.2G PENTIUM 4 2.0AGHZ PENTIUM 4 1.8AGHZ CELERON 1.0G PENTIUM 4 2.53AGHZ
Factory= Intel Synonym= Athlon 1.33G Interface= Socket A D-Frequency= 20 D-Frequency= 18 Interface= Socket 370 Synonym= P4
Interface= Socket A L1 Cache= 64KB Synonym= P4 2.0GHZ Synonym= P4 1.8GHZ L1 Cache= 32KB Interface= Socket 478
L1 Cache= 128KB Abbr.= Duron Interface= Socket 478 Interface= Socket 478 Abbr.= Celeron L1 Cache= 8KB
Abbr.= Athlon Factory= AMD L1 Cache= 8KB L1 Cache= 8KB Factory= Intel Abbr.= P4
Factory= AMD Clock= 1.2GHZ Abbr.= P4 Abbr.= P4 Clock= 1GHZ Factory= Intel
... ... ... ... ... ...
PC Hardware
Modem
Related Related
Concept Related Concept Related
Related Concept Concept
Concept
Related
SCSI Card Concept Motherboard
Reference class
Superclass
Please cite this article in press as: Yang, S.-Y., An ontological website models-supported search agent for web services, Expert Systems
with Applications (2007), doi:10.1016/j.eswa.2007.09.024
ARTICLE IN PRESS
Please cite this article in press as: Yang, S.-Y., An ontological website models-supported search agent for web services, Expert Systems
with Applications (2007), doi:10.1016/j.eswa.2007.09.024
ARTICLE IN PRESS
.....
DocNo#11 DocNo#12 DocNo#189 DocNo#190
(webpage profile)
.....
DocNo#21 DocNo#22 DocNo#290 DocNo#291
(webpage profile)
tion for both basic information and statistics information page into a list of words for further processing by
sections of a webpage profile. It also transforms the web- OntoAnnotator.
page into a list of words (pure text) for further processing Fig. 10 illustrates the architecture of OntoAnnotator.
by OntoAnnotator. Specifically, DocPool contains web- Inside the architecture, OntoClassifier uses the ontology
pages retrieved from the Web. HTML Analyzer analyzes to classify a webpage, and Annotator uses the ontology
the HTML structure to extract URL, Title texts, anchor to annotate ontology features with their term frequencies
texts and heading texts, and to calculate tag-related statis- for each class according to how often they appear in the
tics for website models. HTML TAG Filter removes webpage. Domain Marker uses Table 2 to determine
HTML tags from the webpage, deletes stop words using whether the webpage is relevant to the domain. The condi-
500 stop words we developed from McCallum (1996), tion column in the table means the number of classes
and performs word stemming and standardization. appearing in the webpage and the Limit column specifies
Document Parser transforms the stemmed, tag-free web- a minimal threshold on the average number of features of
Please cite this article in press as: Yang, S.-Y., An ontological website models-supported search agent for web services, Expert Systems
with Applications (2007), doi:10.1016/j.eswa.2007.09.024
ARTICLE IN PRESS
the class which must appear in the webpage. For example, Level Threshold = 1
ontology
information
OntoAnnotator
Ontology
Please cite this article in press as: Yang, S.-Y., An ontological website models-supported search agent for web services, Expert Systems
with Applications (2007), doi:10.1016/j.eswa.2007.09.024
ARTICLE IN PRESS
This level-related weighting mechanism will give a higher returned Web pages from Web Crawler for DocExtractor
weight to the representative features than to the related fea- during the construction of webpage profiles. It also stores
tures. The second stage of classification is defined by Eq. query results from search engines, which usually contains
(3). Inside the equation, OntoTFIDF(d,C) is defined by a list of URLs. URLExtractor is responsible for extracting
Eq. (4), which calculates a TFIDF score for each class C URLs from the query results and dispatching those URLs
with respect tod according to the terms appearing both that are domain-dependent but not yet in the website mod-
on d and C, where TF(xjy) means the number of appear- els to Distiller. User-Oriented Webpage Expander pin-
ance of word x in y. Eq. (3) is used to create a list of class: points interesting URLs in the website models for further
score pairs for d and finally selects the one with the highest webpage expansion according to the user query. Autono-
score of TFIDF as the class for webpage d. mous Website Evolver autonomously discovers URLs in
the website models that are domain-dependent for further
HOntoTFIDFðdÞ ¼ arg max OntoTFIDFðd; C 0 Þ ð3Þ
0 C 2c webpage expansion. Since these two types of URLs are
OntoTFIDFðd; CÞ both derived from website models, we call them website
X 1 model URLs in the figure. User Priority Queue stores the
TFðwjCÞ TFðwjdÞ
¼ P P user search strings and the website model URLs from
w2d
Lw TFðw jCÞ
0 TFðw0 jdÞ User-Oriented Webpage Expander. Website Priority Queue
w0 2F C w0 2F C
stores the website model URLs from Autonomous Website
ð4Þ Evolver and the URLs extracted by URLExtractor.
Distiller controls the Web search by associating a prior-
ity score with each URL (or search string) using Eq. (5)
4. Website models application and placing it in a proper Priority Queue. Eq. (5) defines
ULScore(U,F) as the priority score for each URL (or
The basic goal of the website models is to help Web search string).
search taking account both user-interest and domain-
dependence. Section 4.1 explains how this can be achieved. ULScoreðU ; F Þ ¼ W F S F ðU Þ ð5Þ
The second goal is to help fast retrieval of webpages stored where U represents a URL or search string; and F identifies
in the website models for the user. Section 4.2 explains how the way U is obtained as shown in Table 3, which also as-
this is done. signs to each F a weight WF and a score SF(U). Thus, if
F = 1, i.e., U is a search string, then W1 = 3, and
4.1. Focused web crawling supported by website models S1(U) = 100, which implies all search strings are treated
as the top-priority requests. As for F = 3, if U is new to
In order to effectively use the website models to narrow the website models, S3(U) is set to 1 by URLExtractor;
down the search scope, we propose a new focused crawler otherwise it is set to 0.5. Finally, for F = 2, the URLs
as shown in Fig. 12, which features a progressive crawling may come from User-Oriented Webpage Expander or
strategy in obtaining domain relevant Web information. Autonomous Website Evolver. In the former case, we fol-
Inside the architecture, Web Crawler gathers data from low the algorithm in Fig. 13 (to be explained in Section
the Web. DocPool was mentioned before; it stores all 4.1.1) to calculate S2(U) for each U. The assignment of
Web
Search
Engines Query
result
Web Webpage
DocExtractor OntoAnnotator
Crawler DocPool
Webpage
User Website Query result
Priority Priority
Queue Queue URLExtractor User-Oriented
URL
Webpage
Expander
Distiller
Website Model Website
User search strings URLs Autonomous Models
Website
Evolver
Please cite this article in press as: Yang, S.-Y., An ontological website models-supported search agent for web services, Expert Systems
with Applications (2007), doi:10.1016/j.eswa.2007.09.024
ARTICLE IN PRESS
AnE_Score(U, Q) Table 4
{
For each Term T in AnTerm(U) Example of direct query expansion
If UserTL(Q) contains T
AnE_Score = AnE_Score + 1 Original user query:
Return AnE_Score
} Mainboard CPU Socket KV133 ABIT
Expanding_URL(U, Score) Expanded user query:
{
Add U and Score to URL_List
Mainboard Motherboard
} CPU Central process unit, processor
Socket Slot, connector
KV133
Fig. 13. User-oriented webpage expansion strategy supported by the
ABIT
website models.
Please cite this article in press as: Yang, S.-Y., An ontological website models-supported search agent for web services, Expert Systems
with Applications (2007), doi:10.1016/j.eswa.2007.09.024
ARTICLE IN PRESS
Website Evolver employs a four-phase progressive strategy Website1 URL range Website2 URL range
to autonomously expand the website models. The first
Fig. 15. Basic operation of the first phase.
phase uses Eq. (7) to calculate S2(U) for each hyperlink
U which is referred to by website S and recognized to be
in S from its URL address but whose hyperlinked webpage website 1. In summary, the first phase prefers to expand the
is not yet collected in S. websites that are well profiled in the website models but
X have less coverage of domain concepts.
S 2 ðU Þ ¼ RS;D ð1 P S;D ðCÞÞ;
C2D
The first phase is good at collecting more webpages for
well profiled websites; it cannot help with unknown web-
U 2S and P S;D ðCÞ 6¼ 0 ð7Þ
sites, however. Our second phase goes a step further by
where C is a concept of domain D, RS,D was defined in Eq. searching for webpages that can help define a new website
(6), and PS,D is defined by Eq. (8). PS,D(C) measures the profile. In this phase, we exploit URLs that are in the web-
proportion of concept correlation of website S with respect site models, but belong to some unknown website profile.
to concept C of domain D. NS,C refers to the number of We use Eq. (9) to calculate S2(U) for each outbound hyper-
webpages talking about domain concept C on website S. link U of some webpages that is stored in an indefinite web-
Fig. 14 shows the algorithm for calculating NS,C. In short, site profile.
PS,D(C) measures how strong a website is related to a spe- S 2 ðU Þ ¼ AnchorðU ; DÞ ð9Þ
cific domain concept.
N S;C where function Anchor(U,D) gives outbound link U a
P S;D ðCÞ ¼ ð8Þ weight according to how many terms in the anchor text
N S;D
of U belong to domain D.
Literally, Eq. (7) assigns a higher score to U if U belongs to Fig. 16 illustrates how this strategy works. Phase 2 will
a website S which has a higher degree of domain correla- choose hyperlink H for priority score calculation. The
tion RS,D, but contains less domain concepts, i.e., less unknown website X represents a website profile which con-
PS,D(C). Fig. 15 illustrates how this strategy works. It tains insufficient webpages to make clear its relationships
shows webpages I and J will become the first choices for to some domains. Thus, the second phase prefers to expand
calculating their priority scores by the first phase. In the fig- those webpages that can help bring in more information to
ure, the upper nodes represent the webpages stored in the complete the specification of indefinite website profiles.
website models and the lower nodes represent the webpages In the third phase, we relax one more constraint; we
whose URLs are hyperlinked in the upper nodes. Nodes in relax the condition of unknown website profiles. We
dotted circles, e.g., nodes I and J represent webpages not exploit any URLs as long as they are referred to by some
yet stored in the website models and needed to be collected. webpages in the website models. We use Eq. (10) to calcu-
The figure shows webpage J referred to by webpage A in late S2(U) for each outbound hyperlink U which are
website 1 belongs to website 1, but is not yet collected into referred to by any webpage in the website models. This
its website model. Similarity, webpage I referred to by web- equation heavily relies on the anchor texts to determine
page E in website 2 has to be collected into its website mod- which URLs should receive higher priority scores.
el. Note that webpage I is also referred to by webpage C in
Calculating_NS,C(S, C)
{
outbound
For each webpage P in OntoPage(S)
{ link
If OntoCon(THW, P) contains C
NS,C = NS,C + 1
}
Return NS,C B C E H E F
}
Website1 URL range Unknown Website X URL range
Fig. 14. Algorithm for calculating NS,C. Fig. 16. Basic operation of the second phase.
Please cite this article in press as: Yang, S.-Y., An ontological website models-supported search agent for web services, Expert Systems
with Applications (2007), doi:10.1016/j.eswa.2007.09.024
ARTICLE IN PRESS
Y Need to invoke N
Fig. 17 illustrates how the strategy works. It will select all N
phase 1~3?
Please cite this article in press as: Yang, S.-Y., An ontological website models-supported search agent for web services, Expert Systems
with Applications (2007), doi:10.1016/j.eswa.2007.09.024
ARTICLE IN PRESS
þ W QO M QO ðP Þ P TH ðQ; P Þ P S;D ðT Þ
Model
T 2OntoðTHU ;QÞ User Searching Strings Information
Query
ð12Þ User
Interface
N S;T Ontology Website Models
P S;D ðT Þ ¼ ð13Þ Answer
N S;D
M TH ðQ; P Þ Webpage
P TH ðQ; P Þ ¼ ð14Þ Retrieval
M TH ðQÞ
W QU þ W QO ¼ 1 ð15Þ Fig. 21. Ontology-centric search agent architecture.
Please cite this article in press as: Yang, S.-Y., An ontological website models-supported search agent for web services, Expert Systems
with Applications (2007), doi:10.1016/j.eswa.2007.09.024
ARTICLE IN PRESS
Please cite this article in press as: Yang, S.-Y., An ontological website models-supported search agent for web services, Expert Systems
with Applications (2007), doi:10.1016/j.eswa.2007.09.024
ARTICLE IN PRESS
Onto Classifier CPU Web pages for testing. To make each classifier work best,
100%
Motherboard we allow each classifier to arbitrarily choose the first 40 fea-
80%
Graphic Card tures that make it work best. We limit the number to be 40,
Accuracy
60%
Sound Card because the class SCSI only has 38 features. Fig. 26 illus-
40%
Network Card
trates how the four classifiers work for class as CPU,
SCSI Card
Motherboard, Graphic Card, and Sound Card.
20% Optical Drive
We note that O-PrTFIDF and D-PrTFIDF classifiers
0% Monitor
10 20 30 40 50 60 70 80 90 100
Hard Drive
are the most unstable among the four with respect to differ-
# of Features
Modem
ent numbers of features. The T-PrTFIDF classifier works
rather well except for larger features, which is because T-
Fig. 25. Classification performance of OntoClassifier. PrTFIDF was designed to work based on ontology (Ting,
2000). Its computation complexity is greater than Onto-
(2003), we have reported a performance comparison Classifier though. From this experiment, we learn that
between OntoClassifier and three other similar classifiers, ontology features alone do not work for any classifier;
namely, O-PrTFIDF (Joachims, 1997), T-PrTFIDF (Ting, the ontology features work best for those classifiers that
2000), and D-PrTFIDF (Wang, 2003). All the three classi- are crafted by taking into account how to leverage the
fiers and their respective feature selection methods were re- power of ontology. OntoClassifier is such a classification
implemented. We found that none of these three classifiers mechanism.
can match the performance of OntoClassifier with respect
to either classification accuracy or classification stability. 5.5. User-satisfaction evaluation of system prototype
To verify that the superior performance of OntoClassifi-
er is not due to overfitting, we used 1/3, 1/2, and 2/3 of the Table 9 shows the comparison of user satisfaction of our
collected webpages, respectively, for training in each class system prototype against other search engines. In the table,
to select the ontology features and used all webpages of ST, for satisfaction of testers, represents the average of sat-
the class for testing. Table 8 shows how OntoClassifier isfaction responses from 10 ordinary users, while SE, for
behaves with respect to different ratios of training samples. satisfaction of experts, represents that of satisfaction
The column of number of features gives the number of responses from 10 experts. Basically, each search engine
ontology features used in each class. It does show that receives 100 queries and returns the first 100 webpages
the superior accuracy performance can be obtained even for evaluation of satisfaction by both experts and non-
with 1/3 of training webpages. experts. The table shows that our system prototype with
the techniques described above, the last row, enjoys the
5.4. Do ontology features work for any classifiers? highest satisfaction in all classes. From the evaluation, we
conclude that, unless the comparing search engines are spe-
The second experiment is to learn whether the superior cifically tailored to this specific domain, such as HotBot
performance of OntoClassifier is purely due to ontology and Excite, our system prototype, in general, retrieves
features. In other words, can the ontology features work more correct webpages in almost all classes.
for other classifiers too? From this purpose, this experi-
ment will use the same set of ontology features derived 6. Related works
for OntoClassifier to test the performance of O-PrTFIDF,
D-PrTFIDF and T-PrTFIDF, the three classifiers men- We notice that ontology is mostly used in the systems
tioned before. This time we only used 1/3 of the collected that work on information gathering or classification to
Web pages for training the ontology features and used all improve their gathering processes or the search results
Table 8
Classification performance of OntoClassifier under different ratios of training samples
Class 1/3 Training data 1/2 Training data 2/3 Training data
# of Features Accuracy (%) # of Features Accuracy (%) # of Features Accuracy (%)
CPU 69 97 78 100 82 100
Motherboard 81 100 89 100 89 100
Graphic card 61 100 73 100 77 100
Sound card 73 98 73 99 89 99
Network card 53 94 60 98 64 100
SCSI card 38 93 48 98 50 98
Optical drive 73 90 82 94 87 94
Monitor 69 100 74 100 75 100
Hard drive 39 99 44 98 50 99
Modem 64 100 66 100 66 100
Please cite this article in press as: Yang, S.-Y., An ontological website models-supported search agent for web services, Expert Systems
with Applications (2007), doi:10.1016/j.eswa.2007.09.024
ARTICLE IN PRESS
60% O-PrTFIDF
some aggregation operators for combining results from dif-
T-PrTFIDF
40%
ferent queries. WebSifter II is a semantic taxonomy-based,
OntoClassifier
20% personalizable meta-search agent (Kerschberg, Kim, &
0%
Scime, 2001) that tries to capture the semantics of a user’s
10 20 30 40 decision-oriented search intent, to transform the semantic
# of Features query into target queries for existing search engines, and
(a) CPU class to rank the resulting page hits according to a user-specified
weighted-rating scheme. Chen and Soo (2001) describes an
Motherboard
100%
ontology-based information gathering agent which utilizes
the domain ontology and corresponding support (e.g., pro-
80%
D-PrTFIDF cedure attachments, parsers, wrappers and integration
Accuracy
80%
D-PrTFIDF
60% O-PrTFIDF
SALEM (Semantic Annotation for LEgal Management)
T-PrTFIDF
(Bartolini, Lenci, Montemagni, Pirrelli, & Soria, 2004) is
40%
OntoClassifier an incremental system developed for automated semantic
20%
annotation of (Italian) law texts to effective indexing and
0% retrieval of legal documents. Chan and Lam (2005) propose
10 20 30 40
# of Features an approach for facilitating the functional annotation to
(c) Sound Card class the Gene ontology by focusing on a subtask of annotation,
that is, to determine which of the Gene ontology a litera-
Graphic Card ture is associated with. Swoogle (Ding et al., 2004) is a
100% crawler-based system that discovers, retrieves, analyzes
80% D-PrTFIDF
and indexes knowledge encoded in semantic web docu-
Accuracy
60% O-PrTFIDF
ments on the Web, which can use either character N-Gram
T-PrTFIDF or URIrefs as keywords to find relevant documents and to
40%
OntoClassifier compute the similarity among a set of documents. Finally,
20%
Song, Lim, Park, Kang, and Lee (2005) suggest an auto-
0%
10 20 30 40
mated method for document classification using an ontol-
# of Features ogy, which expresses terminology information and
(d) Graphic Card class vocabulary contained in Web documents by way of a hier-
archical structure. In this paper, we not only proposed
Fig. 26. Do ontology features work for any classifiers?
ontology-directed classification mechanism, namely, Onto-
Table 9
User satisfaction evaluation
K_Word method CPU (SE/ST) Motherboard (SE/ST) Memory (SE/ST) Average (SE/ST)
Yahoo 67%/ 61% 77%/ 78% 38%/ 17% 61%/ 52%
Lycos 64%/ 67% 77%/ 76% 36%/ 20% 59%/ 54%
InfoSeek 69%/ 70% 71%/ 70% 49%/ 28% 63%/ 56%
HotBot 69%/ 63% 78%/ 76% 62%/ 31% 70%/ 57%
Google 66%/ 64% 81%/ 80% 38%/ 21% 62%/ 55%
Excite 66%/ 62% 81%/ 81% 50%/ 24% 66%/ 56%
Alta Vista 63%/ 61% 77%/ 78% 30%/ 21% 57%/ 53%
Our prototype 78%/ 69% 84%/ 78% 45%/ 32% 69%/ 60%
Please cite this article in press as: Yang, S.-Y., An ontological website models-supported search agent for web services, Expert Systems
with Applications (2007), doi:10.1016/j.eswa.2007.09.024
ARTICLE IN PRESS
Classifier can make a decision of the class for a webpage or tion of a webpage. The ontology information is an annota-
a website in the semantic decision process for Web services, tion of how the webpage is interpreted by the domain
but advocated the use of ontology-supported website ontology. The website model also contains a website profile
models to provide a semantic level solution for a search that remembers how a website is related to the webpages
agent so that it can provide fast, precise and stable search and how it is interpreted by the domain ontology.
results. We have developed a Search Agent, which employs
As to Web search, current general search engines use the domain ontology-supported website models as the core
concept of crawlers (spider or soft-robots) to help users technology to search for Web resources that are both
automatically retrieve useful Web information in terms of user-interested and domain-oriented. Our preliminary
ad-hoc mechanisms. For example, Dominos (Hafri & Djer- experimentation demonstrates that the system prototype
aba, 2004) can crawl several thousands of pages every sec- can retrieve more correct webpages with higher user satis-
ond, include a high-performance fault manager, be faction. The Agent features the following interesting char-
platform independent and be able to adapt transparently acteristics. (1) Ontology-supported construction of website
to a wide range of configurations without incurring addi- models. By this, we can attribute domain semantics into the
tional hardware expenditure. Ganesh, Jayaraj, Kalyan, Web resources collected and stored in the local database.
and Aghila (2004) proposes the association-metric to esti- One important technique used here is the Ontology-sup-
mate the semantic content of the URL based on the ported OntoClassifier which can do very accurate and sta-
domain dependent ontology, which in turn strengthens ble classification on webpages to support more correct
the metric that is used for prioritizing the URL queue. Ubi- annotation of domain semantics. Our experiments show
Crawler (Boldi, Codenotti, Samtini, & Vigna, 2004), a scal- that OntoClassifier performs very well in obtaining accu-
able distributed web crawler, is platform independent, rate and stable webpages classification. (2) Website mod-
linear scalability, graceful degradation in the presence of els-supported Website model expansion. By this, we can
faults, a very effective assignment function for partitioning take into account both user interests and domain specific-
the domain to crawl, and more in general the complete ity. The core technique here is the Focused Crawler which
decentralization of every task. Chan (2008) proposes an employs progressive strategies to do user query-driven
intelligent spider that consists of a URL searching agent webpage expansion, autonomous website expansion, and
and an auction data agent to automatically correct related query results exploitation to effectively expand the website
information by crawling over 1000 deals from Taiwan’s models. (3) Website models-supported Webpage Retrieval.
eBay whenever users input the searched product. Finally, We leverage the power of ontology features as a fast index
Google adopts a PageRank approach to rank a large num- structure to locate most-wanted webpages for the user. (4)
ber of webpage link information and pre-record it for solv- We mentioned that the User Interface works as a query
ing the problem (Brin & Page, 1998). A general Web expansion and answer personalization mechanism for
crawler is, in general, a greedy tool that may make the Search Agent. As a matter of fact, the module has been
URL list too large to handle. Focused Crawler, instead, expanded into a User Interface Agent in our information
aims at locating domain knowledge and necessary meta- integration system (Yang, 2006; Yang, 2006). The User
information for assisting the system to find related Web Interface Agent can interact with the user in a more seman-
targets. The concept of Distiller is employed to rank URLs tics-oriented way according to his proficiency degree about
for the Web search (Barfourosh, Nezhad, Anderson, & the domain (Yang, 2007; Yang, 2007).
Perlis, 2002; Diligenti, Coetzee, Lawrence, Giles, & Gori, Most of our current experiments are on the performance
2000; Rennie & McCallum, 1999). IBM is an example, test of OntoClassifier. We are unable to do experiments on
which adopts the HITS algorithm, similar to PageRank, or comparisons of how good the Search Agent is at
for controlling web search Kleinberg, 1999. These methods expanding useful Web resources. Our difficulties are sum-
are ad-hoc and need an off-line, time-consuming pre-pro- marized below. (1) To our knowledge, none of current
cessing. In our system, we not only develop a focused craw- Web search systems adopt a similar approach as ours in
ler using website models as the core technique, which helps the sense that none of them are relying on ontology as
search agents successfully tackle the problems of search heavily as our system to support Web search. It is thus
scope and user interests, but introduce the four-phase pro- rather hard for us to do a fair and convincing comparison.
gressive website expansion strategy for the focused crawler (2) Our ontology construction is based on a set of pre-col-
to control the Web search, which takes into account both lected webpages on a specific domain; it is hard to evaluate
user interests and domain specificity. how critical this pre-collection process is to the nature of
different domains. We are planning to employ the tech-
7. Conclusions and discussion nique of automatic ontology evolution, for example, coop-
erated with data mining technology for discovering useful
We have described how ontology-supported website information and generating desired knowledge that sup-
models can effectively support Web search. A website port ontology construction (Wang, Lu, & Zhang, 2007),
model contains webpage profiles, each recording basic to help study the robustness of our ontology. (3) In general,
information, statistics information, and ontology informa- a domain ontology-based technique cannot work as a gen-
Please cite this article in press as: Yang, S.-Y., An ontological website models-supported search agent for web services, Expert Systems
with Applications (2007), doi:10.1016/j.eswa.2007.09.024
ARTICLE IN PRESS
eral-purpose search engine. We are planning to create a DAML+OIL. (2001). Available at http://www.daml.org/2001/03/dam-
general-purpose search engine by employing a multiple l+oil-index/.
Decker, S., Melnik, S., van Harmelen, F., Fensel, D., Klein, M.,
number of our Search Agent, supported by a set of domain Broekstra, J., et al. (2000). The semantic web: the roles of XML and
ontologies, through a multi-agent architecture. RDF. IEEE Internet Computing, 4(5), 63–74.
Diligenti, M., Coetzee, F., Lawrence, S., Giles, C., & Gori, M. (2000).
Focused crawling using context graphs. In Proceedings of the 26th
Acknowledgements international conference on very large databases (pp. 527–534). Cairo,
Egypt.
The author would like to thank Jr-Chiang Liou, Yung Ding, L., Finin, T., Joshi, A., Pan, R., Cost, R. S., Peng, Y., Reddivari, P.,
Doshi, V., & Sachs, J. (2004). Swoogle: A search and metadata engine
Ting, Jr-Well Wu, Yu-Ming Chung, Zing-Tung Chou, for the semantic web. In Proceedings of the 13th ACM international
Ying-Hao Chiu, Ben-Chin Liao, Yi-Ching Chu, Shu-Ting conference on information and knowledge management (pp. 652–659).
Chang, Yai-Hui Chang, Chung-Min Wang, and Fang- Washington, DC, USA.
Chen Chuang for their assistance in system implementa- Eichmann, D. (1998). Automated categorization of web resources. Avail-
tion. This work was supported by the National Science able at http://www.iastate.edu/~CYBERSTACKS/Aristotle.htm.
Ganesh, S., Jayaraj, M., Kalyan, V., & Aghila, G. (2004). Ontology-based
Council, ROC, under Grants NSC-89-2213-E-011-059, web crawler. In Proceedings of the international conference on
NSC-89-2218-E-011-014, and NSC-95-2221-E-129-019. information technology: coding and computing (pp. 337–341). Las
Vegas, NV, USA.
Hafri, Y., & Djeraba, C. (2004). Dominos: a new web crawler’s design. In
References Proceedings of the 4th international web archiving workshop. Bath, UK.
Henry, S. T., David, B., Murray, M., & Noah, M. (2001). XML Base.
Abasolo, J. M., & Gómez, M. (2000). MELISA: An ontology-based agent Available at http://www.w3.org/TR/2001/REC-xmlbase-20010627/.
for information retrieval in medicine Available at http://cite- Joachims, T. (1997). A probabilistic analysis of the Rocchio algorithm
seer.nj.nec.com/442210.html. with TFIDF for text categorization. In Proceedings of the 14th
Al-Halami, R., & Berwick, R. (1998). WordNet: an electronic lexical international conference on machine learning (pp. 143-151). Nashville,
database. ISBN 0-262-06197-X. Tennessee, USA.
Arnaud, L. H., Philippe, L. H., Lauren, W., Gavin, N., Jonathan, R., Kerschberg, L., Kim, W., & Scime, A. (2001). WebSifter II: A person-
Mike, C., & Steve, B. (2004). Document object model (DOM) level 3 alizable meta-search agent based on weighted semantic taxonomy tree.
core specification. Available at http://www.w3.org/TR/2004/REC- In Proceedings of the international conference on internet computing
DOM-Level-3-Core-20040407/. (pp. 14–20). Las Vegas, USA.
Ashish, N., & Knoblock, C. A. (1997). Wrapper generation for semi- Kleinberg, J. (1999). Authoritative sources in a hyperlinked environment.
structured Internet sources. ACM SIGMOD Record, 26(4), 8–15. Journal of the ACM (JACM), 46(5), 604–632.
Barfourosh, A. A., Nezhad, H. M., Anderson, M. L., & Perlis, D. (2002). Lawewnce, S., & Giles, C. L. (1999). Accessibility and distribution of
Information retrieval on the world wide web and active logic: a survey information on the web. Nature, 400, 107–109.
and problem definition. Technical Report of CS-TR-4291, Department Lawewnce, S., & Giles, C. L. (2000). Accessibility of information on
of Computer Science, University of Maryland, Maryland, USA. the web. ACM Intelligence: New Visions of AI in Practice, 11(1),
Bartolini, R., Lenci, A., Montemagni, S., Pirrelli, V., & Soria, C. (2004). 32–39.
Automatic classification and analysis of provisions in Italian legal Manola, F. (1998). Towards a web object model. Available at http://
texts: a case study. In Proceedings of the 2nd workshop on regulatory op3.oceanpark.com/papers/wom.html.
ontologies (pp. 593-604). Larnaca, Cyprus. McCallum, A. K. (1996). Bow: A toolkit for statistical language modeling,
Boldi, P., Codenotti, B., Samtini, M., & Vigna, S. (2004). UbiCrawler: A text retrieval, classification and clustering. Available at http://
scalable fully distributed web crawler. Software: Practice and Experi- www.cs.cmu.edu/~mccallum/bow.
ence, 34(8), 711–726. Moldovan, D. I., & Mihalcea, R. (2000). Using WordNet and lexical
Brickley, D., & Guha, R. V. (2004). RDF Vocabulary description operators to improve internet searches. IEEE Internet Computing, 4(1),
language 1.0: RDF schema. Available at http://www.w3.org/TR/ 34–43.
2004/REC-rdf-schema-20040210/. Noy, N. F., & Hafner, C. D. (1997). The state of the art in ontology
Brin, S., & Page, L. (1998). The anatomy of a large-scale hypertextual web design. AI Magazine, 18(3), 53–74.
search engine. In Proceedings of the 7th international world wide web Noy, N. F., & McGuinness, D. L. (2001). Ontology development 101: A
conference (pp. 107–117). Brisbane, Australia. guide to creating your first ontology. Stanford Knowledge Systems
Chan, C. C. (2008). Intelligent spider for information retrieval to support Laboratory Technical Report KSL-01-05 and Stanford Medical
mining-based price prediction for online auctioning. expert systems with Informatics Tech. Rep. SMI-2001-0880.
applications: an international journal, 34(1), 347–356. OIL. (2000). http://www.ontoknowledge.org/oil/downl/oil-whitepaper.
Chan, K., & Lam, W. (2005). Gene ontology classification of biomedical pdf.
literatures using context association. In Proceedings of the 2nd Asia Page, L., Brin, S., Motwani, R., & Winograd, T. (1999). The PageRank
information retrieval symposium (pp. 552–557). Jeju Island, Korea. citation ranking: Bringing order to the web. Stanford Digital Libraries
Chen, Y. J., & Soo, V. W. (2001). Ontology-based information gathering Working Paper of SIDL-WP-1999-0120, Department of Computer
agents. In Proceedings of the 2001 international conference on web Science, University of Stanford, CA, USA.
intelligence (pp. 423–427). Maebashi TERRSA, Japan. Park, S. B., & Zhang, B. T. (2003). Automatic webpage classification
Chiu, Y. H. (2003). An interface agent with ontology-supported user models. enhanced by unlabeled data. In Proceedings of the 4th international
Master thesis, Department of Electronic Engineering, National conference on intelligent data engineering and automated learning (pp.
Taiwan University of Science and Technology, Taipei, Taiwan. 821–825). Hong-Kong, China.
Cho, J., Garcia-Molina, H. (2000). The evolution of the web and Rennie, J., & McCallum, A. (1999). Using reinforcement learning to
implications for an incremental crawler. In Proceedings of the 26th spider the web efficiently. In Proceedings of the 16th international
international conference on very large databases (pp. 200–209). Cairo, conference on machine learning (pp. 335–343). Bled, Slovenia.
Egypt. Salton, G., & Buckley, C. (1988). Term weighting approaches in automatic
DAML. (2003). Available at http://www.daml.org/about.html. text retrieval. Information Processing and Management, 24(5), 513–523.
Please cite this article in press as: Yang, S.-Y., An ontological website models-supported search agent for web services, Expert Systems
with Applications (2007), doi:10.1016/j.eswa.2007.09.024
ARTICLE IN PRESS
Salton, G., & McGill, M. J. (1983). Introduction to modern information Yang, S. Y., & Ho, C. S. (1999). Ontology-supported user models for
retrieval. McGraw-Hill. interface agents. In Proceedings of the 4th conference on artificial
Selberg, E., & Etzioni, O. (1995). Multi-service search and comparison intelligence and applications (pp. 248–253). Chang-Hua, Taiwan.
using the MetaCrawler. In Proceedings of the 4th international world Yang, S. Y. (2006). An ontology-directed webpage classifier for web
wide web conference (pp. 169–173). Boston, USA. services. In Proceedings of joint 3rd international conference on soft
Song, M. H., Lim, S. Y., Park, S. B., Kang, D. J., & Lee, S. J. (2005). An computing and intelligent systems and 7th international symposium on
automatic approach to classify web documents using a domain advanced intelligent systems (pp. 720–724). Tokyo, Japan.
ontology. The first international conference on pattern recognition and Yang, S. Y. (2006). A website model-supported focused crawler for search
machine intelligence (pp. 666–671). Kolkata, India. agents. In Proceedings of the 9th joint conference on information
Ting, Y. (2000). A search agent with website models. Master Thesis, sciences (pp. 755–758). Kaohsiung, Taiwan.
Department of Electronic Engineering, National Taiwan University of Yang, S. Y. (2006). An ontology-supported website model for web search
Science and Technology, Taipei, Taiwan. agents. In Proceedings of the 2006 international computer symposium
Wang, C., Lu, J., & Zhang, G. Q. (2007). Mining key information of web (pp. 874-879). Taipei, Taiwan.
pages: a method and its applications. Expert Systems with Applica- Yang, S. Y. (2006). How does ontology help information management
tions: An International Journal, 33(2), 425–433. processing. WSEAS Transactions on Computers, 5(9), 1843–1850.
Wang, C. M. (2003). Web search with ontology-supported technology. Yang, S. Y. (2006). An ontology-supported information management agent
Master thesis, Department of Computer Science and Information with solution integration and proxy. In Proceedings of the 10th WSEAS
Engineering, National Taiwan University of Science and Technology, international conference on computers (pp. 974-979). Athens, Greece.
Taipei, Taiwan. Yang, S. Y. (2007). An ontology-supported user modeling technique with
Wang, Z. L., Yu, H., & Nishino, F. (2004). Automatic special type query templates for interface agents. In Proceedings of 2007 WSEAS
website detection based on webpage type classification. In Proceedings international conference on computer engineering and applications (pp.
of the first international workshop on web engineering. Santa Cruz, 556–561). Gold Coast, Queensland, Australia.
USA. Yang, S. Y. (2007). How does ontology help user query processing for
Weibel, S. (1999). The State of the Dublin core metadata initiative. D-Lib FAQ services. WSEAS Transactions on Information Science and
Magazine, 5(4). Applications, 4(5), 1121–1128.
Please cite this article in press as: Yang, S.-Y., An ontological website models-supported search agent for web services, Expert Systems
with Applications (2007), doi:10.1016/j.eswa.2007.09.024
ARTICLE IN PRESS
a r t i c l e i n f o a b s t r a c t
This paper proposes an ontological Interface agent which works as an assistant between the users and
Available online xxxx FAQ systems. We integrated several interesting techniques including domain ontology, user modeling,
and template-based linguistic processing to effectively tackle the problems associated with traditional
Keywords: FAQ retrieval systems. Specifically, we address the following issues. Firstly, how can an interface agent
Ontological interface agents learn a user’s specialty in order to build a proper user model for him/her? Secondly, how can domain
Template-based query processing ontology help in establishing user models, analyzing user query, and assisting and guiding interface
FAQ services
usage? Finally, how can the intention and focus of a user be correctly extracted? Our work features an
template-based linguistic processing technique for developing ontological interface agents; a nature lan-
guage query mode, along with an improved keyword-based query mode; and an assistance and guidance
for human–machine interaction. Our preliminary experimentation demonstrates that user intention and
focus of up to eighty percent of the user queries can be correctly understood by the system, and accord-
ingly provides the query solutions with higher user satisfaction.
Ó 2008 Elsevier Ltd. All rights reserved.
1. Introduction Each FAQ is represented by one question along with one answer
and is characterized, to be domain-dependent, short and explicit,
With increasing popularity of the Internet, people depend more and frequently asked (Lee, 2000; OuYang, 2000). People usually
on the Web to obtain their information just like a huge knowledge go through the list of FAQs and read those FAQs that are to his/
treasury waiting for exploration. People used to be puzzled by such her questions. This way of answering the user’s questions can save
problems as ‘‘how to search for information on the Web treasury?” the labor power for experts from answering similar questions
As the techniques of Information Retrieval (Salton, Wong, & Yang, repeatedly. The problem is after the fast accumulation of FAQs, it
1975; Salton & McGill, 1983) matured, a variety of information re- becomes harder for people to single out related FAQs. Traditional
trieval systems have been developed, e.g., Search engines, Web FAQ retrieval systems, however, provide only little help, became
portals, etc., to help search on the Web. How to search is no longer they fail to provide assistance and guidance for human–machine
a problem. The problem now comes from the results from these interaction, personalized information services, flexible interaction
information retrieval systems which contain some much informa- interface, etc. (Chiu, 2003).
tion that overwhelms the users. Now, people wants the informa- In order to capture true user’s intention and accordingly pro-
tion retrieval systems to do more help, for instance, by only vides high-quality FAQ answers to meet the user requests, we have
retrieving the results which better meet the user’s requirement. proposed an Interface Agent acquires user intention through an
Other wanted capabilities include a better interface for the user adaptive human–machine interaction interface with the help of
to express his/her true intention, better-personalized services ontology-directed and template-based user models (Yang et al.,
and so on. In short, how to improve traditional information retrie- 1999; Yang, Chiu, & Ho, 2004; Yang, 2006). It also handles user feed-
val systems to provide search results which can better meet the back on the suitability of proposed responses. The agent features
user requirements so as to reduce his/her cognitive loading is an ontology-based representation of domain knowledge, flexible
important issue in current research (Chiu, 2003). interaction interface, and personalized information filtering and
The websites which provide Frequently Asked Questions (FAQ) display. Specifically, according to the user’s behavior and mental
organize user questions and expert answers about a specific prod- state, we employed the technique of user modeling to construct a
uct or discipline in terms of question–answer pairs on the Web. user model to describe his/her characteristics, preference and
knowledge proficiency level, etc. We also used the technique of user
* Corresponding author. Tel.: +886 2 28013131x6394; fax: +886 28013131x6391. stereotype (Rich, 1979) to construct and initialize a new user mod-
E-mail address: [email protected]. el, which helps provide fast-personalized services for new users.
0957-4174/$ - see front matter Ó 2008 Elsevier Ltd. All rights reserved.
doi:10.1016/j.eswa.2008.03.011
Please cite this article in press as: Yang, S. -Y. , Developing of an ontological interface agent with template-based linguistic ..., Expert Sys-
tems with Applications (2008), doi:10.1016/j.eswa.2008.03.011
ARTICLE IN PRESS
We built domain ontology (Noy & McGuinness, 2001) to help define outlined a procedure for this in Yang et al. (1999) from how the
domain vocabulary and knowledge and based on that to construct process was conducted in existent systems. By following the proce-
user models and the Interface Agent. We extended the concept of dure we developed an ontology for the PC domain using Protégé
pattern match (Hovy, Hermjakob, & Ravichandran, 2002) to query 2000 (Noy & McGuinness, 2001) as the fundamental background
template to construct natural language based query models. This knowledge for the system, which was originally developed in Chi-
idea leads to the extraction of true user intention and focus from nese (Yang, Chuang, & Ho, 2007) but was changed to English here
his/her query posted as nature language understanding, which help for easy explanation. Fig. 1 shows part of the ontology taxonomy.
the agent to fast find out precise information for the user, just as The taxonomy represents relevant PC concepts as classes and their
(Coyle & Smyth, 2005) in which using interaction histories for parent–child relationships as isa links, which allow inheritance of
enhancing search results. Our preliminary experimentation dem- features from parent classes to child classes. We carefully selected
onstrates that the intention and focus of up to eighty percent of those properties that are most related to our application from each
the users’ queries can be correctly understood, and accordingly pro- concept, and defined them as the detailed ontology for the corre-
vides the query solutions with higher user satisfaction. sponding class. Fig. 2 exemplifies the detailed ontology of the con-
The rest of the paper is organized as follows: Section 2 describes cept of CPU. In the figure, the root node uses various fields to define
the fundamental techniques. Section 3 explains the Interface Agent the semantics of the CPU class, each field representing an attribute
architecture. Section 4 reports the system demonstrations and of ‘‘CPU”, e.g., interface, provider, synonym, etc. The nodes at the
evaluations. Section 5 compares the work with related works, lower level represent various CPU instances, which capture real
while Section 6 conclude the work. The Personal Computer (PC) world data. The arrow line with term ‘‘io” means the instance of
domain is chosen as the target application of our Interface Agent relationship. The complete PC ontology can be referenced from
and will be used for explanation in the remaining sections. the Protégé Ontology Library at Stanford Website (<http://pro-
tege.stanford.edu/download/download.html>). We also developed
2. Fundamental techniques a problem ontology to deal with query questions. Fig. 3 illustrates
part of the Problem ontology, which contains query type and oper-
2.1. Domain ontology ation type. Together they imply the semantics of a question. Final-
ly, we use Protégé’s APIs to develop a set of ontology services,
The concept of ontology in artificial intelligence refers to knowl- which provide primitive functions to support the application of
edge representation for domain-specific contents (Chandraseka- the ontologies. The ontology services currently available include
ran, Josephson, & Benjamins, 1999). It has been advocated as an transforming query terms into canonical ontology terms, finding
important tool to support knowledge sharing and reusing in devel- definitions of specific terms in ontology, finding relationships
oping intelligent systems. Although development of an ontology among terms, finding compatible and/or conflicting terms against
for a specific domain is not yet an engineering process, we have a specific term, etc.
Hardware
isa
isa
isa
isa isa isa isa isa
CPU
Synonym= Central Processing Unit
D-Frequency String
Interface Instance* CPU Slot
L1 Cache Instance Volume Spec.
Abbr. Instance CPU Spec.
...
io io io io io io io
XEON THUNDERBIRD 1.33G DURON 1.2G PENTIUM4 2.0AGHZ PENTIUM 4 1.8AGHZ CELERO N 1.0G PENTIUM 4 2.53AGHZ
Factory= Intel Synonym= Athlon 1.33G Interface= Socket A D-Frequency= 20 D-Frequency= 18 Interface= Socket 370 Synonym= P4
Interface= Socket A L1 Cache= 64KB Synonym= P4 2.0GHZ Synonym= P4 1.8GHZ L1 Cache= 32KB Interface= Socket 478
L1 Cache= 128KB Abbr.= Duron Interface= Socket 478 Interface= Socket 478 Abbr.= Celeron L1 Cache= 8KB
Abbr.= Athlon Factory= AMD L1 Cache= 8KB L1 Cache= 8KB Factory= Intel Abbr.= P4
Factory= AMD Clock= 1.2GHZ Abbr.= P4 Abbr.= P4 Clock= 1GHZ Factory= Intel
... ... ... ... ... ...
Please cite this article in press as: Yang, S. -Y. , Developing of an ontological interface agent with template-based linguistic ..., Expert Sys-
tems with Applications (2008), doi:10.1016/j.eswa.2008.03.011
ARTICLE IN PRESS
Query
isa isa
Operation Query
Type Type
io
io io io io io io io
io io io io io
Adjust Use Setup Close Open Support Provide How What Why Where
2.2. Query templates type of question, we further identified several intention types
according to its operations. Table 2 illustrates some examples of
To build the query templates, we have collected in total 1215 intention types. Finally, we define a query pattern for each inten-
FAQs from the FAQ website of six famous motherboard factories tion type. Table 3 illustrates the defined query patterns for the
in Taiwan and used them as the reference materials for query tem- intention types of Table 2. Table 4 explains the syntactical con-
plate construction (Hovy et al., 2002; Soubbotin et al., 2001). Cur- structs of the query patterns.
rently, we only take care of the user query with one intention word Now all information for constructing a query template is ready,
and at most three sentences. These FAQs were analyzed and cate- and we can formally define a query template. Table 5 defines what
gorized into six types of questions as shown in Table 1. For each a query template is. It contains a template number, number of sen-
tences, intention words, intention type, question type, operation
type, query patterns, and focus. Table 6 illustrates an example
Table 1
Question types query template for the ANA_CAN_SUPPORT intention type. Note
here that we collect similar query patterns in the field of ‘‘Query
Question type Intention
patterns,” which are used in detailed analysis of a given query.
(A-NOT-A) Asks about can or cannot, should or should not, have or have not
(HOW) Asks about solving methods
(WHAT) Enumerates related information
(WHEN) Asks about time, year, or date Table 4
(WHERE) Asks place or position Definition of pattern symbols and descriptions
(WHY) Asks reasons
Symbol Description
hi Means single sentence and considers the sequence of intention words
and keywords
Table 2 [] Means at least one sentence and only consider appeared keywords but
Examples of intention types sequence
Si Means the variable part of a template which is any string consists of
Intention Type Description
keywords
ANA_CAN_SUPPORT Asks if support some specifications or products Intention Means the fixed part of a template which can help the system
HOW_SET Asks the method of assignment Word distinguish between user intentions
WHAT_IS Asks the meaning of terminology Keyword Means the concepts in domain ontology which are usually domain
WHEN_SUPPORT Asks when can support terminologies
WHERE_DOWNLOAD Asks where can download Focus Means if the variable part of a template is the user query key point, we
WHY_SETUP Asks reasons about setup called the focus
Table 3
Examples of query patterns
Please cite this article in press as: Yang, S. -Y. , Developing of an ontological interface agent with template-based linguistic ..., Expert Sys-
tems with Applications (2008), doi:10.1016/j.eswa.2008.03.011
ARTICLE IN PRESS
Interaction
Table 7 preference
Statistics of query templates
Solution Domain
Question type #Intention type #Template #Pattern #FAQ (%) Presentation Proficiency
Please cite this article in press as: Yang, S. -Y. , Developing of an ontological interface agent with template-based linguistic ..., Expert Sys-
tems with Applications (2008), doi:10.1016/j.eswa.2008.03.011
ARTICLE IN PRESS
Internal Query
User Feedback
USER
Action &
Query Interaction Query Web Page
Internet
Agent Parser Processor
User Model
Recommender
Manager
Personalizer Scorer
Data Flow
Template
Homophone User Model Ontology
Support Folw Base
Debug Base
Interface Agent
Please cite this article in press as: Yang, S. -Y. , Developing of an ontological interface agent with template-based linguistic ..., Expert Sys-
tems with Applications (2008), doi:10.1016/j.eswa.2008.03.011
ARTICLE IN PRESS
Please cite this article in press as: Yang, S. -Y. , Developing of an ontological interface agent with template-based linguistic ..., Expert Sys-
tems with Applications (2008), doi:10.1016/j.eswa.2008.03.011
ARTICLE IN PRESS
User Query
Homophone
Debug Base 1.Segmentation
Ontology 2.Query
User Model
Pruning
Intention-
Word Base 3.Query User Preference
Standardization Terms
Query Modify
NL P Mode Slightly
Fig. 12. Pattern match algorithm.
4.Pattern
Match Template
Base
Keyword Mode
5.User
NO
Confirmation
YES
Table 8
Internal Query Format Internal user query and keyword trimming
Data Flow
(a) Internal query form before keyword trimming
Query type Could
Proxy Support Folw Operation type Support
Agent Keywords 1 GHz, K7V, motherboard, Asus, CPU
(b) Internal query form after keyword trimming
Fig. 10. Flow chart of the user query processing. Query type Could
Operation Type Support
Keywords 1 GHz, K7V, CPU
to become candidates for a given query. In this case, the Query Par-
ser will prompt the user to confirm his/her intent (step 5), as illus-
trated in Fig. 9c. If the user says ‘‘No”, which means the pattern
matching results is not the true intention of the user, he/she is al-
lowed to modify the matched result or change to the keyword
mode for placing query.
The purpose of keyword trimming is to remove irrelevant key-
words from the user query; irrelevant keywords sometimes cause
adversarial effects on FAQ retrieval. Query Parser uses trimming
rules, as shown in Table 9, to prune these keywords. For examples,
Fig. 11. Query template selection algorithm. in Table 8a, ‘‘motherboard” is trimmed and replaced with ‘‘K7V”,
since the latter is an instance of the former and can subsume the
over 1 GHz?” Interface Agent first applies MMSEG to obtain the fol-
lowing list of keywords from the user query: hcould, support, Table 9
1 GHZ, K7V, motherboard, Asus, CPUi. The ‘‘could” query type in Examples of trimming rules
the intention type hierarchy is then followed to retrieve corre- Rule Rule description Example
sponding query templates. Table 6 illustrates the only one corre- no
sponding query template, which contains two patterns, namely, 1 A super-class can be replaced with its sub-class ‘‘Interface card”
hcould S1 support S2i and hcould support S1i. We find the second => ‘‘Sound card”
pattern matches the user query and can be selected to transform 2 A class can be replaced with its instance ‘‘CPU” => ‘‘PIII”
3 A slot value referring to some instance of a class can be ‘‘Microsoft” =>
the query into an internal form by query formulation (step 6) as replaced with the instance ‘‘Windows 2000”
shown in Table 8a. Note that there may be more than two patterns
Please cite this article in press as: Yang, S. -Y. , Developing of an ontological interface agent with template-based linguistic ..., Expert Sys-
tems with Applications (2008), doi:10.1016/j.eswa.2008.03.011
ARTICLE IN PRESS
former according to Trimming Rule 2. Table 8b shows the result of from the ontology. The user either answers a YES or NO to each
the user query after keyword trimming, which now contains only question. The answers are collected and weighted according to
three keywords, namely, 1 GHz, K7V, and CPU. the respective degrees and passed to the Manager, which then cal-
culates a proficiency score for the user according to the percentage
3.2.3. Web Page Processor of correctness of his/her responses to the questions and accord-
The Web Page Processor receives a list of retrieved solutions, ingly instantiates a proper user stereotype as the user model for
which contains one or more FAQs matched with the user query the user.
from the Proxy Agent, each represented as Table 10, and retrieves The second task is to update user models. Here we use the inter-
and caches the solution webpages according to the FAQ_URLs. It action information and user feedback collected by the Interaction
follows to pre-process those webpages for subsequent customiza- Agent in each interaction session or query session. An interaction
tion process, including URL transformation, keyword standardiza- session is defined as the time period from the time point the user
tion, and keyword marking. The URL transformation changes all logs in up to when he logs out, while a query session is defined as
hyperlinks to point toward the cached server. The keyword stan- the time period from when the user gives a query up to when he
dardization transforms all terms in the webpage content into gets the answers and completes the feedback. An interaction ses-
ontology vocabularies. The keyword labeling marks the keywords sion may contain several query sessions. After a query session is
appearing in the webpages by boldfaced hBi KeywordhnBi to facil- completed, we immediately update the interaction preference
itate subsequent keywords processing webpage readability. and solution presentation of the user model. Specifically, the user’s
query mode and solution presentation mode in this query session
3.2.4. Scorer are remembered in both time windows, and the statistics of the
Each FAQ is a short document; concepts involved in FAQs are in preference change for each mode is calculated accordingly, which
general more focused. In other words, the topic (or concept) is will be used to adapt the Interaction Agent on the next query ses-
much clearer and professional. The question part of an FAQ is even sion. Fig. 15 illustrates the algorithm to update the Show_Rate of
more pointed about what concepts are involved. Knowing this the similarity mode. The algorithm uses the ratio of the number
property, we can use the keywords appearing in the question part of user selected FAQs and that of the displayed FAQs to update
of an FAQ to represent its topic. Basically, we use the table of do- the show rate; the algorithm to update the Show_Rate of the pro-
main proficiency to calculate a proficiency degree for each FAQ ficiency mode is similar.
by calculating the proficient concepts appearing in the question In addition, each user will be asked to evaluate each solution
part of the FAQ, detailed as shown in Fig. 14. FAQ in terms of the following five levels of understanding, namely,
very familiar, familiar, average, not familiar, very not familiar. This
3.2.5. Personalizer provides an explicit feedback and we can use it to update his/her
The Personalizer replaces the terms used in the solution FAQs domain proficiency table. Fig. 16 shows the updating algorithm.
with the terms in the user’s terminology table, collected by the Finally, after each interaction session, we can update the user’s
Query Parser, for improving the solution readability. recommendation mode in this session in the respective time
Table 10
Format of retrieved FAQ
Field Description
Fig. 15. Algorithm to update show rate in similarity mode.
FAQ_No. FAQ’s identification
FAQ_Question Question part of FAQ
FAQ_Answer Answer part of FAQ
FAQ_Similarity Similarity degree of the FAQ met with the user query
FAQ_URL Source or related URL of the FAQ
Fig. 14. Proficiency degree calculation algorithm. Fig. 16. Algorithm to update the domain proficiency table.
Please cite this article in press as: Yang, S. -Y. , Developing of an ontological interface agent with template-based linguistic ..., Expert Sys-
tems with Applications (2008), doi:10.1016/j.eswa.2008.03.011
ARTICLE IN PRESS
window. At the same time, we add the query and FAQ-selection re-
cords of the user into the query history and selection history of his/
her user model.
The third task of the User Model Manager is to update user ste-
reotypes. This happens when a sufficient number of user models in
a stereotype has undergone changes. First, we need to reflect these
changes to stereotypes by re-clustering all affected user models, as
shown in Fig. 17, and then re-calculates all parameters in each ste-
reotype, an example as shown in Fig. 18.
3.2.7. Recommender
The Recommender uses the following three policies to recom-
mend information. (1) High hit FAQs. It recommends the first N
solution FAQs according to their selection counts from all users
in the same group within a time window. (2) Hot topic FAQs. It rec-
ommends the first N solution FAQs according to their popularity,
calculated as statistics on keywords appearing in the query histo-
ries of the same group users within a time window. The algorithm
does the hot degree calculation as shown in Fig. 19. (3) Collabora-
tive recommendation. It refers to the user’s selection histories of
the same group to provide solution recommendation. The basic
Fig. 20. Algorithm to do the collaborative recommendation.
idea is this. If user A and user B are in the same group and the first
n interaction sessions of user A are the same as those of user B,
then we can recommend the highest-rated FAQs in the (n + 1)th
session of user A for user B, detailed algorithm as shown in Fig. 20.
Fig. 19. Algorithm to calculate the hot degree. Fig. 21. System register interface.
Please cite this article in press as: Yang, S. -Y. , Developing of an ontological interface agent with template-based linguistic ..., Expert Sys-
tems with Applications (2008), doi:10.1016/j.eswa.2008.03.011
ARTICLE IN PRESS
Now the user can get into the main tableau of our system (Fig
22), which consists of the following three major tab-frames,
namely, query interface, solution presentation, and logout. The
query interface tab is comprised of the following four frames: user
interaction interface, automatic keyword scrolling list, FAQ recom-
mendation list, and PC ontology tree. The user interaction interface
contains both keywords and NLP query modes as shown in Fig. 9.
The keyword query mode provides the lists of question types and
operation types, which allow the users to express their precise
intentions. The automatic keyword-scrolling list provides ranked-
keyword guidance for user query. A user can browse the PC ontol-
ogy tree to learn domain knowledge. The FAQ recommendation list
provides personalized information recommendations from the sys-
tem, which contains three modes: hit, hot topic, and collaboration.
When the user clicked a mode, the corresponding popup window is
produced by the system.
The solution presentation tab is illustrated in Fig 23. It pre-selects
the solutions ranking method according to the user’s preference and Fig. 24. FAQ-selection and feedback enticing.
hides part of solutions according to his/her Show_Rate for reducing
the cognitive loading of the user. The user can switch the solution
ranking method between similarity ranking and proficiency rank-
ing. The user can click the question part of an FAQ (Fig. 24) for dis-
playing its content or giving it a feedback, which contains the
satisfaction degree and comprehension degree. Fig. 25 illustrates
the window before system logout, which ask the user to fill a ques-
tionnaire for statistics to help further system improvement.
Table 11
Effectiveness of constructed query patterns
Please cite this article in press as: Yang, S. -Y. , Developing of an ontological interface agent with template-based linguistic ..., Expert Sys-
tems with Applications (2008), doi:10.1016/j.eswa.2008.03.011
ARTICLE IN PRESS
Table 12 ven a user query, he first determines the class of the user query
User satisfaction evaluations according to the keywords and intention words, and then calcu-
K_WORD CPU MOTHERBOARD MEMORY AVERAGE lates the similarity degree between the user query and FAQs in
METHOD (SE/ST) (%) (SE/ST) (%) (SE/ST) (%) (SE/ST) (%) the related classes.
Alta Vista 63/61 77/78 30/21 57/53 The natural language processing technique was not used in the
Excite 66/62 81/81 50/24 66/56 work of Sneiders et al. (1999) for analyzing user queries. It, instead,
Google 66/64 81/80 38/21 62/55 was applied to analyze the FAQs stored in the database long before
HotBot 69/63 78/76 62/31 70/57
InfoSeek 69/70 71/70 49/28 63/56
any user queries are submitted, where each FAQ is associated with a
Lycos 64/67 77/76 36/20 59/54 required, optional, irrelevant, or forbidden keyword to help subse-
Yahoo 67/61 77/78 38/17 61/52 quent prioritized keyword matching. By this way, the work of FAQ
Our approach 78/69 84/78 45 /32 69/60 retrieval can be reduced to keyword matching without inference.
Razmerita, Angehrm, and Maedche (2003) presents a generic
ontology-based user modeling architecture, (OntobUM), applied
in the context of a Knowledge Management System (KMS). The
reaches 97.28%, which implies the template base can be used as an proposed user modeling system relies on a user ontology, using
effective knowledge base to do natural language query processing. Semantic Web technologies, based on the IMS LIP specifications,
Our second experiment is to learn how well this processing and it is integrated in an ontology-based KMS called Ontologging.
understands new queries. First, we collected in total 143 new FAQs, Degemmis, Licchelli, Lops, and Semeraro (2004) presents the
different from the FAQs collected for constructing the query tem- Profile Extractor, a personalization component based on machine
plates, from four famous motherboard factories in Taiwan, includ- learning techniques, which allows for the discovery of preferences
ing ASUS (<http://www.asus.com/>), SIS (<http://www.sis.com/>), and interests of users that have access to a website. Galassi, Gior-
MSI (<http://www.msi.com.tw/>), and GIGABYTE (<http://www.gi- dana, Saitta, and Botta (2005) also presents a method for automat-
ga-byte.com/>). We then used the question parts of those FAQs for ically constructing a sophisticated user/process profile from traces
testing queries, which test how well this processing performs. Our of user/process behavior, which is encoded by means of a Hierar-
experiments show that we can precisely extract true query inten- chical Hidden Markov Model (HHMM). Finally, Hsu and Ho
tions and focuses from 112 FAQs. The rest of 31 FAQs contain up to (1999) propose an intelligent interface agent to acquire patient
three or more sentences in queries, which explain why we failed to data with medicine-related common sense reasoning.
understand them. In summary, 78.3% (112/143) of the new queries In summary, the work of OuYang determines the user query
can be successfully understood. intention according to keywords and intention words appearing
Finally, Table 12 shows the comparison of user satisfaction of in query; while the work of Sneiders uses both similarity degrees
our systemic prototype against other search engines. In the table, on intention words and keywords for solution searching and selec-
ST, for Satisfaction of testers, represents the average of satisfaction tion. Both approaches only consider the comparison between
responses from 10 ordinary users, while SE, for Satisfaction of ex- words and skip the problem of word ambiguity; e.g., two sentences
perts, represents that of satisfaction responses from 10 experts. with the same intention words may not have the same intention.
Basically, each search engine receives 100 queries and returns The work of Lee uses the analysis of syntax and POS to extract
the first 100 webpages for evaluation of satisfaction by both ex- query intention, which is a hard job with Chinese query, because
perts and non-experts. The table shows that our systemic proto- solving the ambiguity either explicit or implicit meanings of Chi-
type supported by ontological user modeling with query nese words, especially in query analysis on long sentences or sen-
templates, the last row, enjoys the highest satisfaction in all clas- tences with complex syntax, is not at all a trivial task. In this paper,
ses. From the evaluation, we concluded that unless the comparing we integrated several interesting techniques including user model-
search engines are specifically tailored to this specific domain such ing, domain ontology, and template-based linguistic processing to
as HotBot and Excite, our techniques, in general, can retrieve more effectively tackle the above annoying problems, just like (Razmeri-
correct webpages in almost all classes, resulting from the intention ta et al., 2003) in which differently associated with ontology and
and focus of a user can be correctly extracted. user modeling and especially, (Paraiso & Barthes, 2006) highlights
the role of ontologies for semantic interpretation. In addition, both
5. Related works and comparisons (Degemmis et al., 2004 & Galassi et al., 2005) propose different
learning techniques for processing usage patterns and user pro-
The work of Lee (2000) presents a user query system consisting files. The automatic processing feature, supported by HHMM and
of an intention part and a keywords part. With the help of syntax unsupervised learning and common sense reasoning techniques,
and parts-of-speech (POS) analysis, he constructs a syntax gram- respectively, provides another level of automation in interaction
mar from collected FAQs, and accordingly offers the capability of mechanism and deserves more attention.
intention extraction from queries. He also extracts the keywords
from queries through a sifting process on POS and stop words. Fi-
nally, he employs the semantic comparison technique of down- 6. Discussions and future work
ward recursion on a parse tree to calculate the similarity degree
of intention parts between the user query and FAQs, and then uses We have developed an Interface Agent to work as an assistant be-
the concept of space mode (Salton et al., 1975) to calculate the vec- tween the users and FAQ systems, which is different from system
tor similarity degree of the keyword parts between the user query architecture and implementation over our previous work (Yang
and FAQs for finding out best-matched FAQs. et al., 2004). It is also used to retrieve FAQs on the domain of PC.
The work of OuYang (2000) classifies pre-collected FAQs We integrated several interesting techniques including domain
according to the business types of ChungHwa Telecom (<http:// ontology, user modeling, and template-based linguistic processing
www.cht.com.tw/CHTFinalE/Web/>). He employs the technique to effectively tackle the problems associated with traditional FAQ
of TFIDF (Term Frequency and Inversed Document Frequency) to retrieval systems. Specifically, we have solved the following issues.
calculate the weights of individual keywords and intention words, Firstly, our ontological interface agent can truly learn a user’s spe-
and accordingly select representative keywords and intention cialty in order to build a proper user model for him/her. Secondly,
words from them to work as the index to each individual class. Gi- the domain ontology can efficiently and effectively help in
Please cite this article in press as: Yang, S. -Y. , Developing of an ontological interface agent with template-based linguistic ..., Expert Sys-
tems with Applications (2008), doi:10.1016/j.eswa.2008.03.011
ARTICLE IN PRESS
establishing user models, analyzing user query, and assisting and ERCIM workshop on user interfaces for user-centered interaction paradigms for
universal access in the information society (pp. 133–148).
guiding interface usage. Finally, the intention and focus of a user
Galassi, U., Giordana, A., Saitta, L., & Botta, M. (2005). Learning profile based on
can be correctly extracted by the agent. In short, our work features hierarchical hidden markov model. In Proceedings of the 15th international
an template-based linguistic processing technique for developing symposium on foundations of intelligent systems (pp. 47–55).
ontological interface agents; a nature language query mode, along Hovy, E., Hermjakob, U., & Ravichandran, D. (2002). A question/answer typology
with surface text patterns. In Proceedings of the DARPA human language
with an improved keyword-based query mode; and an assistance technology conference (pp. 247–250).
and guidance for human–machine interaction. Our preliminary Hsu, C. C., & Ho, C. S. (1999). Acquiring patient data by an intelligent interface agent
experimentation demonstrates that user intention and focus of up with medicine-related common sense reasoning. Expert Systems with
Applications: An International Journal, 17(4), 257–274.
to eighty percent of the user queries can be correctly understood Lee, C. L. (2000). Intention extraction and semantic matching for internet FAQ
by the system, and accordingly provides the query solutions with retrieval. Master Thesis, Department of Computer Science and Information
higher user satisfaction. Engineering, National Cheng Kung University, Taiwan, ROC.
Noy, N. F., & McGuinness, D. L. (2001). Ontology development 101: A guide to
Most of our current experiments are on the performance test of creating your first ontology. Stanford Knowledge Systems Laboratory Technical
the Query Parser. We are unable to do experiments on or compar- Report KSL-01-05 and Stanford Medical Informatics Technical Report SMI-
isons of how good the Interface Agent is at capturing all interaction 2001-0880.
OuYang, Y. L. (2000). Study and implementation of a dialogued-based query system
information/intention of a user. Our difficulties are summarized for telecommunication FAQ services. Master Thesis, Department of Computer
below: (1) To our knowledge, none of current interface systems and Information Science, National Chiao Tung University, Taiwan, ROC.
adopt a similar approach as ours in the sense that none of them Paraiso, E. C., & Barthes, J. P. A. (2006). An intelligent speech interface for personal
assistants in R&D projects. Expert Systems with Applications: An International
are relying on ontology as heavily as our system to support user
Journal, 31(4), 673–683.
interaction. It is thus rather hard for us to do a fair and convincing Razmerita, L., Angehrm, A., & Maedche, A. (2003). Ontology-based user modeling for
comparison. (2) Our ontology construction is based on a set of pre- knowledge management systems. In Proceedings of the 9th international
collected webpages on a specific domain; it is hard to evaluate how conference on user modeling (pp. 213–217).
Rich, E. (1979). User modeling via stereotypes. Cognitive Science, 3, 329–354.
critical this pre-collection process is to the nature of different do- Salton, G., & McGill, M. J. (1983). Introduction to modern information retrieval. New
mains. We are planning to employ the technique of automatic York, USA: McGraw-Hill Book Company.
ontology evolution, for example, cooperated with data mining Salton, G., Wong, A., & Yang, C. S. (1975). A vector space model for automatic
indexing. Communications of ACM, 18(11), 613–620.
technology for discovering useful information and generating de- Sneiders, E. (1999). Automated FAQ answering: Continued experience with shallow
sired knowledge that support ontology construction (Wang, Lu, & language understanding. In AAAI Fall symposium on question answering systems
Zhang, 2007), to help study the robustness of our ontology. Finally, (pp. 97–107). Technical Report FS-99-02. North Falmouth, Massachusetts, USA:
AAAI Press.
in the future, not only will we employ the techniques of machine Soubbotin, M. M., & Soubbotin, S. M. (2001). Patterns of potential answer
learning and data mining to automate the construction of the tem- expressions as clues to the right answer. In Proceedings of the TREC-10
plate base, as to the allover system evaluation, but also we are conference (pp. 293–302).
Tsai, C. H. (2000). MMSEG: A word identification system for Mandarin Chinese text
planning to engage the concept of usability evaluation on the do- based on two variants of the maximum matching algorithm. <http://
main of human factor engineering to evaluate the performance of technology.chtsai.org/mmseg/>.
the agent. Wang, C., Lu, J., & Zhang, G. Q. (2007). Mining key information of web pages: A
method and its applications. Expert Systems with Applications: An International
Journal, 33(2), 425–433.
Acknowledgements Yang, S. Y. (2006). An ontology-supported and query template-based user modeling
technique for interface agents. In Symposium on application and development of
The author would like to thank Yai-Hui Chang and Ying-Hao management information system (pp. 168–173).
Yang, S. Y. (2006). How does ontology help information management processing.
Chiu for their assistance in system implementation. This work WSEAS Transactions on Computers, 5(9), 1843–1850.
was supported by the National Science Council, ROC, under Grants Yang, S. Y. (2006). FAQ-master: A new intelligent web information aggregation
NSC-89-2213-E-011-059, NSC-89-2218-E-011-014, and NSC-95- system. In Proceedings of international academic conference 2006 special session
on artificial intelligence theory and application (pp. 2–12).
2221-E-129-019. Yang, S. Y. (2007). An ontological multi-agent system for web FAQ query. In
Proceedings of the international conference on machine learning and cybernetics
References (pp. 2964–2969).
Yang, S. Y. (2007). An ontological proxy agent for web information processing. In
Proceedings of the 10th international conference on computer science and
Chandrasekaran, B., Josephson, J. R., & Benjamins, V. R. (1999). What are ontologies,
informatics (pp. 671–677).
and why do we need them? IEEE Intelligent Systems, 14(1), 20–26.
Yang, S. Y., Chuang, F. C., & Ho, C. S. (2007). Ontology-supported FAQ processing and
Chiu, Y. H. (2003). An interface agent with ontology-supported user models. Master
ranking techniques. Journal of Intelligent Information Systems, 28(3), 233–251.
Thesis, Department of Electronic Engineering, National Taiwan University of
Yang, S. Y., & Ho, C. S. (1999). Ontology-supported user models for interface agents.
Science and Technology, Taiwan, ROC.
In Proceedings of the 4th conference on artificial intelligence and applications (pp.
Coyle, M., & Smyth, B. (2005). Enhancing web search result lists using interaction
248–253).
histories. In Proceedings of the 27th european conference on IR research on
Yang, S. Y., Chiu, Y. H., & Ho, C. S. (2004). Ontology-supported and query template-
advances in information retrieval (pp. 543–545).
based user modeling techniques for interface agents. In The 12th national
Degemmis, M., Licchelli, O., Lops, P., & Semeraro, G. (2004). Learning usage patterns
conference on fuzzy theory and its applications (pp. 181–186).
for personalized information access in e-commerce. In: Proceedings of the 8th
Please cite this article in press as: Yang, S. -Y. , Developing of an ontological interface agent with template-based linguistic ..., Expert Sys-
tems with Applications (2008), doi:10.1016/j.eswa.2008.03.011
WSEAS TRANSACTIONS ON INFORMATION SCIENCE & APPLICATIONS Issue 11, Vol. 4, November 2007 ISSN: 1709-0832 1400
Abstract: - In this paper, we describe an Interface Agent which works as an assistant between the users and FAQ systems to
retrieve FAQs on the domain of Personal Computer. It integrates several interesting techniques including domain ontology, user
modeling, and template-based linguistic processing to effectively tackle the problems associated with traditional FAQ retrieval
systems. Specifically, we address how ontology helps interface agents to provide better FAQ services and describe related
algorithms in details. Our work features an ontology-supported, template-based user modeling technique for developing interface
agents. Our preliminary experimentation demonstrates that user intention and focus of up to eighty percent of the user queries
can be correctly understood by the system, and accordingly provides the query solutions with higher user satisfaction.
which capture real world data. The arrow line with term “io” retrieval of FAQs after the intention of a user query is
means the instance of relationship. The complete PC ontology identified.
can be referenced from the Protégé Ontology Library at
Table 1 Examples of query patterns
Stanford Website Question Operation Intention Type Query Pattern
(http://protege.stanford.edu/download/download.html). Type Type
ANA_CAN_SUPPORT <S1 是否 支援 S2>
是否 支援 GA-7VRX 這塊主機板是否支援 KINGMAX DDR-400?
Hardware
isa
(If) (Support) (Could the GA-7VRX motherboard support the KNIGMAX DDR-400 memory
isa
isa
type?)
isa isa isa isa
isa
HOW_SETUP <如何 在 S1><安裝 S2>
如何 安裝
Interface Power
Memory Case
Storage
(How) (Setup) 如何在 Windows 98SE 下,安裝 8RDA 的音效驅動程式?
Card Equipment Media
isa (How to setup the 8RDA sound driver on a Windows 98SE platform?)
isa
isa
isa isa WHAT_IS <S1 是 什麼>
isa isa isa isa isa isa isa isa 什麼 是
AUX power connector 是什麼?
Network Sound Display SCSI Network Power Main
(What) (Is)
Chip Card Card Card Card Supply
UPS ROM
Memory
Optical ZIP (What is an AUX power connector?)
WHEN_SUPPORT <S1 何時 支援 S2>
何時 支援
isa isa isa isa
(When) (Support) P4T 何時才能支援 32-bit 512 MB RDRAM 記憶體規格?
(When can the P4T support the 32-bit 512 MB RDRAM memory specification?)
CD DVD CDR/W CDR
WHERE_DOWNLOAD <S1><哪裡 可以 下載 S2>
哪裡 下載
CUA 的 Driver CD 遺失,請問哪裡可以下載音效驅動程式?
Fig. 2 Part of PC ontology taxonomy (Where) (Download)
(Where can I download the sound driver of CUA whose Driver CD was lost?)
WHY_PRINT [S1]<S2 無法 列印>
CPU
為什麼 列印
Synonym= Central Processing Unit 為什麼在 Win ME 底下,從休眠狀態中回復後,印表機無法列印。
D-Frequency String
(Why) (Print)
Interface Instance* CPU Slot (Why can I not print after coming back from dormancy on a Win ME platform?)
L1 Cache Instance Volume Spec.
Abbr. Instance CPU Spec.
...
io io io io io io io
Table 2 Query template for the ANA_CAN_SUPPORT
XEON THUNDERBIRD 1.33G DURON 1.2G PENTIUM 4 2.0AGHZ PENTIUM 4 1.8AGHZ CELERON 1.0G PENTIUM 4 2.53AGHZ
intention type
Factory= Intel Synonym= Athlon 1.33G Interface= Socket A D-Frequency= 20 D-Frequency= 18 Interface= Socket 370 Synonym= P4
Interface= Socket A L1 Cache= 64KB Synonym= P4 2.0GHZ Synonym= P4 1.8GHZ L1 Cache= 32KB Interface= Socket 478 Template_Number 304
L1 Cache= 128KB Abbr.= Duron Interface= Socket 478 Interface= Socket 478 Abbr.= Celeron L1 Cache= 8KB
Abbr.= Athlon Factory= AMD L1 Cache= 8KB L1 Cache= 8KB Factory= Intel Abbr.= P4 #Sentence 3
Factory= AMD Clock= 1.2GHZ Abbr.= P4 Abbr.= P4 Clock= 1GHZ Factory= Intel
... ... ... ... ... ... Intention_Word 是否(If)、支援(Support)
Intention_Type ANA_CAN_SUPPORT
Fig. 3 Ontology of the concept of CPU Question_Type 是否(If)
Operation_Type 支援(Support)
We also developed a Problem ontology to deal with query Query_Patterns
[S3]<S1 是否 支援 S2>
[S2]<是否 支援 S1>
questions. Fig. 4 illustrates part of the Problem ontology, Focus S1
which contains query type and operation type. Together they
imply the semantics of a question. Finally, we use Protégé’s I
n 是否
ANA_CAN_SUPPLY
...
t (IF)
APIs to develop a set of ontology services, which provide e
n
ANA_CAN_SET
HOW_SET
primitive functions to support the application of the t
i
如何
(HOW_SOLVE)
...
HOW_FIX
o
ontologies. The ontology services currently available include n
什麼
WHAT_SUPPORT
T ...
transforming query terms into canonical ontology terms, y
p
(WHAT)
WHAT_SETUP
Adjust Use Setup Close Open Support Provide How W hat W hy W here 3 System Architecture
Fig. 4 Part of problem ontology taxonomy 3.1 User Modeling
2.2 Ontological Query Templates Interaction
preference
Solution Domain
Presentation Proficiency
To build the query templates, we have collected in total 1215
FAQs from the FAQ website of six famous motherboard
factories in Taiwan and used them as the reference materials Terminology Interaction Explicit
Table History Feedback
Implicit
Feedback
for query template construction. Currently, we only take care Fig. 6 Our user model
of the user query with one intention word and at most three
sentences. These FAQs were analyzed and categorized into A user model contains interaction preference, solution
six types of questions. For each type of question, we further presentation, domain proficiency, terminology table, query
identified several intention types according to its operations. history, selection history, and user feedback, as shown in Fig.
Finally, we define a query pattern for each intention type. 6.
Table 1 illustrates the defined query patterns for the intention The interaction preference is responsible for recording
types. Now all information for constructing a query template user’s preferred interface, e.g., favorite query mode, favorite
is ready, and we can formally define a query template [3,8]. recommendation mode, etc. When the user logs on the system,
Table 2 illustrates an example query template for the the system can select a proper user interface according to this
ANA_CAN_SUPPORT intention type. Note here that we preference. We provide two modes, either through keywords
collect similar query patterns in the field of “Query patterns,” or natural language input. We provide three recommendation
which are used in detailed analysis of a given query. modes according to hit rates, hot topics, or collaborative
According to the generalization relationships among learning. We record recent user’s preferences in a time
intention types, we can form a hierarchy of intention types to window, and accordingly determine the next interaction style.
organize all FAQs. Currently, the hierarchy contains two The solution presentation is responsible for recording
levels as shown in Fig. 5. Now, the system can employ the solution ranking preferences of the user. We provide two
intention type hierarchy to reduce the search scope during the types of ranking, either according to the degree of similarity
WSEAS TRANSACTIONS ON INFORMATION SCIENCE & APPLICATIONS Issue 11, Vol. 4, November 2007 ISSN: 1709-0832 1402
between the proposed solutions and the user query, or segmenting word, removing conflicting words, and
according to user’s proficiency about the solutions. In standardizing terms, followed by the recording of the user’s
addition, we use a Show_Rate parameter to control how many terminologies in the terminology table of the user model. It
items of solutions for display each time, in order to reduce finally applies the template matching technique to select
information overloading problem. best-matched query templates, and accordingly transforms the
The domain proficiency factor describes how familiar the query into an internal query for the Proxy Agent to search for
user is with the domain. By associating a proficiency degree solutions and collect them into a list of FAQs, which each
with each ontology concept, we can construct a table, which containing a corresponding URL. The Web Page Processor
contains a set of <concept proficiency-degree> pairs, as his pre-downloads FAQ-relevant webpages and performs some
domain proficiency. Thus, during the decision of solution pre-processing tasks, including labeling keywords for
representation, we can calculate the user’s proficiency degree subsequent processing. The Scorer calculates the user’s
on solutions using the table, and accordingly only show his proficiency degree for each FAQ in the FAQ list according to
most familiar part of solutions and hide the rest for advanced the terminology table in his user model. The Personalizer then
requests. To solve the problem of different terminologies to produces personalized query solutions according to the
be used by different users, we include a terminology table to terminology table. The User Model Manager is responsible
record this terminology difference. We can use the table to for quickly building an initial user model for a new user using
replace the terms used in the proposed solutions with the user the technique of user stereotyping as well as updating the user
favorite terms during solutions representation to help him models and stereotypes to dynamically reflect the changes of
better comprehend the solutions. user behavior. The Recommender is responsible for
Finally, we record the user’s query history as well as FAQ recommending information for the user based on hit count,
selection history and corresponding user feedback in each hot topics, or group’s interests when a similar interaction
query session in the Interaction history, in order to support history is detected.
collaborative recommendation. The user feedback is a
Search Answerer Proxy
complicated factor. We remember both explicit user feedback Agent Agent Agent
Solution &
Web Page Links
in the selection history and implicit user feedback, which Internal Query
User Feedback
includes query time, time of FAQ click, sequence of FAQ USER
Action &
clicks, sequence of clicked hyperlinks, etc. Query Interaction
Agent
Query
Parser
Web Page
Processor
Internet
Template
behavior and requires the same information. Fig. 7 illustrates Support Folw Homophone
Debug Base
Base
User Model Ontology
an example user stereotype. When a new user enters the Interface Agent
system, he is asked to complete a questionnaire, which is used Fig. 8 Interface agent architecture
by the system to determine his domain proficiency, and
accordingly select a user stereotype to generate an initial user 3.2.1 Interaction Agent
model to him. However, the initial user model constructed The Interaction Agent consists of the following three
from the stereotype may be too generic or imprecise. It will components: Adapter, Observor and Assistant. First, the
be refined to reflect the specific user’s real intent after the Adapter constructs best interaction interfaces according to
system has experiences with his query history, FAQ-selection user’s favorite query and recommendation modes. It is also
history and feedback, and implicit feedback [2]. responsible for presenting to the user the list of FAQ solutions
Stereotype : Expert (from the Personalizer) or recommendation information (from
Interaction
Query Mode
Keyword Mode : 0/5
NLP Mode : 1/5
Use History
N
Time Window Size:5
the Recommender). During solution representation, it
Preference
Recommendation
Hit : 0/7
Hot Topic : 0/7
Use History
Co
arranges the solutions in terms of the user’s preferred style
Mode
Collaborative : 1/7 Time Window Size:7
(query similarity or solution proficiency) and displays the
Use History
Solution
Show Query Similarity : 1/5
Mode Solution Proficiency : 0/5
S
Time Window Size:5
solutions according to the “Show_Rate.” Second, the
Representation
Show_Rate (Similarity Mode) : 0.9 Show_Rate (Proficiency Mode) : 0.9 Observer passes the user query to the Query Parser, and
Domain Proficiency Table
Concept Proficiency
Terminology Table
Prefer Standard
simultaneously collects the interaction information and
Domain
Proficiency
主機板
中央處理器
...
0.9
0.9
0.9
Terminology
related feedback from the user. The interaction information
Table
tree and learn proper terms to enter their queries. We also second word corpus to bring those mis-segmented words back.
rank all ontology concepts by their probabilities and display It also performs fault correction on homophonous or multiple
them in a keyword list. When the user enters a query at the words using the ontology and homophone debug base [2].
input area, the Assistant will automatically “scroll” the The step of query standardization is responsible for
content of the keyword list to those terms related to the input replacing the terms used in the user query with the canonical
keywords. Fig. 9 illustrates an example of this automatic terms in the ontology and intention word base. The original
keyword scrolling mechanism. If the displayed terms of the terms and the corresponding canonical terms will then be
list contain a concept that the user wants to enter, he can stored in the terminology table for solution presentation
double-click the terms into the input area, e.g., “華碩” (ASUS) personalization. Finally, we label those recognized keywords
at step 2 of Fig. 9. In addition to the keyword-oriented query by symbol “K” and intention words by symbol “I.” The rest
mode, the Assistant also provides lists of question types and are regarded as stop words and removed from the query. Now,
operation types to help question type-oriented or operation if the user is using the keyword mode, we directly jump to the
type-oriented search. The user can use one, two, or all of step of query formulation. Otherwise, we use template-based
these three mechanisms to help form his query in order to pattern matching to analyze the natural language input.
convey his intention to the system. The step of pattern match is responsible for identifying the
semantic pattern associated with the user query. Using the
Keyword 華邦 華碩 主機板
List 華碩
螢幕
螢幕
製程
二進制
交換器
pre-constructed query templates in the template base, we can
Input Area 什麼華| 什麼華碩| 什麼華碩主|
compare the user query with the query templates and select
Step (1) (2) (3) the best-matched one to identify user intention and focus. Fig.
Fig. 9 Examples of automatic keyword scrolling 11 shows the algorithm of fast selecting possibly matched
mechanism templates, Fig. 12 describes the algorithm which finds out all
patterns matched with the user query, and Fig. 13 removes
3.2.2 Query Parser
those matched patterns that are generalization of some other
U ser Query
matched patterns.
Template Selection :
H omophone
D ebug B ase 1.Segmentation
Q : User Query.
Q.Intention_Word = {I1, I2,..., IN}, Intention words in Q.
Q.Sentence : Number of sentences in Q.
Ontology Template Base = {T1, T2,..., TM}, M : Number of templates.
2.Query
U ser Model
Pruning For each template Tj in Template Base
{
Intention- If Tj conforms to the follow rules, then select Tj into C.
1. Tj.Sentence = Q.Sentence.
W ord B ase 3.Query User Preference
Standardiz ation Terms
2. Tj.Intention_Word ⊆ Q.Intention_Word.
Query Modify }
NLP Mode Slightly return C : Candidate Templates.
4.Pattern
Fig. 11 Query template selection algorithm
Match Template
B ase Pattern Match :
Keyword Mode For each template Tj in C, candidate templates
{
5.U ser
C onfirmation
NO Tj.Pattern = {P1, P2,...}, Pk : pattern k in template j.
For each PK in Tj.Pattern
{
YES
If Pk match Q, the user query, then
Pk.Intention_Word = Tj.Intention_Word,
6.Query
Formulation Pk.Intention_Type = Tj.Intention_Type,
Pk.Quertion_Type = Tj.Question_Type,
Pk.Operation = Tj.Operation,
Internal Query Format
D ata Flow Pk.Focus = Tj.Focus,
and put Pk in M and break this inner loop.
Proxy Support Folw }
Agent
}
return M : Patterns matching Q.
Fig. 10 Flow chart of the Query Parser
Fig. 12 Pattern match algorithm
The Query Parser pre-processes the user query by performing Pattern Removal :
Chinese word segmentation, correction on word segmentation, For each pattern Pk in M, matched patterns
{
fault correction on homophonous or multiple words, and term If Pk conforms to follow rule, then remove Pk from M.
standardization. It then employs template-based pattern {
∃Pi ∈ M, Pk.Intention_Type = Pi.Intention_Type and
matching to analyze the user query and extract the user Pk.Intention_Word ⊂ Pi.Intention_Word
intention and focus. Finally, it transforms the user query into }
the internal query format and then passes the query to the }
Proxy Agent for retrieving proper solutions [20]. Fig. 10 Fig. 13 Query pattern removal algorithm
shows the flow chart of the Query Parser. Detailed Take the following query as an example: “華碩 K7V 主機
explanation follows.
板是否支援 1GHz 以上的中央處理器呢?”, which means
Given a user query in Chinese, we segment the query using
MMSEG [9]. The results of segmentation were not good, for “Could Asus K7V motherboard support a CPU over 1GHz?”
the predefined MMSEG word corpus contains insufficient Table 2 illustrates a query template example, which contains
terms of the PC domain. For example, it does not contain the two patterns, namely, <S1 是否 支援 S2> and <是否 支援
keywords “華碩” or “AGP4X”, and returns wrong word S1>. We find the second pattern matches the user query and
segmentation like “華”, “碩”, “AGP”, and “4X”. The step of can be selected to transform the query into an internal form
by query formulation (step 6) as shown in Table 3. Note that
query pruning can easily fix this by using the ontology as a
there may be more than two patterns to become candidates for
WSEAS TRANSACTIONS ON INFORMATION SCIENCE & APPLICATIONS Issue 11, Vol. 4, November 2007 ISSN: 1709-0832 1404
a given query. In this case, the Query Parser will prompt the example, “Do you know a CPU contains a floating
user to confirm his intent (step 5). If the user says “No”, co-processor?”, “Do you know the concept of 1GB=1000MB
which means the pattern matching results is not the true in specifying the capacity of a hard disk?”, etc. The difficulty
intention of the user, he is allowed to modify the matched degrees of the questions are proportional to the hierarchy
result or change to the keyword mode for placing query. depth of the concepts in the ontology. When a new user logs
on the system, the Manager randomly selects questions from
Table 3 Internal form
User_Level …
the ontology. The user either answers an YES or NO to each
Query_Mode … question. The answers are collected and weighted according
Intention_Type …
Question_Type … to the respective degrees and passed to the Manager, which
Operation …
Keyword …
then calculates a proficiency score for the user according to
Focus … the percentage of correctness of his responses to the questions
and accordingly instantiates a proper user stereotype as the
3.2.3 Web Page Processor
user model for the user.
The Web Page Processor receives a list of retrieved solutions, The second task is to update user models. Here we use the
which contains one or more FAQs matched with the user interaction information and user feedback collected by the
query from the Proxy Agent, each represented as Table 4, and Interaction Agent in each interaction session or query session.
retrieves and caches the solution webpages according to the An interaction session is defined as the time period from the
FAQ_URLs. It follows to pre-process those webpages for time point the user logs in up to when he logs out, while a
subsequent customization process, including URL query session is defined as the time period from when the
transformation, keyword standardization, and keyword user gives a query up to when he gets the answers and
marking. The URL transformation changes all hyperlinks to completes the feedback. An interaction session may contain
point toward the cached server. The keyword standardization several query sessions. After a query session is completed, we
transforms all terms in the webpage content into ontology immediately update the interaction preference and solution
vocabularies. The keyword labeling marks the keywords presentation of the user model. Specifically, the user’s query
appearing in the webpages by boldfaced <B>Keyword<\B> mode and solution presentation mode in this query session are
to facilitate subsequent keywords processing webpage remembered in both time windows, and the statistics of the
readability. preference change for each mode is calculated accordingly,
Table 4 Format of retrieved FAQ which will be used to adapt the Interaction Agent on the next
Field Description query session. Fig. 15 illustrates the algorithm to update the
FAQ_No.
FAQ_Question
FAQ’s identification
Question part of FAQ
Show_Rate of the similarity mode. The algorithm uses the
FAQ_Answer Answer part of FAQ ratio of the number of user selected FAQs and that of the
FAQ_Similarity Similarity degree of the FAQ met with the user query
FAQ_URL Source or related URL of the FAQ displayed FAQs to update the show rate; the algorithm to
update the Show_Rate of the proficiency mode is similar.
3.2.4 Scorer NS : Number of FAQ in Solution FAQ List.
Each FAQ is a short document; concepts involved in FAQs N(Similarity Mode) : Number of FAQ shown to user in Similarity Mode.
N(Similarity Mode) = ⎡NS ∗ Show_Rate(Similarity Mode)old ⎤
are in general more focused. In other words, the topic (or NHide = NS - N(Similarity Mode), Number of hidden FAQ.
concept) is much clearer and professional. The question part NSelect : Number of FAQ selected by user in the query session.
Show_Rate(Similarity Mode) = Show_Rate(Similarity Mode)old + Variation
of an FAQ is even more pointed about what concepts are
(
⎧ NSelect ⎛ )
⎪ NS − 0.7 ∗ ⎜⎝1 − exp⎜⎝ −
⎛ N(Similarity Mode) ⎞ ⎞, if NSelect ≥ 0.7
α ⎟⎟
⎠⎠
involved. Knowing this property, we can use the keywords ⎪
⎪
NS
Variation = ⎨0, if 0.3 < NSelect < 0.7
appearing in the question part of an FAQ to represent its topic. ⎪ N S
same time, we add the query and FAQ-selection records of from all users in the same group within a time window. 2) Hot
the user into the query history and selection history of his topic FAQs. It recommends the first N solution FAQs
user model. according to their popularity, calculated as statistics on
For each useri in each user proficiency group keywords appearing in the query histories of the same group
{ users within a time window. The algorithm does the hot
∑ Proficiency(Concept j) degree calculation as shown in Fig. 19. 3) Collaborative
Proficiencyavg(useri ) =
j
,
Number of concepts in Domain Proficiency Table recommendation. It refers to the user’s selection histories of
where Concept j : jth Concept in useri' s Domain Proficiency Table,
the same group to provide solution recommendation. The
if (0.8 ≤ Proficiencyavg(useri ) ≤ 1.0 ), user i reassigned to Expert group.
if (0.6 ≤ Proficiencyavg(useri ) < 0.8 ), user i reassigned to Senior group. basic idea is this. If user A and user B are in the same group
if (0.4 ≤ Proficiencyavg(useri ) < 0.6 ), user i reassigned to Junior group. and the first n interaction sessions of user A are the same as
if (0.2 ≤ Proficiencyavg(useri ) < 0.4 ), user i reassigned to Novice group.
those of user B, then we can recommend the highest-rated
if (0.0 ≤ Proficiencyavg(useri ) < 0.2 ), user i reassigned to Amateur group.
}
FAQs in the (n+1)th session of user A for user B, detailed
Fig. 17 Algorithm to re-cluster all user groups algorithm as shown in Fig. 20.
SExpert : Stereotype of Expert.
U = {U1, U2,..., UN}, Users in Expert group. 4 System Demonstration and Experiments
N
∑ Ui.Show_Rate(Similarity Mode) 4.1 System demonstration
i =1
SExpert.Show_Rate(Similarity Mode) =
N
N
∑ Ui.Show_Rate(Proficiency Mode)
i =1
SExpert.Show_Rate(Proficiency Mode) =
N
For each Concept j in Domain Proficiency Table (DPT) Basic user Information
{
N
∑ Ui.DPT.Proficiency(Concept j)
i =1 Questionnaire
SExpert.DPT.Proficiency(Concept j) =
N
}
When the user clicked a mode, the corresponding popup first experiment, we use this same FAQs for testing queries,
window is produced by the system. in order to verify whether any conflicts exist within the query.
The solution presentation tab is illustrated in Fig. 23. It Table 5 illustrates the experimental results, where only 33
pre-selects the solutions ranking method according to the queries match with more than one query patterns and result in
user’s preference and hides part of solutions according to his confusion of query intention, called “error” in the table. These
Show_Rate for reducing the cognitive loading of the user. errors may be corrected by the user. The experiment shows
The user can switch the solution ranking method between the effectiveness rate of the constructed query templates
similarity ranking and proficiency ranking. The user can click reaches 97.28%, which implies the template base can be used
the question part of an FAQ (Fig. 24) for displaying its as an effective knowledge base to do natural language query
content or giving it a feedback, which contains the processing.
satisfaction degree and comprehension degree. Fig. 25
Table 5 Effectiveness of constructed query patterns
illustrates the window before system logout, which ask the #Testing #Correct #Error Precision Rate (%)
user to fill a questionnaire for statistics to help further system 1215 1182 33 97.28 %
improvement.
Our second experiment is to learn how well the Parser
understands new queries. First, we collected in total 143 new
Ranking Order
FAQs, different from the FAQs collected for constructing the
query templates, from four famous motherboard factories in
Show All Taiwan, including ASUS, GIGABYTE, MSI, and SIS. We
then used the question parts of those FAQs for testing queries,
Return in E-mail
which test how well the Parser performs. Our experiments
show that we can precisely extract true query intentions and
focuses from 112 FAQs. The rest of 31 FAQs contain up to
three or more sentences in queries, which explain why we
failed to understand them. In summary, 78.3% (112/143) of
the new queries can be successfully understood.
Fig. 23 Solution presentation
Table 6 User satisfaction evaluation
K_WORD CPU MOTHERBOARD MEMORY AVERAGE
METHOD (SE / ST) (SE / ST) (SE / ST) (SE / ST)
Alta Vista 63% / 61% 77% / 78% 30% / 21% 57% / 53%
Excite 66% / 62% 81% / 81% 50% / 24% 66% / 56%
Google 66% / 64% 81% / 80% 38% / 21% 62% / 55%
HotBot 69% / 63% 78% / 76% 62% / 31% 70% / 57%
User Feedback InfoSeek 69% / 70% 71% / 70% 49% / 28% 63% / 56%
Lycos 64% / 67% 77% / 76% 36% / 20% 59% / 54%
Yahoo 67% / 61% 77% / 78% 38% / 17% 61% / 52%
Our approach 78% / 69% 84% / 78% 45% / 32% 69% / 60%
interaction. Our preliminary experimentation demonstrates Fuzzy Theory and Its Applications, I-Lan, Taiwan, 2004, pp.
that user intention and focus of up to eighty percent of the 181-186.
user queries can be correctly understood by the system, and [14] Yang, S.Y., Chuang, F.C., and Ho, C.S.,
accordingly provides the query solutions with higher user Ontology-Supported FAQ Processing and Ranking
satisfaction. In the future, we are planning to employ the Techniques, Accepted for publication in International
techniques of machine learning and data mining to automate Journal of Intelligent Information Systems, 2005.
the construction of the template base. As to the allover system [15] Yang, S.Y., Liao, P.C., and Ho, C.S., A User-Oriented
Query Prediction and Cache Technique for FAQ Proxy
evaluation, we are planning to employ the concept of
Service, Proc. of the 2005 International Workshop on
usability evaluation on the domain of human factor
Distance Education Technologies, Banff, Canada, 2005, pp.
engineering to evaluate the performance of the user interface. 411-416.
[16] Yang, S.Y., Liao, P.C., and Ho, C.S., An
Acknowledgements Ontology-Supported Case-Based Reasoning Technique for
The author would like to thank Yai-Hui Chang and Ying-Hao FAQ Proxy Service, Proc. of the Seventeenth International
Conference on Software Engineering and Knowledge
Chiu for their assistance in system implementation. This work
Engineering, Taipei, Taiwan, 2005, pp. 639-644.
was supported by the National Science Council, R.O.C.,
[17] Yang, S.Y., FAQ-master: A New Intelligent Web
under Grant NSC-95-2221-E-129-019. Information Aggregation System, International Academic
Conference 2006 Special Session on Artificial Intelligence
References: Theory and Application, Tao-Yuan, Taiwan, 2006, pp. 2-12.
[1] Chandrasekaran, B., Josephson, J.R., and Benjamins, V.R., [18] Yang, S.Y., An Ontology-Supported and Query
What Are Ontologies, and Why Do We Need Them? IEEE Template-Based User Modeling Technique for Interface
Intelligent Systems, Vol. 14, No. 1, 1999, pp. 20-26. Agents, 2006 Symposium on Application and Development
[2] Chiu, Y.H., An Interface Agent with Ontology-Supported of Management Information System, Taipei, Taiwan, 2006,
User Models, Master Thesis, Department of Electronic pp. 168-173.
Engineering, National Taiwan University of Science and [19] Yang, S.Y., An Ontology-Supported Website Model for
Technology, Taiwan, R.O.C., 2003. Web Search Agents, Accepted for presentation in 2006
[3] Hovy, E., Hermjakob, U., and Ravichandran, D., A International Computer Symposium, Taipei, Taiwan, 2006.
Question/Answer Typology with Surface Text Patterns, [20] Yang, S.Y., How Does Ontology Help Information
Proc. of the DARPA Human Language Technology Management Processing, WSEAS Transactions on
conference, San Diego, CA, USA, 2002, pp. 247-250. Computers, Vol. 5, No. 9, 2006, pp. 1843-1850.
[4] Noy, N.F. and McGuinness, D.L., Ontology Development
101: A Guide to Creating Your First Ontology, Stanford
Knowledge Systems Laboratory Technical Report
KSL-01-05 and Stanford Medical Informatics Tech. Rep.
SMI-2001-0880, 2001.
[5] Rich, E., User Modeling via Stereotypes, Cognitive Science,
Vol. 3, 1979, pp. 329-354.
[6] Salton, G., Wong, A., and Yang, C.S., A Vector Space
Model for Automatic Indexing, Communications of ACM,
Vol. 18, No. 11, 1975, pp. 613-620.
[7] Salton, G. and McGill, M.J., Introduction to Modern
Information Retrieval, McGraw-Hill Book Company, New
York, USA, 1983.
[8] Soubbotin, M.M. and Soubbotin, S.M., Patterns of
Potential Answer Expressions as Clues to the Right Answer,
Proc. of the TREC-10 Conference, NIST, Gaithersburg,
MD, USA, 2001, pp. 293-302.
[9] Tsai, C. H. MMSEG: A word identification system for
Mandarin Chinese text based on two variants of the
maximum matching algorithm. Available at
http://technology.chtsai.org/mmseg/, 2000.
[10] Winiwarter, W., Adaptive Natural Language Interface to
FAQ Knowledge Bases, International Journal on Data and
Knowledge Engineering, Vol. 35, 2000, pp. 181-199.
[11] Yang, S.Y. and Ho, C.S., Ontology-Supported User Models
for Interface Agents, Proc. of the 4th Conference on
Artificial Intelligence and Applications, Chang-Hwa,
Taiwan, 1999, pp. 248-253.
[12] Yang, S.Y. and Ho, C.S., An Intelligent Web Information
Aggregation System Based upon Intelligent Retrieval,
Filtering and Integration, Proc. of the 2004 International
Workshop on Distance Education Technologies, Hotel
Sofitel, San Francisco Bay, CA, USA, 2004, pp. 451-456.
[13] Yang, S.Y., Chiu, Y.H., and Ho, C.S., Ontology-Supported
and Query Template-Based User Modeling Techniques for
Interface Agents, 2004 The 12th National Conference on