Semantic Computing and Language Knowledge Bases

Download as pdf or txt
Download as pdf or txt
You are on page 1of 12

Semantic Computing and Language Knowledge Bases1

Lei Wang Shiwen Yu


Key Laboratory of Computational Linguistics Key Laboratory of Computational
of Ministry of Education Linguistics of Ministry of Education,
Department of English, Peking University Peking University
[email protected] [email protected]

Abstract 1 Introduction

As the proposition of the next-generation Semantic computing is a technology to compose


Web – semantic Web, semantic computing information content (including software) based
has been drawing more and more attention on meaning and vocabulary shared by people
within the circle and the industries. A lot of and computers and thereby to design and
research has been conducted on the theory operate information systems (i.e., artificial
and methodology of the subject, and computing systems). Its goal is to plug the
potential applications have also been semantic gap through this common ground, to
investigated and proposed in many fields. let people and computers cooperate more
The progress of semantic computing made closely, to ground information systems on
so far cannot be detached from its people’s life world, and thereby to enrich the
supporting pivot – language resources, for meaning and value of the entire life world.
instance, language knowledge bases. This (Hasida, 2007) The task of semantic computing
paper proposes three perspectives of is to explain the meaning of various constituents
semantic computing from a macro view and of sentences (words or phrases) or sentences
describes the current status of affairs about themselves in a natural language. We believe
the construction of language knowledge that semantic computing is a field that addresses
bases and the related research and two core problems: First, to map the semantics
applications that have been carried out on of user with that of content for the purpose of
the basis of these resources via a case study content retrieval, management, creation, etc.;
in the Institute of Computational Linguistics second, to understand the meanings (semantics)
at Peking University. of computational content of various sorts,
including, but is not limited to, text, video,
audio, network, software, and expressing them
in a form that can be processed by machine.

Figure 1. Human-computer interaction is handicapped without semantic computing.


1
This work is supported by the National Natural Science Foundation of China (No. 60970083) and Chiang Ching-kuo
Foundation for International Scholarly Exchange (2009).
But the way to the success of semantic same sentence. Research on the theory and
computing is not even and it has taken a quite methodology of semantic computing still has a
long time for researchers to make some long way to go.
progress in this field. The difficulties of Now we provide an example in a search
semantic computing involve many aspects: engine to show how difficult for the computer
ambiguity, polysemy, domain of quantifier, to understand the meaning of a word. We input
metaphor, etc. Different individuals will have two sentences into Google.com Translate and
different understanding of the same word or the the following results were returned:

Example 1
I bought a table with three dollars. (20091016 Google: 本人买了 3 美元一表)
I bought a table with three legs. (20091016 Google: 本人买了 3 条腿的表)

We know that the word ―table‖ has two But up until now there has not been a theory
common meanings in English (a wooden object that can describe the meaning of various
and a structured data report). But in Chinese language units (words, phrases and sentences)
they correspond to two different words (表 biǎo so perfectly that was accepted universally, even
and 桌子 zhuō zi2 ). From Example 1, we can though Fillmore’s proposition of Framework
see that the search engine cannot distinguish the semantics (1976) is successful enough. Since
two senses and translate them both as 表. Thus, Gildea et al. (2002) initiated the research on
without semantic analysis queries in a search automatic semantic role labeling, many
engine may result in very poor performance. evaluations have been conducted internationally,
The first principle of a search engine is based such as Senseval-3 and SemEval 2007, as well
on shallow Natural Language Processing (NLP) as CoNLL SRL Shared Task 2004, 2005 and
techniques, for instance, string matching, while 2008. Word Sense Disambiguation (WSD) is
future direction of search engines should aim at also a very important research subject and a lot
content index and the understanding of user’s of work has been done in this regard, such as
intention. Semantic computing becomes Lesk (1986), Gale et al. (1998), Jin et a l. (2007)
applicable only with the development of deep and Qu et al. (2007) as the Chinese counterpart.
NLP techniques. Machine Translation (MT) is As to the research on computing word sense
the first application of digital computers in the relatedness, Dagan et al (1993) did some pilot
non-digital world and semantic information is work and Lee (1997) and Resnik (1999)
indispensable in MT research and applications. contributed to the research on semantic
However, there has been no breakthrough to the similarity.
extent of Natural Language Understanding In recent years, semantics-based analysis
(NLU) and semantic computing may serve as such as data and web mining, analysis of social
the key to some success in this field. networks and semantic system design and
synthesis have begun to draw more attention
from researchers. Applications using semantics
2 Related Work on Semantic Computing such as search engines and question answering
(Li et al., 2002), content-based multimedia
Semantics is an interesting but controversial retrieval and editing, natural language interfaces
topic. Many a theory has been proposed in (Yokoi et al., 2005) based on semantics have
attempt to describe what meaning really means. also been attracting attentions. Even semantic
computing has been applied to areas like music
2
Pinyin is currently the most commonly used description, medicine and biology and GIS
Romanization system for standard M andarin. The system systems and architecture. The whole idea is how
is now used in mainland China, Hong Kong, M acau, parts
of Taiwan, M alaysia and Singapore to teach M andarin to realize human-centered computing.
Chinese and internationally to teach M andarin as a second
language. It is also often used to spell Chinese names in
foreign publications and can be used to enter Chinese
characters on computers and cell phones.
3 The Theory and Methodology of commonsensically understood and used by
Semantic Computing humans. The entries or attributes are entirely
formatted in some knowledge representation
and can be manipulated by a computer.
3.1 Important Questions That Need to Be
Where? What are the sources of semantic
Asked about Semantic Computing
knowledge? Traditionally, individual
introspection is often a source of obtaining
In the past few years there has been a growing word senses. However, individual introspection
interest in the field of semantics and semantic brings about both theoretical and
computing. But there are questions that have implementation problems. Theoretically, it is
been always lingering on researchers’ minds. because ―different researchers with different
What on earth semantics is? What is the best theories would observe different things about
way to describe the meaning of a language unit? their internal thoughts...‖ (Anderson 1989).
How can natural languages be processed so that With regard to implementation, it is because
we are able to benefit from human-computer consistency becomes a major problem when the
interaction, or even interpersonal size of the lexicon or the syntactic tree bank
communication? It seems that no one can give exceeds a few thousands entries or annotation
satisfactory answers to these questions. But it is tags. Despite the scientific interest of such
now commonly agreed that the study of experiments, they cannot be extensively
semantic computing or knowledge repeated for the purpose of acquiring mass word
representation is a central issue in sense definitions. On-line corpora and
computational linguistics. The major dictionaries are widely available today and
contributions on this topic are collected in provide experimental evidence of word uses and
Computational Linguistics (1987-2010) and word definitions. The major advantage of
International Journal of Semantic Computing on-line resources is that in principle they
(2007-2010). Research in computing semantics provide the basis for very large experiments,
is, however, rather heterogeneous in scope, even though at present the methods of analysis
methods, and results. The traditional ―wh‖ and and application are not fully developed and
―how‖ questions need to be asked again to need further research to get satisfactory results.
understand the consequences of conceptual and How? Semantic computing can be realized
linguistic decisions in semantic computing: at various levels. The hard work is to implement
What? What should be computed in terms a system in a real domain, or the more
of semantics? Each word is a world and its conceptual task of defining an effective
meaning can be interpreted differently. Despite mathematical framework to manipulate the
the interest that semantics has received from the objects defined within a linguistic model. Quite
scholars of different disciplines since the early obviously the ―hows‖ in the literature about
history of humanity, a unifying theory of semantic computing are much more important
meaning does not exist, no matter whether we than the ―whats‖ and ―wheres‖. The
view a language from a lexical or a syntactic methodology that really works in semantic
perspective. In practice, the quality and type of computing is deeply related to the ultimate
the expressed concepts again depend upon the objective of NLP research, which still cannot be
one who uses it: any language speaker or writer, defined adequately so far.
a linguist, a psychologist, a lexicographer, or a
computer. In psycholinguistics and
computational linguistics, semantic knowledge 3.2 The Perspectives of Semantic Computing
is modeled with very deep and formal from a Macro View
expressions. Often semantic models focus on
some very specific aspect of language Why semantic computing (or NLU) has posed
communication, according to the scientific so great a challenge? We may attribute this to
interest of a researcher. In natural language two major reasons: First, it is based on the
processing, lexical entries or semantic attributes knowledge of human language mechanism. If
typically express linguistic knowledge as fully-developed complicated brains are often
seen as a crowning achievement of biological main contents of semantic computing include
evolution, the interpersonal communication is the following three aspects:
no simpler than human biological mechanism.  semantic computing on the ontological
Language has to be a crucial part of the perspective
evolutionary process, which has not been fully  semantic computing on the cognitive
understood by scientific research. Second, in perspective
NLP research the language is both the target and  semantic computing on the pragmatic
the tool. Current NLP research focuses on either perspective
speech or written texts only. However, in the As for ontologies, much progress has been
real world scenario, reading and interaction made worldwide. The remarkable achievements
between humans are multi-dimensional in English include: WordNet by Princeton
(through different forms of information such as University, PropBank by University of
text, speech, or images and utilizing our Pennsylvania, etc. Also there are quite a number
different senses such as vision, hearing). It is of efforts made on building ontologies in
necessary to rely on the advancements of brain Chinese, which will be elaborated in Section 5.
science, cognitive science and other related In the last few years, the main direction of
fields and work in collaboration to produce semantic computing is to disambiguate
better results. Linguistics, especially language units and constructions. In the
computational linguistics, has made its own following Example 2, the word 仪表 yí biǎo
contribution, and semantic computing will play has two meanings in different contexts. In
an important role in NLP. Chinese, word segmentation is also a problem
There are complex many-to-many relations that needs to be addressed. In Example 3,
between the form and the meaning of a segmenting the word 白天鹅 bái tiān é as 白/
language. Semantic computing is not only the 天 鹅 or 白 天 / 鹅 can result in different
way but also the ultimate goal of natural understanding of the sentences.
language understanding. Although it is hard, we
should not give up. Here we propose that the

Example 2
她的仪表很端庄。 tā de yí biǎo hěn duān zhuāng (She has a graceful appearance.)
她的仪表很精确。 tā de yí biǎo hěn jīng què (Her meters are very accurate.)

Example 3
白天鹅飞过来了。bái tiān é fēi guò lái le (A white swan flies toward us.)
白天鹅可以看家。bái tiān é kě yǐ kān jiā (A goose can guard our house at daytime.)

As to WSD tasks on the word level, some from language knowledge bases (See Section 5).
problems can be solved when ontology is The following examples of syntactic semantic
applied. But ambiguity can also appear on the analysis will illustrate how different syntactic
syntactic level. For this, it is usually difficult for structures will change the meaning of sentences:
ontologies to do much, so we may seek help

Example 4
这样的电影不是垃圾是什么? --该电影是垃圾。
zhè yàng de diàn yǐng bú shì lā jī shì shén me? -- gāi diàn yǐng shì lā jī
If a movie as such is not rubbish, what is it? -- It is rubbish.
这样的电影怎么能说是垃圾呢? -- 该电影不是垃圾。
zhè yàng de diàn yǐng zěn me néng shuō shì lā jī ne? -- gāi diàn yǐng bú shì lā jī
How can a movie as such be rubbish? -- It is not rubbish.
Example 5
蚂蚱是蚂蚱, 蛐蛐是蛐蛐。 -- 蚂蚱不是蛐蛐。
màzhàshìmàzhà, qū qū shì qū qū -- màzhàbú shì qū qū
A grasshopper is a grasshopper, while a cricket is a cricket. -- A grasshopper is not a cricket.
Rule:A is A, while B is B. ——〉A is not B.
丁是丁, 卯是卯。dīng shì dīng, mǎo shì mǎo
Ding is ding, while mao is mao. — being conscientious

With respect to semantic computing on challenge to NLU research. If the computer can
cognitive level, we will use metaphor as an deal with metaphors, it will greatly improve the
example. For a long time, NLP research has ability of natural language understanding.
focused on ambiguity resolution. Can NLU be First, let’s discuss the rhetorical function of
realized after ambiguity resolution? Metaphor, a metaphor. Metaphor is extensively and
insinuation, pun, hyperbole (exaggeration), skillfully used in the Chinese classic ―Book of
humor, personification, as well as intended Songs‖ to boost expressiveness.
word usage or sentence composing, pose a great

Example 6
Simile: 自伯之东,首如飞蓬3 ;岂无膏沐?谁适为容。 --(卫风· 伯兮)
zì bó zhī dōng ,shǒu rú fēi péng ;qǐ wú gào mù ? shuíshìwéi róng。-- (wèi fēng · bó xī)
(Your hair is like disordered grass.)
Metaphor:它山之石,可以攻玉。 --(小雅· 鹤鸣)
tā shān zhī shí ,kě yǐ gōng yù。 --(xiǎo yǎ·hèmí ng)
(Rocks from another mountain can be used to carve jade. Metaphorically this phrase means a
change of method may solve the current problem.)

Also, many Chinese idioms are creation and polysemy production (sense
metaphorical expressions: 同舟共济 tóng zhōu evolution), for example, 垃圾箱 lā jī xiāng
gòng jì(Literally, to cross the river in the same (recycle) and 病毒 bì ng dú(virus) are used in a
boat; metaphorically, to work together with one computer setting and words like 高峰 gāo fēng
heart while in difficulty), 铜墙铁壁 tóng qiáng (peak), 瓶颈 píng jǐng (bottleneck) and 线索
tiě bì (Literally, walls of brass and iron; xiàn suǒ (clue) are endowed with new meanings
metaphorically, impregnable). The Chinese which have not been included in traditional
language makes use of lots of idioms or Chinese dictionaries. Besides, metaphor creates
idiomatic expressions that are derived from new meanings in sentence level, for instance, in
ancient Chinese stories and fables. These 地球是人类的母亲。dì qiú shì rén lèi de mǔ qīn
idioms and idiomatic expressions are often used (The earth is the mother of humanity.), the word
metaphorically and reflect historical and
母亲 (mother) has a different meaning. So,
cultural background of the language. They are
metaphor understanding is beyond the scope of
the most precious relics to the Chinese language
ambiguity resolution. Metaphor, linguistics, and
and culture. Therefore the Chinese Idiom
human cognitive mechanisms are inextricably
Knowledge Base (CIKB) was also built in 2009.
interlinked. So metaphor becomes a fort that
CIKB consists of 38,117 entries and describes
must be conquered in NLU research.
many attributes of Chinese idioms. Among the
From an NLP perspective, metaphors can
attributes, ―literal translation‖, ―free translation‖
be summarized into the following categories as
and ―English equivalent‖ are very valuable.
in Table 1. As for the NLP tasks of metaphor
The linguistic function of metaphor is also
computing, we can conclude that there are three
important. Metaphor is the base of new word
tasks to be accomplished: First, metaphor

3
For the purpose of conciseness, only the underlined parts that contain metaphors are translated.
recognition. For instance, how can we (Knowledge is as rich as the ocean.). Third,
distinguish 知识的海洋 from 海洋资源考察 metaphor generation. For instance, how phrases
hǎi yáng zī yuán kǎo chá (investigation of such as 信息的海洋 xìn xī de hǎi yáng (ocean
ocean resources); Second, metaphor of information) and 鲜花的海洋 xiān huā de
understanding and translation. For instance, 知 hǎi yáng (ocean of flowers) can be generated
识的海洋 actually means 知识像海洋一样丰 successfully by computer?
富。zhī shí xiàng hǎi yáng yí yàng fēng fù

Perspective of grammatical Perspective of language unites of


properties metaphorical expressions
No minal 祖国的花朵 zǔ guó de huā duǒ Word-formation 卵 石 luǎn shí(egg-like stone), 杏 仁 眼
(flower of the country), 生命 level xìng rén yǎn (ap ricot-like eyes)
的 旅 程 shēng mìng de lǚ
chéng (life journey)
Verb 心潮澎湃 xīn cháo péng pài Word level 潮流 cháo liú (t ide), 朝阳 zhāo yáng
(heart wave ), 放飞理想 fàng (morning sun)
fēi lǐ xiǎng (let f dream fly)
Adjective 这篇文章写得干巴。zhè piān Phrase level 知识的海洋 zhī shí de hǎi yáng (ocean
wén zhāng xiě de gān bā(This of knowledge), 播 种 幸 福 的 种 子 bō
article is written drily), 这篇 zhǒng xìng fú de zhǒng zi (to sow the
文章清汤寡水。zhè p iān wén seeds of happiness)
zhāng qīng tāng guǎ shuǐ
(This article is like plain soup
and water.)
Adverb 纯 粹 胡 说 chún cuì hú Sentence level 汽车喝汽油。qì chē hē qì yóu (Cars
shuō(absolute nonsense) drink gasoline.), 女 人是 水 nǚ rén shì
shuǐ (A wo man is water.)
Discourse level 打起黄莺儿,莫叫枝上啼。啼时惊妾梦,
不得到辽西。dǎ qǐ huáng yīng ér, mò jiào
zhī shàng tí。tí shí jīng qiè mèng, bù dé
dào liáo xī 。 (To scare away the
nightingales for their noise has my dream
in wh ich I went to the west to meet my
dear husband.)

Table 1. Categories of metaphors from NLP perspective.

Currently we focus on recognition and two: First, rule (or logic)-based method, i.e.,
understanding of metaphors on phrase and finding the conflicts between the target and the
sentence level. The automatic processing source, and search their common properties.
methods of metaphors can be summarized as

Example 7
这个人是一头狮子。zhè gè rén shì yī tóu shī zi (This man is a lion)
— only the target and the source
那个人是老狐狸。nà gè rén shì lǎo hú li (That man is an old fox.)
— only the target and the source
森林里既有勇猛的狮子,也有狡猾的狐狸。sēn lín lǐ jì yǒu yǒng měng de shī zi, yě yǒu jiǎo
huáde húli (In the forest, there are both brave lions and sly foxes.)
--- find out properties of the sources
这个人是勇猛的,那个人是狡猾的。zhè gè rén shì yǒng měng de, nà gè rén shì jiǎo huá de
(This man is brave, while that man is sly.)

The utterance 河北有个老太太吃土块。 then we can deem this 海洋 is a metaphor.


hé běi yǒu gè lǎo tài tài chī tǔ kuài (An old lady Then the problem becomes how to
in Hebei eats clay.) is not in conformity with compute p (m) p (c | m) . We can compute it
common sense, but it is not a metaphor; based on large-scale annotated corpus and get
whereas 男人都是动物。nán rén dōu shì dòng
wù(All men are animals.) is logical but it may p(m)  N m / N (4)
be a metaphor in certain context and may not be
in another context. Nm — the times of 海洋 as a metaphor in the
Second, empirical (statistical) method i.e., corpus;
providing machine with a large number of N — the total times of 海洋 in the corpus.
samples and training a model. Yu Shiwen
presided over the national 973 project Then we simplify 海洋 and its context c
―Database for text content understanding‖ into: W-k … W-1 海洋 W1 … Wi , where W-k, …,
(2004-2009), which includes a subtask named
―Analysis of Metaphorical Expressions and W-1 , W1 ,…, Wi represent the n-gram of 海洋
and its syntactic and semantic attributes
Their Pointed Contents in Chinese Texts‖. In
respectively.
this project, various machine learning methods
have been applied to do semantic analyses from
p(c | m)  p(Wk | m) p(W1 | m) p(W1 | m) p(Wi | m) (5)
the token level. Among them, Wang Zhimin
completed her doctoral thesis ―Chinese Noun
p(Ws | m)  N (Ws ) / N w ,   
(s  k ,,1,1,, i) (6)
Phrase Metaphor Recognition‖ in 2006. Jia
Yuxiang studied verb metaphor recognition and
N(Ws ) stands for the times of
―X is Y‖ type metaphor understanding and
generation. Qu Weiguang presided over the co-occurrence of 海洋 as a metaphor and word
National Natural Science Fund Project W with designated attributes at position. Here
―Research on Key Technologies in Chinese an important hypothesis of independence is:
Metaphor Understanding‖ (2008-2010). words at different position s is not correlated
From a statistical point of view, metaphor with the word 海洋.
recognition can be seen as a problem to Last, we will discuss semantic computing
compute the conditional probability p(m|c) to on the pragmatic perspective, which is more or
decide whether 海洋 is a metaphor in context c. less unique of Chinese language. First, the
The reversed order of two variants m and c will change of construction in Chinese will affect
not change the value of unified probability of the meaning of a sentence even though the
p(m|c) and p(c|m),while the relation between words themselves are not changed. The
unified probability and conditional probability emphasized meaning of the construction is not
can be written as: equal to the combination of the underlying
meaning from each element in the construction.
The meaning reflects the distribution of quantity
p (c ) p ( m | c )  p ( m ) p (c | m ) (1)
of entities and the relative locations among
Then, entities. Although the underlying syntactic
relationship among the main verb, the agent and
p ( m | c )  p ( m ) p (c | m ) / p (c ) (2)
the object(s) still exists, such syntactic
relationship is only secondary. As in the
Given c,p(c) is a constant. Then,
sentence 这张床可以睡三个人。zhè zhāng
chuáng kě yǐ shuì sān gè rén (This bed can
p ( m | c )  p ( m ) p (c | m ) (3)
sleep three people.) is different in meaning from
Given a thresholdδ , if p (m) p (c | m) >δ , the sentence 三个人可以睡这张床。(Three
people can sleep on this bed.). Second, the
semantic direction of the complement in action, but also the change of participants’ states.
verb-complement constructions and the Third, our ultimate goal will be to achieve
adverbial phrase in verb-adverbial constructions ―semantic harmony‖. For instance, in both
also change the semantic roles of each English and Chinese we can say 拔出来 bá chū
constituent. For instance, (文章)写完了。 lái (pull out) or 插进去 chā jìn qù (thrust
( wén zhāng ) xiě wán le ((The article) is into), but we never say 插出来 (thrust out) or
completed.) or (老师)写累了。( lǎo shī ) 拔进去 (pull into). It is alright to say 那个大
xiě lèi le ((The teacher) is tired for writing.) or 苹果他都吃了。nà gè dà pín guǒ tā dōu chī le
香喷喷地炸了一盘花生米。xiāng pēn pēn dì (That big apple he eats it all.) , but it is
zhà le yī pán huā shēng mǐ(aromatically fried a awkward to say 那颗小核桃他都吃了。nà kē
plate of peanuts). Here the ontology cannot xiǎo hé táo tā dōu chī le (That small chestnut he
provide enough information to reflect the
eats it all.). In fact we can say 那颗小核桃松
process and result of change in semantic roles.
Thus the Generalized Valence Mode (GVM) is 鼠都吃了。nà kē xiǎo hé táo sōng shǔ dōu chī le
(That small chestnut the squirrel eats it all.).
proposed to describe not only participants of the

Figure 2. Empirical (statistical) method of metaphor processing.

Professor Lu Jianming (2010) remarked on


the realization of semantic harmony. The 4 Potential Applications of Semantic
principle of semantic constraint of words
Computing – a Case Study on
essentially requires that the words in sentences
Automatic Metaphor Processing in
should be harmonic in terms of meaning.
Analysis of ill-formed sentences and automatic Search Engines
language generation will benefit from the
research in semantic harmony. Semantic Nowadays, search engines are developing very
computing on the pragmatic level has unique rapidly and some of them have won great
characteristics with respect to Chinese language. economic success. In terms of semantic
The solution of these problems poses a great computing, Baidu.com takes the lead and has
challenge and will make great contribution to unveiled the search concept ―Box computing‖
the understanding of the essence and which introduces semantic analysis. The
universality of languages. precision and recall of a search engine are
always the essential issue that a user is framework to evaluate the performance of
concerned. Therefore we will find the value of metaphor recognition and understanding, and
semantic computing first in a search engine. also is a tool to realize cross-lingual search. For
Certainly, if metaphor can be understood instance, a well-known Chinese female
properly by a computer, the precision of search volleyball player got a nickname as 铁榔头 tiě
engines will be improved. Let’s take the phrase láng tou. Shall we translate it literally as ―iron
起飞 qǐ fēi(take off) as an example. Literally 起 hammer‖ or more metaphorically as ―iron fist‖
飞 means an aircraft takes off such as in 航班 in order to let a user of search engine have a
起飞时间 háng bān qǐ fēi shí jiān (the time for better sense of what it actually means?
the airplane to take off). Sometimes we also use Translation is culture-bound. When we see the
it in phrases like 经 济 起 飞 jīng jì qǐ fēi sentence 该电影是鸡肋。gāi diàn yǐng shì jī lèi,
(economic take-off) or 东 方美 女 歌坛 起飞 how should we translate the word 鸡肋 (a
dōng fāng měi nǚ gē tán qǐ fēi (Oriental chicken’s rib) here? And how shall we
beauties take off in the music arena.) to mean distinguish its literal meaning with its
metaphorically. If the literal sense and its metaphorical meaning (食之无味弃之可惜。shí
metaphorical sense can be distinguished zhī wú wèi qì zhī kě xī, tasteless to eat but a
successfully, we will find the exact information waste to cast away) in order to understand better
that we need. Meanwhile, we hope that through the sentence ―The movie is a chicken’s rib‖?
this the recall of search engine will also be Therefore when we investigate the
improved. For example, in Chinese we often feasibility analysis of applications of automatic
use the phrase 祖国的花朵 zǔ guó de huā duǒ metaphor recognition, we propose there are still
(flowers of the country) metaphorically to refer three solutions to the above-mentioned
to 儿 童 ér tong (children). So web pages problems:
 To overcome the limitedness of source
describing 祖国的花朵 should also be related
domain words
to the query word 儿童.  To recognize metaphors in web pages
We also observe that the phrases 金融风暴 and build metaphor indexes. Offline
jīn róng fēng bào (financial storm) and 金融海 processing often makes good use of the
啸 jīn róng hǎi xiào(financial tsunami) advantages of a search engine.
metaphorically refer to 金融危机 jīn róng wēi  Before realizing query understanding,
jī (financial crisis). But when we input the let users choose metaphorical or literal
query 金 融 危 机 into a search engine, the meaning of the query through
human-computer interaction.
results were only web pages with 金融危机 or
金融//危机. But when we use the query 金融风
暴 or 金融海啸, there were no web pages with 5 Language Knowledge Bases as the
Foundation of Semantic Computing
the results 金融危机. We know that the phrase
炒鱿鱼 chǎo yóu yú has literal usage (to fry
squids) and metaphorical usage (to fire sb. from As the foundation of semantic computing,
his/her job). When we input the phrase into the language knowledge bases are in great demand.
search engine, we find the result with The achievements on language knowledge
metaphorical usage takes up 65% while other bases for Chinese-centered multilingual
usage only accounts for 35% (Wang, 2006). information processing include: Chinese LDC,
Therefore we may conclude that whether Comprehensive Language Knowledge Base
metaphor is understood will seriously affect (CLKB) by ICL at Peking University, HowNet
precision and recall. by Zhendong Dong, Chinese Dependency Tree
Another important application lies in Bank by Harbin Institute of Technology, etc.
machine translation and cross-lingual search. Language knowledge base is an
Correct metaphor recognition and indispensable component for NLP system, and
understanding is the precondition of correct its quality and scale determines the failure or
translation. Machine translation can be a success of the system to a great extent. For the
past two decades, a number of important the quantitative analysis of ―numeral-noun‖
language knowledge bases have been built construction of Chinese was conducted by
through the effort of people in Institute of Wang (2009) to further analyze the attributes of
Computational Linguistics (ICL) at Peking Chinese words. A project aiming at the emotion
University. Among them, the Grammatical prediction of entries in CIKB was completed by
Knowledge Base of Contemporary Chinese Wang (2010) to further understand how the
(GKB) (Yu et al., 2000) is the most influential. compositional elements of a fossilized construct
Based on GKB, various research projects like an idiom function from the token level.
have been initiated. For instance, a project on
Offset S ynset Csyncet Hypernym Hyponym Definition Cdefinition
07632177 teacher 教师 07235322 07086332 a person 以教学为职
instructor 教员 07162304 whose 业的人
老师 07209465 occupation
07243767 is teaching
先生
07279659
导师
07297622
老板 07341176
孩子王 07401098
臭老九 …

Offset S ynset Csyncet Hypernym Hyponym Definition Cdefinition


07331418 husband 丈夫 07391044 071094820 a married 已婚男子;
hubby 先生 719596807 man; 婚姻中女性
married_ 夫君 255726073 a woman's 一方的伴侣
man 28008 partner in
夫婿
marriage
爱人
老公
郎君
驸马
驸马爷

Offset S ynset Csyncet Hypernym Hyponym Definition Cdefinition
07414666 M ister 先生 07391044 a form of 对男子的一
M r. 师傅 address for 种称呼
同志 a man
大哥
老兄
老弟

Table 2. The Synset of the word 教师 jiào shī and its related Synsets.

Following GKB, language knowledge bases As mentioned in Section 3, the word 病毒


of large scale, high quality and various type (virus) has two senses in both English and
(words and texts, syntactic and semantics, Chinese: one is in biology and the other is in
multi-lingual) have been built, such as the computer science. When we want to do
Chinese Semantic Dictionary (CSD) for cross-lingual information retrieval, the two
Chinese-English machine translation, the senses need to be distinguished. Hence, CCD
Chinese Concept Dictionary (CCD) for can serve as a useful tool to complete the task
cross-language text processing, the multi-level for it organizes semantic knowledge from a
Annotated Corpus of Contemporary Chinese, different angle. Concepts in CCD are
etc. The projects as a whole won the Science represented by Synsets, i.e. sets of synonyms as
and Technology Progress Award issued by in Table 2. For instance, the concept 教师 is in
Ministry of Education of China in 2007.
a Synset {教师 教员 老师 先生 导师 老板 Lexicographers need computational aids to
孩子王 臭老九 …} and all the concepts form analyze in a more compact and extensive way
a network to associate the various semantic word definitions in dictionaries. Computer
relations between or among the concepts: scientists need semantics for the purpose of
hypernym-hyponym, part-whole, antonym, natural language processing and understanding.
cause and entailment, by which we can retrieve Therefore, the significance of semantic
information in either an extensive or a computing in NLP is obvious and more research
contractive way so as to improve the precision needs to be done with this respect.
or recall of a search engine. It can also provide All in all, we may conclude that the
support for WSD tasks. methods of semantic computing can be
In 2009, the various knowledge bases built summarized as the following:
by ICL were integrated into the CLKB. The  The research of applicable language
integration of heterogeneous knowledge bases model
is realized by a resolution of ―a pivot of word  The research of effective algorithms
sense‖. Three basic and important knowledge  To build language knowledge bases as
bases, GKB, CSD and CCD have been its foundation
integrated into a unified system which includes Semantic computing is a long-term
language processing module, knowledge research subject. We hope more progress can be
retrieval module and knowledge exploration made if a clearer view can be provided for the
module. direction of its development and the pavement
Although there are some fundamental for future research can be constructed more
resources on semantic computing, it needs solidly with more work done.
further improvement, updating, integration and
specification to form a collective platform to Acknowledge ments
perform more complicated NLP tasks. To
further improve the result of semantic Our work is based on the long-term
computing, innovative projects for new tasks accumulation of the language resources that
should also be launched, for instance: have been built by the colleagues of ICL and it
 metaphor knowledge base is their contributions that make our achievement
 ultra-ontology dynamic knowledge possible today. Parts of the content in this paper
base (generalized valence mode) were presented by Shiwen Yu on the
 the integration of information based conferences in Hangzhou (International
on multi-lingual translation Workshop on Connected Multimedia 2009) and
Suzhou (the 11th Chinese Lexical Semantics
6 Concluding Remarks Workshop 2010), and many thanks should be
given to those who offered valuable thoughts
Why semantics is so useful in the first place? and advice. The authors also want to extend
Linguists and psychologists are interested in the their gratitude toward CIPS-Sighan for this
study of word senses to shed light on important valuable opportunity to demonstrate our
aspects of human communication, such as viewpoints and work.
concept formation and language use.

References Role Labeling. Proceedings of the CoNLL


2004: 89-97.
Anderson, J. R. 1989. A Theory of the Origins Dagan, I. et al. 1993. Contextual Word
of Human Knowledge. Artificial Similarity and Estimation from Sparse
Intelligence. 40(1-3): 313-351. Data. In Proceedings of the 31st Annual
Carreras, X. and Marques L. 2004. Introduction Meeting on the Association for
to the CoNLL-2004 Shared Task: Semantic Computational Linguistics (ACL):164-171
Fillmore, C. J.. 1976. Frame Semantics and the International Conference on Industrial,
Nature of Language. In Annals of the New Engineering, and Other Applications of
York Academy of Sciences: Conference on Applied Intelligent Systems:23-32.
the Origin and Development of Language Schutze, Hinrich. 1998. Automatic Word Sense
and Speech:20-32 Discrimination. Computational Linguistics,
Gale, William A., Kenneth W. Church, and 24(1):97-124.
David Yarowsky. 1993. A Method for Resnik, Philip. 1999. Semantic Similarity in a
Disambiguation Word Senses in a Large Taxonomy: An Information-Based
Corpus. Computers and the Humanities. Measure and its Application to Problems of
26(5-6): 415-439 Ambiguity in Natural Language, Journal
Gildea, Denial and Denial Jurafsky. 2002. of Artificial Intelligence Research 11:
Automatic Labeling of Semantic Roles. 95-130.
Computational Linguistics, 28(3): Wang, Lei and Yu Shiwen. Forthcoming 2010.
245-288. Construction of Chinese Idiom Knowledge
Hasida, K. 2007. Semantic Authoring and Base and Its Applications. In Proceedings
Semantic Computing. Sakurai, A. et al. of Coling 2010 Multi-word Expressions
(Eds.): JSAI 2003/2004, LNAI 3609, Workshop.
137–149. Wang, Meng et al. 2009. Quantitative Research
Ide, Nancy and Jean Véronis. 1998. on Grammatical Characteristics of Noun in
Introduction to the Special Issue on Word Contemporary Chinese. Journal of Chinese
Sense Disambiguation: The State of the Art, Information Processing, 22(5): 22-29.
Computational Linguistics, 24(1) : 2-40. Wang, Zhiming. 2006. Recent Developments in
Jin, Peng, Wu Yunfang, Yu Shiwen. Computational Approach to Metaphor
SemEval-2007 Task 05: Multilingual Research. Journal of Chinese Information
Chinese-English Lexical Sample. In Processing, 20(4): 16-24.
Proceedings of SemEval-2007: 19-23. Xue, Nianwen and Martha Palmer. 2005.
Johansson, Richard and Pierre Nugues. 2008. Automatic Semantic Role Labeling for
Dependency-based Syntactic-semantic Chinese Verbs. In Proceedings of the 19 th
Analysis with PropBank and NomBank. In International Joint Conference on
Proceedings of the Twelfth Conference on Artificial Intelligence:1160-1165
Computational Natural Language Yu, Shiwen et al.. 2003. Introduction to
Learning: 183-187. Grammatical Knowledge Base of
Lee, Lillian. Similarity-Based Approaches to Contemporary Chinese (Second Edition)
Natural Language Processing. Ph.D. thesis. (in Chinese), Tsinghua University Press,
Harvard University. Beijing, China.
Lesk, Michal. 1986. Automatic Sense
Disambiguation: How to Tell a Pine from
an Ice Cream Cone. In Proceedings of the
5th Annual International Conference on
Systems Documentation: 24-26.
Li, Sujian, Zhang Jian, Huang Xiong and Bai
Shuo. 2002. Semantic Computation in
Chinese Question-Answering System,
Journal of Computer Science and
Technology, 17(6) : 993-999.
Lu, Jianming. 2010. Foundations of Rhetoric --
The Law of Semantic Harmony. Rhetoric
Learning, 2010(1): 13-20.
Qu, Weiguang, Sui Zhifang, et al. 2007. A
Collocation-based WSD Model:
RFR-SUM. In Proceedings of the 20 th

You might also like