Semantic Computing and Language Knowledge Bases
Semantic Computing and Language Knowledge Bases
Semantic Computing and Language Knowledge Bases
Abstract 1 Introduction
Example 1
I bought a table with three dollars. (20091016 Google: 本人买了 3 美元一表)
I bought a table with three legs. (20091016 Google: 本人买了 3 条腿的表)
We know that the word ―table‖ has two But up until now there has not been a theory
common meanings in English (a wooden object that can describe the meaning of various
and a structured data report). But in Chinese language units (words, phrases and sentences)
they correspond to two different words (表 biǎo so perfectly that was accepted universally, even
and 桌子 zhuō zi2 ). From Example 1, we can though Fillmore’s proposition of Framework
see that the search engine cannot distinguish the semantics (1976) is successful enough. Since
two senses and translate them both as 表. Thus, Gildea et al. (2002) initiated the research on
without semantic analysis queries in a search automatic semantic role labeling, many
engine may result in very poor performance. evaluations have been conducted internationally,
The first principle of a search engine is based such as Senseval-3 and SemEval 2007, as well
on shallow Natural Language Processing (NLP) as CoNLL SRL Shared Task 2004, 2005 and
techniques, for instance, string matching, while 2008. Word Sense Disambiguation (WSD) is
future direction of search engines should aim at also a very important research subject and a lot
content index and the understanding of user’s of work has been done in this regard, such as
intention. Semantic computing becomes Lesk (1986), Gale et al. (1998), Jin et a l. (2007)
applicable only with the development of deep and Qu et al. (2007) as the Chinese counterpart.
NLP techniques. Machine Translation (MT) is As to the research on computing word sense
the first application of digital computers in the relatedness, Dagan et al (1993) did some pilot
non-digital world and semantic information is work and Lee (1997) and Resnik (1999)
indispensable in MT research and applications. contributed to the research on semantic
However, there has been no breakthrough to the similarity.
extent of Natural Language Understanding In recent years, semantics-based analysis
(NLU) and semantic computing may serve as such as data and web mining, analysis of social
the key to some success in this field. networks and semantic system design and
synthesis have begun to draw more attention
from researchers. Applications using semantics
2 Related Work on Semantic Computing such as search engines and question answering
(Li et al., 2002), content-based multimedia
Semantics is an interesting but controversial retrieval and editing, natural language interfaces
topic. Many a theory has been proposed in (Yokoi et al., 2005) based on semantics have
attempt to describe what meaning really means. also been attracting attentions. Even semantic
computing has been applied to areas like music
2
Pinyin is currently the most commonly used description, medicine and biology and GIS
Romanization system for standard M andarin. The system systems and architecture. The whole idea is how
is now used in mainland China, Hong Kong, M acau, parts
of Taiwan, M alaysia and Singapore to teach M andarin to realize human-centered computing.
Chinese and internationally to teach M andarin as a second
language. It is also often used to spell Chinese names in
foreign publications and can be used to enter Chinese
characters on computers and cell phones.
3 The Theory and Methodology of commonsensically understood and used by
Semantic Computing humans. The entries or attributes are entirely
formatted in some knowledge representation
and can be manipulated by a computer.
3.1 Important Questions That Need to Be
Where? What are the sources of semantic
Asked about Semantic Computing
knowledge? Traditionally, individual
introspection is often a source of obtaining
In the past few years there has been a growing word senses. However, individual introspection
interest in the field of semantics and semantic brings about both theoretical and
computing. But there are questions that have implementation problems. Theoretically, it is
been always lingering on researchers’ minds. because ―different researchers with different
What on earth semantics is? What is the best theories would observe different things about
way to describe the meaning of a language unit? their internal thoughts...‖ (Anderson 1989).
How can natural languages be processed so that With regard to implementation, it is because
we are able to benefit from human-computer consistency becomes a major problem when the
interaction, or even interpersonal size of the lexicon or the syntactic tree bank
communication? It seems that no one can give exceeds a few thousands entries or annotation
satisfactory answers to these questions. But it is tags. Despite the scientific interest of such
now commonly agreed that the study of experiments, they cannot be extensively
semantic computing or knowledge repeated for the purpose of acquiring mass word
representation is a central issue in sense definitions. On-line corpora and
computational linguistics. The major dictionaries are widely available today and
contributions on this topic are collected in provide experimental evidence of word uses and
Computational Linguistics (1987-2010) and word definitions. The major advantage of
International Journal of Semantic Computing on-line resources is that in principle they
(2007-2010). Research in computing semantics provide the basis for very large experiments,
is, however, rather heterogeneous in scope, even though at present the methods of analysis
methods, and results. The traditional ―wh‖ and and application are not fully developed and
―how‖ questions need to be asked again to need further research to get satisfactory results.
understand the consequences of conceptual and How? Semantic computing can be realized
linguistic decisions in semantic computing: at various levels. The hard work is to implement
What? What should be computed in terms a system in a real domain, or the more
of semantics? Each word is a world and its conceptual task of defining an effective
meaning can be interpreted differently. Despite mathematical framework to manipulate the
the interest that semantics has received from the objects defined within a linguistic model. Quite
scholars of different disciplines since the early obviously the ―hows‖ in the literature about
history of humanity, a unifying theory of semantic computing are much more important
meaning does not exist, no matter whether we than the ―whats‖ and ―wheres‖. The
view a language from a lexical or a syntactic methodology that really works in semantic
perspective. In practice, the quality and type of computing is deeply related to the ultimate
the expressed concepts again depend upon the objective of NLP research, which still cannot be
one who uses it: any language speaker or writer, defined adequately so far.
a linguist, a psychologist, a lexicographer, or a
computer. In psycholinguistics and
computational linguistics, semantic knowledge 3.2 The Perspectives of Semantic Computing
is modeled with very deep and formal from a Macro View
expressions. Often semantic models focus on
some very specific aspect of language Why semantic computing (or NLU) has posed
communication, according to the scientific so great a challenge? We may attribute this to
interest of a researcher. In natural language two major reasons: First, it is based on the
processing, lexical entries or semantic attributes knowledge of human language mechanism. If
typically express linguistic knowledge as fully-developed complicated brains are often
seen as a crowning achievement of biological main contents of semantic computing include
evolution, the interpersonal communication is the following three aspects:
no simpler than human biological mechanism. semantic computing on the ontological
Language has to be a crucial part of the perspective
evolutionary process, which has not been fully semantic computing on the cognitive
understood by scientific research. Second, in perspective
NLP research the language is both the target and semantic computing on the pragmatic
the tool. Current NLP research focuses on either perspective
speech or written texts only. However, in the As for ontologies, much progress has been
real world scenario, reading and interaction made worldwide. The remarkable achievements
between humans are multi-dimensional in English include: WordNet by Princeton
(through different forms of information such as University, PropBank by University of
text, speech, or images and utilizing our Pennsylvania, etc. Also there are quite a number
different senses such as vision, hearing). It is of efforts made on building ontologies in
necessary to rely on the advancements of brain Chinese, which will be elaborated in Section 5.
science, cognitive science and other related In the last few years, the main direction of
fields and work in collaboration to produce semantic computing is to disambiguate
better results. Linguistics, especially language units and constructions. In the
computational linguistics, has made its own following Example 2, the word 仪表 yí biǎo
contribution, and semantic computing will play has two meanings in different contexts. In
an important role in NLP. Chinese, word segmentation is also a problem
There are complex many-to-many relations that needs to be addressed. In Example 3,
between the form and the meaning of a segmenting the word 白天鹅 bái tiān é as 白/
language. Semantic computing is not only the 天 鹅 or 白 天 / 鹅 can result in different
way but also the ultimate goal of natural understanding of the sentences.
language understanding. Although it is hard, we
should not give up. Here we propose that the
Example 2
她的仪表很端庄。 tā de yí biǎo hěn duān zhuāng (She has a graceful appearance.)
她的仪表很精确。 tā de yí biǎo hěn jīng què (Her meters are very accurate.)
Example 3
白天鹅飞过来了。bái tiān é fēi guò lái le (A white swan flies toward us.)
白天鹅可以看家。bái tiān é kě yǐ kān jiā (A goose can guard our house at daytime.)
As to WSD tasks on the word level, some from language knowledge bases (See Section 5).
problems can be solved when ontology is The following examples of syntactic semantic
applied. But ambiguity can also appear on the analysis will illustrate how different syntactic
syntactic level. For this, it is usually difficult for structures will change the meaning of sentences:
ontologies to do much, so we may seek help
Example 4
这样的电影不是垃圾是什么? --该电影是垃圾。
zhè yàng de diàn yǐng bú shì lā jī shì shén me? -- gāi diàn yǐng shì lā jī
If a movie as such is not rubbish, what is it? -- It is rubbish.
这样的电影怎么能说是垃圾呢? -- 该电影不是垃圾。
zhè yàng de diàn yǐng zěn me néng shuō shì lā jī ne? -- gāi diàn yǐng bú shì lā jī
How can a movie as such be rubbish? -- It is not rubbish.
Example 5
蚂蚱是蚂蚱, 蛐蛐是蛐蛐。 -- 蚂蚱不是蛐蛐。
màzhàshìmàzhà, qū qū shì qū qū -- màzhàbú shì qū qū
A grasshopper is a grasshopper, while a cricket is a cricket. -- A grasshopper is not a cricket.
Rule:A is A, while B is B. ——〉A is not B.
丁是丁, 卯是卯。dīng shì dīng, mǎo shì mǎo
Ding is ding, while mao is mao. — being conscientious
With respect to semantic computing on challenge to NLU research. If the computer can
cognitive level, we will use metaphor as an deal with metaphors, it will greatly improve the
example. For a long time, NLP research has ability of natural language understanding.
focused on ambiguity resolution. Can NLU be First, let’s discuss the rhetorical function of
realized after ambiguity resolution? Metaphor, a metaphor. Metaphor is extensively and
insinuation, pun, hyperbole (exaggeration), skillfully used in the Chinese classic ―Book of
humor, personification, as well as intended Songs‖ to boost expressiveness.
word usage or sentence composing, pose a great
Example 6
Simile: 自伯之东,首如飞蓬3 ;岂无膏沐?谁适为容。 --(卫风· 伯兮)
zì bó zhī dōng ,shǒu rú fēi péng ;qǐ wú gào mù ? shuíshìwéi róng。-- (wèi fēng · bó xī)
(Your hair is like disordered grass.)
Metaphor:它山之石,可以攻玉。 --(小雅· 鹤鸣)
tā shān zhī shí ,kě yǐ gōng yù。 --(xiǎo yǎ·hèmí ng)
(Rocks from another mountain can be used to carve jade. Metaphorically this phrase means a
change of method may solve the current problem.)
Also, many Chinese idioms are creation and polysemy production (sense
metaphorical expressions: 同舟共济 tóng zhōu evolution), for example, 垃圾箱 lā jī xiāng
gòng jì(Literally, to cross the river in the same (recycle) and 病毒 bì ng dú(virus) are used in a
boat; metaphorically, to work together with one computer setting and words like 高峰 gāo fēng
heart while in difficulty), 铜墙铁壁 tóng qiáng (peak), 瓶颈 píng jǐng (bottleneck) and 线索
tiě bì (Literally, walls of brass and iron; xiàn suǒ (clue) are endowed with new meanings
metaphorically, impregnable). The Chinese which have not been included in traditional
language makes use of lots of idioms or Chinese dictionaries. Besides, metaphor creates
idiomatic expressions that are derived from new meanings in sentence level, for instance, in
ancient Chinese stories and fables. These 地球是人类的母亲。dì qiú shì rén lèi de mǔ qīn
idioms and idiomatic expressions are often used (The earth is the mother of humanity.), the word
metaphorically and reflect historical and
母亲 (mother) has a different meaning. So,
cultural background of the language. They are
metaphor understanding is beyond the scope of
the most precious relics to the Chinese language
ambiguity resolution. Metaphor, linguistics, and
and culture. Therefore the Chinese Idiom
human cognitive mechanisms are inextricably
Knowledge Base (CIKB) was also built in 2009.
interlinked. So metaphor becomes a fort that
CIKB consists of 38,117 entries and describes
must be conquered in NLU research.
many attributes of Chinese idioms. Among the
From an NLP perspective, metaphors can
attributes, ―literal translation‖, ―free translation‖
be summarized into the following categories as
and ―English equivalent‖ are very valuable.
in Table 1. As for the NLP tasks of metaphor
The linguistic function of metaphor is also
computing, we can conclude that there are three
important. Metaphor is the base of new word
tasks to be accomplished: First, metaphor
3
For the purpose of conciseness, only the underlined parts that contain metaphors are translated.
recognition. For instance, how can we (Knowledge is as rich as the ocean.). Third,
distinguish 知识的海洋 from 海洋资源考察 metaphor generation. For instance, how phrases
hǎi yáng zī yuán kǎo chá (investigation of such as 信息的海洋 xìn xī de hǎi yáng (ocean
ocean resources); Second, metaphor of information) and 鲜花的海洋 xiān huā de
understanding and translation. For instance, 知 hǎi yáng (ocean of flowers) can be generated
识的海洋 actually means 知识像海洋一样丰 successfully by computer?
富。zhī shí xiàng hǎi yáng yí yàng fēng fù
Currently we focus on recognition and two: First, rule (or logic)-based method, i.e.,
understanding of metaphors on phrase and finding the conflicts between the target and the
sentence level. The automatic processing source, and search their common properties.
methods of metaphors can be summarized as
Example 7
这个人是一头狮子。zhè gè rén shì yī tóu shī zi (This man is a lion)
— only the target and the source
那个人是老狐狸。nà gè rén shì lǎo hú li (That man is an old fox.)
— only the target and the source
森林里既有勇猛的狮子,也有狡猾的狐狸。sēn lín lǐ jì yǒu yǒng měng de shī zi, yě yǒu jiǎo
huáde húli (In the forest, there are both brave lions and sly foxes.)
--- find out properties of the sources
这个人是勇猛的,那个人是狡猾的。zhè gè rén shì yǒng měng de, nà gè rén shì jiǎo huá de
(This man is brave, while that man is sly.)
Table 2. The Synset of the word 教师 jiào shī and its related Synsets.