Module-4
Module-4
(Accredited by NAAC Approved by A.I.C.T.E. New Delhi, Recognized by Govt. of Karnataka & Affiliated to V.T U., Belagavi)
#57, Chimney Hills, Hesaraghatta Main Road, Chikkabanavara Post, Bengaluru- 560090
ARTIFICIAL AND MACHINE LEARNING DEPARTMENT
Module-4
Semantic Parsing
1. Introduction
• In other words, the reusability of the representation across domains is very limited.
• The problem with second approach is that it is extremely difficult to construct a
general-purpose ontology and create symbols that are shallow enough to be learnable
but detailed enough to be useful for all possible applications.
• Ontology means
1. The branch of metaphysics dealing with the nature of being.
2. a set of concepts and categories in a subject area or domain that shows their properties
and the relations between them.
"what's new about our ontology is that it is created automatically from large datasets"
Resolve the ambiguities of words in context. The bill is large but need not be paid, the
theory should be able to disambiguate the monetary meaning of bill.
Identify meaningless but syntactically well-formed sentence: Colorless green ideas
sleep furiously.
Identify syntactically or transformationally unrelated paraphrasers of concept having
the same semantic content.
2.1Structural Ambiguity
2.2Word Sense
In any given language, the same word type is used in different contexts and with
different morphological variants to represent different entities or concepts in the
world.
For example, we use the word nail to represent a part of the human anatomy and also
to represent the generally metallic object used to secure other objects.
Once we have the word-sense, entities and events identified, another level of
semantics structure comes into play: identifying the participants of the entities in
these events.
Resolving the argument structure of predicate in the sentence is where we identify
which entities play what part in which event.
A word which functions as the verb is called a predicate and words which function as
the nouns arecalled arguments. Here are some other predicates and arguments:
2.5Meaning Representation
1.System Architectures
a. Knowledge based: These systems use a predefined set of rules or a knowledge base
to obtain a solution to a new problem.
b. Unsupervised: Thesesystemstendto requireminimal humanintervention to be
functional by using existing resources that can be bootstrapped for a particular
application or problem domain.
c. Supervised: these systems involve the manual annotation of some
phenomena that appear in a sufficient quantity of data so that machine
learning algorithms can be applied.
d. Semi-Supervised: manual annotation is usually very expensive and does not
yield enough data to completely capture a phenomenon. In such instances,
researches can automatically expand the data set on which their models are
trained either by employing machine-generated output directly or by
bootstrapping off an existing model by having humans correct its output.
2.Scope:
Domain Dependent: These systems are specific to certain domains, such as air travel
reservations or simulated football coaching.
Domain Independent: These systems are general enough that the techniques can be
applicable to multiple domains without little or no change.
3. Coverage
a. Shallow: These systems tend to produce an intermediate representation that can then be
converted to one that a machine can base its action on.
b. Deep: These systems usually create a terminal representation that is directly consumed by a
machine or application.
Resources:
As with any language understanding task, the availability of resources is key factor in
the disambiguation of the word senses in corpora.
Early work on word sense disambiguation used machine readable dictionaries or
thesaurus as knowledge sources.
Two prominent sources were the Longman dictionary of contemporary English
(LDOCE) and Roget’s Thesaurus.
The biggest sense annotation corpus OntoNotes released through Linguistic Data
Consortium (LDC).
The Chinese annotation corpus is HowNet.
Systems:
Researchers have explored various system architectures to address the sense disambiguation
problem.
We can classify these systems into four main categories: i. rules based or knowledge
ii. Supervised iii.unsupervised iv. Semisupervised
Rule Based:
The first-generation of word sense disambiguation systems was primarily based on
dictionary sense definitions.
Much of this information is historical and cannot readily be translated and made
SRI KRISHNA INSTITUTE OF TECHNOLOGY
(Accredited by NAAC Approved by A.I.C.T.E. New Delhi, Recognized by Govt. of Karnataka & Affiliated to V.T U., Belagavi)
#57, Chimney Hills, Hesaraghatta Main Road, Chikkabanavara Post, Bengaluru- 560090
ARTIFICIAL AND MACHINE LEARNING DEPARTMENT
available for building systems today. But some of techniques and algorithms are still
available.
SRI KRISHNA INSTITUTE OF TECHNOLOGY
(Accredited by NAAC Approved by A.I.C.T.E. New Delhi, Recognized by Govt. of Karnataka & Affiliated to V.T U., Belagavi)
#57, Chimney Hills, Hesaraghatta Main Road, Chikkabanavara Post, Bengaluru- 560090
ARTIFICIAL AND MACHINE LEARNING DEPARTMENT
each category.
SRI KRISHNA INSTITUTE OF TECHNOLOGY
(Accredited by NAAC Approved by A.I.C.T.E. New Delhi, Recognized by Govt. of Karnataka & Affiliated to V.T U., Belagavi)
#57, Chimney Hills, Hesaraghatta Main Road, Chikkabanavara Post, Bengaluru- 560090
ARTIFICIAL AND MACHINE LEARNING DEPARTMENT
Supervised:
• The simpler form of word sense disambiguating systems the supervised approach,
SRI KRISHNA INSTITUTE OF TECHNOLOGY
(Accredited by NAAC Approved by A.I.C.T.E. New Delhi, Recognized by Govt. of Karnataka & Affiliated to V.T U., Belagavi)
#57, Chimney Hills, Hesaraghatta Main Road, Chikkabanavara Post, Bengaluru- 560090
ARTIFICIAL AND MACHINE LEARNING DEPARTMENT
which tends to transfer all the complexity to the machine learning machinery while
still requiring hand annotation tends to be superior to unsupervised and performs best
SRI KRISHNA INSTITUTE OF TECHNOLOGY
(Accredited by NAAC Approved by A.I.C.T.E. New Delhi, Recognized by Govt. of Karnataka & Affiliated to V.T U., Belagavi)
#57, Chimney Hills, Hesaraghatta Main Road, Chikkabanavara Post, Bengaluru- 560090
ARTIFICIAL AND MACHINE LEARNING DEPARTMENT
Unsupervised:
SRI KRISHNA INSTITUTE OF TECHNOLOGY
(Accredited by NAAC Approved by A.I.C.T.E. New Delhi, Recognized by Govt. of Karnataka & Affiliated to V.T U., Belagavi)
#57, Chimney Hills, Hesaraghatta Main Road, Chikkabanavara Post, Bengaluru- 560090
ARTIFICIAL AND MACHINE LEARNING DEPARTMENT
SRI KRISHNA INSTITUTE OF TECHNOLOGY
(Accredited by NAAC Approved by A.I.C.T.E. New Delhi, Recognized by Govt. of Karnataka & Affiliated to V.T U., Belagavi)
#57, Chimney Hills, Hesaraghatta Main Road, Chikkabanavara Post, Bengaluru- 560090
ARTIFICIAL AND MACHINE LEARNING DEPARTMENT
Semi Supervised:
SRI KRISHNA INSTITUTE OF TECHNOLOGY
(Accredited by NAAC Approved by A.I.C.T.E. New Delhi, Recognized by Govt. of Karnataka & Affiliated to V.T U., Belagavi)
#57, Chimney Hills, Hesaraghatta Main Road, Chikkabanavara Post, Bengaluru- 560090
ARTIFICIAL AND MACHINE LEARNING DEPARTMENT
SRI KRISHNA INSTITUTE OF TECHNOLOGY
(Accredited by NAAC Approved by A.I.C.T.E. New Delhi, Recognized by Govt. of Karnataka & Affiliated to V.T U., Belagavi)
#57, Chimney Hills, Hesaraghatta Main Road, Chikkabanavara Post, Bengaluru- 560090
ARTIFICIAL AND MACHINE LEARNING DEPARTMENT
Self-training is a popular semi-supervised learning approach that can be adapted for WSD. In
self-training for WSD, you start with a small set of labeled examples and a larger set of
unlabeled examples. The process involves iterative steps:
SRI KRISHNA INSTITUTE OF TECHNOLOGY
(Accredited by NAAC Approved by A.I.C.T.E. New Delhi, Recognized by Govt. of Karnataka & Affiliated to V.T U., Belagavi)
#57, Chimney Hills, Hesaraghatta Main Road, Chikkabanavara Post, Bengaluru- 560090
ARTIFICIAL AND MACHINE LEARNING DEPARTMENT