0% found this document useful (0 votes)
38 views

Natural Language Processing

Natural Language Processing in Artificial Intelligence

Uploaded by

editedvideoes
Copyright
© © All Rights Reserved
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
38 views

Natural Language Processing

Natural Language Processing in Artificial Intelligence

Uploaded by

editedvideoes
Copyright
© © All Rights Reserved
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 32

Natural Language

Processing
Introduction – Syntax – semantics – Introduction to Statistical NLP
NLP - Introduction
• Language is a method of communication with the help of which we
can speak, read and write.
• Natural Language Processing (NLP) is a subfield of Computer Science
that deals with Artificial Intelligence (AI), which enables computers to
understand and process human language.
• It is a challenge for us to develop NLP applications because computers
need structured data, but human speech is unstructured and often
ambiguous in nature.
• Technically, the main task of NLP would be to program computers for
analyzing and processing huge amount of natural language data.
History of NLP
• History of NLP divided into four phases
• First Phase (Machine Translation Phase) - Late 1940s to late 1960s
• Second Phase (AI Influenced Phase) – Late 1960s to late 1970s
• Third Phase (Grammatico-logical Phase) – Late 1970s to late 1980s
• Fourth Phase (Lexical & Corpus Phase) – The 1990s
First Phase (Machine Translation
Phase) - Late 1940s to late 1960s
• The work done in this phase focused mainly on machine translation
(MT)
Second Phase (AI Influenced Phase)
– Late 1960s to late 1970s
• the work done was majorly related to world knowledge and on its role
in the construction and manipulation of meaning representations
Third Phase (Grammatico-logical
Phase) – Late 1970s to late 1980s
• Due to the failure of practical system building in last phase, the
researchers moved towards the use of logic for knowledge
representation and reasoning in AI
Fourth Phase (Lexical & Corpus
Phase) – The 1990s
• The phase had a lexicalized approach to grammar that appeared in
late 1980s and became an increasing influence.
• There was a revolution in natural language processing in this decade
with the introduction of machine learning algorithms for language
processing.
Ambiguity and Uncertainty in
Language
• Being understood in more than one way
• NLP has the following types of ambiguities:
• Lexical Ambiguity
• Syntactic Ambiguity
• Semantic Ambiguity
• Anaphoric Ambiguity
• Pragmatic ambiguity
Lexical Ambiguity Anaphoric Ambiguity
• The ambiguity of a single word is called • This kind of ambiguity arises due to the use of
lexical ambiguity. anaphora entities in discourse.
• For example, treating the word silver as a • For example, the horse ran up the hill. It was very
noun, an adjective, or a verb. steep. It soon got tired. Here, the anaphoric
reference of “it” in two situations cause
Syntactic Ambiguity ambiguity.
• This kind of ambiguity occurs when a
sentence is parsed in different ways. Pragmatic ambiguity
• For example, the sentence “The man saw • Such kind of ambiguity refers to the situation
the girl with the telescope”. It is where the context of a phrase gives it multiple
ambiguous whether the man saw the girl interpretations.
carrying a telescope or he saw her through • For example, the sentence “I like you too” can
his telescope have multiple interpretations like I like you (just
Semantic Ambiguity like you like me), I like you (just like someone else
• This kind of ambiguity occurs when the does).
meaning of the words themselves can be
misinterpreted.
• In other words, semantic ambiguity
happens when a sentence contains an
ambiguous word or phrase.
• For example, the sentence “The car hit the pole while it was moving” is having semantic
ambiguity because the interpretations can be “The car, while moving, hit the pole” and “The car
hit the pole while
NLP Phases
Syntactic Analysis - Introduction
• In this sense, syntactic analysis or parsing may be defined as the
process of analyzing the strings of symbols in natural language
conforming to the rules of formal grammar.
• The origin of the word ‘parsing’ is from Latin word ‘pars’ which means ‘part’.
• The purpose of Syntax analysis is to draw exact meaning, or dictionary
meaning from the text.
• Syntax analysis checks the text for meaningfulness comparing to the
rules of formal grammar.
• For example, the sentence like “hot ice-cream” would be rejected by semantic
analyzer.
Syntactic Analysis
Concept of Parse Tree
• It may be defined as the graphical depiction of a derivation.
• The start symbol of derivation serves as the root of the parse tree.
• In every parse tree, the leaf nodes are terminals and interior nodes
are non-terminals.
• A property of parse tree is that in-order traversal will produce the
original input string.
Concept of Grammar
• Grammar is very essential and important to describe the syntactic structure of well-formed programs.
• In the literary sense, they denote syntactical rules for conversation in natural languages.
NLP - Linguistic Resources
• Corpus - A corpus is a large and structured set of machine-readable texts that
have been produced in a natural communicative setting.
• Its plural is corpora.
• They can be derived in different ways like text that was originally electronic,
transcripts of spoken language and optical character recognition, etc.
TreeBank Corpus
• It may be defined as linguistically parsed text corpus that annotates
syntactic or semantic sentence structure.
• Geoffrey Leech coined the term ‘treebank’, which represents that the
most common way of representing the grammatical analysis is by
means of a tree structure.
• Generally, Treebanks are created on the top of a corpus, which has
already been annotated with part-of-speech tags.
Types of TreeBank Corpus
• Semantic Treebanks
• These Treebanks use a formal representation of sentence’s semantic structure.
• They vary in the depth of their semantic representation.
• Robot Commands Treebank, Geoquery, Groningen Meaning Bank, RoboCup Corpus are some of
the examples of Semantic Treebanks.
• Syntactic Treebanks
• Opposite to the semantic Treebanks, inputs to the Syntactic Treebank systems are expressions
of the formal language obtained from the conversion of parsed Treebank data.
• The outputs of such systems are predicate logic based meaning representation.
• Various syntactic Treebanks in different languages have been created so far:
• Penn Arabic Treebank, Columbia Arabic Treebank are syntactic Treebanks created in Arabia language.
• Sininca syntactic Treebank created in Chinese language.
• Lucy, Susane and BLLIP WSJ syntactic corpus created in English language.
PropBank Corpus
• PropBank more specifically called “Proposition Bank” is a corpus, which is annotated
with verbal
propositions and their arguments.
• The corpus is a verb-oriented resource; the annotations here are more closely related to
the syntactic level.
• Martha Palmer et al., Department of Linguistic, University of Colorado Boulder
developed it.
• In Natural Language Processing (NLP), the PropBank project has played a very
significant role. It
helps in semantic role labeling

VerbNet(VN)
• VerbNet(VN) is the hierarchical domain-independent and largest lexical resource present
in English that incorporates both semantic as well as syntactic information about its
contents.
• VN is a broad coverage verb lexicon having mappings to other lexical resources such as
WordNet, Xtag and FrameNet.
• It is organized into verb classes extending Levin classes by refinement and addition of
subclasses for achieving syntactic and semantic coherence among class members.
WordNet
• WordNet, created by Princeton is a lexical database for English language.
• It is the part of the NLTK corpus.
• In WordNet, nouns, verbs, adjectives and adverbs are grouped into sets of
cognitive synonyms called Synsets.
• All the synsets are linked with the help of conceptual-semantic and lexical
relations.
• Its structure makes it very useful for natural language processing (NLP).
• In information systems, WordNet is used for various purposes like word-sense
disambiguation,
information retrieval, automatic text classification and machine translation.
• One of the most important uses of WordNet is to find out the similarity among
words.
• For this task, various algorithms have been implemented in various packages like
Similarity in Perl, NLTK in Python and ADW in Java.
Semantic Analysis
Homonymy
•Definition: Homonyms are words that have the same spelling or pronunciation
but have entirely unrelated meanings.
•Example: The word "bat" can mean a flying mammal or a piece of sports
equipment used in baseball. These meanings have no connection to each other.
•Types:
• Homophones: Same pronunciation but different meanings and spellings
(e.g., flower and flour).
• Homographs: Same spelling but different meanings and sometimes
pronunciations (e.g., lead as in the metal and lead as in to guide).
•Homonymy: Unrelated meanings with the same form (e.g., bat as an animal vs. bat for sports).
•Polysemy: Related meanings with the same form (e.g., bank as a financial institution vs. bank as the side of a
river).
Morphological Analyzer

You might also like