Parsing and Syntax
Parsing and Syntax
Syntax- refers to the way words are related to each other in a sentence.
Syntactic Analysis- analyzes:
how words are grouped together into phrases;
what words modify other words;
what words are of central importance to the sentence.
Syntactic Analysis is used in many NLP applications such as:
Grammar Checking
Question Answering
Information Extraction
Machine Translation
Cont…
Noun Phrases
Student, the student, that student, two students, many students
Clever student
A student of computer science
AU students, long queues, the student with long hair, the city where I lived
Adjective Phrases
incredibly short
rather difficult
very happy
unbelievably quick
exceedingly sorry about the mistake
amazingly rich in minerals
Cont…
Verb Phrases
turn, turn on, is turning on, have been working
threatened to throw himself into the window
was an understandable reaction by the visitors
is amazingly rich in minerals
Prepositional Phrases
on the table
across the world
over your head
in the hotel
to their house
Cont…
Adverbial Phrases
immediately
unbelievably quickly
very carefully
Simple Sentences
The computer is on the table
He went home
They are always happy
Complex Sentences
He was driving the car that he bought from his father
We rented our house to friends while we were abroad
Cont…
Examples: Simple Sentences
Cont…
Concepts: Alphabet, String, and Language
Formal Language Theory - considers a language as a mathematical object defined by
alphabets, strings and grammar.
Alphabet - a finite set of symbols.
e.g. Binary Alphabet: {0, 1}
Decimal Alphabet: {0, 1, 2 , … , 9}
English Alphabet: {a, b, c, … , z, A, B, C, …, Z}
String - a finite sequence of symbols from an alphabet.
e.g. Binary String: 0100101, 01101, 00110
Decimal String: 176392, 12, 398702
English String: killed, Abebe, lion, the
Language- (potentially infinite) set of strings over an alphabet.
e.g. Binary Language: {0100101, 01101, 00110, ….}
Decimal Language: {176392, 12, 398702, ….}
English Language: {killed, Abebe, lion, the, ….}
Cont…
Grammar - a formalism to generate strings in a language by a process of replacing symbols.
- has 4 elements (tuples) represented as: G= (N, T, P, S) where
• N is a finite set of non-terminal symbols. In natural languages, this can be syntactic
categories, phrases or sentences.
• T is a finite set of terminal symbols (disjoint from N). It consists of elements of target
language such as words and letters in natural language.
• P is a finite set of production rules of the form ɑ→β with at least one nonterminal in ɑ.
• S is a member of N called the start symbol (special non-terminal symbol). In natural
languages, the start symbol is a sentence.
Cont…
Hierarchy of Grammars/Languages
Also known as Chomsky Classification, the hierarchy of grammars/languages represents
a hierarchy of expressiveness of grammars.
Different classes of grammars/languages are defined by putting different constraints on
production rules resulting in different structural complexity of sentences of natural
languages.
Chomsky classification consists of the following four levels of grammars/languages:
Type 0 (Unrestricted / Recursively Enumerable)
Type I (Context-Sensitive)
Type II (Context-Free)
Type III (Regular)
Cont…
Hierarchy of Grammars/Languages: Type 0 (Unrestricted)
Cont…
Hierarchy of Grammars/Languages: Type I (Context-Sensitive)
Cont…
Hierarchy of Grammars/Languages: Type II (Context-Free)
Cont…
Hierarchy of Grammars/Languages: Type III (Regular)
Cont…
Parsing
Is the process of recognizing and assigning structure of sentences.
is a derivation process which identifies the structure of sentences using a given grammar.
considered as a special case of a search problem.
two basic methods of searching are used
top-down strategy
bottom-up strategy
methods of improving efficiency
storing lexical rules separately
chunking
Cont…
Parsing Strategies: Top-down Parsing
Cont…
Parsing Strategies: Bottom-up Parsing
Cont…
Towards Efficient Parsing: Separating Lexical Rules
Cont…
Towards Efficient Parsing: Chunking
Applications of parsing
Machine translation
tree
English operations Chinese