Natural Language Processing: Dr. Abdulfetah A.A

Download as pptx, pdf, or txt
Download as pptx, pdf, or txt
You are on page 1of 25

Natural Language Processing

Dr. Abdulfetah A.A

1
Course outline(tentative)
• Basics of Natural Language Processing and Language
modeling techniques.
• Syntactic and Semantic Parsing
• NLP: Applications: Information Extraction & Machine
Translation
Units
• Unit 1 and 2: Basics and modeling techniques
• Unit 3 and 4: Syntactic and Semantic Parsing
• Unit 5 and 6: Information Extraction & Machine
Translation
2
References

• Jurafsky, D., and Martin, J.H., Speech and


Language Processing, 2nd Edn, Prentice Hall,
2008
– New draft: https://web.stanford.edu/ jurafsky/slp3/
– References to online draft where possible: J&M3
• Bird, S., Klein, E., Loper, E. (2009). Natural
Language Processing with Python, O'Reilly,
2009
• Web Resources
3
Assessment types

• The course assessment will contain at least


the following types
– Assignments : 40%
– Midterm Exam : 20%
– Final Exam: 40%

4
Natural Language Processing
• Topics for today
– General introduction to NLP
• Why study NLP?

5
Natural language and NLP
• “natural” language
– Amharic, English, Chinese , German Japanese, etc
• Ultimate goal
– To build computer systems that perform as well at using natural
language as humans do
• Immediate goal
– To build computer systems that can process text and speech more
intelligently

6
Why NLP
• Why do we need computers to understand (or
generate) human language?
– Huge amounts of data on the Internet
– We need applications for processing
(understanding, retrieving, translating,
summarizing, …) this large amounts of texts.
– People expect interactive agents to communicate
in NL
E.g. dialogue systems
7
Dialogue systems

• Require both understanding and generation


– Dave: Open the pod bay doors, HAL.
– HAL: I'm sorry Dave, I'm afraid I can't do that.
– Dave: What's the problem?
– HAL: I think you know what the problem is just as
well as I do.

8
Knowledge of Language(steps NLU)
• Task: given sentence, get
some representation
computer can use for
question answering

9
Why NLP
NLP Applications:
• Classifiers: classify a set of document into categories, (as spam filters)
• Information Retrieval: find relevant documents to a given query.
• Information Extraction: Extract useful information from resumes; discover
names of people and events they participate in, from a document.
• Machine Translation: translate text from one human language into another
• Question Answering: find answers to natural language questions in a text
collection or database…
• Summarization: Produce a readable summary, e.g., news about oil today.
• Sentiment Analysis, identify people opinion on a subjective.
• Speech Processing: book a hotel over the phone
• Spelling checkers, grammar checkers, auto-filling, ….. and more

10
Why NLP is complex/hard
• Natural language is extremely rich in form and structure, constantly
changing and ambiguous.
• How to represent meaning,
• Which structures map to which meaning structures.
• One input can mean many different things. Ambiguity can be at
different levels.
– Lexical (word level) ambiguity -- different meanings of words
– Syntactic ambiguity -- different ways to parse the sentence
– Interpreting partial information -- how to interpret pronouns
– Contextual information -- context of the sentence may affect the meaning
of that sentence.
• Many input can mean the same thing.
• Interaction among components of the input is not clear.

11
Linguistics Level of Ambiguity/Analysis

• Phonology: sounds / letters / pronunciation


– two, too.
• Morphology: the structure of words
• child – children, book - books;
• Syntax: grammar, how these sequences are structured
• I saw the man with the telescope
• Semantics: meaning of the strings
• table as data structure, table as furniture.
• Dealing with all of these levels of ambiguity make NLP
difficult

12
Knowledge of Language
• Phonetics and Phonology
– Phonetics: is computer processing concerned with
physical sounds of language; performed using
signal processing methodology. It can be divided
into speech generation and speech analysis.
– Phonology: is linguistic processing of the sounds of
spoken language; higher level than phonetics,
mainly concerned with elementary sound units of
a language called phonemes.

13
Knowledge of Language
• Phonetics and phonology: speech sounds,
their production, and the rule systems that
govern their use
• Morphology: words and their composition
from more basic units
– Cat, cats (inflectional morphology)
– Child, children
– Friend, friendly (derivational morphology)

14
Knowledge of Language
Syntax: is concerned with the sentence structure, i.e., the
rules for arranging words within a sentence.
– One of the main tasks is parsing, which is the task of producing
a parse tree given a sentence as the input.
– Grammar— set of rules for deriving syntactic structure
– Semantics: is interpreting literal meaning of language up
to the sentence level.
– Lexical semantics: semantics of words
– Building semantic representation of larger structures
– Methodology: neural networks, FOPC (first-order logic),
unification

15
Knowledge of Language
• Pragmatics: is concerned with intended, practical meaning
of language.
– Example: “Could you print this document?”
• Discourse: is concerned with language structure beyond
sentence level; such as inter-sentence relations,
references, and document structure.
– Examples: turn taking, speech acts
– Sue took the trip to New York. She had a great time there.
• Sue/she;
• New York/there;
• took/had (time)

16
Ambiguity
• Some interpretations of : I made her duck.
1. I cooked duck for her.
2. I cooked duck belonging to her.
3. I created a toy duck which she owns.
4. I caused her to quickly lower her head or body.
5. I used magic and turned her into a duck.
• duck – morphologically and syntactically ambiguous: noun
or verb.
• her – syntactically ambiguous: dative or possessive.
• make – semantically ambiguous: cook or create.
• make – syntactically ambiguous

17
Resolve Ambiguities
• We will introduce models and algorithms to resolve
ambiguities at different levels.
• part-of-speech tagging -- Deciding whether duck is verb or
noun.
• word-sense disambiguation -- Deciding whether make is
create or cook.
• lexical disambiguation -- Resolution of part-of-speech and
word-sense ambiguities are two important kinds of lexical
disambiguation.
• syntactic ambiguity -- her duck is an example of syntactic
ambiguity, and can be addressed by probabilistic parsing.

18
Resolve Ambiguities (cont.)
I made her duck

S S

NP VP NP VP

I V NP NP I V NP

made her duck made DET N

her duck

19
Models to Represent Linguistic Knowledge

• We will use certain formalisms (models) to


represent the required linguistic knowledge.
• State Machines -- FSAs, FSTs, HMMs, ATNs,
RTNs
• Formal Rule Systems -- Context Free Grammars,
Unification Grammars, Probabilistic CFGs.
• Models of Uncertainty -- Bayesian probability
theory.

20
Algorithms to Manipulate Linguistic Knowledge

• Used to manipulate the representations and produce the desired


behavior
– choosing among possibilities and combining pieces
• Many of the algorithms that we’ll study will turn out to be
transducers; algorithms that take one kind of structure as input and
output another.
• In particular..
– State-space search
• To manage the problem of making choices during processing when we lack the
information needed to make the right choice
– Dynamic programming
• To avoid having to redo work during the course of a state-space search
– Machine Learning (classifiers, EM, etc)

21
State Space Search

• States represent pairings of partially processed


inputs with partially constructed answers
– E.g. sentence + partial parse tree
• Goal is to arrive at the right/best structure after
having processed all the input.
– E.g. the best parse tree spanning the sentence
• As with most interesting AI problems the spaces
are too large and the criteria for “bestness” are
difficult to encode (so heuristics, probabilities)

22
Dynamic Programming

• Don’t do the same work over and over.


• Avoid this by building and making use of
solutions to sub-problems that must be
invariant across all parts of the space.

23
Reading
• Slp2: Chapter 1:
– 1.4 Language, Thought, and Understanding
– 1.5 the state of the art
– 1.6 history
Next lecture : Morphology Analysis: FST
Unit 2: Language model:

24
25

You might also like