0% found this document useful (0 votes)
29 views

Lecture 01

Uploaded by

1162407364
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
29 views

Lecture 01

Uploaded by

1162407364
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 44

Natural Language

Processing
Lecture 1: Course Overview and Introduction.

10/26/2020

N L P
COMS W4705 (2) – Fall B 2020
Yassine Benajiba
The 4705 Team
• Instructor: Yassine Benajiba <[email protected]>
Office Hours: Friday after class
On Zoom or Room 7LW1A (when in person)

• Assistants:

• Let’s look at Courseworks

• IA office hours / recitations start next week.


Time/Location TBA by email.
Lectures & Recitation
Sessions
• Lectures:

Mon10:10am- 12:40pm, Zoom or 309 Havemeyer Hall

Fri 1:10pm-3:40pm, Zoom 417 International Affairs Building

Recitation Sessions:

• Optional recitation sessions, led by the IAs


(schedule TBA)
Course Resources
• Gradescope/Courseworks 2 (a.k.a Canvas):

• Courseworks: All course materials: Lecture notes, code,


announcements, assignments, reading materials

• Homework submission, grade book on both

• Piazza used for Q & A. (COMSW4705_002_2020_3 - NATURAL LANGUAGE PROCESSING)

Do not email the instructor or IAs with questions about the


course content.
Textbook / Reading
• There is NO official textbook for this course.

• Recommended textbook (somewhat outdated, we won’t


follow too closely):

Dan Jurafsky & James Martin


Speech and Language Processing
2nd Ed. Prentice Hall (2009).

• Draft of most 3rd edition chapters:


https://web.stanford.edu/~jurafsky/slp3/

• We will also read a number of research papers.


Textbook / Reading
• Recommended textbook (mostly relevant later in the
course):

Yoav Goldberg
Neural Network Methods for
Natural Language Processing
Morgan & Claypool. 2017

• Available as an ebook through the CU library


https://clio.columbia.edu/catalog/13420294
Prerequisites
• Data Structures (COMS W3134 or COMS W3137)

• Discrete Math (COMS W3202, recommended)

• Some previous or concurrent exposure to AI and machine


learning is beneficial, but not required.

• Some experience with basic probability/statistics.

• Some experience with Python is helpful.


Grading
• Midterm 20%

• Final 30%

• 4 Homework assignments, each contains an analytical


and a programming part, 10% each

• Regrade requests should be submitted on Gradescope


within 3 days!

• Class is Pass/Fail this semester


Homework

• Homework uploaded through Courseworks AND


Gradescope. Do not email!

• PDF: analytical part + copy/paste code programming


part (only for comment and discussion on Gradescope)

• Python 3: programming part on Courseworks only


Homework Late Policy

• Written homework and programming problems may be


submitted up to 2 days late for a 20 point penalty.

• No homework will be accepted more than 2 days after the


deadline.

• Other extensions will only be granted in exceptional


circumstances.
Academic Honesty

• Submit your own answers and code.

• Review academic honesty policy on the syllabus


(Courseworks).

• When in doubt, ask.

• When in trouble, ask for help (and early).


NLP in the Movies
Natural Language
Processing
• Important and active research area within AI.

• Timely: Most of our activities online are text based


(web-pages, email, social media, blogs, news, product
descriptions and reviews, medical reports, course content,
…)

• NLP leverages more and more available training data and


modern Machine Learning techniques.

• Communicating with computers is the “holy grail” of AI.


Turing Test
(Alan Turing, 1950)

• A computer passes the test of intelligence if it can fool a


human interrogator into believing it is human.

• What skills are needed to build such a system?

• Language processing, knowledge representation,


reasoning, learning.
Image source: Russel & Norvig, Artificial Intelligence - A Modern Approach
Natural Language
Processing

AI NLP Linguistics

“Every time I fire a linguist, my performance goes up” (Fred Jelinek)


Natural Language Processing
vs. Computational Linguistics

• NLP: Build systems that can understand and generate


natural language. Focus on applications.

• Computational Linguistics: Study human language


using computational approaches.

• Many overlapping techniques.


Applications: Information
Retrieval

query

indexed document
corpus

ranked results
Applications: Text
Classification
• Spam filtering.

• Detecting topics / genre.

• Sentiment analysis, author recognition, forensic


linguistics, …
Applications: Sentiment
Analysis
Fantastic... truly a wonderful family movie

I have a mixed feeling about this movie.

Well it is fun for sure but definitely not appropriate


for kids 10 and below

My kids loved it!!

The movie is very funny and entertaining. Big A+

I got so boooored...

Disappointed. They showed all fun details in the trailer

Cute but not for adults


Applications: News
Summarization
Application: Question
Answering
“Where was George Washington born?”

Unstructured
Text

QA system

Knowledge
Base

“Westmoreland County, Virginia”


Applications: Playing
Jeopardy! IBM Watson [2011]

William Wilkinson’s “An Account of the Principalities of Wallachia and Moldavia“ in

Combines information extraction & natural language understanding.


Applications: Machine
Translation
Machine Translation
• One of the main research areas in NLP, and one of the
oldest. Historical motivation: Translate Russian to English.

• MT is really difficult:

• “Out of sight, out of mind” → “Invisible, imbecile”

• “The spirit is willing, but the flesh is weak”


English → Russian → English
“The vodka is good, but the meat is rotten”

• Challenges: Word order, multiple translations for a word


(need context), want to preserve meaning.
Machine Translation

• Until recently phrase-based translation was the


predominant framework.

• Today neural network sequence-to-sequence models are


used.

• Google Translate supports > 100 languages.


Applications: Virtual
Assistants
• Siri (Apple), Google Now, Cortana (Microsoft), Alexa
(Amazon).

• Subtasks: Speech recognition, language understanding (in


context?), speech generation, …
Applications:Image
Captioning

“Man in black t-shirt is playing guitar.”

• Neural Networks for Object Detection and


Language Generation.
• “Multi-modal” embeddings.
• Microsoft COCO data set.

A. Karpathy, L. Fei-Fei. Deep Visual-Semantic Alignments for Generating Image Descriptions. CVPR 201
What You Will Learn In This
Course
• How can machines understand and generate natural
language?

• Theories about language (linguistics).

• Algorithms.

• Statistical / Machine Learning Methods.

• Applications.
Course Overview
• Part I: Core NLP techniques.

• Language modeling, part-of-speech tagging, syntactic parsing,


word-sense disambiguation, semantic parsing, text similarity.

• Part II: Applications.

• text classification, information retrieval, question answering, text


generation, summarization, machine translation, image captioning,
dialog systems.

• Machine Learning Techniques:


Supervised machine learning, bayesian models, sequence models (n-
gram models, HMMs), neural networks, recurrent neural networks,...
Levels of Linguistic
Representation
phonetics sounds and sound
phonology patterns of language

morphology formation of words in- + validate + -ed

DT | NN | VBZ | DT | NN | TO | VB | PRP |.
the | boy | want+s | the | girl | to | like | him |.

syntax word order

word and sentence


semantics
meaning

influence of context and


pragmatics
situation
Natural Language
Processing as Translation
• Most NLP techniques can be understood as translation
tasks from one structure into another.

• For each translation step:

• Construct search space of possible translations.

• Find best paths through this space (decoding) according


to some performance measure.

• Modern NLP relies on Machine Learning to figure out


these translation steps.
NLP is hard: Ambiguity
• Unlike artificial languages, natural language is full of ambiguity.

• This can happen on all levels of representation.

• “Wreck a Nice Beach” , “Recognize Speech”

• “inflammable” = in + -flammable

• “Enraged Cow Injures Farmer with Axe”

• “Stolen Painting Found by Tree”

• “Red Tape Holds Up New Bridges”

• “Mouse”
More Real Headlines
• Ban on nude dancing on Governor’s desk

• Kids Make Nutritious Snacks

• Drunk gets nine months in violin case

• Government head seeks arms

• Patient at death’s door – doctors pull him through

• In America a woman has a baby every 15 minutes


Syntactic Structure
• What is the part-of-speech of each word? (noun, verb, adjective, adverb,
determiner, …)

• What are the constituents:

• Noun phrase: “Enraged cow”, “The cat with the hat”,


“Columbia University”

• What are the subjects and objects:

• “Dog bites man” vs. “Man bites dog”

• Modification:

• “John saw the man in the park with a telescope”


Structural Ambiguity
• Interplay between constituent structure and modification.

• Prepositional Phrase (PP) attachment:


Enraged cow injures farmer with axe.

[Enraged cow] injures [farmer with axe]


NP NP

[Enraged cow] injures farmer [with axe]


NP NP PP
Representing Modification
with Brackets

[Enraged cow] [injures [farmer [with axe]]]


NP NP PP

[Enraged cow] injures [farmer] [with axe]]


NP NP PP
More PP attachment
[Ban]
Ban on
on nude
[nudedancing
dancing]
onon
governor’s
[governor’s
desk
desk]
NP NP NP

• What are the possible modifications? Which one is correct?

[[Ban] on [nude dancing]] [on governor’s desk]


NP PP

[Ban] on [[nude dancing] [on governor’s desk]]


NP PP

NP
Noun-Noun Modification
• What is the semantic relationship between nouns in a noun
compound?

• Water fountain: A fountain that supplies water.

• Water ballet: A ballet that takes place in water.

• Water meter: A device that measures water.

• Water barometer: A barometer that uses water (instead


of mercury) to measure air pressure.

• Water glass: A glass that is meant to hold water.


Other tricky phenomena
• Need for semantic representation.

There was once a Wolf who saw a Lamb drinking at a river and wanted an excuse to eat it.

For that purpose, even though he himself was upstream, he accused the Lamb of stirring
up the water and keeping him from drinking. . .

Minsky 1975
Other tricky issues:
Language Variety
• Problem: Most NLP techniques were developed on
English (specifically financial news written in American
English in the 1980s), or other languages with many
resources.

• Languages use different mechanisms to express meaning


(morphology vs. word-order).
Other tricky issues: Domains
and Language Change

• Non-standard English

• Idioms: throw in the towel, get cold feet, kick the bucket

• Neologisms (fixed lexicon doesn’t work)

• noob, crowdsource, unfriend, retweet, bromance, …


Morphology
• Structure and formation of words.

• Derivational morphology: Create new words from old words (can


also change the part-of-speech).
anti- + dis- + establish + -ment + -arian + -ism

• Inflectional morphology:

• Convey information about number, person, tense, aspect,


mood, voice, and the role a word plays in the sentence (case).

• English has few morphological categories, but many


languages are morphologically rich.
Morphology
• Morphological categories in English

• Number (“dog”, “dog +s”)

• Person (“I run”, “She runs”)

• Tense (“He waited”)

• Voice (“The issue was decided”)

• Other examples from other languages?


Acknowledgments

• Some slides and examples from Kathy McKeown, Dan


Jurafsky, Dragomir Radev.

You might also like