lecture1-intro
lecture1-intro
Natural Language
Processing
Instructor: Jackie CK Cheung & David I.
Adelani
COMP-550
Fall 2024
J&M Chapter 1
About Jakie
Associate Professor at McGill 2021 -
• Associate Scientific Co-Director at Mila
Assistant Professor at McGill 2015 – 2021
PhD in Computer Science (Toronto) 2014
4
Textbook
Jurafsky and Martin. Speech and Language Processing
(2nd edition)
5
Assignments
Two programming assignments (10% each x 2 = 20%)
Hand in online through myCourses
Programming to be done in Python 3.
Four reading assignments (5% each x 4 = 20%)
Covers advanced material and applications
6
Midterm
Worth 25% of your final grade
To be completed online as a myCourses quiz
7
Final Project
Worth 35%.
Experiment on some language data set
Summarize and review relevant papers
Report on experiments
Must be done in teams of three
8
Project Steps
Paper or project proposal
Progress update
Final submission
9
General Policies
Lateness policy for assignments:
• Grace period of 24 hours
• > 24 hours: accepted if it is convenient for us at our
discretion
Plagiarism: just don’t do it—I regularly catch and
submit cases.
Language policy: In accord with McGill policy, you
have the right to write essays and examinations in
English or in French.
10
Generative AI Usage
Fine to use in an assistive manner
• Help understand course content
• Search for information
• Brainstorm ideas
• Edit writing
Must acknowledge use of this technology.
Not okay to use as primary means to complete tasks
• Feed in assignment questions to generate solutions
• Generate project report from scratch on a topic
Platforms
ed
Being adopted by many CS courses this term
You’ll be added this week
Most releases will be done via this platform
myCourses
Assignment and project submissions
Midterm
Grade release
12
Computational Linguistics
and Natural Language
Processing
13
LLMs – Impressive Impact!
• Question answering, code generation, essay writing, summarization
• Commercial uses: customer service, personal assistants, healthcare
• Many informal uses: entertainment, settling disputes
18
Languages Are Diverse
6000+ languages in the world
language
langue
ਭਾਸਾ
語言 WashingtonPost
idioma
Sprache
lingua
19
What is Language?
Some properties:
• Form of communication
• Arbitrary pairing between form and meaning
• Primarily vocal (exception: sign languages)
• Highly expressive and productive
• Nearly universal (barring developmental disorders)
20
Computational Linguistics (CL)
Modelling natural language with computational models
and techniques
21
Computational Linguistics (CL)
Modelling natural language with computational models
and techniques
Goals
Language technology applications
Scientific understanding of how language works
22
Computational Linguistics (CL)
Modelling natural language with computational models
and techniques
23
Natural Language Processing
Computational linguistics and natural language
processing (NLP) are sometimes used interchangeably.
Slight difference in emphasis:
NLP CL
Goal: practical Goal: how language
technologies actually works
Engineering Science
24
Understanding and Generation
Natural language understanding (NLU)
Language to form usable by machines or humans
25
Personal Assistant App
Understanding
Call a taxi to take me to the airport in 30 minutes.
Generation
26
Machine Translation
I like natural language processing.
Generation
27
Computational Linguistics
Besides new language technologies, there are other
reasons to study CL and NLP as well.
28
The Nature of Language
First language acquisition
Chomsky proposed a universal grammar
Is language an “instinct”?
29
The Nature of Language
Language processing
Some sentences are supposed to be grammatically correct,
but are difficult to process.
Formal mathematical models to account for this.
30
Mathematical Foundations of CL
We describe language with various formal systems.
31
Mathematical Foundations of CL
Mathematical properties of formal systems and
algorithms
Can they be efficiently learned from data?
Efficiently recovered from a sentence?
Complexity analysis
Implications for algorithm design
32
Types of Language
Text
In some sense, an idealization of spoken language.
Much of traditional NLP work has been on news text.
Clean, formal, standard English, but very limited!
More recent work on diversifying into multiple domains
Political texts, text messages, Twitter
Speech
Messier: disfluencies, non-standard language
Automatic speech recognition (ASR)
Text-to-speech generation
33
34
Domains of Language
The grammar of a language has traditionally been
divided into multiple levels.
Phonetics
Phonology
Morphology
Syntax
Semantics
Pragmatics
Discourse
35
Phonetics
Study of the speech sounds that make up language
Articulation, transmission, perception
peach [phi:tsh]
36
Phonology
Study of the rules that govern sound patterns and how
they are organized
37
Morphology
Word formation and meaning
antidisestablishmentarianism
anti- dis- establish -ment -arian -ism
establish
establishment
establishmentarian
establishmentarianism
disestablishmentarianism
antidisestablishmentarianism
38
Syntax
Study of the structure of language
*I a woman saw park in the.
I saw a woman in the park.
39
Syntax
http://explosm.net/comics/1682/
There are two meanings for the first sentence in the comic!
What are they? This is called ambiguity.
40
Semantics
Study of the meaning of language
bank
Ambiguity in the sense of the word
41
Semantics
Ross wants to marry a Swedish woman.
42
Pragmatics
Study of the meaning of language in context.
🡪 Literal meaning (semantics) vs. meaning in context:
http://www.smbc-comics.com/index.php?id=3730
43
Pragmatics
44
Pragmatics
45
Pragmatics
46
Pragmatics – Deixis
Interpretation of expressions can depend on
extralinguistic context
e.g., pronouns
I think cilantro tastes great!
47
Discourse
Study of the structure of larger spans of language (i.e.,
beyond individual clauses or sentences)
I am angry at her.
She lost my cell phone.
I am angry at her.
The rabbit jumped and ate two carrots.
48
NLP – the Technological
Perspective
A combination of pre-specified knowledge and
machine learning from data
49
NLP Tools and Techniques
Major paradigms for NLP, not mutually exclusive:
Rule-based systems
• Often hand-engineered knowledge about language
• E.g., heureux -> happy
Machine learning
• Model learns about language through examples
• Classification: e.g., is this e-mail spam?
• Sequence models: make series of decisions
• Many other paradigms
Knowledge representation
• Formal structure to encode what model knows
• Logic? A large set of continuous-valued numbers?
50
Topics in COMP-550
Organized roughly by level of linguistic analysis and a
corresponding technical approach (ML or otherwise)
NLP Topic Linguistic layer Techniques
52
Course Objectives
Understand the broad topics, applications and
common terminology in the field
Prepare you for research or employment in CL/NLP
Learn some basic linguistics
Learn the basic algorithms
Be able to read an NLP paper
Understand the challenges in CL/NLP
Answer questions like “Is it easy or hard to…”
53
Next Lecture
The next lecture is Wednesday, Sept 4
54