0% found this document useful (0 votes)

12 views

Lecture01 Introduction

The document outlines the course structure for COMS W4705 Natural Language Processing, taught by Dr. Daniel Bauer, including important dates, prerequisites, grading criteria, and course resources. It emphasizes the significance of NLP in AI, its applications, and the evolution of machine learning techniques in the field. Students will learn about language theories, algorithms, and various NLP applications, with a focus on ethical considerations in the use of NLP methods and datasets.

Uploaded by

yl5404

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

12 views

Lecture01 Introduction

Uploaded by

yl5404

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 35

Natural Language

Processing
Lecture 1: Course Overview and Introduction.
11/06/2024

COMS W4705
Daniel Bauer
The 4705 Team
• Instructor: Dr. Daniel Bauer (he/him/his)
O ce hours: Fri 1:15-2:45pm (after class 704 CEPSR and
on Zoom, starting 9/13).

• Course Assistants:

See Courseworks for contact info and o ce hour

schedule.
ffi
ffi
Important Dates
• Lectures: Fri 10:10-12:40pm (incl. 20 min break)

• Location: 417 IAB (or on Zoom for CVN or by

permission). All sessions will be recorded.

• Exam 1: Friday Oct 18

• Exam 2: Friday Dec 6

There is no additional nal exam.

fi
Course Resources
• Courseworks

• All course materials: videos, lecture notes, code,

announcements, assignments, reading materials

• Homework submission, grade book.

• Ed used for Q & A (shared between sections)

Textbook / Reading
• There is NO o cial textbook for this course.

• Recommended textbook 1 (somewhat outdated, we

won’t follow this too closely, but
references will be provided):

Dan Jurafsky & James Martin

Speech and Language Processing
2nd Ed. Prentice Hall (2009).

• Draft of most 3rd edition chapters:

https://web.stanford.edu/~jurafsky/slp3/
ffi
Textbook / Reading
• Recommended textbook 2:

Yoav Goldberg
Neural Network Methods for
Natural Language Processing
Morgan & Claypool. 2017

• Available as an ebook through the CU library

https://clio.columbia.edu/catalog/13676351
Prerequisites
• Data Structures (COMS W3134 or COMS W3137)

• Discrete Math (COMS W3202, recommended)

• Some experience with basic probability/statistics.

• Some previous or concurrent exposure to AI and machine

learning is bene cial, but not required.

• Some experience with Python is helpful.

fi
Grading

• 40%: 5 programming assignments (lowest score dropped,

10% each)

• 50%: exams (25% each)

• 10%: Participation (in class attendance and on Ed)

Homework
• Homework uploaded through Courseworks. Do not email solutions.
Check your submission!

• ~ 2-week turnaround.

• 72h limit on regrade requests.

• Only programming. Theory done in class / online ungraded

exercises.

• Use Python 3!

• There will typically be some sca olding code to start you o .

• Some assignments will require a GPU (we will use Google Colab)
ff
ff
Academic Honesty
• Submit your own answers and code.

• Review academic honesty policy on the syllabus.

• When in doubt, ask.

• When in trouble, ask for help (and early).

• We will talk _about_ ChatGPT et al. but please do not use

large language models for your homework.
NLP in the Movies
I am fluent in over
six million forms Open the pod bay
of communication doors HAL!

I’m sorry Dave, I’m

afraid I can’t do
that!
Natural Language
Processing
• Important and active research area within AI.

• Timely: Most of our activities online are text based

(web-pages, email, social media, blogs, news, product
descriptions and reviews, medical reports, course content, …)

• NLP leverages more and more available training data and

modern Machine Learning techniques, such as neural
networks (RNNs, transformers etc.)

• Communicating with computers is the “holy grail” of AI.

• NLP may be "AI complete".

Turing Test
(Alan Turing, 1950)

• A computer passes the test of intelligence if it can fool

a human interrogator into believing it is human.

• What skills are needed to build such a system?

• Language processing, knowledge representation,
reasoning, learning.
Image source: Russel & Norvig, Arti cial Intelligence - A Modern Approach
fi
Natural Language
Processing

AI NLP Linguistics

“Every time I re a linguist, my performance goes up” (Fred Jelinek)

fi
Natural Language Processing
vs. Computational Linguistics

• NLP: Build systems that can understand and generate

natural language. Focus on applications.

• Computational Linguistics: Study human language

using computational approaches.

• Many overlapping techniques.

Applications: Information
Retrieval

query

indexed document
corpus

ranked results
Applications: Text
Classi cation

• Spam ltering.

• Detecting topics / genre.

• Sentiment analysis, author recognition, forensic

linguistics, …

• Detecting hate speech.

fi
fi
Applications: Sentiment
Analysis
Fantastic... truly a wonderful family movie

I have a mixed feeling about this movie.

Well it is fun for sure but de nitely not appropriate

for kids 10 and below

My kids loved it!!

The movie is very funny and entertaining. Big A+

I got so boooored...

Disappointed. They showed all fun details in the trailer

Cute but not for adults

fi
Application: Question
Answering
“Where was George Washington born?”

Unstructured
Text

QA system

Knowledge
Base

“Westmoreland County, Virginia”

Applications: Playing
Jeopardy! IBM Watson [2011]

William Wilkinson’s “An Account of the Principalities of Wallachia and

Moldavia“ inspired this author’s most famous novel.

Combines information extraction & natural language understanding.

Applications:
Summarization

Credit: Prof. Kathleen Mc Keown

Applications: Machine
Translation
Machine Translation
• One of the main research areas in NLP, and one of the oldest.
Historical motivation: Translate Russian to English.

• MT is really di cult:

• “Out of sight, out of mind” → “Invisible, imbecile”

• “The spirit is willing, but the esh is weak”

English → Russian → English
“The vodka is good, but the meat is rotten”

• Challenges: Word order, multiple translations for a word

(need context), want to preserve meaning.
ffi
fl
Machine Translation

• Until recently phrase-based translation was the

predominant framework.

• Today neural network models are used.

• Google Translate supports > 100 languages. Near-human

translation quality for some language pairs.
Machine Translation
Applications: Virtual
Assistants
• Siri (Apple), Google Now, Cortana (Microsoft), Alexa
(Amazon).

• Subtasks: Speech recognition, language understanding

(in context?), speech generation, …
Applications: Large
Language Models

• Predictive text, content generation "Generative AI"

• See ChatGPT example.

Applications: Multi-modal
NLP
Image Captioning Visual Question Answering

“Man in black t-shirt is playing guitar.”

RoboNLP (instruction giving / following, summarization etc.)

Evolution of ML techniques
in NLP
• Rules and heuristics, patterns matching, formal grammars.

• Statistical NLP, generative probabilistic models.

• Discriminative models, support vector machines, logistic regression.

• Back to large generative models.

• Neural networks, phase 1 (RNNs including LSTMs, CNNs)

• Neural networks, phase 2, transformer models, large language

models, pretraining

• Few / zero-shot learning. Prompting.

NLP History
rules and heuristics, patterns
matching, formal grammars.
1950s Statistical NLP, generative probabilistic models.
Corpus-based NLP.
1980s and 90s
Discriminative models,
(e.g. logistic regression,
support vector machines)
early 2000s

neural nets, neural sequence models

RNN, LSTM
mid/late 2000s
pre-training,
embeddings
2010s
transformers
large pre-trained LMs
late 2010s, 2020s
•
GPT-2 Examples
GPT is a transformer-based language model created by OpenAI.

• GPT-2 example (Feb 2019 1.5b parameters, trained on 8m web pages)

https://openai.com/blog/better-language-models/#sample1
GPT "prompting" examples

• GPT-3 (GPT3, Jun 2020, 175b parameters, trained on 45TB of text)

• Fine-tuned model InstructGPT trained using Reinforcement Learning from

Human Feedback.
November 2022 ChatGPT

• Fine-tuned GPT 3.5 using dialog data, then optimized using Reinforcement
Learning from Human Feedback.

• Output may be indistinguishable from human output in many cases.

What You Will Learn In This
Course
• How can machines understand and generate natural
language?

• Theories about language (linguistics).

• Algorithms.

• Statistical / Machine Learning Methods, incl. neural

networks.

• Applications.
Course Overview
• Core NLP techniques.

• Language modeling, part-of-speech tagging, syntactic parsing, word-

sense disambiguation, semantic parsing, text similarity.

• Applications.

• text classi cation, machine translation, generation, image

captioning,...

• Machine Learning Techniques:

Supervised machine learning, bayesian models, sequence models (n-
gram models, HMMs), deep learning techniques (RNNs, transformers, ...)

• Critical assessment of NLP methods and data sets (ethics in NLP).

PRACTICE MATERIALS FOR TOEFL ITP
No ratings yet
PRACTICE MATERIALS FOR TOEFL ITP
2 pages
VERBS FOLLOWED BY GERUND OR INFINITIVE With Answer Key
100% (1)
VERBS FOLLOWED BY GERUND OR INFINITIVE With Answer Key
4 pages
Linguistics Olympiad Training Material Edited
No ratings yet
Linguistics Olympiad Training Material Edited
23 pages
Lecture 01
No ratings yet
Lecture 01
44 pages
NLP PPT1 (1)
No ratings yet
NLP PPT1 (1)
29 pages
Lecture_1_Introduction
No ratings yet
Lecture_1_Introduction
57 pages
1. Introduction
No ratings yet
1. Introduction
29 pages
NLP Lect Unit I
100% (1)
NLP Lect Unit I
140 pages
Natural Language Processing - Session 1 - Introduction
No ratings yet
Natural Language Processing - Session 1 - Introduction
55 pages
ML Module A7707 - Part1
No ratings yet
ML Module A7707 - Part1
48 pages
Module I NLP
No ratings yet
Module I NLP
65 pages
Natural Language Processing
No ratings yet
Natural Language Processing
4 pages
Nlpslide
No ratings yet
Nlpslide
21 pages
Natural Language Processing
No ratings yet
Natural Language Processing
87 pages
NLP ppt
No ratings yet
NLP ppt
20 pages
NLP Module 1
No ratings yet
NLP Module 1
31 pages
256_FA2024_INTRO-1
No ratings yet
256_FA2024_INTRO-1
66 pages
P-1.1.3
No ratings yet
P-1.1.3
9 pages
NLP Intro Logistics MIHE
No ratings yet
NLP Intro Logistics MIHE
21 pages
Introduction_AdvNLP
No ratings yet
Introduction_AdvNLP
12 pages
Module-1 Introduction To NLP
No ratings yet
Module-1 Introduction To NLP
28 pages
Lec-1 Introduction
No ratings yet
Lec-1 Introduction
68 pages
Unit 5 A.I
No ratings yet
Unit 5 A.I
17 pages
Introduction to NLP_first_week_lecture_1st
No ratings yet
Introduction to NLP_first_week_lecture_1st
6 pages
AI Lecture1
No ratings yet
AI Lecture1
24 pages
NLP AI Professional Presentation 2
No ratings yet
NLP AI Professional Presentation 2
18 pages
NLP Lecture 1
No ratings yet
NLP Lecture 1
3 pages
Gujarat Technological University: W.E.F. AY 2018-19
No ratings yet
Gujarat Technological University: W.E.F. AY 2018-19
2 pages
Introduction to Natural Language Processing
No ratings yet
Introduction to Natural Language Processing
31 pages
Introduction To NLP - Part 1
No ratings yet
Introduction To NLP - Part 1
23 pages
SCO409 Lecture Notes
No ratings yet
SCO409 Lecture Notes
64 pages
Syllabus NLP
100% (1)
Syllabus NLP
2 pages
module-1
No ratings yet
module-1
49 pages
1_NLP.docx
No ratings yet
1_NLP.docx
26 pages
NLP A
No ratings yet
NLP A
6 pages
Topic 2: Introduction To Natural Language Processing (NLP)
No ratings yet
Topic 2: Introduction To Natural Language Processing (NLP)
16 pages
ME02023011
No ratings yet
ME02023011
3 pages
NLP handwritten notes_copy
No ratings yet
NLP handwritten notes_copy
26 pages
Brochure CMU NLP 24-08-2022 V13
No ratings yet
Brochure CMU NLP 24-08-2022 V13
13 pages
Introduction To NLP
No ratings yet
Introduction To NLP
23 pages
MScIT-Sem4
No ratings yet
MScIT-Sem4
8 pages
Chapter - 6 Communicating, Perceiving, and Acting
No ratings yet
Chapter - 6 Communicating, Perceiving, and Acting
30 pages
Intro. To NLP
No ratings yet
Intro. To NLP
18 pages
Course Material- Artificial Intelligence-Week6_update
No ratings yet
Course Material- Artificial Intelligence-Week6_update
32 pages
Seminar Outline NLP
No ratings yet
Seminar Outline NLP
5 pages
NLP StudyMaterial
No ratings yet
NLP StudyMaterial
540 pages
ML1701 - NLP Notes Unit-1
No ratings yet
ML1701 - NLP Notes Unit-1
38 pages
NLP LectureNotes UNIT 1
No ratings yet
NLP LectureNotes UNIT 1
55 pages
NLP
No ratings yet
NLP
88 pages
Introduction nlc
No ratings yet
Introduction nlc
69 pages
NLP Chapter 1
No ratings yet
NLP Chapter 1
1 page
NLP AI Detailed Presentation
No ratings yet
NLP AI Detailed Presentation
18 pages
Natural Language Processing
No ratings yet
Natural Language Processing
5 pages
Introduction to Data Science_Week 7_LAQ's
No ratings yet
Introduction to Data Science_Week 7_LAQ's
4 pages
NLP2
No ratings yet
NLP2
3 pages
Introduction
No ratings yet
Introduction
29 pages
Introduction to NLP_first_week_lecture_2st
No ratings yet
Introduction to NLP_first_week_lecture_2st
4 pages
Swe1017 NLP Syllabus
No ratings yet
Swe1017 NLP Syllabus
2 pages
Natural_Language_Processing (NLP)
No ratings yet
Natural_Language_Processing (NLP)
32 pages
AI4youngster - 6 - Topic NLP
No ratings yet
AI4youngster - 6 - Topic NLP
66 pages
Natural Language Processing_ Bridging the Gap Between Humans and Machines
No ratings yet
Natural Language Processing_ Bridging the Gap Between Humans and Machines
6 pages
Demystifying Large Language Models: Unraveling the Mysteries of Language Transformer Models, Build from Ground up, Pre-train, Fine-tune and Deployment
From Everand
Demystifying Large Language Models: Unraveling the Mysteries of Language Transformer Models, Build from Ground up, Pre-train, Fine-tune and Deployment
James Chen
No ratings yet
Mastering Large Language Models: Advanced techniques, applications, cutting-edge methods, and top LLMs (English Edition)
From Everand
Mastering Large Language Models: Advanced techniques, applications, cutting-edge methods, and top LLMs (English Edition)
Sanket Subhash Khandare
No ratings yet
简单的基于LSTM的股市分析与预测（Python）
No ratings yet
简单的基于LSTM的股市分析与预测（Python）
17 pages
[量化策略]使用Python优化RSI策略
No ratings yet
[量化策略]使用Python优化RSI策略
7 pages
2024_GR5245 class2 notes
No ratings yet
2024_GR5245 class2 notes
10 pages
2024 GR5245 HW1_due0929_11pm
No ratings yet
2024 GR5245 HW1_due0929_11pm
2 pages
GR5010_Handout6OptionsPricing2023 (3)
No ratings yet
GR5010_Handout6OptionsPricing2023 (3)
29 pages
GR5010_Handout0_2024
No ratings yet
GR5010_Handout0_2024
26 pages
GR5010_Handout4_Futures_CTA_2024
No ratings yet
GR5010_Handout4_Futures_CTA_2024
24 pages
GR5010_Handout3_Futures2024
No ratings yet
GR5010_Handout3_Futures2024
24 pages
GR5010_Handout1_Arbitrage
No ratings yet
GR5010_Handout1_Arbitrage
24 pages
GR5010_Handout5OptionsBasics2024
100% (1)
GR5010_Handout5OptionsBasics2024
31 pages
GR5010_Handout2_ProbabilityReviewNew
No ratings yet
GR5010_Handout2_ProbabilityReviewNew
31 pages
Lecture02 Ambiguity
No ratings yet
Lecture02 Ambiguity
21 pages
Lecture06-Syntax Formal Languages
No ratings yet
Lecture06-Syntax Formal Languages
43 pages
AMST - DU 101 Dutch Language in Daily Life - SP23
No ratings yet
AMST - DU 101 Dutch Language in Daily Life - SP23
6 pages
Paradise Education Cetre Series Paper English Language Form 4 Time 3:00 Hours 3/11/2021 Instructions
No ratings yet
Paradise Education Cetre Series Paper English Language Form 4 Time 3:00 Hours 3/11/2021 Instructions
8 pages
past-simple-with-to-be-was-were-wh-questions-put-the-words-in-the-correct-order
No ratings yet
past-simple-with-to-be-was-were-wh-questions-put-the-words-in-the-correct-order
3 pages
English Writing Skills Class 11 ISC
No ratings yet
English Writing Skills Class 11 ISC
8 pages
River of Life Learning Center, Inc.: 3501 Sangi New Road, Pajo, Lapu-Lapu City Contact Nos. 032-4956403/ 09330257310
No ratings yet
River of Life Learning Center, Inc.: 3501 Sangi New Road, Pajo, Lapu-Lapu City Contact Nos. 032-4956403/ 09330257310
3 pages
Tenses Rules Chart
No ratings yet
Tenses Rules Chart
3 pages
Class 7 English [SYLLABUS]
No ratings yet
Class 7 English [SYLLABUS]
4 pages
ENGLISH
No ratings yet
ENGLISH
10 pages
Writing For The IELTS
No ratings yet
Writing For The IELTS
117 pages
01 Academic Writing
No ratings yet
01 Academic Writing
23 pages
Grammar Booster: Lesson 1
No ratings yet
Grammar Booster: Lesson 1
1 page
BG 1001 Unit 1
No ratings yet
BG 1001 Unit 1
14 pages
Subject and Verb Agreement
No ratings yet
Subject and Verb Agreement
3 pages
Destination B1.unit 1.fun and Games
No ratings yet
Destination B1.unit 1.fun and Games
7 pages
Using The Braille German Code 2007 tc20910
No ratings yet
Using The Braille German Code 2007 tc20910
21 pages
Individual Assignment 3: A. Underline The Correct Verb-Be (Was/were) To Complete The Sentence
No ratings yet
Individual Assignment 3: A. Underline The Correct Verb-Be (Was/were) To Complete The Sentence
3 pages
Jiang 2018 Kristina Boreus and Goran Bergstrom Eds Analyzing Text and Discourse Eight Approaches For The Social Sciences
No ratings yet
Jiang 2018 Kristina Boreus and Goran Bergstrom Eds Analyzing Text and Discourse Eight Approaches For The Social Sciences
3 pages
Happy World Book Day
No ratings yet
Happy World Book Day
8 pages
Apt English 10 - I
No ratings yet
Apt English 10 - I
2 pages
Precis Writing
No ratings yet
Precis Writing
2 pages
Table of Contents Worksheets
No ratings yet
Table of Contents Worksheets
4 pages
27 - BANK Coding - Decoding ADV
No ratings yet
27 - BANK Coding - Decoding ADV
20 pages
Foreign Words &#038 Phrases MCQs For All Competitive Exams
No ratings yet
Foreign Words &#038 Phrases MCQs For All Competitive Exams
12 pages
The 7 Basic Functions of Text Analytics
No ratings yet
The 7 Basic Functions of Text Analytics
11 pages
RPT Bahasa Inggeris CEFR Tahun 2
No ratings yet
RPT Bahasa Inggeris CEFR Tahun 2
11 pages
Exercise 1 - Identify The Transitive and Intransitive Verb
No ratings yet
Exercise 1 - Identify The Transitive and Intransitive Verb
7 pages
10 2307@4178153 PDF
No ratings yet
10 2307@4178153 PDF
37 pages