Lecture 1 Course Overview

The document outlines a course on Natural Language Processing (NLP) led by Danish Pruthi, covering topics such as text classification, machine translation, and question answering. It emphasizes the challenges of understanding human language and the importance of language comprehension in building intelligent systems. The course includes practical assignments, evaluations, and a focus on computational models, with no formal prerequisites but a recommendation for familiarity with Python and basic probability.

Uploaded by

Himanshu Ranjan

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

1 views41 pages

Lecture 1 Course Overview

Uploaded by

Himanshu Ranjan

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 41

DS 207: Introduction to

Natural Language Processing

Danish Pruthi
What is Natural Language Processing
The science and engineering of building computational models to comprehend language
What is Natural Language Processing
The science and engineering of building computational models to comprehend language
Text Classification
"Lots of epic shows feel a little underpopulated
towards the end but there's really no excuse for Negative
something as mythic, huge and mesmerizing to end as
disappointingly as this."

Machine Translation
"India recorded their first Test victory, in their 24th भारत ने 1952 में मद्रास में इं ग्लैंड के िखलाफ अपने 24वें मैच में
match, against England at Madras in 1952. Later in the अपनी पहली टेस्ट जीत दजर् की। बाद में उसी वषर्, उन्होंने अपनी
same year, they won their first Test series, which was
against Pakistan." पहली टेस्ट श्रृंखला जीती, जो पािकस्तान के िखलाफ थी।

Question answering
"When did India win
their first test match?" 1952
You use NLP everyday … (maybe without even noticing)

3
You use NLP everyday … (maybe without even noticing)

3
You probably have used ChatGPT
• Use cases abound:
• Summarizing (or simplifying) content,
• Writing content (emails, documents, etc.)
• Creative content (e.g., advertisements, titles, names, etc.)
• Question answering
• Problem solving (to some degree)
• … and many more

4
Understanding language is critical
• Language is a means for people to communicate …
• Majority of the available data is in textual format

• Language understanding is a core requirement of intelligence

• Same for for building "intelligent machines"

5
Why is it hard to handle human languages

6
Why is it hard to handle human languages
• The same word can have different meaning in different contexts (and cultures!)

• Understanding language often requires some common sense

• Olive oil is made up of olives,
• palm oil is made up of palm fruit,
• peanut oil is made of peanuts
• This does not mean that baby oil is made up of babies

6
Why is it hard to handle human languages
• The same word can have different meaning in different contexts (and cultures!)

• Understanding language often requires some common sense

• Olive oil is made up of olives,
• palm oil is made up of palm fruit,
• peanut oil is made of peanuts
• This does not mean that baby oil is made up of babies

• Languages are highly compositional—you can create infinite novel sentences

6
Why is it hard to handle human languages

7
Why is it hard to handle human languages
• Word frequencies follow a power law (zipf's law)

• Ambiguity:
• Semantic: "The trophy did not fit the suitcase because it was too small
• Syntactic: "A computer that understands you like your mother"

7
Why is it hard to handle human languages
• Word frequencies follow a power law (zipf's law)

• Ambiguity:
• Semantic: "The trophy did not fit the suitcase because it was too small
• Syntactic: "A computer that understands you like your mother"

• Meanings of words change, new words are introduced, old discontinued

• master & mistress, buddy & sissy, bachelor & spinster, doctor & doctress

7
Why computationally study languages
• Our (spoken/written) language is a window into our life

8
Why computationally study languages
• Our (spoken/written) language is a window into our life

• What our (function) words say about us?

• State of mind, i.e., well-being
• Economy
• Propensity to lead
• Many other aspects

8
Course content
• Tasks: classification, sequence to sequence, tagging, language modeling
• Architectures: RNNs, LSTMs, GRUs, Transformers
• Models: n-gram models, encoder, decoder (e.g., GPTs), encoder-decoder models
• Algorithms for learning: largely gradient descent, MLE of probabilistic models
• Algorithms for decoding: greedy, top-k and top-p sampling, Viterbi decoding

9
Course content: what this course is not?
• Given the nature of the subject (and what currently works in practice):
• There will be no mathematical derivations, proofs, bounds or guarantees

• Our understanding of current NLP systems is quite limited (theoretically),

• But we know a fair bit about what works in practice (empirically)

10
Course logistics: pre-requisites
• No formal pre-requisites

• We expect you to be familiar with

• Basic probability (e.g., joint distribution, bayes rule, expectations)
• Linear algebra (matrix manipulation)
• Python programming: assignments require you to write a fair bit of code!
• Good to have: familiarity with PyTorch and deep learning background
• I'll try to introduce topics you might not be aware of …
• But it takes time for concepts to settle in

11
Course logistics: important links

12
Course logistics: important links
• Course website: https://danishpruthi.com/teaching/ds-207-jan-2025/

• Anonymous (continuous) feedback: http://tinyurl.com/feedback-for-danish

• Open to criticism, but please be civil and polite
• Treat others how you want to be treated

12
Course logistics: important links
• Course website: https://danishpruthi.com/teaching/ds-207-jan-2025/