Lecture 1 Course Overview
Lecture 1 Course Overview
Machine Translation
"India recorded their first Test victory, in their 24th भारत ने 1952 में मद्रास में इं ग्लैंड के िखलाफ अपने 24वें मैच में
match, against England at Madras in 1952. Later in the अपनी पहली टेस्ट जीत दजर् की। बाद में उसी वषर्, उन्होंने अपनी
same year, they won their first Test series, which was
against Pakistan." पहली टेस्ट श्रृंखला जीती, जो पािकस्तान के िखलाफ थी।
Question answering
"When did India win
their first test match?" 1952
You use NLP everyday … (maybe without even noticing)
3
You use NLP everyday … (maybe without even noticing)
3
You use NLP everyday … (maybe without even noticing)
3
You use NLP everyday … (maybe without even noticing)
3
You use NLP everyday … (maybe without even noticing)
3
You use NLP everyday … (maybe without even noticing)
3
You use NLP everyday … (maybe without even noticing)
3
You probably have used ChatGPT
• Use cases abound:
• Summarizing (or simplifying) content,
• Writing content (emails, documents, etc.)
• Creative content (e.g., advertisements, titles, names, etc.)
• Question answering
• Problem solving (to some degree)
• … and many more
4
Understanding language is critical
• Language is a means for people to communicate …
• Majority of the available data is in textual format
5
Why is it hard to handle human languages
6
Why is it hard to handle human languages
• The same word can have different meaning in different contexts (and cultures!)
6
Why is it hard to handle human languages
• The same word can have different meaning in different contexts (and cultures!)
6
Why is it hard to handle human languages
• The same word can have different meaning in different contexts (and cultures!)
6
Why is it hard to handle human languages
7
Why is it hard to handle human languages
• Word frequencies follow a power law (zipf's law)
7
Why is it hard to handle human languages
• Word frequencies follow a power law (zipf's law)
• Ambiguity:
• Semantic: "The trophy did not fit the suitcase because it was too small
• Syntactic: "A computer that understands you like your mother"
7
Why is it hard to handle human languages
• Word frequencies follow a power law (zipf's law)
• Ambiguity:
• Semantic: "The trophy did not fit the suitcase because it was too small
• Syntactic: "A computer that understands you like your mother"
7
Why computationally study languages
• Our (spoken/written) language is a window into our life
8
Why computationally study languages
• Our (spoken/written) language is a window into our life
8
Course content
• Tasks: classification, sequence to sequence, tagging, language modeling
• Architectures: RNNs, LSTMs, GRUs, Transformers
• Models: n-gram models, encoder, decoder (e.g., GPTs), encoder-decoder models
• Algorithms for learning: largely gradient descent, MLE of probabilistic models
• Algorithms for decoding: greedy, top-k and top-p sampling, Viterbi decoding
9
Course content: what this course is not?
• Given the nature of the subject (and what currently works in practice):
• There will be no mathematical derivations, proofs, bounds or guarantees
10
Course logistics: pre-requisites
• No formal pre-requisites
11
Course logistics: important links
12
Course logistics: important links
• Course website: https://danishpruthi.com/teaching/ds-207-jan-2025/
12
Course logistics: important links
• Course website: https://danishpruthi.com/teaching/ds-207-jan-2025/
12
Course logistics: important links
• Course website: https://danishpruthi.com/teaching/ds-207-jan-2025/
12
Course logistics: important links
• Course website: https://danishpruthi.com/teaching/ds-207-jan-2025/
12
Course evaluation: assignments
13
Course evaluation: assignments
• Four assignments to be solved individually
• Text classification & representation learning; language modeling; translation; TBD
13
Course evaluation: assignments
• Four assignments to be solved individually
• Text classification & representation learning; language modeling; translation; TBD
13
Course evaluation: assignments
• Four assignments to be solved individually
• Text classification & representation learning; language modeling; translation; TBD
13
Course evaluation: assignments
• Four assignments to be solved individually
• Text classification & representation learning; language modeling; translation; TBD
13
Course evaluation: assignments
• Four assignments to be solved individually
• Text classification & representation learning; language modeling; translation; TBD
13
Course evaluation: assignments
• Four assignments to be solved individually
• Text classification & representation learning; language modeling; translation; TBD
13
About use of AI models for assignments (e.g, GPT 4o)
• Each assignment would clearly say what's allowed or not …
• You will be asked to declare and specify the use of AI models
• Might conduct a in-class quiz to be sure whether students did their HWs
• If you lift content from language models, your code might be similar to others
• Plagiarism cases: there were several last time! (Let's keep it clean this time)
14
Course evaluation: quizzes
• Two exams
• Mid term (15%)
• Final (25%)
15
Course logistics: TAs
• Tarun Gupta
• Yash Patel
• Shivashish Naithani
• Karan Raj (primarily available in March/April)
16
Questions?
Thank you
17