NLP Week 01
NLP Week 01
to NLP
Faizad Ullah
1
About Me
❑ Faizad Ullah
❑ Ph.D. Student at LUMS
❑ Specialization
▪ Natural Language Processing (NLP)
▪ Machine Learning
▪ Data Science
❑ Contributions
▪ Text Analytics of Low-Resourced Languages
▪ Medical Image Analysis
▪ Graph Analysis
2
Grading
Quizzes 20%
Assignments 10%
Midterm 25%
Project 10%
3
Programming Tasks
❑ *3-5 Assignments
▪ Programming Assignments
❑ *One Project
❑ Programming Environment
▪ Python (Pytoch, TensorFlow, Colab)
❑ Sharing
▪ Copying is not allowed for assignments. Discussions are encouraged; however, you must submit your
own work.
▪ Violators would be reported to the Disciplinary Committee or face marks reduction penalties
❑ Plagiarism
▪ Do NOT pass someone else’s work as your own!
▪ Write in your own words and cite the reference if you use someone else’s material.
5
Policies (2)
❑ Submission Policy
▪ Submissions are due at the day and time specified
▪ Late submissions will result in 10% marks deduction per day from obtained marks.
❑ Attendance Policy
▪ You are advised to attend all lectures.
▪ It’s the students’ responsibility to recover any information or announcements posted during a
lecture from which they were absent.
❑ Classroom behavior
▪ Maintain classroom sanctity by remaining attentive
▪ Asking questions is encouraged.
▪ You are not allowed to use a Laptop/mobile phone, etc., during class.
6
Policies (3)
❑ Retakes
▪ No retakes for quizzes, assignments, exams, or projects
▪ In case of any medical emergency or unavoidable circumstances, inform before hand and seek a formal
approval. You need to share medical reports for departmental record.
▪ Do not wait for the final exam to seek approval for retakes
7
Contact
❑How to contact me?
▪ E-mail: [email protected]
▪ Office: 426-G
▪ Office Hours: Mentioned on office door
8
Most Important
10
Key Areas of NLP
• Text Processing & Understanding
• Tokenization (splitting text into words or sentences), Part-of-Speech Tagging (identifying nouns, verbs, etc.)
• Named Entity Recognition (extracting names, locations, organizations)
• Machine Translation
• Google Translate, DeepL, and other language translation models
• Speech Recognition
• Voice assistants like Siri, Alexa, and Google Assistant
• Sentiment Analysis
• Detecting emotions in text (positive, negative, neutral)
• Chatbots & Conversational AI
• AI-powered assistants (e.g., ChatGPT, customer support bots)
• Text Generation
• Automated writing tools, AI-generated content
• Information Retrieval & Search
• Search engines like Google understanding user queries
• Summarization
• Extracting key points from long texts (news, reports, articles)
Natural Language Processing
• Other names:
• Computational Linguistics (CL)
WILLIAM WILKINSON’S
“AN ACCOUNT OF THE PRINCIPALITIES OF
WALLACHIA AND MOLDOVIA” Bram Stoker
INSPIRED THIS AUTHOR’S
MOST FAMOUS NOVEL
Event: FYP Part-A meeting
Information Extraction Date: Feb-10-2025
Start: 10:00am
Subject: FYP Part-A Meeting End: 11:30am
Date: February 10, 2025 Where: S-125
To: Faizad Ullah
Hi Sir, we would like to meet with you to discuss our FYP Part-A
presentations. We’ve scheduled a meeting for tomorrow at S-125 from
10:00 AM to 11:30 AM. Looking forward to your guidance!
Best regards,
/|\
/\
Language Technology
making good progress
Sentiment analysis still really hard
Best roast chicken in San Francisco!
mostly solved Question answering (QA)
The waiter ignored us for 20 minutes.
Q. How effective is ibuprofen in reducing
Spam detection Coreference resolution fever in patients with acute febrile illness?
Let’s go to Agra! ✓
✗
Carter told Mubarak he shouldn’t run again. Paraphrase
You won $100,000 …
Word sense disambiguation (WSD) XYZ acquired ABC yesterday
I need new batteries for my mouse. ABC has been taken over by XYZ
Part-of-speech (POS) tagging
ADJ ADJ NOUN VERB ADV Summarization
Colorless green ideas sleep furiously. Parsing The Dow Jones is up Economy is
I can see Alcatraz from the window! The S&P500 jumped good
Housing prices rose
Named entity recognition (NER) Machine translation (MT)
PERSON ORG LOC 第13届上海国际电影节开幕… Dialog Where is Citizen Kane playing in SF?
Einstein met with UN officials in Princeton
The 13th Shanghai International Film Festival…
Castro Theatre at 7:30. Do
Information extraction (IE) you want a ticket?
Party
You’re invited to our dinner May 27
party, Friday May 27 at 8:30 add
34
Linguistics
• Linguistics is the study of languages with respect to its form or
structure, meaning, and context.
• Linguistics also deals with the social, cultural, historical, and political
factors that influence languages, including their origins and evolution.
• The words "running", "runs", and "ran" share the root word "run".
• Stemming reduces words to their base form:
• "running" → "run"
• "happily" → "happi"
• Lemmatization does a more sophisticated reduction based on meaning:
• "ran" → "run"
• "better" → "good"
Syntax
• Examines the structure of sentences and grammar rules (e.g., parsing
sentences for grammatical correctness).
• “He” refers to “Ali”, but an NLP model must infer that based on
discourse context.
Real-World Example: Google Search
• When you search: “Why is she eating an apple quickly?”, NLP techniques
help improve search results by applying linguistic concepts:
• Morphology – Google recognizes that "eating", "eat", and "eats" are related.
• Syntax –"she" is the subject, "eating" is the action, and "an apple" is the object.
• Semantics – It understands the intent: You are likely looking for reasons why someone
eats fast (e.g., hunger, habits).
• Pragmatics – If you meant "Why do people eat apples quickly?", Google may show
articles on health benefits of apples.
• Discourse Analysis – If you searched "Why is she eating an apple?" after searching
"Hunger and eating speed," Google considers previous searches to refine results.
Sub-fields of Linguistics
• Historical linguistics
• Cultural linguistics
• Political linguistics
• Social linguistics
• Psycho-linguistics
• Bio-linguistics
• Neuro-linguistics
• Computational linguistics
Grammar
• Rules guiding the composition of clauses, phrases, and words in a
language
• Clause: part of a sentence that contain subject and verb.
• Phrase: group of words (that plays a specific role) in a sentence but does not
typically represent a complete sentence.
49
Text
• Text is a sequence of characters arranged in a particular order.
• apples
• Apple
• Apples
Disjunctions
• Letters inside square brackets []
Pattern Matches
[aA]pple apple, Apple
[1234567890] Any digit
• Ranges [A-Z]
Pattern Matches
[A-Z] An upper case letter Drenched Blossoms
[a-z] A lower case letter my beans were impatient
[0-9] A single digit Chapter 1: Down the Rabbit Hole
Negation in Disjunction
• Negations [^Ss]
• Caret means negation only when first in []
Pattern Matches
[^A-Z] Not an upper case letter How are you?
[^Ss] Neither ‘S’ nor ‘s’ I have no exquisite reason
[^e^] Neither e nor ^ Look here
\^ Looking for a caret ^ Look up a^b now
The Pipe “|” Symbol: More Disjunction
• Woodchucks is another name for groundhog!
• The pipe | for disjunction
Pattern Matches
groundhog|woodchuck
yours|mine yours
mine
a|b|c = [abc]
[gG]roundhog|[Ww]oodchuck
Regular Expressions: ? * + .
Kleene *, Kleene +
Pattern Matches
colou?r Optional color colour
previous char
oo*h! 0 or more of oh! ooh! oooh! ooooh!
previous char
o+h! 1 or more of oh! ooh! oooh! ooooh!
previous char
baa+ baa baaa baaaa baaaaa
beg.n begin begun beg3n
Anchors ^ $
Pattern Matches
^[^A-Za-z] 1 “Hello”