0% found this document useful (0 votes)
20 views

Introduction to Natural Language Processing

Uploaded by

abbastayyaba417
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
20 views

Introduction to Natural Language Processing

Uploaded by

abbastayyaba417
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 31

Introduction to Natural

Language Processing
INSTRUCTOR: DR . GULSHAN SALEEM
COURSE CODE: CS AL4253
Research Profile

1. Google Scholar Profile


2. ResearchGate Profile
3. MPVIR Group (https://sites.google.com/view/mpvir)
4. Projects:
o Leaf classification
o Disease classification
o Anomaly detection
o Object Detection and Tracking
o Security Surveillance

2
Learning Objectives
•By the end of this lecture, students will:
•Understand what Natural Language Processing (NLP) is.
•Learn about the history of NLP.
•Discover modern-day applications of NLP (chatbots, translation,
etc.).
What is NLP
Natural Language Processing (NLP) is a field at the intersection of
computer science, artificial intelligence, and linguistics.
It enables computers to understand, interpret, and generate human
language.
Goal: To bridge the gap between human communication and
computer understanding.
Why is NLP Important?
•Automates the processing and understanding of large volumes of
natural language data.
•Improves human-computer interaction through voice assistants,
search engines, and chatbots.
Used in diverse industries: healthcare, finance, customer service, etc.
Brief History of NLP
1950s: Alan Turing’s "Turing Test" for machine intelligence.
1960s: Early work on machine translation (MT) and rule-based
systems.
1980s: Statistical models began to emerge (n-grams, hidden Markov
models).
1990s: Rise of machine learning in NLP.
2010s: Deep learning and neural networks revolutionized NLP (e.g.,
Word2Vec, BERT, GPT).
Communication with Machines
Conversational Agents
Core Components of NLP
•Text Preprocessing: Tokenization, lemmatization, stemming.
•Syntax and Parsing: POS tagging, dependency parsing.
•Semantics: Word meaning, embeddings.
•Applications: Text classification, sentiment analysis, machine
translation.
NLP vs. AI vs. ML

•Artificial Intelligence (AI): General field focused on making machines


"intelligent."
•Machine Learning (ML): Subset of AI focused on learning from data.
•NLP: Subfield of AI focused on language understanding and
generation.
Modern-Day Applications of
NLP
•Chatbots and Virtual Assistants (e.g., Siri, Alexa): Enable human-
computer conversations.
•Machine Translation (e.g., Google Translate): Converts text from one
language to another.
•Sentiment Analysis: Analyzes emotions in social media, reviews, etc.
•Text Summarization: Summarizes large documents automatically.
•Speech Recognition (e.g., Speech-to-Text): Converts spoken words
into text.
Chatbots Example
•Definition: Chatbots simulate human conversation using text or
voice.
•Applications: Customer service, healthcare, educational tools.
•Example: Conversational AI like ChatGPT or Alexa.
Machine Translation Example
•Definition: Automatically translates text or speech from one
language to another.
•Applications: Breaking language barriers, real-time translation
services.
•Example: Google Translate and DeepL.
Machine Translation
Key Challenges in NLP
•Ambiguity: Words or sentences can have multiple meanings.
•Context: Understanding the context is difficult for machines (e.g.,
sarcasm, idioms).
•Data and Ethics: Bias in language models, lack of labeled data, and
privacy concerns.
Why NLP is Hard?
NLP Datasets
•IMDb Reviews: A dataset for sentiment analysis containing movie reviews labeled as
positive or negative.
•20 Newsgroups: A collection of approximately 20,000 newsgroup documents, useful for
text classification and topic modeling.
•SMS Spam Collection: A set of SMS messages labeled as spam or not spam, ideal for
binary classification tasks.
•Sentiment140: A dataset of 1.6 million tweets labeled for sentiment (positive, negative,
neutral), perfect for sentiment analysis.
•Common Crawl: A massive web archive that can be used for various NLP tasks like
language modeling or text generation.
NLP Datasets
•Wikipedia Dump: A raw dump of Wikipedia articles, great for unsupervised learning
tasks or building language models.
•Quora Question Pairs: A dataset of questions from Quora, labeled as duplicate or not,
useful for semantic similarity and paraphrase detection.
•Amazon Reviews: A collection of product reviews across various categories, useful for
sentiment analysis and recommendation systems.
•Enron Email Dataset: A dataset of emails from the Enron corporation, useful for tasks
like text classification and named entity recognition.
•TREC Question Classification: A dataset of questions categorized into various
classes, great for question classification tasks.
Sentence Segmentation
Example
Word Tokenization Example
Part of Speech
Lemmatization
Named Entity Recognition
Named Entity Recognition
People’s names.
Company names.
Geographical locations
Product names.
Date and time.
Amount of money.
Events
Coreference Resolution
San Pedro is a town on the southern part of the island of Ambergris Caye in
the Belize District of the nation of Belize, in Central America. According to
2015 mid-year estimates, the town has a population of about 16, 444. It is
the second-largest town in the Belize District and largest in the Belize Rural
South constituency.
Here, we know that ‘it’ in the sentence 6 stands for San Pedro, but for a
computer, it isn’t possible to understand that both the tokens are same
because it treats both the sentences as two different things while it’s
processing them. Pronouns are used with a high frequency in English
literature and it becomes difficult for a computer to understand that both
things are same.
Conclusion
•NLP is at the heart of modern applications like chatbots, machine
translation, and text analysis.
•The field has evolved from rule-based systems to machine learning
and deep learning approaches.
Text Book
Daniel Jurafsky and James H. Martin. 2008. Speech and Language Processing: An Introduction
to Natural Language Processing, Computational Linguistics and Speech Recognition. Prentice
Hall 2nd/3rd Edition
http://www.cs.colorado.edu/~martin/slp.html
Great Overview of the Field, explanations of
techniques, algorithms, etc.
Christopher D. Manning and Hinrich Schütze. 1999. Foundations of Statistical Natural
Language Processing. MIT Press
Natural Language Processing with Python
By Steven Bird, Ewan Klein, and Edward Loper
http://www.nltk.org/book
Downloadable open source programs to try out various
Useful Readings
Look at projects at Stanford:
http://web.stanford.edu/class/cs224n/

Other useful links


http://aclweb.org/
http://www.cs.vassar.edu/sigann/
Programming Language
Why Python is better-suited:
easy to learn, clean syntax, powerful features
becoming increasingly popular in CompLinguistics!
Extensive tutorials, CompLing support, toolkits, data, etc.
References
Chowdhary, K., & Chowdhary, K. R. (2020). Natural language
processing. Fundamentals of artificial intelligence, 603-649.
Daniel Jurafsky and James H. Martin. 2018. Speech and Language
Processing: An Introduction to Natural Language Processing. Third
Edition. Prentice Hall
Thank You 

You might also like