0% found this document useful (0 votes)

12 views

Module 05 - Learners Guide

Natural Language Processing (NLP) combines computational linguistics with machine learning to enable computers to understand and generate human language. The document outlines the history, foundational concepts, techniques, and real-world applications of NLP, including speech recognition, language translation, and sentiment analysis. Key techniques discussed include tokenization, lemmatization, and named entity recognition, which are essential for various NLP tasks.

Uploaded by

blackythekarpie

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

12 views

Module 05 - Learners Guide

Uploaded by

blackythekarpie

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 31

NATURAL

LANGUAGE
PROCESSING
Let’s start!
WHAT IS NLP?
Natural language processing, or NLP, combines computational
linguistics—rule-based modeling of human language—with statistical and
machine learning models to enable computers and digital devices to
recognize, understand and generate text and speech.

Let’s begin!
HISTORY OF NLP
• 1950s: Alan Turing introduces the Turing Test, questioning if machines can think, and the first automated
translation experiment translates Russian to English.
• 1960s: Development of ELIZA, a chatbot simulating a psychotherapist, showcasing early NLP capabilities.
• 1970s: Introduction of SHRDLU, an NLP program for understanding natural language in a blocks world
context.
• 1980s: Transition from rule-based to statistical methods in NLP, acknowledging the complexity of human
language.
• 1990s: The rise of the internet provides vast text data, boosting statistical NLP methods and machine
learning models.
• 2000s: Machine learning becomes dominant in NLP, with algorithms learning from large corpora of text for
language processing.
• 2010s: Deep learning and neural networks revolutionize NLP, significantly improving language
understanding and generation.
• Late 2010s: Introduction of Transformer-based models like BERT and GPT, setting new standards for NLP
tasks.
• Present: NLP technologies achieve near-human
L e t ’ s levels
b e g iofn language
! understanding and generation across
various applications.
• Ongoing: Continuous advancements in deep learning models drive NLP forward, expanding its capabilities
and applications in everyday technology.
FOUNDATIONS OF
NLP
Let’s begin!
TYPES OF DATA
1. Qualitative Data: Categories without numerical values.
2.Quantitative Data: Numerical values, including countable items (discrete) or
measurements (continuous).
3.Structured Data: Organized data in databases, easily searchable, like customer
databases.
4.Unstructured Data: Not organized in a predefined way, including texts, images, and
videos.
5.Semi-structured Data: Not in databases but has some organization, like JSON or XML
files.
6.Time Series Data: Data points collected over time intervals.
Let’s begin!
TOKENIZATION LEMMATIZATION
Tokenization breaks text into Lemmatization also reduces
individual elements or tokens, words to their base or
making text analysis dictionary form but considers
manageable and efficient. the context to ensure the root
word's correct meaning.

STEMMING STOP WORDS

Stemming reduces words to Words like "oh" , "the" etc that
their base or root form, often do not contributes positively to
used to improve search the model.
processes by treating related
words similarly.
CHOOSE THE RIGHT PROCESS
Question 1: What process is being applied when transforming the words "running", "runner", and "ran" to "run"?

A) Tokenization
B) Lemmatization
C) Stemming

Question 2: If "better" becomes "good" and "best" becomes "good", which NLP process is being used?

A) Tokenization
B) Lemmatization
C) Stemming

Question 3: Splitting "Hello, world!" into ["Hello", ",", "world", "!"] is an example of:

A) Tokenization
B) Lemmatization
C) Stemming
CHOOSE THE RIGHT PROCESS
Question 4: Converting the sentence "The boys are playing" into tokens like "The", "boys", "are", "playing" is known as:

A) Tokenization
B) Lemmatization
C) Stemming

5: Which process might incorrectly reduce "university" and "universal" to a common root such as " univers"?

A) Tokenization
B) Lemmatization
C) Stemming

Question 6.Which of the following is an example of qualitative data?

A) The temperature of a room
B) The color of a car
C) The height of a building
CHOOSE THE RIGHT PROCESS
Question7 . Identify an example of continuous quantitative data.

A) The number of students in a class

B) The weight of a newborn baby
C) The number of books on a shelf

Question 8. Which of the following is an example of unstructured data?

A) A spreadsheet containing sales figures
B) A database of customer addresses
C) A collection of tweets about a new movie
IMPORTANT NLP TECHNIQUES
BAG OF WORDS (BOW)
Why It's Important?
The Bag of Words (BoW) model simplifies
text by treating it as a collection of
individual words regardless of order,
enabling basic but effective analysis and
machine learning on textual data.

Real-World Scenario
Email spam detection. BoW can be used to
identify spam by analyzing the frequency of
words typically found in spam emails versus
legitimate ones, allowing email systems to
filter out unwanted messages efficiently.
N-GRAMS
Why It's Important?
N-grams capture the sequence of 'N' items
(words or letters) from text, preserving
some order and context, which is lost in the
BoW model, improving the model's
understanding of language structure.

Real-World Scenario
Auto-complete features in search engines or
texting apps use n-grams to predict the next
word or phrase you're likely to type based on
the probability of word sequences,
enhancing user experience by speeding up
typing and reducing effort.
embedding
Why It's Important?
Word embeddings map words into dense
vectors of real numbers in a way that captures
semantic meaning, relationships, and context,
greatly enhancing the performance of NLP
models on complex tasks.

Real-World Scenario:
Recommendation systems in streaming
services. By using embeddings to understand
the content and context of user reviews and
interactions, these platforms can recommend
movies, shows, or songs that are more aligned
with individual user preferences, improving
personalization and satisfaction.
Term Frequency-Inverse Document
Frequency (TF-IDF)
Why it's important?
TF-IDF is a statistical measure used to evaluate how
important a word is to a document in a collection or
corpus. It increases proportionally to the number of
times a word appears in the document but is offset
by the frequency of the word across the corpus.

Real-world Implication:
When you search for "data scientist remote jobs" on
a job site, TF-IDF helps find the best matches. It
checks how often "data scientist" appears in a job
post and how unique the term is across all posts.
Posts with higher scores appear first, making your job
hunt quicker and more relevant.
ONE HOT ENCODING
Why It's Important?
Converts categorical data into binary vectors,
enabling computational operations on text.
Essential for NLP tasks in machine learning
models.

Real-World Example:
In customer service chatbots, one-hot encoding
transforms queries into a format the algorithm
understands, aiding in accurately responding to
customer needs like processing returns or
exchanges, thereby enhancing efficiency and
satisfaction.
named entity recognition
Why it's important?
NER is vital for extracting information from texts,
such as names of people, organizations, locations,
and more. This technique is fundamental in
information retrieval, content categorization, and
question-answering systems, making it easier to
organize and search large datasets.

Real-world Implication:
When customers send messages to their banks via
chat or email, NER can identify personal names,
account numbers, transaction IDs, and other crucial
entities in the text. This enables the automated
system to quickly pull up relevant account
information or transaction history, streamlining
customer service and reducing response times.
A document classification system analyzes job descriptions to categorize them into IT, Marketing, and Finance.
Which technique is likely used to convert the job descriptions into a format suitable for machine learning models?
A) N-grams
B) BoW
C) One-hot encoding

Correct Answer: B

Question 2. An online library wants to improve its search feature so that rare books on specific topics are more easily
found. Which technique could help prioritize these rare books when relevant search terms are used?
A) BoW
B) TF-IDF
C) Encoding

Correct Answer: B

Question 3 . A smartphone keyboard app predicts the next word as a user types to speed up input. What technique
underlies this predictive text feature?
- A) BoW
- B) N-grams
- C) One-hot encoding
*Correct Answer: B**
Question 4 . An email filtering system needs to categorize incoming messages into "Urgent", "Important", "Regular", and "Spam". Before applying
machine learning, each email's text is transformed into a binary vector. This initial step is called:
- A) NER
- B) TF-IDF
- C) One-hot encoding

Correct Answer: C

Question 5 . A news aggregator automatically tags articles with names of mentioned countries, companies, and persons for easier browsing.
Which NLP technique is being employed for tagging?
- A) NER
- B) BoW
- C) N-grams

Correct Answer: A

Question 6 . To recommend articles based on their content similarity, a website analyzes how unique keywords are across articles compared to
how often they appear in each article. This analysis likely uses:
- A) Encoding
- B) TF-IDF
- C) BoW

Correct Answer: B

## Question 7
In developing a chatbot, it's crucial to understand the user's intent based on their input. Before processing, each word the user types is represented by a vector where only one element is "1", and
the rest are "0". This technique is:
- A) BoW
- B) One-hot encoding
- C) N-grams

Correct Answer: B

### Question 8
A data analyst wants to examine Twitter feeds to see how often certain policy topics are mentioned over time. Before analysis, tweets are broken down into 2-word phrases to capture context
better than single words. This technique is called:
- A) N-grams
- B) TF-IDF
- C) Encoding

Correct Answer: A

### Question 9
A customer feedback tool highlights keywords that frequently appear in negative reviews to help a business understand common complaints. This feature is most likely powered by:
- A) BoW
- B) NER
- C) TF-IDF

Correct Answer: C

### Question 10
A language learning app highlights named entities in sentences to teach users proper nouns in context. This functionality relies on:
- A) NER
- B) Encoding
- C) N-grams

Correct Answer: Aph text

NLP in Speech Recognition
Speech Recognition is the technology that enables electronic devices to
recognize and process human speech.

NLP approaches contribute significantly to voice recognition

systems' comprehension of context, semantics, and syntax. NLP
techniques, can help speech recognition systems to better perceive
spoken language and provide more accurate translations and
transcriptions.

For Eg. : If you instruct your phone to "set an alarm for 8 in the morning",
speech recognition converts it into text. Natural Language Understanding
L e ttriggers
processes the text, extracts the meaning, and ’ s b e gan
i naction
! to set the
alarm at 8 am. A response is driven by natural language generation that
the alarm is set.
NATURAL LANGUAGE GENERATION
Natural language generation (NLG) is the process of generating text that appears to be written by a
human, without the need for a human.
NLG works usually by pulling large bodies of text, take sentences which represent the main points ,
identify key concepts which is then rephrased and summarized in a grammatically accurate manner.

NLG technology can be used for a variety of purposes,

-creating automated customer service responses,
-filling out online forms,
-creating intelligent chatbots
-generating reports
Which of the following is an example of natural language generation?
a) Converting speech to text
b) Translating a document from English to French
c) Writing a news article
d) Analyzing social media posts

Answer: c) Writing a news article.

What is the difference between syntax and semantics in natural language processing?
a) Syntax refers to the meaning of language, while semantics refers to the structure
b) Syntax refers to the structure of language, while semantics refers to the meaning
c) Syntax and semantics are the same thing
d) Syntax and semantics are not relevant to natural language processing
Answer: b) Syntax refers to the structure of language, while semantics refers to the meaning.

Which of the following is an example of a natural language processing task?

a) Creating a website layout
b) Designing a logo
c) Identifying named entities in a text
d) Generating a musical composition
Answer: c) Identifying named entities in a text.
Which of the following is an example of a text generation task in natural language processing?
a) Machine translation
b) Named entity recognition
c) Text summarization
d) Chatbot response generation
Answer: d) Chatbot response generation.

Which of the following is an example of a sequence labeling task in natural language processing?
a) Sentiment analysis
b) Named entity recognition
c) Text classification
d) Language modeling
Answer: b) Named entity recognition.
REAL WORLD APPLICATIONS OF NLP
1 Search Engine Results
Search engine functionality is an example of natural language processing in action.Search engines utilize NLP to propose appropriate results
based on previous search history and user intent.Google, for example, anticipates what you'll enter next based on popular queries, while also
taking into account the context and detecting the meaning behind what you want to say.

2 Smart Search and Predictive Text, Spellcheck

Autocorrect, autocomplete, and predictive text are so integrated into social media platforms and apps that we frequently forget they exist.
Autocomplete and predictive text predict what you could say based on what you've typed, finish your words, and even propose more relevant
ones, much like search engine results.
Autocorrect can even replace words based on typos to ensure that the overall meaning of the sentence is clear. These capabilities can learn
and adapt based on your activity.
For example, predictive text will learn and adapt to your personal lingo over time.
REAL WORLD APPLICATIONS OF NLP
3 Language Translation
One of the most common NLP examples is translation. In the 1950s, Georgetown and IBM presented the first
NLP-based translation machine, which had the ability to translate 60 Russian sentences to English automatically.
Translation applications available today use NLP and Machine Learning to accurately translate both text and voice
formats for most global languages.

4 Email Filters
Email filters are common NLP examples you can find online across most servers.
For e.g Spam filters they uncovered patterns of words or phrases that were linked to spam messages.
REAL WORLD APPLICATIONS OF NLP
5 Smart Assistants
Smart assistants such as Google's Alexa use voice recognition to understand everyday phrases and inquiries.
They then use a subfield of NLP called natural language generation to respond to queries.
As NLP evolves, smart assistants are now being trained to provide more than just one-way answers. E.g :
Apple's Siri, Amazon's Alexa, and Google Assistant.

6 CHATBOT
Chatbots are an NLP customer service application example
They can be used to:
Respond to pre-determined FAQs
Schedule meetings and appointments Book tickets
Process and track orders
Cross and upsell
Onboard new users or members
sentiment analysis
Sentiment analysis is a technique in natural language processing (NLP)
that determines the emotional tone behind a body of text. This process
involves identifying whether the sentiment is positive, negative, or
neutral, often by analyzing word choice and context. It allows computers
to understand opinions, emotions, and attitudes expressed in written
language.
Lt’s begin!

Introduction To NLP
No ratings yet
Introduction To NLP
50 pages
NLP_DeepNLP
No ratings yet
NLP_DeepNLP
61 pages
NLP_Crash_Course_Comprehensive
No ratings yet
NLP_Crash_Course_Comprehensive
2 pages
NLP Quiz
No ratings yet
NLP Quiz
8 pages
CSC 528 Lecture 3
No ratings yet
CSC 528 Lecture 3
42 pages
Natural Language Processing Revision Notes
No ratings yet
Natural Language Processing Revision Notes
4 pages
DLT Unit-5
No ratings yet
DLT Unit-5
48 pages
Natural Language Processing Notes Class 10
No ratings yet
Natural Language Processing Notes Class 10
10 pages
Motivation Video: Mitsuku Vs Cleverbot - AI (Artificial Intelligence)
No ratings yet
Motivation Video: Mitsuku Vs Cleverbot - AI (Artificial Intelligence)
45 pages
Natural Language Processing: Some Screenshots Are Taken From NLP Course by Jufrasky - Used Only For Educational Purpose
No ratings yet
Natural Language Processing: Some Screenshots Are Taken From NLP Course by Jufrasky - Used Only For Educational Purpose
44 pages
NLP - 1_250119_222702 (1)
No ratings yet
NLP - 1_250119_222702 (1)
71 pages
NLP 9
No ratings yet
NLP 9
44 pages
big data analytics Chap 11
No ratings yet
big data analytics Chap 11
8 pages
Harambe University
No ratings yet
Harambe University
8 pages
NLP Q2 21SAL54 Scheme
No ratings yet
NLP Q2 21SAL54 Scheme
6 pages
NLP Unit 1 pdf
No ratings yet
NLP Unit 1 pdf
27 pages
Module 1.1
No ratings yet
Module 1.1
9 pages
NLP (4)
No ratings yet
NLP (4)
40 pages
learn 4
No ratings yet
learn 4
27 pages
NLP PPT
No ratings yet
NLP PPT
58 pages
pdf NLP
No ratings yet
pdf NLP
7 pages
AP for NLP-LO1
No ratings yet
AP for NLP-LO1
61 pages
Subjective Ai 417 2023
No ratings yet
Subjective Ai 417 2023
43 pages
Natural Language Processing (NLP)
No ratings yet
Natural Language Processing (NLP)
5 pages
Unit-I QB
No ratings yet
Unit-I QB
5 pages
Board QP Solution and Notes
No ratings yet
Board QP Solution and Notes
36 pages
AP for NLP-Word 2 Vec
No ratings yet
AP for NLP-Word 2 Vec
33 pages
NLP
No ratings yet
NLP
11 pages
About NLP
No ratings yet
About NLP
14 pages
Python Text Mining: Perform Text Processing, Word Embedding, Text Classification and Machine Translation
From Everand
Python Text Mining: Perform Text Processing, Word Embedding, Text Classification and Machine Translation
Alexandra George
No ratings yet
kostyacholak.substack.com-The Evolution of LLMs in the context of NLP
No ratings yet
kostyacholak.substack.com-The Evolution of LLMs in the context of NLP
5 pages
Natural Language Processing_NOTES
No ratings yet
Natural Language Processing_NOTES
4 pages
Module-5 (1)
No ratings yet
Module-5 (1)
57 pages
Chapter 7.1 - Introducing Natural Language Processing
No ratings yet
Chapter 7.1 - Introducing Natural Language Processing
39 pages
LP v Oral Questions and Answers
No ratings yet
LP v Oral Questions and Answers
4 pages
NLP For ML - Spam Classifier
No ratings yet
NLP For ML - Spam Classifier
14 pages
Hadi Pres, 21-12-24-1
No ratings yet
Hadi Pres, 21-12-24-1
16 pages
Nlp Materia
No ratings yet
Nlp Materia
29 pages
Minorproject Ishant
No ratings yet
Minorproject Ishant
18 pages
Unit 1 NLP KCS072
No ratings yet
Unit 1 NLP KCS072
12 pages
6. Applications of NLP
No ratings yet
6. Applications of NLP
85 pages
Dealing With Textual Data
No ratings yet
Dealing With Textual Data
67 pages
256_FA2024_INTRO-1
No ratings yet
256_FA2024_INTRO-1
66 pages
UNIT - 03 (All Topics) (3)
No ratings yet
UNIT - 03 (All Topics) (3)
54 pages
Chapter 6 NLP
No ratings yet
Chapter 6 NLP
16 pages
(IJCST-V6I3P19) :vignesh Venkatesh
No ratings yet
(IJCST-V6I3P19) :vignesh Venkatesh
16 pages
Evolving landscap of nlp
No ratings yet
Evolving landscap of nlp
5 pages
NLP Q&A1a Text Processing
No ratings yet
NLP Q&A1a Text Processing
16 pages
A Beginner's Guide To Natural Language Processing - IBM Developer
No ratings yet
A Beginner's Guide To Natural Language Processing - IBM Developer
9 pages
Natural Language Processing
No ratings yet
Natural Language Processing
25 pages
Lect_05_Preprocessing_text
No ratings yet
Lect_05_Preprocessing_text
25 pages
Lect01
No ratings yet
Lect01
28 pages
13. TEXT CLASSIFICATION USING NLP
No ratings yet
13. TEXT CLASSIFICATION USING NLP
28 pages
Complete NLP Guide_ From Fundamentals to Deep Learning with TensorFlow
No ratings yet
Complete NLP Guide_ From Fundamentals to Deep Learning with TensorFlow
13 pages
NLP m2
No ratings yet
NLP m2
71 pages
Natural Language Processing
No ratings yet
Natural Language Processing
10 pages
Nlp Revision Notes
No ratings yet
Nlp Revision Notes
6 pages
Introduction To NLP - Part 1
No ratings yet
Introduction To NLP - Part 1
23 pages
Natural Language Processing
No ratings yet
Natural Language Processing
73 pages
UNIT 6 NATURAL LANGUAGE PROCESSING.docx
No ratings yet
UNIT 6 NATURAL LANGUAGE PROCESSING.docx
10 pages
Modals & Modal Perfects
No ratings yet
Modals & Modal Perfects
2 pages
Concept Structuring Systems Toward A Cognitive Semantics Vol 1
100% (3)
Concept Structuring Systems Toward A Cognitive Semantics Vol 1
574 pages
Lesson Plan - Subject-Verb Agreement
No ratings yet
Lesson Plan - Subject-Verb Agreement
6 pages
Communication Strategies - Luciano Mariani
No ratings yet
Communication Strategies - Luciano Mariani
66 pages
Speech Error in Radio Show
No ratings yet
Speech Error in Radio Show
2 pages
Assessing Speaking Performance at Level C1
No ratings yet
Assessing Speaking Performance at Level C1
1 page
Delhi Private School, Dubai Syllabus For Admission To Grade - II Session 2016-17 Term I
No ratings yet
Delhi Private School, Dubai Syllabus For Admission To Grade - II Session 2016-17 Term I
1 page
Informal Letters: Recounts Events in The Order in Which They Happened
No ratings yet
Informal Letters: Recounts Events in The Order in Which They Happened
5 pages
English Books For Library
No ratings yet
English Books For Library
2 pages
Stages of Reading Development
100% (1)
Stages of Reading Development
12 pages
Planning A Weekend Away British English Student
No ratings yet
Planning A Weekend Away British English Student
6 pages
ESL Paper 3 - 2023
No ratings yet
ESL Paper 3 - 2023
4 pages
6-р-нэгж-8-ранги-1
No ratings yet
6-р-нэгж-8-ранги-1
10 pages
Extreme Adjectives
No ratings yet
Extreme Adjectives
3 pages
Lesson Plan Quarter 2 W1
No ratings yet
Lesson Plan Quarter 2 W1
5 pages
Tieng Anh 3 - Sach Giao Vien
No ratings yet
Tieng Anh 3 - Sach Giao Vien
360 pages
Grade V Report On Unmastered Competencies Per Learning Area SECOND QUARTER GR V
No ratings yet
Grade V Report On Unmastered Competencies Per Learning Area SECOND QUARTER GR V
7 pages
Vocabulary Grammar Unit 1 Test B
No ratings yet
Vocabulary Grammar Unit 1 Test B
3 pages
Strengthening Literacy Instruction PowerPoint
No ratings yet
Strengthening Literacy Instruction PowerPoint
130 pages
WA LANGUAGEg SKILLS RELATED TASKS
No ratings yet
WA LANGUAGEg SKILLS RELATED TASKS
10 pages
Reading Comprehension Practice
No ratings yet
Reading Comprehension Practice
10 pages
16 - Question formation
No ratings yet
16 - Question formation
18 pages
Catch Up Friday
No ratings yet
Catch Up Friday
4 pages
Simple Present Vs Present Continuous
No ratings yet
Simple Present Vs Present Continuous
1 page
Program Studi Pendidikan Bahasa Inggris
No ratings yet
Program Studi Pendidikan Bahasa Inggris
2 pages
Sapir Whorf Thesis Definition
100% (2)
Sapir Whorf Thesis Definition
4 pages
Grade 3 Syllabus - Updated
No ratings yet
Grade 3 Syllabus - Updated
8 pages
Language Policy Critique Paper
No ratings yet
Language Policy Critique Paper
1 page
Lesson Plan (This/These) P1
No ratings yet
Lesson Plan (This/These) P1
12 pages
ASSIGNMENT ON TASK AND PROJECTS
No ratings yet
ASSIGNMENT ON TASK AND PROJECTS
11 pages