0% found this document useful (0 votes)
8 views

NLP Assignement Solution

Natural Language Processing (NLP) is an AI field that enables computers to analyze and generate human language, with applications including machine translation and question answering systems. Levels of language analysis in NLP include phonology, morphology, syntax, semantics, and pragmatics, with detailed discussions on morphology and pragmatics. The document also covers text classification using supervised machine learning and Naïve Bayes sentiment classification, illustrating key concepts with examples and mathematical expressions.

Uploaded by

Maymoon Irfan
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
8 views

NLP Assignement Solution

Natural Language Processing (NLP) is an AI field that enables computers to analyze and generate human language, with applications including machine translation and question answering systems. Levels of language analysis in NLP include phonology, morphology, syntax, semantics, and pragmatics, with detailed discussions on morphology and pragmatics. The document also covers text classification using supervised machine learning and Naïve Bayes sentiment classification, illustrating key concepts with examples and mathematical expressions.

Uploaded by

Maymoon Irfan
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 6

Question 1: Define Natural Language Processing?

What are the significant application


areas of Natural Language Processing? Discuss any two application areas in detail.
Definition:
Natural Language Processing (NLP) is a field of artificial intelligence that focuses on enabling
computers to analyze, understand, and generate human language. It draws from computational
linguistics, cognitive science, psycholinguistics, and statistics to process language at various
levels (phonology, morphology, syntax, semantics, and pragmatics).

Significant Application Areas of NLP:

• Machine Translation
• Question Answering Systems
• Information Retrieval and Extraction
• Text Categorization and Classification
• Speech Recognition and Text-to-Speech
• Sentiment Analysis
• Spelling and Grammar Checking
• Plagiarism Detection
• Dialogue Systems
• Language Learning and Teaching Tools

Detailed Discussion of Two Applications:

1. Machine Translation:
o Converts text from one language to another automatically.
o Example: Google Translate attempts to translate a daily newspaper from Japanese
to English.
o Challenges: Ambiguity, syntactic differences, cultural expressions.
2. Question Answering:
o Systems designed to retrieve specific answers to user queries.
o Example: A system answering "Who is the first Taiwanese president?" from a
large document corpus.
o Involves NLP tasks such as named entity recognition, parsing, and semantic
matching.

Question 2: Define the different levels of language analysis and discuss two of them in
detail with real-life examples.

Levels of Language Analysis in NLP:

1. Phonology
2. Morphology
3. Syntax
4. Semantics
5. Pragmatics
Detailed Explanation:

1. Morphology:
o Deals with the structure and formation of words from morphemes.
o Morpheme: Smallest meaningful unit (e.g., "dog" or the plural suffix "-s").
o Example:
 dogs = dog (free morpheme) + -s (bound morpheme)
 unhappiness = un- + happy + -ness
2. Pragmatics:
o Concerned with how context influences the interpretation of meaning.
o Examples:
 “Do you know the time?” is often a request, not a yes/no question.
 “We gave the monkeys the bananas because they were hungry” → 'they'
refers to monkeys;
“...because they were overripe” → 'they' refers to bananas.

Question 3: Define the three types of representation in descriptive statistics.

Frequency Distributions:

• Tabular display showing how often each value appears in a dataset.

Graphical Representations:

• Visual tools such as bar charts, pie charts, and histograms.


• Help in quick understanding of patterns and distributions.

Summary Statistics:

• Condense data using key values:


o Mean: Arithmetic average.
o Median: Middle value.
o Mode: Most frequent value.
o Variance/Standard Deviation: Measure of spread.

Question 4: Differentiate between semantics and syntax with reference to the levels of
language analysis.

Feature Syntax Semantics


Focus Sentence structure Meaning of words and sentences
Interpretation of phrases and
Concerned with Grammar rules, sentence parsing
sentences
Type of Analysis Syntactic parsing (tree structures) Meaning representation
“Colorless green ideas sleep But the sentence is semantically
Example
furiously” (syntactically valid) meaningless
Question 5: Draw the systematic diagram where NLP fits in CS Taxonomy.

Question 6: Discuss the text classification method using supervised machine learning with
the help of mathematical expression.

Text Classification using Supervised Learning involves:

• Training Set: Labeled documents (𝑑𝑑1 , 𝑐𝑐1 ), … , (𝑑𝑑𝑚𝑚 , 𝑐𝑐𝑚𝑚 )


• Input: Unlabeled document 𝒅𝒅
• Output: Predicted class 𝒄𝒄ˆ ∈ 𝐶𝐶

Naïve Bayes Classifier:

𝑐𝑐ˆ𝑀𝑀𝑀𝑀𝑀𝑀 = arg max 𝑃𝑃(𝑐𝑐 ∣ 𝑑𝑑) = arg max 𝑃𝑃(𝑑𝑑 ∣ 𝑐𝑐)𝑃𝑃(𝑐𝑐)


𝑐𝑐∈𝐶𝐶 𝑐𝑐∈𝐶𝐶

Under the Bag-of-Words model and conditional independence:


𝑛𝑛 𝑛𝑛

𝑃𝑃(𝑑𝑑 ∣ 𝑐𝑐) = � 𝑃𝑃(𝑓𝑓𝑖𝑖 ∣ 𝑐𝑐) ⇒ 𝑐𝑐ˆ𝑁𝑁𝑁𝑁 = arg max 𝑃𝑃(𝑐𝑐) � 𝑃𝑃(𝑓𝑓𝑖𝑖 ∣ 𝑐𝑐)
𝑐𝑐∈𝐶𝐶
𝑖𝑖=1 𝑖𝑖=1
To avoid underflow, compute in log space:
𝑛𝑛

𝑐𝑐ˆ𝑁𝑁𝑁𝑁 = arg max log 𝑃𝑃(𝑐𝑐) + � log 𝑃𝑃(𝑓𝑓𝑖𝑖 ∣ 𝑐𝑐)


𝑐𝑐∈𝐶𝐶
𝑖𝑖=1

Question 7: A fair coin is tossed 3 times. What is the likelihood of 2 heads?

• Sample Space:
Ω = {𝐻𝐻𝐻𝐻𝐻𝐻, 𝐻𝐻𝐻𝐻𝐻𝐻, 𝐻𝐻𝐻𝐻𝐻𝐻, 𝐻𝐻𝐻𝐻𝐻𝐻, 𝑇𝑇𝑇𝑇𝑇𝑇, 𝑇𝑇𝑇𝑇𝑇𝑇, 𝑇𝑇𝑇𝑇𝑇𝑇, 𝑇𝑇𝑇𝑇𝑇𝑇}
→ |Ω| = 8

• Event A: Outcomes with exactly 2 heads


𝐴𝐴 = {𝐻𝐻𝐻𝐻𝐻𝐻, 𝐻𝐻𝐻𝐻𝐻𝐻, 𝑇𝑇𝑇𝑇𝑇𝑇}
→ |𝐴𝐴| = 3

• Uniform Distribution:
|𝐴𝐴| 3
𝑃𝑃(𝐴𝐴) = = = 0.375
|Ω| 8

Question #8: Naïve Bayes Sentiment Classification

Category Documents
Excellent product
Affordable and Reliable
Positive (+)
Very satisfied with the purchase
Highly recommended
Very disappointed
Negative (-)
Not worthy

Step 1: Preprocess and Build the Vocabulary

1. Normalization: Convert all text to lowercase and tokenize the words.


o Positive Documents (after normalization):
 Doc1: "excellent product" → excellent, product
 Doc2: "affordable and reliable" → affordable, and, reliable
 Doc3: "very satisfied with the purchase" → very, satisfied, with,
the, purchase
 Doc4: "highly recommended" → highly, recommended
o Negative Documents:
 Doc5: "very disappointed" → very, disappointed
 Doc6: "not worthy" → not, worthy
Vocabulary V: The union of unique words from all documents:

V={excellent, product, affordable, and, reliable, very, satisfied, with, the, purchase, highly, reco
mmended, disappointed, not, worthy}V = \{ \text{excellent, product, affordable, and, reliable,
very, satisfied, with, the, purchase, highly, recommended, disappointed, not, worthy}
\}V={excellent, product, affordable, and, reliable, very, satisfied, with, the, purchase, highly, rec
ommended, disappointed, not, worthy}

• Vocabulary size: ∣V∣=15

Step 2: Compute Class Priors

• Total documents = 6
• Positive Prior P(+): 46≈0.667\frac{4}{6} \approx 0.66764≈0.667
• Negative Prior P(−): 26≈0.333\frac{2}{6} \approx 0.33362≈0.333

Step 3: Preprocess the Test Document

• Test Document: “Disappointed quality, not recommended”


• Normalized (lowercase, tokenized):
Test Words={disappointed, quality, not, recommended}

Step 4: Calculate Likelihoods with Laplace (Add-One) Smoothing


For any word 𝑤𝑤 given class 𝑐𝑐 :

count (𝑤𝑤, 𝑐𝑐) + 1


𝑃𝑃(𝑤𝑤 ∣ 𝑐𝑐) =
Total words in 𝑐𝑐 + |𝑉𝑉|

Step 5: Compute the Posterior for Each Class


The Naïve Bayes classification rule in product space is:

𝑐𝑐ˆ = arg max 𝑃𝑃(𝑐𝑐) � 𝑃𝑃(𝑤𝑤 ∣ 𝑐𝑐)


𝑐𝑐∈{+,−}
𝑤𝑤∈ Test Document

For the Positive Class:

Score + = 𝑃𝑃(+) × 𝑃𝑃( disappointed ∣ +) × 𝑃𝑃( quality ∣ +) × 𝑃𝑃( not ∣ +) × 𝑃𝑃( recommended ∣ +)
1 1 1 2
= 0.667 × × × ×
27 27 27 27
2
= 0.667 × 4
27
2
≈ 0.667 ×
531441
≈ 0.667 × 3.76 × 10−6
≈ 2.51 × 10−6
For the Negative Class:

Score_ = 𝑃𝑃(−) × 𝑃𝑃( disappointed ∣ −) × 𝑃𝑃( quality ∣ −) × 𝑃𝑃( not ∣ −) × 𝑃𝑃( recommended ∣ −)
2 1 2 1
= 0.333 × × × ×
19 19 19 19
4
= 0.333 × 4
19
4
≈ 0.333 ×
130321
≈ 0.333 × 3.07 × 10−5
≈ 1.02 × 10−5

Step 6: Compare the Scores

• Positive Class Score: ≈ 2.51 × 10−6


• Negative Class Score: ≈ 1.02 × 10−5
Since 1.02 × 10−5 > 2.51 × 10−6, the Naïve Bayes model predicts that the test document has
negative sentiment.

You might also like