NLP Assignement Solution
NLP Assignement Solution
• Machine Translation
• Question Answering Systems
• Information Retrieval and Extraction
• Text Categorization and Classification
• Speech Recognition and Text-to-Speech
• Sentiment Analysis
• Spelling and Grammar Checking
• Plagiarism Detection
• Dialogue Systems
• Language Learning and Teaching Tools
1. Machine Translation:
o Converts text from one language to another automatically.
o Example: Google Translate attempts to translate a daily newspaper from Japanese
to English.
o Challenges: Ambiguity, syntactic differences, cultural expressions.
2. Question Answering:
o Systems designed to retrieve specific answers to user queries.
o Example: A system answering "Who is the first Taiwanese president?" from a
large document corpus.
o Involves NLP tasks such as named entity recognition, parsing, and semantic
matching.
Question 2: Define the different levels of language analysis and discuss two of them in
detail with real-life examples.
1. Phonology
2. Morphology
3. Syntax
4. Semantics
5. Pragmatics
Detailed Explanation:
1. Morphology:
o Deals with the structure and formation of words from morphemes.
o Morpheme: Smallest meaningful unit (e.g., "dog" or the plural suffix "-s").
o Example:
dogs = dog (free morpheme) + -s (bound morpheme)
unhappiness = un- + happy + -ness
2. Pragmatics:
o Concerned with how context influences the interpretation of meaning.
o Examples:
“Do you know the time?” is often a request, not a yes/no question.
“We gave the monkeys the bananas because they were hungry” → 'they'
refers to monkeys;
“...because they were overripe” → 'they' refers to bananas.
Frequency Distributions:
Graphical Representations:
Summary Statistics:
Question 4: Differentiate between semantics and syntax with reference to the levels of
language analysis.
Question 6: Discuss the text classification method using supervised machine learning with
the help of mathematical expression.
𝑃𝑃(𝑑𝑑 ∣ 𝑐𝑐) = � 𝑃𝑃(𝑓𝑓𝑖𝑖 ∣ 𝑐𝑐) ⇒ 𝑐𝑐ˆ𝑁𝑁𝑁𝑁 = arg max 𝑃𝑃(𝑐𝑐) � 𝑃𝑃(𝑓𝑓𝑖𝑖 ∣ 𝑐𝑐)
𝑐𝑐∈𝐶𝐶
𝑖𝑖=1 𝑖𝑖=1
To avoid underflow, compute in log space:
𝑛𝑛
• Sample Space:
Ω = {𝐻𝐻𝐻𝐻𝐻𝐻, 𝐻𝐻𝐻𝐻𝐻𝐻, 𝐻𝐻𝐻𝐻𝐻𝐻, 𝐻𝐻𝐻𝐻𝐻𝐻, 𝑇𝑇𝑇𝑇𝑇𝑇, 𝑇𝑇𝑇𝑇𝑇𝑇, 𝑇𝑇𝑇𝑇𝑇𝑇, 𝑇𝑇𝑇𝑇𝑇𝑇}
→ |Ω| = 8
• Uniform Distribution:
|𝐴𝐴| 3
𝑃𝑃(𝐴𝐴) = = = 0.375
|Ω| 8
Category Documents
Excellent product
Affordable and Reliable
Positive (+)
Very satisfied with the purchase
Highly recommended
Very disappointed
Negative (-)
Not worthy
V={excellent, product, affordable, and, reliable, very, satisfied, with, the, purchase, highly, reco
mmended, disappointed, not, worthy}V = \{ \text{excellent, product, affordable, and, reliable,
very, satisfied, with, the, purchase, highly, recommended, disappointed, not, worthy}
\}V={excellent, product, affordable, and, reliable, very, satisfied, with, the, purchase, highly, rec
ommended, disappointed, not, worthy}
• Total documents = 6
• Positive Prior P(+): 46≈0.667\frac{4}{6} \approx 0.66764≈0.667
• Negative Prior P(−): 26≈0.333\frac{2}{6} \approx 0.33362≈0.333
Score + = 𝑃𝑃(+) × 𝑃𝑃( disappointed ∣ +) × 𝑃𝑃( quality ∣ +) × 𝑃𝑃( not ∣ +) × 𝑃𝑃( recommended ∣ +)
1 1 1 2
= 0.667 × × × ×
27 27 27 27
2
= 0.667 × 4
27
2
≈ 0.667 ×
531441
≈ 0.667 × 3.76 × 10−6
≈ 2.51 × 10−6
For the Negative Class:
Score_ = 𝑃𝑃(−) × 𝑃𝑃( disappointed ∣ −) × 𝑃𝑃( quality ∣ −) × 𝑃𝑃( not ∣ −) × 𝑃𝑃( recommended ∣ −)
2 1 2 1
= 0.333 × × × ×
19 19 19 19
4
= 0.333 × 4
19
4
≈ 0.333 ×
130321
≈ 0.333 × 3.07 × 10−5
≈ 1.02 × 10−5