0% found this document useful (0 votes)
11 views9 pages

Module 1.1

Lecture Notes for Natural Language processing A brief history of natural language processing, language challenges, applications, classical vs statistical vs deep learning-based, Basic concepts in linguistic data Structure: Morphology, syntax, semantics, pragmatics, Tokenized text and pattern matching-Recognizing names, Stemming, Tagging

Uploaded by

Abd Xy
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
11 views9 pages

Module 1.1

Lecture Notes for Natural Language processing A brief history of natural language processing, language challenges, applications, classical vs statistical vs deep learning-based, Basic concepts in linguistic data Structure: Morphology, syntax, semantics, pragmatics, Tokenized text and pattern matching-Recognizing names, Stemming, Tagging

Uploaded by

Abd Xy
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 9

Brief History of NLP

o 1950s: Alan Turing proposed the Turing Test to evaluate a


machine's ability to exhibit intelligent behavior equivalent to,
or indistinguishable from, that of a human.
o 1960s: Development of early NLP systems such as ELIZA, a
computer program by Joseph Weizenbaum that simulated
conversation.
o 1970s-1980s: Introduction of rule-based systems like
SHRDLU and the development of the Chomsky hierarchy in
linguistics.
o 1990s: Statistical approaches began to dominate NLP,
utilizing probabilistic models to handle large corpora of text
data.
o 2000s: The rise of machine learning and deep learning
techniques led to significant advancements in NLP, such as
Google's PageRank algorithm for search engines.
o 2010s-Present: Development of powerful deep learning
models like Word2Vec, GloVe, BERT, and GPT, which
have revolutionized NLP tasks such as language translation,
sentiment analysis, and text generation.

Language Challenges in NLP

o Ambiguity: Words and sentences can have multiple


meanings.
o Example: "The farmer went to the bank." (Is "bank"
referring to the side of a river or a financial institution?)
o Context: Understanding the context is crucial for accurate
interpretation.
 Example: "He banked the plane" vs. "He went to the
bank."
o Sarcasm and Irony: Detecting sarcasm and irony can be
challenging.

 Example: "Oh, great! Another homework


assignment."

o Diverse Syntax and Grammar: Different languages have


different syntax and grammar rules.
 Example: Subject-Verb-Object (SVO) , She eats an
apple in English vs. Subject-Object-Verb (SOV) in
Japanese. Kanojo wa ringo o taberu

o Idioms and Phrases: Recognizing and interpreting


idiomatic expressions.

 Example: "Kick the bucket" meaning "to die."

Applications of NLP

o Machine Translation: Translating text from one language


to another.

 Example: Google Translate translating "Hello, world!"


into Spanish as "¡Hola, mundo!"

o Sentiment Analysis: Determining the sentiment (positive,


negative, neutral) of a text.

 Example: Analyzing product reviews to determine


customer satisfaction.

o Chatbots: Automated systems that interact with users via


text or speech.

 Example: Customer support chatbots like those used


by banks or online retailers.
o Information Retrieval: Extracting relevant information
from large datasets.

 Example: Search engines like Google retrieving


relevant web pages based on user queries.

o Speech Recognition: Converting spoken language into text.

 Example: Voice assistants like Siri, Alexa, and


Google Assistant.

Classical vs. Statistical vs. Deep Learning-based NLP

 Classical NLP:
o Rule-based Approaches: Utilize hand-crafted rules to
process language.

 Example: Parsing sentences using grammar rules.

o Manual Feature Engineering: Involves defining specific


linguistic features for analysis.

 Example: Identifying parts of speech (POS) using


predefined rules.
 Statistical NLP:

o Probabilistic Models: Use statistical methods to model and


predict language patterns.

 Example: Hidden Markov Models (HMMs) for POS


tagging.

o Large Amounts of Data: Relies on extensive corpora to


learn patterns.

 Example: Using n-grams to predict the next word in a


sentence.

 Deep Learning-based NLP:

o Neural Networks: Employ deep neural networks to learn


from raw text data.

 Example: Recurrent Neural Networks (RNNs) for


sequence prediction.

o End-to-End Learning: Models can learn to perform tasks


directly from data without explicit feature engineering.

 Example: Transformers like BERT and GPT for


various NLP tasks.
Basic Concepts in Linguistic Data Structure

 Morphology:

o Study of word structure and formation.


o Example: Analyzing the root, prefix, and suffix of words
like "unhappiness" (un- + happy + -ness).

 Syntax:

o Rules that govern sentence structure.


o Example: English follows Subject-Verb-Object (SVO)
order: "She (S) loves (V) music (O)."

 Semantics:

o Meaning of words and sentences.


o Example: Understanding that "bark" can refer to the sound a
dog makes or the outer covering of a tree.

 Pragmatics:

o Contextual use of language.


o Example: Interpreting "Can you pass the salt?" as a request
rather than a question about ability.
Tokenized Text and Pattern Matching

o Tokenization: Splitting text into individual tokens (words or


sentences).
o Example:
 Input Text: "Natural Language Processing is
fascinating."
 Tokenized Text: ['Natural', 'Language', 'Processing',
'is', 'fascinating', '.']
 Explanation: The sentence is divided into individual
words and punctuation marks.

o Pattern Matching: Identifying patterns within tokenized


text using regular expressions.
o Example:

 Input Text: "The quick brown fox jumps over the


lazy dog."
 Pattern: Words with exactly 4 letters.
 Matched Words: ['quick', 'over', 'lazy']
 Explanation: The pattern identifies words that are
exactly four letters long within the sentence.

Recognizing Names

o Named Entity Recognition (NER): Identifies proper nouns


and classifies them as people, organizations, etc.
o Example:
 Input Text: "Barack Obama was the 44th President of
the United States."
 Recognized Entities:

 'Barack Obama' as PERSON


 '44th President' as TITLE
 'United States' as GPE (Geopolitical Entity)

 Explanation: The NER system identifies and


categorizes names and titles within the text.

Stemming and Lemmatization

 Stemming:
o Reduces words to their base form by removing prefixes or
suffixes.
o Example:

 Input Words: ['running', 'jumps', 'easily', 'fairly']


 Stemmed Words: ['run', 'jump', 'easili', 'fairli']
 Explanation: The words are reduced to their root
forms, which may not always be meaningful.

 Lemmatization:

o Reduces words to their meaningful base form using


vocabulary and morphological analysis.
o Example:

 Input Words: ['running', 'jumps', 'easily', 'fairly']


 Lemmatized Words: ['run', 'jump', 'easy', 'fair']
 Explanation: The words are reduced to their base or
dictionary forms, ensuring they remain meaningful.

Tagging Parts of Speech


o POS Tagging: Assigns part-of-speech tags to each word in a
sentence.
o Example:

 Input Text: "The quick brown fox jumps over the


lazy dog."
 POS Tags: [('The', 'DT'), ('quick', 'JJ'), ('brown', 'JJ'),
('fox', 'NN'), ('jumps', 'VBZ'), ('over', 'IN'), ('the', 'DT'),
('lazy', 'JJ'), ('dog', 'NN')]
 Explanation: Each word is tagged with its
corresponding part of speech, such as determiner (DT),
adjective (JJ), noun (NN), verb (VBZ), and
preposition (IN).

Constituent Structure

o Constituent Structure Analysis: Breaks down sentences


into their sub-parts (constituents).
o Example:

 Input Text: "The quick brown fox jumped over the


lazy dog."
 Constituent Structure:

 Sentence (S)

 Noun Phrase (NP): "The quick brown


fox"
 Determiner (DT): "The"
 Adjectives (JJ): "quick", "brown"
 Noun (NN): "fox"
 Verb Phrase (VP): "jumped over the lazy
dog"

 Verb (VBD): "jumped"


 Prepositional Phrase (PP): "over
the lazy dog"

 Preposition (IN): "over"


 Noun Phrase (NP): "the lazy
dog"
 Determiner (DT):
"the"
 Adjective (JJ): "lazy"
 Noun (NN): "dog"

 Explanation: The sentence is parsed into a


hierarchical structure, showing the relationships
between words and phrases.

You might also like