NLP Chapter - 1 Sheet
NLP Chapter - 1 Sheet
• Email platforms:
o Spam classification, priority inbox, calendar event extraction, auto-complete (e.g., Gmail, Outlook).
• Voice-based assistants:
o Query understanding, query expansion, question answering, information retrieval (e.g., Google, Bing).
• Machine translation:
Other applications:
• E-commerce platforms:
• IBM Watson: An AI built using NLP techniques that competed on "Jeopardy!" quiz show and won $1 million,
outperforming human champions.
• Educational tools:
1
2. Explain the following NLP tasks
• Language modeling:
• Text classification:
o Categorizing text into predefined classes (e.g., spam detection, sentiment analysis).
• Information extraction:
o Extracting structured information from unstructured text (e.g., extracting names,or events from emails).
• Information retrieval:
• Conversational agent:
o Building systems that can converse with humans (e.g., chatbots, voice assistants).
• Text summarization:
• Question answering:
• Machine translation:
• Topic modeling:
2
3. What are the building blocks of language and their applications?
• Phonemes:
• Syntax:
• Context:
• Ambiguity: Words and sentences can have multiple meanings depending on context.
• Creativity: Language includes creative elements like poetry, metaphors, and idioms.
• Complexity of human language: Syntax, semantics, and pragmatics make language processing difficult for
machines.
3
5. How NLP, ML, and DL are related?
• AI: Broad field aiming to build systems that perform tasks requiring human intelligence.
• ML: Subfield of AI that learns patterns from data without explicit programming.
• DL: Subfield of ML based on neural networks to model complex patterns.
• NLP: Subfield of AI focused on language processing, often using ML and DL techniques.
• Rule-Based Systems: Early NLP systems relied on handcrafted rules and resources like dictionaries and
thesauruses.
• Lexicon-Based Sentiment Analysis: Uses counts of positive and negative words to deduce sentiment.
• Knowledge Bases: WordNet (synonyms, hyponyms, meronyms), Open Mind Common Sense.
• Regex and CFG: Regular expressions and context-free grammars for text analysis.
7. Explain briefly Naive Bayes, Support Vector Machine, Hidden Markov Model, and Conditional Random
Fields approaches
• Naive Bayes: A probabilistic classifier based on Bayes’ theorem. Assumes feature independence. Used in text
classification.
• Support Vector Machine (SVM): A classifier that finds the optimal decision boundary between classes. Used in
text classification.
• Hidden Markov Model (HMM): A statistical model for sequential data. Used in part-of-speech tagging.
• Conditional Random Fields (CRF): A sequential classifier that considers context. Used in named entity
recognition and part-of-speech tagging.
4
8. What is the difference between RNN and LSTM NN?
• RNN: Processes sequential data but struggles with long-term dependencies due to the vanishing gradient
problem.
• LSTM: A variant of RNN that uses memory cells to retain long-term context, making it more effective for longer
sequences.
• CNNs process text by converting words into 2D word vectors of dimension n ✕ d, forming a matrix.
o n : number of words in the sentence , d : size of the word vectors.
• This matrix can be treated similar to an image and can be modeled by a CNN
• Convolution filters capture local patterns (e.g., n-grams).
• pooling layers condense features, and fully connected layers classify the text.
• This makes CNNs effective for tasks like sentiment analysis and text classification.
• Applying knowledge learned from one task to a different but related task.
• Example: Pre-training a large model (e.g., BERT) on a massive dataset, then fine-tuning it for specific NLP tasks
like text classification or question answering.
• Hidden Layer (Encoder): Compresses the input into a dense vector representation.
• Output Layer (Decoder): Reconstructs the input from the compressed representation.
5
12. List the key reasons that make DL not suitable for all NLP tasks
• Natural language understanding: Analyzes text for sentiment, entities, and intent.
• Dialog management: Determines the user’s intent and decides the next action.