0% found this document useful (0 votes)
15 views5 pages

Stemming Is The Process of Reducing Words To Their Base or Root Form (E.g., "Running"

The document provides a comprehensive overview of Natural Language Processing (NLP), covering fundamental concepts, intermediate techniques, and advanced methodologies. Key topics include tokenization, machine translation, language models like BERT and GPT, and various learning approaches such as transfer learning and self-supervised learning. It serves as a foundational guide for understanding the components and applications of NLP.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
15 views5 pages

Stemming Is The Process of Reducing Words To Their Base or Root Form (E.g., "Running"

The document provides a comprehensive overview of Natural Language Processing (NLP), covering fundamental concepts, intermediate techniques, and advanced methodologies. Key topics include tokenization, machine translation, language models like BERT and GPT, and various learning approaches such as transfer learning and self-supervised learning. It serves as a foundational guide for understanding the components and applications of NLP.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 5

Basics of NLP (Questions 1-30)

1. What is NLP?

Answer: NLP (Natural Language Processing) is a subfield of artificial intelligence that focuses on the
interaction between computers and humans using natural language. It involves tasks like text
analysis, language generation, and understanding.

2. What are the key components of NLP?

Answer: Key components include:

• Tokenization

• Part-of-Speech (POS) Tagging

• Named Entity Recognition (NER)

• Syntax and Parsing

• Sentiment Analysis

• Machine Translation

3. What is tokenization?

Answer: Tokenization is the process of splitting text into smaller units called tokens, which can be
words, phrases, or sentences.

4. What is stemming?

Answer: Stemming is the process of reducing words to their base or root form (e.g., "running" →
"run").

5. What is lemmatization?

Answer: Lemmatization reduces words to their base or dictionary form (lemma), considering the
context (e.g., "better" → "good").

6. What is stop word removal?

Answer: Stop words are common words (e.g., "the", "is", "and") that are often removed to focus on
meaningful words in text analysis.

7. What is Part-of-Speech (POS) tagging?

Answer: POS tagging assigns grammatical labels (e.g., noun, verb, adjective) to each word in a
sentence.

8. What is Named Entity Recognition (NER)?

Answer: NER identifies and classifies entities in text into categories like names, dates, organizations,
etc.

9. What is a corpus?

Answer: A corpus is a large and structured collection of texts used for linguistic analysis and training
NLP models.
10. What is the difference between syntax and semantics?

Answer: Syntax refers to the structure of sentences, while semantics deals with the meaning of
words and sentences.

11. What is a bag-of-words model?

Answer: A bag-of-words model represents text as an unordered collection of words, ignoring


grammar and word order but keeping track of word frequency.

12. What is TF-IDF?

Answer: TF-IDF (Term Frequency-Inverse Document Frequency) is a statistical measure used to


evaluate the importance of a word in a document relative to a corpus.

13. What is word embedding?

Answer: Word embedding is a dense vector representation of words that captures semantic
relationships (e.g., Word2Vec, GloVe).

14. What is Word2Vec?

Answer: Word2Vec is a neural network-based model that learns word embeddings by predicting
words in a context (CBOW) or predicting context from a word (Skip-gram).

15. What is GloVe?

Answer: GloVe (Global Vectors for Word Representation) is an unsupervised learning algorithm for
obtaining word embeddings by factorizing a word co-occurrence matrix.

16. What is a language model?

Answer: A language model predicts the probability of a sequence of words, often used in text
generation and speech recognition.

17. What is n-gram?

Answer: An n-gram is a contiguous sequence of n items (words, characters) from a given text.

18. What is sentiment analysis?

Answer: Sentiment analysis determines the emotional tone or opinion expressed in text (e.g.,
positive, negative, neutral).

19. What is text normalization?

Answer: Text normalization is the process of transforming text into a consistent format (e.g.,
lowercasing, removing punctuation).

20. What is the difference between rule-based and statistical NLP?

Answer: Rule-based NLP uses handcrafted linguistic rules, while statistical NLP relies on machine
learning and data-driven approaches.

21. What is a confusion matrix in NLP?

Answer: A confusion matrix is used to evaluate classification models by showing true positives, false
positives, true negatives, and false negatives.
22. What is precision and recall?

Answer: Precision measures the accuracy of positive predictions, while recall measures the fraction
of true positives correctly identified.

23. What is F1-score?

Answer: F1-score is the harmonic mean of precision and recall, providing a balance between the
two.

24. What is overfitting in NLP models?

Answer: Overfitting occurs when a model performs well on training data but poorly on unseen data
due to excessive complexity.

25. What is underfitting in NLP models?

Answer: Underfitting occurs when a model is too simple to capture patterns in the data, resulting in
poor performance on both training and test data.

26. What is cross-validation?

Answer: Cross-validation is a technique to evaluate model performance by partitioning data into


multiple subsets and training/testing on different combinations.

27. What is the difference between supervised and unsupervised learning in NLP?

Answer: Supervised learning uses labeled data, while unsupervised learning works with unlabeled
data to find patterns.

28. What is a chatbot?

Answer: A chatbot is an NLP application that simulates human conversation using text or voice.

29. What is machine translation?

Answer: Machine translation automatically translates text from one language to another (e.g.,
Google Translate).

30. What is text summarization?

Answer: Text summarization generates a concise summary of a longer text while retaining key
information.

Intermediate NLP (Questions 31-70)

31. What is sequence-to-sequence (Seq2Seq) modeling?

Answer: Seq2Seq is a framework for tasks like machine translation, where an input sequence is
mapped to an output sequence using encoder-decoder architectures.

32. What is attention mechanism?

Answer: Attention mechanism allows models to focus on specific parts of the input sequence,
improving performance in tasks like translation.
33. What is Transformer architecture?

Answer: Transformer is a neural network architecture based on self-attention mechanisms, enabling


parallel processing and state-of-the-art performance in NLP.

34. What is BERT?

Answer: BERT (Bidirectional Encoder Representations from Transformers) is a pre-trained language


model that uses bidirectional context for tasks like question answering and sentiment analysis.

35. What is GPT?

Answer: GPT (Generative Pre-trained Transformer) is a language model that uses autoregressive
transformers for text generation.

36. What is the difference between BERT and GPT?

Answer: BERT is bidirectional and focuses on understanding context, while GPT is unidirectional and
focuses on text generation.

37. What is transfer learning in NLP?

Answer: Transfer learning involves using pre-trained models (e.g., BERT, GPT) and fine-tuning them
for specific tasks.

38. What is fine-tuning in NLP?

Answer: Fine-tuning is the process of adapting a pre-trained model to a specific task by training it on
a smaller, task-specific dataset.

39. What is zero-shot learning in NLP?

Answer: Zero-shot learning involves training a model to perform tasks it has never seen during
training, using generalizable knowledge.

40. What is few-shot learning in NLP?

Answer: Few-shot learning involves training a model with very few examples of a task.

41. What is a pre-trained language model?

Answer: A pre-trained language model is trained on a large corpus and can be fine-tuned for specific
NLP tasks.

42. What is perplexity in NLP?

Answer: Perplexity measures how well a language model predicts a sample, with lower values
indicating better performance.

43. What is beam search?

Answer: Beam search is a decoding algorithm used in text generation to find the most likely
sequence of words.

44. What is greedy search?

Answer: Greedy search selects the most likely word at each step in text generation, without
considering future steps.
45. What is a dependency tree?

Answer: A dependency tree represents the grammatical structure of a sentence by showing


relationships between words.

46. What is coreference resolution?

Answer: Coreference resolution identifies expressions that refer to the same entity in a text.

47. What is semantic role labeling?

Answer: Semantic role labeling identifies the roles of words in a sentence (e.g., agent, patient).

48. What is topic modeling?

Answer: Topic modeling is an unsupervised technique to identify topics in a collection of documents


(e.g., LDA).

49. What is Latent Dirichlet Allocation (LDA)?

Answer: LDA is a probabilistic model used for topic modeling, representing documents as mixtures
of topics.

50. What is word sense disambiguation?

Answer: Word sense disambiguation determines the correct meaning of a word based on context.

Advanced NLP (Questions 71-100)

71. What is self-supervised learning in NLP?

Answer: Self-supervised learning uses unlabeled data to create supervised tasks, such as predicting
masked words in BERT.

72. What is masked language modeling?

Answer: Masked language modeling involves predicting masked words in a sentence, used in models
like BERT.

73. What is contrastive learning in NLP?

Answer: Contrastive learning trains models to distinguish between similar and dissimilar pairs of
data points.

74. What is adversarial training in NLP?

Answer: Adversarial training improves model robustness by exposing it to adversarial examples.

75. What is multi-task learning in NLP?

Answer: Multi-task learning trains a model on multiple related tasks simultaneously to improve
generalization.

You might also like