AI Quiz ch3
AI Quiz ch3
**Probabilistic Models**
- **N-gram Models**: Estimate word probabilities based on preceding words but have
limitations in capturing semantic meaning and long-range dependencies.
- **Hidden Markov Models**: Probabilistic models that infer hidden states from observable
events, used in tasks like speech recognition.
### Conclusion
NLP has transitioned from simple statistical models to complex neural networks and large
language models, revolutionizing human-computer communication. The eld continues to
evolve, with ongoing advancements promising more intuitive and e ective language
processing capabilities.
### Summary
- **RNN**: E ective for short sequences but limited in long-term dependencies.
- **LSTM**: More complex, addresses long-term dependencies with multiple gates.
- **GRU**: A simpli ed alternative to LSTM, balancing performance and complexity.
- **Encoder-Decoder with Attention**: Enhances sequence processing by dynamically
focusing on relevant parts of the input.
Sure! Here’s a summary of the key points about training sequence-to-sequence (Seq2Seq)
models, transformers, and large language models (LLMs):
### Conclusion
- The journey from early statistical models to advanced transformers and LLMs has
transformed NLP, addressing previous limitations and expanding applications across
industries, while also raising ethical considerations and challenges.
ffi
fl
fi
fi
fi
Here are 50 multiple-choice questions (MCQs) related to the resources you've provided,
along with their answers:
16. **Which of the following models is best for capturing long-term dependencies in
sequences?**
- A) N-grams
- B) RNNs
- C) Decision Trees
- D) Linear Regression
**Answer:** B
41. **Which of the following models is speci cally designed for dialogue applications?**
- A) BERT
- B) GPT-3
- C) FastText
- D) Word2Vec
**Answer:** B
43. **Which model uses unsupervised learning to create word embeddings based on context?
**
- A) Decision Trees
- B) Word2Vec
- C) Logistic Regression
- D) Random Forest
**Answer:** B
44. **What is the purpose of using a softmax function in classi cation tasks?**
- A) To optimize performance
- B) To normalize output probabilities
- C) To initialize weights
- D) To enhance input data
fi
fi
fi
**Answer:** B
45. **Which of the following techniques can improve the performance of a neural network?**
- A) Increasing data noise
- B) Hyperparameter tuning
- C) Reducing data size
- D) Ignoring validation data
**Answer:** B
48. **Which type of model uses layers of neurons to learn from data?**
- A) Linear regression
- B) Neural networks
- C) Decision trees
- D) Clustering algorithms
**Answer:** B
5. **What component of LSTM controls how much of the previous cell state is retained?**
- A) Input gate
- B) Forget gate
- C) Output gate
- D) Cell state
**Answer:** B) Forget gate
18. **What type of neural network architecture is typically used in Encoder-Decoder models?
**
- A) Feedforward Neural Network
- B) Convolutional Neural Network
- C) Recurrent Neural Network
- D) Generative Adversarial Network
**Answer:** C) Recurrent Neural Network
19. **What does the output of the decoder in a Seq2Seq model depend on?**
- A) Previous outputs only
fi
ffi
- B) Current input only
- C) Context vector and previous outputs
- D) Random values
**Answer:** C) Context vector and previous outputs
22. **In attention mechanisms, what does the dynamic context vector represent?**
- A) Fixed-length input
- B) Summary of past outputs
- C) Relevance of input words
- D) Hidden state of the encoder
**Answer:** C) Relevance of input words
23. **Which type of data is best suited for LSTM and GRU?**
- A) Tabular data
- B) Sequential data
- C) Image data
- D) Static data
**Answer:** B) Sequential data
24. **What is the primary advantage of using attention mechanisms in neural networks?**
- A) Decreases training time
- B) Enhances interpretability
- C) Improves handling of long sequences
- D) Reduces model size
**Answer:** C) Improves handling of long sequences
25. **In LSTM, which function is used to normalize values between -1 and 1?**
- A) Sigmoid
- B) Softmax
- C) Tanh
- D) ReLU
ff
**Answer:** C) Tanh
31. **Which layer typically generates the output predictions in Encoder-Decoder networks?**
- A) Input layer
- B) Softmax layer
- C) Hidden layer
- D) Embedding layer
**Answer:** B) Softmax layer
32. **In a Seq2Seq model, what does the decoder primarily rely on to generate its output?**
fi
fi
fl
- A) Random noise
- B) The context vector
- C) The training dataset
- D) Prede ned rules
**Answer:** B) The context vector
B) Fewer parameters
40. **Which network architecture is best for tasks requiring sequence prediction?**
- A) Convolutional Neural Network
- B) Recurrent Neural Network
- C) Radial Basis Function Network
- D) Fully Connected Network
**Answer:** B) Recurrent Neural Network
42. **In the context of attention mechanisms, what are the "scores"?**
- A) Measures of loss
- B) Relevance indicators between encoder and decoder states
- C) Training performance metrics
- D) Randomly generated values
**Answer:** B) Relevance indicators between encoder and decoder states
43. **Which of the following describes a key bene t of using LSTMs over traditional RNNs?
**
- A) Simplicity
- B) Ability to handle longer sequences
- C) Lower computational cost
- D) Faster convergence
**Answer:** B) Ability to handle longer sequences
44. **What type of model would use an encoder-decoder architecture with attention?**
- A) Regression model
- B) Classi cation model
- C) Neural machine translation model
fi
fi
fi
fl
fi
fi
- D) Clustering model
**Answer:** C) Neural machine translation model
45. **In a Seq2Seq architecture, the encoder’s hidden states are used to create what?**
- A) Output sequence
- B) Context vector
- C) Input layer
- D) Loss function
**Answer:** B) Context vector
50. **Which of the following best describes the function of the attention mechanism in a
neural network?**
- A) It reduces the number of parameters.
- B) It focuses on relevant parts of the input sequence at each decoding step.
- C) It eliminates the need for the encoder.
- D) It simpli es the architecture.
**Answer:** B) It focuses on relevant parts of the input sequence at each decoding step.
fi
fi
fi
Training sequences
### MCQs
2. **Which technique is commonly used to minimize the di erence between generated output
and ground truth in Seq2Seq models?**
- A) Data augmentation
- B) Teacher forcing
- C) Dropout
- D) Batch normalization
**Answer: B**
3. **What mechanism allows Seq2Seq models to handle long input sequences more
e ectively?**
- A) Dropout
- B) Attention mechanisms
- C) Convolutional layers
- D) Recurrent connections
**Answer: B**
8. **In a Transformer, what does the "Add & Norm" step do?**
- A) Combines di erent models
- B) Adds residual connections and normalizes the output
- C) Reduces the dimensionality of inputs
- D) Optimizes the loss function
**Answer: B**
9. **What type of attention is used in the decoder to prevent future tokens from being seen?
**
- A) Multi-head attention
- B) Self-attention
- C) Masked multi-head attention
- D) Global attention
**Answer: C**
10. **Which function is used to capture both the meaning and position of each word in
Transformers?**
- A) Activation function
- B) Positional encoding
- C) Softmax function
- D) Loss function
**Answer: B**
11. **What does the Query (Q) represent in the self-attention mechanism?**
- A) The importance of the output
- B) The current token's focus
- C) The model's predictions
- D) The sequence length
**Answer: B**
12. **Which type of model is speci cally designed for dialogue applications?**
- A) BERT
- B) GPT-3
- C) LaMDA
- D) LLaMA
ff
fi
**Answer: C**
15. **What does the softmax function do in the output layer of a Transformer?**
- A) Generates random predictions
- B) Converts logits to probabilities
- C) Normalizes input sequences
- D) Reduces model complexity
**Answer: B**
20. **Which model introduced the "text-to-text" framework for NLP tasks?**
- A) BERT
- B) GPT-3
- C) T5
- D) LaMDA
**Answer: C**
27. **What does the "Nx repetitions" refer to in the Transformer architecture?**
- A) The number of hidden layers
- B) The number of attention heads
- C) The number of times the encoder components are repeated
- D) The number of output tokens
**Answer: C**
31. **Which component of the Transformer architecture attends to both encoded input and
the decoder's state?**
- A) Input embedding
- B) Multi-head attention
- C) Feed-forward layer
- D) Positional encoding
**Answer: B**
ffi
fi
32. **What is the main focus of Codex?**
- A) Text summarization
- B) Language translation
- C) Code generation
- D) Sentiment analysis
**Answer: C**
33. **Which of the following models is known for handling multimodal inputs?**
- A) BERT
- B) GPT-4
- C) LaMDA
- D) LLaMA
**Answer: B**
34. **What does the term "autoregressive" refer to in the context of GPT models?**
- A) Processing input in parallel
- B) Generating text sequentially
- C) Analyzing images
- D) Training without supervision
**Answer: B**
36. **Which model is designed to improve text understanding through bidirectional context?
**
- A) GPT-3
- B) BERT
- C) T5
- D) Codex
**Answer: B**
41. **Which model was speci cally designed for open-ended dialogue?**
- A) T5
- B) LaMDA
- C) BERT
- D) Codex
**Answer: B**
48. **What does the "softmax" function output in the context of the Transformer model?**
- A) Probabilities of the next token
- B) Numerical embeddings
- C) Hidden state representations
- D) Attention scores
**Answer: A**
49. **Which model is known for its focus on generating human-like text?**
- A) BERT
- B) T5
- C) GPT-3
- D) LLaMA
**Answer: C**