2mrk Answers
2mrk Answers
PART-A
1. What is Deep Learning?
Deep learning is a subset of machine learning that uses neural networks with many layers (hence the term "deep") to
model and understand complex patterns in data. Deep learning algorithms automatically learn feature
representations from raw data such as images, text, and speech, and are particularly useful in tasks like image
recognition, natural language processing, and autonomous driving.
2. What are the main differences between AI, Machine Learning, and Deep Learning?
Artificial Intelligence (AI): The broader field that involves creating machines or systems that can perform
tasks requiring human intelligence, such as decision-making, problem-solving, and understanding natural
language.
Machine Learning (ML): A subset of AI focused on building algorithms that allow computers to learn from
and make predictions or decisions based on data, without being explicitly programmed.
Deep Learning (DL): A further subset of ML that uses artificial neural networks with many layers (deep neural
networks) to model complex patterns in large datasets. It is particularly useful for tasks like image
recognition, speech recognition, and natural language processing.
Speech Recognition: Virtual assistants like Siri, Google Assistant, and speech-to-text systems.
Autonomous Vehicles: Self-driving cars use deep learning for object detection and decision-making.
Robotics: Deep learning helps robots understand and interact with their environment.
Gaming: AI models used in game development to create more realistic non-playable characters (NPCs) and
adversarial agents.
Vector: A one-dimensional array or list of numbers, which represents a point in space or a set of values (e.g.,
position in 2D space as [x,y][x, y][x,y]).
Matrix: A 2D array of numbers arranged in rows and columns, used to represent data or transformations
(e.g., image data in grayscale).
Tensor: A generalization of matrices to higher dimensions. A 3D tensor could represent a colored image with
width, height, and RGB color channels. In deep learning, tensors are used to represent multidimensional
data.
Bayesian Methods: Many deep learning models (such as Bayesian Neural Networks) use probability
distributions to make inferences and predictions.
Learning from Data: Deep learning models typically learn to predict the probabilities of outcomes, rather
than making deterministic predictions. This helps in tasks like classification, where the goal is to predict the
likelihood of an event.
A random variable is a variable whose value is subject to random variation or uncertainty. It can take different values,
each with an associated probability. Random variables are classified as either discrete or continuous.
Discrete Random Variables take specific values (e.g., the number of heads in 10 coin tosses).
Continuous Random Variables can take any value within a range (e.g., the height of a person or the
temperature).
Overfitting: Occurs when the model is too complex and learns not only the underlying patterns in the data
but also the noise, leading to poor generalization to new data.
Underfitting: Happens when the model is too simple and cannot capture the underlying patterns in the data,
leading to poor performance even on the training data.
The capacity of a model refers to its ability to learn complex patterns and functions. A model with high capacity can
learn more intricate patterns but may also be more prone to overfitting. A model with low capacity may underfit and
fail to capture important patterns in the data.
Choosing simpler models (e.g., linear models instead of deep neural networks).
Limiting the number of parameters in the model (e.g., fewer layers or nodes in neural networks).
Bayes error is the lowest possible error that can be achieved by any classifier on a given problem, assuming the true
underlying probability distribution is known. It represents the irreducible error in classification tasks due to the
inherent randomness in the data.
Hyperparameters in machine learning are settings that control the training process and model architecture, such as
learning rate, batch size, and number of epochs. Unlike model parameters, hyperparameters are not learned from
data but are set before training.
They are important because they directly affect model performance, learning efficiency, and generalization. Proper
tuning of hyperparameters can improve accuracy, prevent overfitting or underfitting, and help the model converge
more quickly. Hyperparameter tuning is typically done through methods like grid search or random search.
20. How to solve overfitting problem caused by learning hyperparameters on training dataset?
Applying early stopping to halt training when performance on the validation set starts to degrade.
Point estimators are statistics that estimate the value of a parameter (e.g., mean, variance) based on sample data.
They provide a single value (point) as an estimate of the true parameter.
Efficiency: The estimator has the smallest variance among all unbiased estimators.
Consistency: As the sample size increases, the estimator converges to the true value of the parameter.
A deep feedforward network is a type of artificial neural network where information moves only in one direction—
from input to output—through multiple layers of neurons. It is used in supervised learning tasks such as classification
and regression.
In a feedforward neural network, the input is passed through a series of hidden layers where each neuron processes
the input through an activation function and passes it to the next layer. The output layer produces the final prediction
or classification.
Hidden layers: Layers that perform computations and learn features from the input data.
Output layer: The layer that produces the final output, which could be a classification or a regression value.
Regularization is a technique used to prevent overfitting by adding a penalty term to the loss function, which
discourages overly complex models. Common regularization methods include L2 regularization (Ridge) and L1
regularization (Lasso).
Dropout is a regularization technique where, during training, random units (neurons) are "dropped" or ignored in
each iteration. This helps prevent overfitting by ensuring that the network does not rely too heavily on any particular
neuron.
Regularization:
Regularization refers to techniques used to prevent a model from overfitting to the training data by
introducing additional constraints or penalties. The goal is to create a simpler model that generalizes well to
unseen data. Common regularization methods include L1 (Lasso) and L2 (Ridge) regularization, where the
model's complexity is penalized based on the size of the coefficients. Regularization discourages the model
from assigning too much importance to any particular feature, which helps to prevent overfitting.
Optimization:
Optimization refers to the process of finding the best parameters (weights) for a machine learning model to
minimize or maximize an objective function (such as a loss or cost function). Optimization methods, such as
Gradient Descent, adjust the model's parameters iteratively to minimize the loss function. In contrast to
regularization, which specifically controls the model's complexity, optimization focuses on finding the best fit
for the data.
30. How Splitting a Dataset into Train, Dev, and Test Sets Helps Identify Overfitting
When you split a dataset into three parts—training, development (validation), and test sets—it allows you to
identify if your model is overfitting, as follows:
Training Set: Used to train the model and adjust its parameters.
Development (Validation) Set: Used to tune hyperparameters (such as learning rate, model complexity, etc.)
and evaluate the model's performance during training.
Test Set: Used only after the model is trained and hyperparameters are finalized, providing an unbiased
estimate of the model's generalization performance.
Overfitting occurs when the model performs very well on the training data but poorly on unseen data (validation or
test set). By using a validation set, you can monitor if the model's performance is significantly better on the training
set than on the validation set. If the model has high training accuracy but low validation accuracy, it's likely
overfitting. If this discrepancy is large, techniques like regularization, pruning, or cross-validation might be used to
address the overfitting.
Definition:
SGD is an optimization algorithm where model parameters are updated based on the gradient of the loss function for
a single data point (or mini-batch), rather than the whole dataset.
Merits:
1. Faster updates: Updates are quicker since they use one data point.
2. Memory efficient: Requires less memory, making it scalable for large datasets.
3. Helps avoid local minima: The noisy updates can help escape local minima.
Demerits:
1. Noisy updates: The model may oscillate around the optimal solution.