Aids2 QB Ut2
Aids2 QB Ut2
The matrix displays the number of instances produced by the model on the test data.
• True positives (TP): occur when the model accurately predicts a positive data point.
• True negatives (TN): occur when the model accurately predicts a negative data point.
• False positives (FP): occur when the model predicts a positive data point incorrectly.
• False negatives (FN): occur when the model mispredicts a negative data point.
Example: https://youtu.be/_CGTbkHwUHQ?si=Crqjkbo6sKiwZRLj
e. ROC curves (5th chp)
Ans: ROC curve (Receiver Operating Characteristic curve) is a graphical representation used
to evaluate the performance of a binary classification model. It helps to visualize the trade-
offs between the true positive rate (TPR) and the false positive rate (FPR) at various threshold
settings.
Cross-validation is a robust technique used to assess the performance of machine learning models by
dividing the dataset into multiple subsets and evaluating the model on each subset. It helps reduce
variability in performance estimates and ensures that the model generalizes well to unseen data. The
most common form of cross-validation is k-fold cross-validation. Here's an explanation of how it
works and its key concepts:
How Cross-Validation Works
1. Data Splitting:
o The dataset is divided into k equal-sized subsets (also called "folds").
2. Training and Testing:
o For each fold, the model is trained on k-1 folds (combined as the training set) and
tested on the remaining fold (the test set).
o This process is repeated k times, each time using a different fold as the test set and
the remaining folds as the training set.
3. Averaging the Results:
o After training and testing across all k folds, the evaluation metric (e.g., accuracy,
precision, recall) is averaged across all k iterations.
o This gives a more reliable estimate of the model’s performance compared to the
holdout method because the model is trained and tested on different parts of the
dataset multiple times.
K-Fold Cross-Validation
K-fold cross-validation is a type of cross-validation that involves splitting a dataset into k folds. Each
fold is used as a test set, and the remaining folds are used as training sets. The model is trained and
evaluated on each fold, and the average performance across all folds is calculated.
works:
1. Divide the dataset into k subsets (folds).
2. For each of the k iterations:
o One fold is used as the validation set (test set), and the remaining k-1 folds are used
for training.
3. After all iterations are done, the results are averaged to produce a single performance
estimate.
Example:
In 5-Fold Cross-Validation, the dataset is split into 5 equal parts:
• In iteration 1, the first part is used for testing, and the rest for training.
• In iteration 2, the second part is used for testing, and the rest for training, and so on until all
parts have been used as a test set.
Advantages:
• Less Bias: Each data point gets to be in the test set exactly once and in the training set k-1
times, making the performance estimate more robust.
• Efficient Use of Data: Especially useful when the dataset is small, as it allows the model to
train on multiple data configurations.
Disadvantages:
• Computational Cost: Training the model k times increases the computation time
significantly, especially with large datasets or complex models.
• Choice of k: The value of k can affect performance. Common values are k=5 or k=10.
4. Compare between single layer perceptron and multi layer perceptron
Architecture of an Autoencoder
1. Input Layer:
o The layer where the original data is fed into the network. The input can be any form
of data like images, text, or numerical values, typically represented as a vector or
matrix.
2. Encoder:
o The encoder compresses the input into a smaller, more compact form. This
compression process reduces the dimensionality of the input, retaining only essential
features. The encoder is made up of hidden layers that apply transformations (such as
linear combinations followed by non-linear activation functions like ReLU or
sigmoid).
3. Code (Latent Space):
o The code represents the compressed, low-dimensional version of the input. It is the
"bottleneck" of the network, where the network captures the most critical features of
the input in the most compact form possible.
4. Decoder:
o The decoder reconstructs the original input from the compressed code. It tries to
reverse the encoding process and retrieve the input as closely as possible to its
original form. The decoder is also composed of hidden layers that progressively
transform the compressed code back into the original input format.
5. Output Layer:
o The output layer generates the final reconstructed version of the input. Ideally, the
output should be as close as possible to the original input. A loss function (e.g., mean
squared error) compares the input and output, and the network learns by minimizing
this loss during training.
applications:
1. Data Compression: Reducing data size for efficient storage.
2. Denoising: Removing noise from corrupted data, like noisy images.
3. Dimensionality Reduction: Lowering feature dimensions while retaining essential info.
4. Anomaly Detection: Identifying outliers in datasets (e.g., fraud detection).
5. Feature Learning: Extracting key features for use in other models.
6. Image Generation: Creating new data, such as images, using variational autoencoders.
7. Recommender Systems: Learning user preferences for personalized recommendations.
8. Pre-training: Pre-training deep networks in unsupervised learning tasks.
9. Medical Imaging: Enhancing or compressing medical images for diagnosis.
10. Video/Image Compression: Efficiently compressing media files for storage or streaming.
1. Denoising Autoencoder
• Purpose: Used to remove noise from data.
• Architecture: Trained by introducing noise into the input data and then reconstructing the
original, clean data.
• How it works: The autoencoder learns to extract useful features while ignoring the noise.
During training, a noisy version of the input is passed, but the target output is the original
noise-free input.
• Applications: Image denoising, noise reduction in data.
2. Sparse Autoencoder
• Purpose: Forces the autoencoder to learn a sparse representation of the input, meaning only a
few neurons in the encoding layer are active at once.
• Architecture: It adds a sparsity constraint (regularization term) to the loss function to ensure
that most of the activations in the hidden layers are close to zero.
• How it works: The sparsity constraint encourages the network to learn a compact,
informative encoding.
• Applications: Feature extraction, unsupervised pre-training for deep networks.
3. Deep Autoencoder
• Purpose: A more complex form of autoencoder with multiple hidden layers.
• Architecture: It has a deep network, typically made up of multiple layers of encoders and
decoders.
• How it works: The deep structure allows it to learn more complex, hierarchical features.
• Applications: Dimensionality reduction, data compression, complex feature extraction.
4. Contractive Autoencoder
• Purpose: Encourages robustness to small changes in the input.
• Architecture: It adds a penalty term to the loss function based on the Frobenius norm of the
Jacobian matrix of the encoder activations with respect to the input.
• How it works: The penalty term ensures that the learned representations do not change much
when the input is slightly perturbed.
• Applications: Learning robust features, manifold learning.
5. Undercomplete Autoencoder
• Purpose: Compresses the data into a smaller latent representation, with no explicit
regularization.
• Architecture: The size of the hidden layer is smaller than the input layer (hence,
"undercomplete").
• How it works: It forces the autoencoder to learn the most important features by constraining
the size of the hidden layer.
• Applications: Dimensionality reduction, data compression.
CNN architecture
Convolutional Neural Network consists of multiple layers like the input layer, Convolutional
layer, Pooling layer, and fully connected layers.
• Input Layer: This layer takes in the raw image data, which is typically a 3D tensor of
size (height, width, channels).
• Convolutional Layer: This layer applies filters to small regions of the image to detect
local patterns. The output of this layer is a feature map, which represents the presence of
certain features in the image.
• Pooling Layer: Also known as downsampling, this layer reduces the spatial dimensions
of the feature map to reduce the number of parameters and the number of computations
required. Common pooling techniques include max pooling and average pooling.
• Fully Connected Layers: These layers, also known as dense layers, are used for
classification. They take the output from the convolutional and pooling layers and
produce a probability distribution over the possible classes.
• Output Layer: The output layer provides the final predictions. For a classification task,
this often uses a softmax function, which produces probabilities for each class.
CNN architecture may also include other layers such as:
• Activation Functions: These introduce non-linearity into the model, allowing it to learn more
complex relationships between the inputs and outputs. Common activation functions include
ReLU, Sigmoid, and Tanh.
• Batch Normalization: This layer normalizes the input data for each layer, which can improve
the stability and speed of training.
• Dropout: This layer randomly sets a fraction of the neurons to zero during training, which
can help prevent overfitting.
Hidden Layer 2
Input Layer: As the name suggests, it accepts inputs in several different formats provided by the
programmer.
Hidden Layer: The hidden layer presents in-between input and output layers. It performs all the
calculations to find hidden features and patterns.
Output Layer: The input goes through a series of transformations using the hidden layer, which
finally results in output that is conveyed using this layer.
The learning process in ANNs involves adjusting the model's parameters to minimize the
difference between the predicted output and the actual output. This process is typically done
through an optimization algorithm, such as Stochastic Gradient Descent (SGD), Adam, or
RMSProp.
step-by-step explanation of the learning process:
1. Data Preparation: The first step is to prepare the data for training. This includes collecting and
preprocessing the data, splitting it into training, validation, and testing sets, and normalizing the
input features.
2. Forward Propagation: The next step is to feed the input data through the network, layer by
layer, to produce an output. This process is called forward propagation. The output of each layer
is calculated using the weights, biases, and activation functions.
3. Error Calculation: The error between the predicted output and the actual output is calculated
using a loss function, such as Mean Squared Error (MSE) or Cross-Entropy.
4. Backpropagation: The error is then propagated backwards through the network, layer by layer,
to calculate the gradients of the loss function with respect to each parameter. This process is
called backpropagation.
5. Weight Update: The gradients are then used to update the weights and biases of the network
using an optimization algorithm. The goal is to minimize the loss function and improve the
accuracy of the model.
6. Repeat: Steps 2-5 are repeated for multiple iterations, with the network adjusting its parameters
to better fit the training data.
7. Model Evaluation: The model is evaluated on the validation set to monitor its performance
and prevent overfitting.
8. Model Deployment: Once the model has converged and achieved satisfactory performance, it
can be deployed to make predictions on new, unseen data.
10. Justify with reason Recurrent Neural networks are used for Named Entity Recognition
Ans: Named Entity Recognition (NER) is a fundamental task in Natural Language Processing
(NLP) that involves identifying and categorizing named entities in unstructured text into
predefined categories such as person, organization, location, and time. Recurrent Neural
Networks (RNNs) are widely used for NER tasks due to their ability to model sequential data and
capture long-range dependencies.
Why RNNs are Suitable for NER
RNNs are suitable for NER tasks because they can:
• Model Sequential Data: RNNs are designed to handle sequential data, such as text, where
each word or character is dependent on the previous ones.
• Capture Long-Range Dependencies: RNNs can capture long-range dependencies in text,
which is essential for NER tasks where entities can be mentioned multiple times in a
document.
• Handle Variable-Length Sequences: RNNs can handle variable-length sequences, which is
important for NER tasks where sentences or documents can have varying lengths.
• Learn Contextual Representations: RNNs can learn contextual representations of words,
which is essential for NER tasks where the meaning of a word depends on its context.
Some common image processing tasks where CNNs are used include:
• Image Classification: CNNs can be used to classify images into different categories (e.g.,
objects, scenes, actions).
• Object Detection: CNNs can be used to detect objects within images and locate them with
bounding boxes.
• Image Segmentation: CNNs can be used to segment images into different regions or objects.
• Image Generation: CNNs can be used to generate new images or modify existing ones.
• Image Denoising: CNNs can be used to remove noise from images.
12. For Processing of Long sentences, LSTM networks are used instead of RNN, Justify.
Ans:
Recurrent Neural Networks (RNN) are a type of neural network designed to handle sequential
data, such as text or speech. However, RNNs have limitations when it comes to processing long
sentences or sequences. This is where Long Short-Term Memory (LSTM) networks come in.
Limitations of RNNs: RNNs have two main limitations:
1. Vanishing Gradient Problem: As the sequence length increases, the gradients used to update
the weights during backpropagation tend to vanish, making it difficult for the network to learn
long-term dependencies.
2. Exploding Gradient Problem: Conversely, the gradients can also explode, causing the
weights to be updated excessively, leading to unstable training.
Vanishing Gradient Problem: LSTMs are designed specifically to mitigate this issue. They have
a more complex architecture that includes gates (input, output, and forget gates) that control the
flow of information, allowing them to maintain and adjust the cell state over long sequences. This
helps preserve important information over long time spans.
Ability to Capture Context: By maintaining an internal memory, LSTMs can better capture the
context and relationships among words, which is essential for tasks like language translation,
sentiment analysis, and text generation.
Flexible Input and Output: LSTMs can handle varying input and output lengths more
effectively due to their architecture. They can take in sequences of different lengths and still
produce meaningful outputs, which is particularly useful in natural language processing tasks.
13. Explain Data science for multimodal, Image, Audio and video
Ans:
1. Multimodal Data Science
Multimodal data science refers to the study and analysis of data that comes from various sources
or modalities. This approach aims to leverage the strengths of different data types to enhance
machine learning models and provide more comprehensive insights. By integrating multimodal
data, data scientists can improve prediction accuracy, discover hidden patterns, and develop more
robust applications in fields like healthcare, social media, security, and entertainment.
2. Image Data Science
Image data science involves analyzing and extracting information from image data using various
techniques and algorithms. Key components include:
• Computer Vision: This field focuses on enabling machines to interpret and understand visual
information from the world. Techniques such as object detection, image segmentation, and
facial recognition are common.
• Image Processing: Image data often requires preprocessing steps, such as resizing,
normalization, filtering, and augmentation, to improve the quality of the analysis.
• Deep Learning: Convolutional Neural Networks (CNNs) are widely used for image
classification, recognition, and feature extraction tasks, as they can automatically learn
hierarchical features from images.
• Applications: Image data science is used in various applications, including medical imaging
(e.g., diagnosing diseases), autonomous vehicles (e.g., obstacle detection), and social media
(e.g., image tagging and recommendations).
d. Entertainment Industry:
Case Study: Recommending Movies using Collaborative Filtering
A movie streaming company wanted to improve its movie recommendation system to increase user
engagement. The company had a large dataset of user ratings and movie attributes.
Data Science Approach:
• Data preprocessing: Cleaned and processed the dataset to remove missing values and outliers.
• Feature engineering: Extracted relevant features such as user ratings, movie genres, and
director information.
• Modeling: Built a collaborative filtering model to recommend movies to users based on their
past ratings and preferences.
• Evaluation: Used metrics such as precision, recall, and F1 score to evaluate the model's
performance.
Results:
• The data science approach resulted in a 25% increase in user engagement due to personalized
movie recommendations.
• The company was able to improve its recommendation system and increase user satisfaction.
• The model also helped the company to identify opportunities to improve its content offerings
and increase revenue.