ML Solved Endsem
ML Solved Endsem
Q. Linear Regression: Definition: The process of finding a straight line that best approximates a set of points
on a graph is called linear regression. In simple words. The word linear signifies that the type of relationship
that you try to establish between the variables tends to be a straight line. Linear regression is one of the most
simple and popular techniques of regression analysis There are 2 types of linear regression 1. Simple Linear
Regression (SLR): This has only one independent (or input) variable. For example, number of litres of petrol
and kilometres driven. 2. Multiple Linear Regression (MLR): This has more than one independent (or input)
variables. For example number of litres of petrol, age of the vehicle, speed and kilometres driven.
The general formula (or model) for linear regression analysis is Y= B0+B1X1+B2X2+…BnXn +e Where Y is the
outcome (dependent) variable X; are the values of independent variables. The objective is to find a linear
model where the sum of squares of the distances is minimal. Applications/ Usecase:1. Healthcare : As you
understand by now, healthcare industry is evolving and there are several researches that are going on Linear
regression could be used to establish the relationship between treatment and its effects or to understand
complex operations of the human body to derive certain 2. Other predictions: There are several other areas
where predictions can be made using the established linear relationship between the variables. It could be
sports outcomes, crop output, machinery performance, fitness, and other similar areas.
Q. Ridge Regression: Ridge Regression is a type of linear regression used to prevent overfitting by adding a
penalty term to the traditional linear regression model. 1. It's a regression technique that incorporates
regularization by adding a penalty term (L2 regularization) to the ordinary least squares regression. 2. Ridge
regression aims to minimize the sum of squared coefficients along with the traditional linear regression loss
function. 3. Ridge regression helps in reducing the impact of highly correlated predictors stabilizing and
improving the model's prediction performance. 4. Ridge regression is effective in cases where there are many
predictors or when predictors are highly correlated, ensuring stable and more reliable predictions. 5. It's used
in various fields like finance, biology, and social sciences, where handling multicollinearity and preventing
overfitting are crucial. 6. Evaluation of Ridge regression models involves assessing their performance using
metrics like Mean Squared Error (MSE) or R-squared to understand predictive accuracy. Example: Suppose
you're predicting a person's salary based on factors like education level, years of experience, and age.
Q. Lasso Regression: Lasso Regression (short for Least Absolute Shrinkage and Selection Operator) is a type
of linear regression used for feature selection and regularization to prevent overfitting. 1. It's a regression
technique that combines ordinary least squares regression with regularization to improve the model's
prediction accuracy and prevent overfitting. 2. Lasso regression adds a penalty term (L1 regularization) to the
traditional linear regression, which penalizes large coefficients by adding their absolute values to the loss
function. 3. The penalty term encourages the model to minimize the sum of the absolute values of the
coefficients, forcing some coefficients to become exactly zero. 4. By reducing some coefficients to zero, Lasso
regression performs feature selection, indicating which features are more important or influential in
predicting the outcome. 5. It helps in handling datasets with a large number of features by automatically
selecting the most relevant ones and excluding irrelevant or less important features. 6. Lasso regression can
be useful when dealing with multicollinearity, a situation where predictor variables are highly correlated with
each other. Example: Consider predicting house prices based on features like size, number of rooms, location,
etc. Lasso regression helps in not only predicting house prices but also in selecting the most imp features.
Ridge Regression Lasso Regression
It shrinks the coefficients towards zero encourages some coefficient to be exactly zero
does not eliminates any feature can eliminate
suitable when all features are importantly when some features are irrelevant or redundant
more computationally efficient Less
requires setting a hyper parameter requires setting a hyperparameter
performs better when there are many small to medium When there are a few large coefficients
sized coefficients
Adds a penalty Term proportional to the sum of squared coefficients To the absolute values of coefficient
Q. Overfitting: Overfitting occurs when a machine learning model learns too much from the training data,
including noise or random fluctuations, and performs poorly on new, unseen data. 1. Overfitting happens
when a model is excessively complex, capturing not only patterns but also noise or random fluctuations
present in the training data. 2. The model performs exceptionally well on the training data but fails to
generalize to new, unseen data, showing poor performance on the test dataset. 3. It occurs when the model
learns too much from specific examples in the training data, making it too specialized and unable to generalize
well. 4. Complex models, such as those with too many features or parameters, are more prone to overfitting.
5. Overfitting often arises when a model tries to fit the training data too closely, capturing even the
peculiarities or outliers that might not represent the overall pattern. 6. Regularization techniques, such as
Lasso or Ridge regression, can help prevent overfitting by penalizing complex models. Example of Overfitting:
Consider training a model to recognize handwritten digits. An overfitted model might perfectly memorize
specific examples in the training set (even accounting for noise), but when presented with new, unseen
handwritten digits, it fails to accurately identify or generalize to different styles of handwriting.
Q. Underfitting: 1. Underfitting happens when a model is too basic or lacks the ability to learn patterns from
the data. 2. It occurs when the model is too generalized and fails to capture the nuances or details present in
the dataset. 3. An underfit model shows low accuracy on both the training and test datasets. 4. It can occur
due to using a model that's too straightforward or has insufficient parameters to represent the data
complexity. 5. Underfitting often happens when trying to fit a linear model to nonlinear data. 6. The model
might overlook important features or characteristics of the data, leading to poor predictions. 7. Solutions to
underfitting include using more complex models or increasing the model's capacity by adding more layers or
parameters. Example: Suppose you're trying to create a model to predict housing prices based on features
like size and number of rooms, but you use a too simplistic linear model. This model may not capture the
complexity of how other factors (like location, amenities, etc.) affect the price, resulting in underfitting. As a
result, the model fails to accurately predict house prices even with more data points.
Q. Techniques to Reduce Underfitting: 1. Increase Model Complexity: Use more complex models that can
capture the underlying patterns in the data. For example, use deeper neural networks, more sophisticated
algorithms, or models with more parameters. 2. Feature Engineering: Enhance the quality of input features
by adding new features, transforming existing ones, or selecting more relevant features. This helps the model
to better represent the relationships in the data. 3. Reduce Regularization: Decrease the strength of
regularization techniques or avoid using them altogether. Regularization can prevent a model from fitting the
data too closely, but excessive regularization might lead to underfitting.
Q. Techniques to Reduce Overfitting: 1. Cross-Validation: Use techniques like k-fold cross-validation to assess
the model's performance on multiple subsets of the data. This helps to detect overfitting by evaluating the
model's performance on different data partitions. 2. Data Augmentation: Increase the size and diversity of
the training data by augmenting it with modified or synthetic examples. This helps the model generalize better
by exposing it to a wider range of scenarios. 3. Regularization: Introduce regularization techniques like L1 or
L2 regularization, dropout (in neural networks), or early stopping to prevent the model from fitting the noise
in the data. Regularization penalizes complex models, encouraging simpler and more generalized solutions.
4. Ensemble Methods: Use ensemble methods like bagging (e.g., Random Forests) or boosting (e.g., Gradient
Boosting Machines) that combine multiple models to reduce variance and improve generalization.
Overfitting Underfitting
model is too complex not complex enough
accurate for training set not accurate
accurate for validation set accurate for validation set
need to reduce complexity need to increase
reduce number of features Increase
apply regularisation Reduce
reduce training Increase
add training examples add training examples
Q. Bias: 1. Bias is a tendency towards or against something or someone. It can be conscious or unconscious,
and it can affect our thoughts, feelings, and behaviors. 2. Biases can be positive or negative. Positive biases
can lead us to see the 0good in things, while negative biases can lead us to see the bad. 3. Biases can be
formed in a variety of ways. We can develop biases from our own experiences, from the people we interact
with, and from the information we consume. 4. Biases can have a significant impact on our lives. They can
affect our decisions, our relationships, and our overall well-being. 5. It is important to be aware of our biases.
Once we are aware of them, we can take steps to manage them and reduce their impact on our lives. 6. There
are a number of ways to manage bias. We can educate ourselves about different biases, challenge our own
assumptions, and seek out diverse perspectives. 7. Reducing bias is an ongoing process. It takes time and
effort to become more aware of our biases and to develop strategies for managing them. 8. Everyone has
biases. It is important to remember that we are all human and that we are all susceptible to bias.
Q. Variance: 1. Variance measures how spread out or dispersed a set of numbers is around their mean or
average. 2. It calculates the average of squared differences from the mean, showing how much each number
differs from the mean squared. 3. High variance indicates that the data points are widely spread out from the
average, while low variance means the data points are closer to the average. 4. It helps understand the
consistency or variability within a dataset. 5. Variance is sensitive to outliers—extreme values that can
significantly affect the spread of the data. 6. Mathematically, variance is calculated by taking the average of
the squared differences between each data point and the mean. 7. It's used in statistics and probability to
analyze the distribution and dispersion of data points in a dataset. 8. Variance is the square of the standard
deviation, another measure of how spread out the numbers in a dataset are. 9. A larger variance implies a
wider range of values, indicating more diversity or inconsistency within the data. Variance Example: Imagine
you have two classes of students. In Class A, most students consistently score around 80% on their tests. In
Class B, the scores vary widely, ranging from 50% to 100%. Class A has low variance because the scores are
close to each other and to the average (80%), while Class B has high variance due to the wide range of scores.
Q. Mean Absolute Error (MAE): Mean Absolute Error (MAE) is a metric used to evaluate the accuracy of a
machine learning model's predictions by measuring the average of the absolute differences between
predicted and actual values. 1. MAE calculates the average absolute differences between predicted values
and actual values in a dataset. 2. It measures the average magnitude of errors without considering their
direction 3. MAE provides an understanding of how far off, on average, the predictions of a model are from
the actual values in the dataset. 4. A lower MAE indicates that the model has smaller errors & is better at
predicting the target variable. 5.MAE is commonly used for regression problems to Example: Suppose you
have a model predicting house prices. To measure its performance, you calculate the MAE by taking the
absolute differences between the predicted prices and the actual prices of several houses in your dataset.
Q. Root Mean Squared Error (RMSE): Root Mean Squared Error (RMSE) is a metric used to evaluate the
accuracy of a machine learning model's predictions by measuring the average of squared differences between
predicted and actual values, then taking the square root of this average. 1. RMSE calculates the square root
of the average of squared differences between predicted values and actual values in a dataset. 2. It measures
the average magnitude of errors while considering their direction, penalizing larger errors more heavily due
to squaring. 3. The formula for RMSE involves squaring the differences between predicted and actual values,
computing the average of these squared differences, and then taking the square root of this average. 4. RMSE
gives an understanding of the typical size of the errors made by a model when predicting the target variable.
5. It's commonly used for regression problems to evaluate the performance of models predicting continuous
numerical values. 6. RMSE is sensitive to outliers since squaring can significantly amplify the impact of larger
errors on the overall metric. Example: Suppose you have a model predicting the heights of individuals. To
measure its performance, you calculate the RMSE by squaring the differences between the predicted heights
and the actual heights of several people in your dataset, computing their average, and then taking the square
root. If the RMSE is, let's say, 5 inches, it means, on average, the model's predictions are off by approximately
5 inches from the actual heights.
Q. R-squared (R2): R-squared (R2) is a statistical metric used to evaluate the goodness of fit of a regression
model by measuring the proportion of variance in the dependent variable explained by the independent
variables. 1. R-squared measures the goodness of fit of a regression model by determining how well the model
fits the observed data. 2. It represents the proportion of variance in the dependent variable (target) that is
explained by the independent variables (features) in the model. 3. R2 ranges from 0 to 1, where: - An R2 value
of 1 indicates that the model perfectly predicts the dependent variable based on the independent variables.
- An R2 value of 0 indicates that the model does not explain any variability in the dependent variable. 4. The
formula for R-squared involves comparing the sum of squared differences between predicted values and the
mean of the dependent variable to the sum of squared differences between actual values and the mean. 5.
R2 can be interpreted as the proportion of variance in the dependent variable that is captured by the model,
expressed as a percentage. 6. It's a commonly used metric to assess how well a regression model fits the data
and whether it provides better predictions compared to a baseline model (a model with no predictors). 7.
A higher R2 value generally indicates a better fit, suggesting that the model explains a larger portion of the
variance in the target variable. Example: Suppose you have a model predicting student exam scores based on
study hours and attendance. A high Rsquared value, such as 0.8, indicates that 80% of the variability in exam
scores is explained by study hours and attendance in your model, demonstrating a good fit between the model
and the observed data.
Q. 1. Bagging: Bagging involves training multiple instances of the same learning algorithm on different
subsets of the training data, often using bootstrapping (sampling with replacement). The final prediction is
made by averaging (for regression) or voting (for classification) the predictions of these models. Random
Forest is an example of a bagging ensemble method that employs decision trees
Q. 2. Boosting: Boosting is a technique that sequentially trains models and focuses on the misclassified
instances in each iteration. Models are weighted based on their performance, with more weight given to
incorrectly predicted instances, allowing subsequent models to focus more on these cases.
Q. K-Nearest Neighbors (KNN): (KNN) is a simple and intuitive machine learning algorithm used for
classification & regression tasks based on similarity measures. 1. KNN is a non parametric, instance-based
algorithm that makes predictions based on the similarity of the input data points to other points in the
dataset. 2. It's a type of lazy learning algorithm, meaning it doesn't explicitly learn a model during the training
phase but rather memorizes the training data for predictions. 3. For classification tasks, KNN assigns a class
to a new data point based on the majority class of its K nearest. 4. For regression tasks, KNN predicts the
value of a new data point by averaging or taking the mean of the values of its K nearest neighbors. 5. The 'K'
in KNN represents the number of nearest neighbors (data points) used to make predictions. Choosing an
appropriate K value is crucial. 6. KNN measures the similarity between data points using distance metrics such
as Euclidean distance, Manhattan distance, or other distance functions. 8. KNN doesn't involve model
training, making it computationally inexpensive during the training phase, but can be slow during prediction
for large datasets. Example: Suppose you have a dataset of different types of fruits based on their weight and
sweetness level. To classify a new fruit, KNN finds the K nearest fruits (based on weight and sweetness) from
the dataset and assigns the class (e.g., apple, orange) by majority vote among the K neighbors. If K=5 and 3
out of 5 nearest fruits are apples, the algorithm predicts the new fruit as an apple.
Q. Support Vector Machines Algorithm (SVM): Support Vector Machine or SVM is one of the most popular
supervised learning algorithms that is used for classification in addition to regression troubles. However, in
general, it's far used for classification problems in Machine Learning. The intention of the SVM set of rules is
to create the satisfactory line or selection boundary that could segregate n-dimensional space into lessons in
order that we are able to easily position the new information factor in the perfect category in the future. This
best decision boundary is known as a hyperplane. SVM chooses the extreme points/vectors that help in
developing the hyperplane These severe cases are known as help vectors and as a result a set of rules is called
a support vector machine. Consider the below diagram wherein there are exceptional categories which are
categorised the usage of a selection boundary or hyperplane. Example: Imagine a dataset with two classes
of flowers based on petal length and width. SVM finds the optimal hyperplane that best separates these
classes, aiming to maximize the margin between the flower types. It then classifies new flowers based on
their features by determining which side of the hyperplane they fall on
Q. Types of SVM :-
1.Linear Support Vector Machine (SVM): Linear Support Vector Machine (SVM) is a supervised machine
learning algorithm used for classification tasks that aim to find the best linear boundary between different
classes. 1. Linear SVM is a variant of the SVM algorithm that works for linearly separable datasets, seeking to
find a hyperplane that maximizes the margin between different classes. 2. It operates by finding the optimal
linear boundary (hyperplane) that best separates classes by maximizing the margin, which is the distance
between the hyperplane and the closest data points (support vectors). 3. The algorithm's objective is to
identify a hyperplane that minimizes misclassifications while maximizing the margin, enhancing the model's
ability to generalize to unseen data. 4. Linear SVM is effective for binary classification problems, where it
classifies data points into two distinct classes. 5. It doesn’t perform well on nonlinear or complex datasets
without transformations, as it seeks a linear decision boundary. Example: Consider a scenario where you're
classifying whether bank transactions are fraudulent or not based on transaction details. Linear SVM can be
employed to create a linear boundary between fraudulent and non-fraudulent transactions. It finds the
optimal hyperplane that maximizes the margin between the two classes, effectively separating genuine from
fraudulent transactions based on their features.
2. Non-linear Support Vector Machine: Non-linear Support Vector Machine (SVM) is an extension of the SVM
algorithm that handles datasets that are not linearly separable by mapping data to a higher dimensional
space. 1. Non-linear SVM is used when the classes in the dataset cannot be effectively separated by a linear
boundary in the original feature space. 2. It employs a technique called the "kernel trick" to map the original
features into a higherdimensional space where the data might become linearly separable. 3. Kernels are
functions (like polynomial, radial basis function - RBF, sigmoid) used to transform the input features into a
higher-dimensional space, allowing for more complex decision boundaries. 4. In the higher-dimensional
space, Non-linear SVM seeks to find a hyperplane that effectively separates the classes, even if they are not
separable in the original feature space. 5. The choice of the kernel function is crucial as it determines the
nature and complexity of the decision boundary created by the SVM. Example: Suppose you're working on
handwritten digit recognition. Non-linear SVM with an RBF kernel can be employed to classify digits that are
not linearly separable in the pixel space. The RBF kernel transforms the data into a higher-dimensional space
where complex patterns in the digits become linearly separable, enabling accurate classification of different
digits.
Q. Hyperplane in SVM: SVM is a supervised machine learning algorithm used for classification tasks. The
primary objective of SVM is to find the optimal hyperplane that best separates different classes in the feature
space. - Hyperplane: In SVM, a hyperplane is a decision boundary that divides the feature space into two
classes. For binary classification (classifying data into two categories), the hyperplane is a (d - 1)- dimensional
subspace in an d-dimensional feature space. If the data is linearly separable, SVM aims to find the hyperplane
that maximizes the margin. 2. Margin in SVM: - Margin: The margin in SVM refers to the distance between
the separating hyperplane and the closest data points (support vectors) from each class. The goal of SVM is
to maximize this margin. - Maximizing Margin: The optimal separating hyperplane is the one that maximizes
this margin. The distance between the hyperplane and the support vectors of both classes is maximized to
ensure a robust separation between the classes.
Q. Ensemble learning methods: Ensemble learning methods are techniques in machine learning that
combine the predictions of multiple individual models to produce a more accurate and robust final prediction.
The core idea behind ensemble methods is that by combining multiple models, each with its strengths and
weaknesses, it's possible to achieve better predictive performance than any single model alone.eg. bagging ,
boosting, stacking. Advantages: - Improved Accuracy: Ensembles often outperform individual models by
reducing overfitting and capturing more complex patterns in the data. - Robustness: They are less sensitive
to noise and outliers in the data due to their ability to generalize well. - Versatility: Ensemble methods can be
applied to various machine learning algorithms and tasks.
Q. Random Forest: Random Forest is a versatile and robust machine learning algorithm used for both
classification and regression tasks, known for its accuracy and resilience to overfitting. 1. Random Forest is an
ensemble learning method that operates by constructing multiple decision trees during the training phase
and combines their outputs for predictions. 2. It belongs to the bagging family of algorithms, utilizing
bootstrap sampling (randomly selecting subsets of data with replacement) and aggregating the results from
multiple decision trees. 3. Each decision tree in a Random Forest is trained independently on a random subset
of the training data and a random subset of features. 4. The randomness introduced while building each tree
helps in reducing the variance and decorrelating the trees from one another. 5. During prediction, each
individual tree in the Random Forest independently makes a prediction, and the final prediction is determined
by aggregating the results (e.g., using majority voting for classification or averaging for regression). 6. Random
Forest is effective in handling overfitting due to its ensemble nature, where the combination of multiple trees
tends to generalize better to unseen data. 7. It is robust to noise and outliers in the data, as the averaging
effect of multiple trees reduces the impact of individual noisy data points. 8. Random Forest can handle high-
dimensional data and works well with both numerical and categorical features without extensive
preprocessing. 9. The algorithm provides feature importance scores, indicating the relative importance of
each feature in making predictions. 10. Random Forest finds applications in various domains such as finance,
healthcare, image recognition, and more due to its high accuracy and ability to handle complex datasets.
Example: Suppose you aim to predict whether an email is spam or not based on various attributes like the
sender, subject, and content. Using Random Forest, multiple decision trees can be trained on different subsets
of email data, and their predictions are combined to classify new emails as spam or non-spam.
Applications of Random Forest:1. Banking: Banking zone in general uses this algorithm for the identification
of loan danger.2. Medicine: With the assistance of this set of rules, disorder traits and risks of the disorder
may be recognized. 3. Land use: We can perceive the areas of comparable land use with the aid of this
algorithm. 4. Marketing: Marketing tendencies can be recognized by the usage of this algorithm.
Q. 1. Binary Classification: Objective: In binary classification, the task involves predicting between two
mutually exclusive classes or outcomes. - Example: Determining whether an email is spam or not spam,
predicting if a patient has a disease or not, classifying whether a customer will buy a product (yes/no). -
Algorithm Usage: Algorithms like logistic regression, support vector machines, and decision trees are
commonly employed for binary classification tasks. - Output: The output is either one of the two classes,
typically represented as 0/1, True/False, or any other binary representation.
Q. 2. Multiclass Classification: 1. One-vs-One (OvO): - Approach: In OvO, multiple binary classifiers are
trained, each focusing on distinguishing between a pair of classes. - Binary Classifiers: For N classes, N(N-1)/2
classifiers are trained, each handling a unique pair of classes. - Decision Making: During prediction, each
classifier "votes" for a class, and the class with the most votes is chosen as the final prediction. - Advantages:
Works well for algorithms that scale better with smaller datasets or when training multiple classifiers is
computationally feasible. 2. One-vs-All (OvA or One-vs-Rest - OvR): - Approach: In OvA, a single classifier is
trained for each class against the rest of the classes combined. - Binary Classifiers: For N classes, N binary
classifiers are trained, each distinguishing one class from all other classes. - Decision Making: During
prediction, the classifier with the highest confidence or probability is chosen as the predicted class. -
Advantages: More efficient in scenarios where the number of classes is large, as it requires training only N
classifiers, making it computationally efficient.
Q. Key Differences Of both : - Number of Classifiers: OvO trains a classifier for every pair of classes, while
OvA trains one classifier per class. - Training Complexity: OvO can be computationally expensive for a large
number of classes due to the number of classifiers required. OvA is more scalable. - Objective: Multiclass
classification involves categorizing data points into three or more classes or categories. - Example: Identifying
the type of fruit (apple, orange, banana), classifying different breeds of dogs, recognizing handwritten digits
(0-9). - Algorithm Usage: Algorithms like decision trees, random forests, neural networks & algo’s specifically
designed for multiclass problems such as multinomial logistic regression or one-vs all classification.
- Output: The output involves assigning each data point to one of the multiple classes available.
Q. Balanced and Imbalanced Multiclass Classification Problems:
1. Balanced Multiclass Classification: Scenario: A balanced multiclass classification problem occurs when the
dataset has roughly an equal number of instances for each class. - Characteristics: Each class has a substantial
and comparable representation within the dataset. - Impact: Models trained on balanced datasets tend to
perform well across all classes since they have sufficient data to learn patterns for each class equally.
2. Imbalanced Multiclass Classification: Scenario: An imbalanced multiclass classification problem arises
when there's a significant disparity in the number of instances among different classes. - Characteristics: One
or more classes may have significantly fewer instances compared to others, resulting in a skewed distribution.
- Impact: Models trained on imbalanced datasets might exhibit bias toward the majority class(es), leading to
poor performance in predicting minority classes. It can also affect the model's ability to generalize well.
Challenges: Bias: Imbalanced datasets can cause the model to be biased toward the majority class, leading
to misclassification of minority classes. - Performance Evaluation: Accuracy may not be an adequate metric
for evaluating models in imbalanced scenarios since a high accuracy might be achieved by just predicting the
majority class. - Feature Importance: Imbalanced datasets might misrepresent the importance of features for
minority classes, affecting the overall predictive power.
Q. Micro-Average Precision and Recall: 1. Micro-average computes precision and recall across all classes. 2.
Treats all instances equally and aggregates class-wise results. 3. Calculated by summing individual true
positives, false positives, and false negatives. 4. Suitable for imbalanced datasets. 5. Reflects overall model
performance across all classes.
Q. MicroAverage F-score: 1. F-score computed using micro-averaged precision and recall. 2. Aggregates
individual class results, emphasizing overall performance. 3. Treats each instance equally, regardless of class
distribution. 4. Suitable for imbalanced datasets. 5. Represents the overall balance between precision and
recall across all classes.
Q. Accuracy: 1. Measures the overall correctness of predictions made by a model. 2. Calculated as the ratio
of correctly predicted instances to the total instances. 3. Simple to understand but may not be suitable for
imbalanced datasets. 4. Accuracy = (TP + TN) / (TP + TN + FP + FN). 5. Commonly used as an initial evaluation
metric for classification models.
Q. Precision: 1. Indicates the proportion of correctly predicted positive instances out of all predicted
positives. 2. Focuses on the relevancy of positive predictions. 3. Precision = TP / (TP + FP). 4. High precision
implies fewer false positives. 5. Useful when false positives are critical or costly.
Q. Recall: 1. Measures the proportion of correctly predicted positive instances out of all actual positives. 2.
Emphasizes the model's ability to identify all relevant instances. 3. Recall = TP / (TP + FN). 4. High recall means
fewer false negatives. 5. Valuable when false negatives are more problematic.
Q. F-score: 1. Harmonic mean of precision and recall. 2. Balances between precision and recall. 3. F-score =
2 (Precision Recall) / (Precision + Recall). 4. Provides a single metric considering both precision and recall. 5.
Useful for assessing a model's overall performance. Fl-Score or F-degree is an assessment metric for a
category described as the harmonic mean of precision and remember. It is a statistical measure of the
accuracy of a take a look at or model.
Q. Types of Clustering: Hierarchical Clustering: Forms a tree-like structure of clusters, where clusters at the
same level are similar to each other. K-Means Clustering: Divides data into K clusters based on centroids,
aiming to minimize the within-cluster variance. DBSCAN (Density-Based Spatial Clustering of Applications
with Noise): Identifies clusters based on the density of data points. Agglomerative Clustering: Starts with
individual data points and merges them into clusters based on certain criteria. Mean Shift: Shifts cluster
centroids to areas of higher point density. Distance Metrics: Clustering algorithms often rely on distance
metrics to measure the similarity or dissimilarity between data points. Evaluation Metrics : Determining the
effectiveness of a clustering algorithm can be challenging. Metrics like silhouette score, Davies-Bouldin index,
and purity are used to assess the quality of clusters.
Q. Types of cluster:Clusters, in the context of data clustering, can be broadly categorized into different types
based on their characteristics and formation. Here are some common types of clusters:
Exclusive Clusters:Each data point belongs to exactly one cluster, and there is no overlap between clusters.
This is typical in algorithms like K-Means. Overlapping or Fuzzy Clusters: Data points can belong to multiple
clusters with varying degrees of membership. Fuzzy clustering allows for a softer assignment of data points
to clusters, indicating the degree of membership. Hierarchical Clusters: Clusters are organized in a hierarchical
structure, forming a tree or dendrogram. Agglomerative hierarchical clustering is an example where clusters
are successively merged.Complete Linkage Clusters:In hierarchical clustering, complete linkage clusters have
the property that the maximum distance between any two points in the cluster is minimized. Single Linkage
Clusters: In hierarchical clustering, single linkage clusters have the property that the minimum distance
between any two points in the cluster is minimized.
Q. spectral clustering: Spectral Clustering is a powerful technique for clustering data based on the eigenvalues
of the similarity matrix. Unlike traditional clustering algorithms that operate in the original feature space,
spectral clustering transforms the data into a new space where clusters are more apparent. This algorithm is
particularly useful for identifying non-convex and complex-shaped clusters.Key Steps in Spectral
Clustering:Similarity Graph Construction:Build a similarity graph using the data points. Each node represents
a data point, and edges represent pairwise similarities between points.Affinity Matrix: Convert the similarity
graph into an affinity matrix, where each entry represents the strength of the connection between two data
points.Graph Laplacian:Compute the normalized Laplacian matrix from the affinity matrix. The Laplacian
matrix helps capture the underlying structure of the data. Eigenvalue Decomposition: Perform eigenvalue
decomposition on the Laplacian matrix to obtain its eigenvectors.
Q. K-Means Clustering: K-Means is a popular unsupervised machine learning algorithm used for partitioning
a dataset into K distinct, non-overlapping subsets (clusters). The goal of K-Means is to group similar data
points together and discover underlying patterns in the data. Here's an overview of how K-Means Algorithm
Steps : Initialization: Choose the number of clusters (K) that you want to identify in the data. Randomly
initialize K cluster centroids. These centroids represent the initial estimated positions of the cluster centers.
Assignment: For each data point in the dataset, calculate the distance to each centroid. The distance is often
measured using Euclidean distance. Assign each data point to the cluster whose centroid is the closest.
Update Centroids : Recalculate the centroid of each cluster by taking the mean of all data points assigned to
that cluster. The new centroids represent the updated estimates of the cluster centers. Convergence Check:
Check for convergence by comparing the new centroids with the previous centroids. If the centroids do not
change significantly or a predefined number of iterations is reached, the algorithm converges. If convergence
is not achieved, repeat steps 2 and 3.Repeat Assignment and Update: Iterate between the assignment and
centroid update steps until convergence. Each iteration improves the accuracy of cluster assignments &
centroid positions. Final Result: Once convergence is reached, the algorithm has identified K clusters, and
each data point is assigned to a specific cluster.
Q. Hierarchical Clustering:Hierarchical Clustering is a clustering algorithm that organizes data points into a
tree-like structure or hierarchy. It builds a hierarchy of clusters by successively merging or splitting existing
clusters based on the similarity or dissimilarity between data points. There are two main types of hierarchical
clustering: agglomerative and divisive. Agglomerative Hierarchical Clustering: Agglomerative hierarchical
clustering starts with individual data points as separate clusters and successively merges them into larger
clusters. The process continues until all data points belong to a single cluster or until a stopping criterion is
met. Here are the Algo steps: Initialization: Treat each data point as a singleton cluster. Pairwise Similarity:
Calculate the similarity or dissimilarity between all pairs of clusters. Common metrics include Euclidean
distance, Manhattan distance, or other distance measures. Merge: Merge the two most similar clusters into
a new, larger cluster. The similarity between clusters is determined by the chosen distance metric.Update
Similarity Matrix: Recalculate the pairwise similarities or dissimilarities between the new cluster and the
remaining clusters. Repeat: Repeat steps 3 and 4 until only a single cluster remains or until a stopping criterion
is met. Dendrogram: Represent the hierarchy of clusters using a dendrogram, a tree-like diagram that
illustrates the sequence of merges divisive method: The divisive hierarchical clustering method starts with all
data points in a single cluster and then recursively divides the clusters into smaller ones until each data point
forms its own singleton cluster. This process creates a tree-like structure known as a dendrogram, which
illustrates the order and sequence of cluster splits.
Q. Density-Based Clustering is a type of clustering algorithm that identifies dense regions of data points and
separates them from sparser regions. Unlike centroid-based algorithms (such as K-Means) or hierarchical
algorithms, density-based methods don't require specifying the number of clusters beforehand. One popular
density-based clustering algorithm is Density-Based Spatial Clustering of Applications with Noise (DBSCAN).
Here's an overview:DBSCAN (Density-Based Spatial Clustering of Applications with Noise):DBSCAN is a density-
based clustering algorithm that divides the dataset into clusters based on the density of data points. It has the
ability to discover clusters of arbitrary shapes and is robust to noise. Here are the main concepts of
DBSCAN:Core Points:A data point is a core point if it has at least a specified number of data points (MinPts)
within a given radius (Epsilon, ε) around it. Border Points:A data point is a border point if it is within ε distance
of a core point but itself does not have enough neighbors to be considered a core point.Noise Points:A data
point that is neither a core point nor a border point is considered a noise point.Algorithm Steps:1.
Initialization:Select an arbitrary data point that has not been visited.2. Expand Cluster:If the selected point is
a core point, a new cluster is created and expanded by adding all reachable points within ε distance to the
cluster. This process continues until no more points can be added 3. Repeat: Steps a and b are repeated until
all points have been visited. 4. Assign Border Points: Border points are assigned to the cluster of their
corresponding core point. 5. Noise Points: Remaining unvisited points are considered noise points.
Advantages: Can find clusters of arbitrary shapes. Robust to noise and outliers.Does not require specifying the
number of clusters in advance.
Q. K-Medoids Clustering: K-Medoids is a variation of the K-Means clustering algorithm that addresses some
of its limitations, particularly its sensitivity to outliers. In K-Medoids, instead of using the mean (centroid) of
data points within a cluster, it uses the actual data point that is most centrally located as the representative
or medoid of the cluster. This makes K-Medoids more robust to outliers and noise in the data. Here's an
overview of the K-Medoids algorithm: Algorithm Steps: Initialization:Choose the number of clusters (K) that
you want to identify in the data.Randomly initialize K data points as the initial medoids.Assignment:For each
data point in the dataset, calculate the dissimilarity or distance to each medoid. Common distance metrics
include Euclidean distance or other dissimilarity measures.Assign each data point to the cluster with the
closest medoid.Update Medoids:For each cluster, select the data point that minimizes the total dissimilarity
to other points in the same cluster. This data point becomes the new medoid for that cluster.The updated
medoids represent the new central points of the clusters.Convergence Check:Check for convergence by
comparing the new medoids with the previous medoids. If the medoids do not change significantly or a
predefined number of iterations is reached, the algorithm converges.If convergence is not achieved, repeat
steps 2 and 3.Repeat Assignment and Update:Iterate between the assignment and medoid update steps until
convergence. Final Result:Once convergence is reached, the algorithm has identified K clusters, and each data
point is assigned to a specific cluster.The final medoids represent the central points of the identified clusters.
Q. Outliers Analysis: Outliers are unusual data points significantly different from the majority, impacting
statistical analyses and models. Detection methods include z-scores, IQR, and machine learning algorithms
like Isolation Forest. Outliers can result from errors, extreme events, or natural variations. Treatment varies
based on context, and understanding domain-specific considerations is crucial. Visualization tools like box
plots and scatter plots aid in outlier identification. Continuous monitoring is important for dynamic datasets.
Outliers may carry valuable insights and should be analyzed thoughtfully. Types of Outliers: Global Outliers:
Affect the entire dataset and are noticeable across the entire data range. Contextual Outliers: Depend on
specific subsets or contexts within the data, not necessarily apparent in the overall dataset. Spatial Outliers:
Relate to geographical data, where certain locations exhibit unusual characteristics compared to the rest.
Q. The Local Outlier Factor: The Local Outlier Factor (LOF) is an unsupervised anomaly detection algorithm
used to identify local anomalies or outliers within a dataset. It measures the local density deviation of a data
point with respect to its neighbors. LOF assigns an anomaly score to each data point, where higher scores
indicate a higher likelihood of being an outlier.Key Concepts: Local Density:LOF calculates the local density of
a data point based on the density of its neighbors. A point in a sparse region will have a lower local density
compared to points in denser regions.Reachability Distance:The reachability distance of a point measures the
distance at which a point can be reached from its neighbors while considering the local density. It is used to
identify regions of varying density.Local Outlier Factor (LOF):LOF is computed by comparing the local density
of a data point with the local densities of its neighbors. A point with a significantly lower density than its
neighbors is likely to have a higher LOF and is considered an outlier.
Q. Evaluation Metrics and Scores : 1. Homogenity: Homogeneity, in the context of clustering evaluation
metrics, refers to the similarity of data points within the same cluster. It is a measure that assesses how well-
defined and internally consistent the clusters are. The homogeneity score ranges from 0 to 1, with 1 indicating
perfect homogeneity. Usage: Homogeneity is often used in conjunction with other clustering metrics like
completeness and V-measure to provide a comprehensive evaluation of clustering performance.. 2.
completeness: Completeness, in the context of clustering evaluation metrics, assesses whether all data points
that belong to the same true class are assigned to the same cluster. It measures the extent to which all
members of a given class are grouped together in a single cluster. Like homogeneity, completeness is a
measure that ranges from 0 to 1, with 1 indicating perfect completeness. Usage : Completeness is often used
in conjunction with other clustering metrics like homogeneity and V-measure to provide a comprehensive
evaluation of clustering performance. 3. Adjusted Rand Index:The Adjusted Rand Index (ARI) is a clustering
evaluation metric that measures the similarity between the true class assignments and the predicted cluster
assignments, adjusted for chance. It considers all pairs of samples and counts the number of pairs that are
either in the same cluster and the same true class or in different clusters and different true classes. Usage:ARI
is commonly used when the ground truth is available and is particularly useful for evaluating clustering
algorithms that aim to reproduce the true class labels. 4. Silhouette: The Silhouette Score is a clustering
evaluation metric that measures how similar an object is to its own cluster (cohesion) compared to other
clusters. It quantifies the compactness and separation of clusters, providing an indication of how well-defined
and distinct the clusters are. The Silhouette Score ranges from -1 to 1, where a high value indicates well-
separated clusters, a score around zero indicates overlapping clusters, and negative values suggest that data
points might have been assigned to the wrong cluster.
Q. Extrinsic and intrinsic method: Extrinsic and intrinsic evaluation methods are two broad categories used
to assess the performance of machine learning models or algorithms. 1. Extrinsic Evaluation : Definition:
Extrinsic evaluation assesses the performance of a model based on its contribution to solving a specific real-
world task or application. Examples: In a document classification task, extrinsic evaluation would involve
measuring the impact of the model on improving document organization or search. In a speech recognition
system, extrinsic evaluation would assess the effectiveness of the system in facilitating accurate transcription
for users. Advantages: Directly measures the model's utility in the intended application. Aligns with the end
goals and objectives of the task. Disadvantages: May be task-specific, limiting the generalizability of the
evaluation. Requires access to real-world scenarios for evaluation, which may not always be feasible.
2. Intrinsic Evaluation : Definition: Intrinsic evaluation focuses on assessing specific aspects or components
of a model in isolation, often without direct consideration of the model's impact on a larger task or application.
Examples: In language models, intrinsic evaluation might involve assessing the model's performance on tasks
like language modeling perplexity In image processing, intrinsic evaluation could involve measuring the
accuracy of object detection or segmentation. Advantages: Provides insights into the strengths and
weaknesses of individual model components. Can be more easily controlled and conducted in a controlled
environment. Disadvantages: May not directly correlate with the model's performance in real-world
applications. Might not capture the holistic effectiveness of the model.
Choosing b/w extrinsic & intrinsic evaluation methods depends on the specific goals of the evaluation: If the
primary concern is real-world impact and utility: Prioritize extrinsic evaluation to measure the model's
performance in the context of the intended application. In many cases, a combination of both extrinsic &
intrinsic evaluation methods is employed to provide a comprehensive understanding of a model's capabilities
& limitations. Q. Elbow Method:The Elbow Method is a technique used to determine the
optimal number of clusters (K) in a dataset for clustering algorithms, such as K-Means. It involves plotting the
explained variation as a function of the number of clusters and looking for an "elbow" point where the rate of
improvement sharply changes. The idea is to find a balance between increasing the number of clusters and
avoiding overfitting. Here are the steps involved in the Elbow Method: Run K-Means for Different Values of K:
Apply the K-Means algorithm to the dataset for a range of values of K. For each K, compute the sum of squared
distances (inertia) of samples to their assigned centers. Plot the Elbow Curve: Create a line plot or a scree plot
with the number of clusters (K) on the x-axis and the corresponding inertia values on the y-axis. The elbow
point is considered the optimal number of clusters. S elect the Optimal K: Choose the value of K at the elbow
point as the optimal number of clusters for your dataset.
UNIT : 6
Q. Perceptron Model: Pattern classification into two or more categories can be done using perce : Pro,
networks. The perceptron learning rule is used fo train the perceptron. py, N moving on to the broad
multiclass classification, we will first think aboy, classification into two categories. * All that is required for
classification into just two categories 1s a single Outpy neuron. Here, bipolar neurons will be used. The
simplest architecture that coulg do the task is one input layer, one output layer with neurons, and no hidden
layers Algorithm Of Pereceptron : Step 1: Initialize the weights and bias in step 1. Set the weights and bias to
0 for simplicity. Set the learning rate to be between 0 and 1. Stop 2: Perform steps 2 - 6 when the stopping
condition is false. Step 3 : Set the input units’ activations to x; = aj. Step 4: Calculate the summing portion
value in step five. Net = Yaiw i—6 Step 5 : Determine the output unit's response based on the activation
functions. step 6 : Update weights and bias if 2 mistake was made for this motif in step 6 {fy is not equal to t)
Q. Single Layer Neural Network / single layer perceptron : 1. A Single Layer Neural Network, also known as
a Single-Layer Perceptron, consists of input nodes directly connected to an output layer without any hidden
layers. 2. It's the simplest form of a neural network architecture, with inputs passing directly to the output
layer through weighted connections. 3. Each input node represents a feature, and these nodes are fully
connected to the output layer. 4. The network computes a weighted sum of the inputs (using weights and
biases) and passes it through an activation function. 5. An activation function (e.g., step function, sigmoid, or
ReLU) introduces nonlinearity, allowing the network to learn and model complex relationships.
Example: Imagine a Single Layer Neural Network used to classify emails as spam or not spam based on two
features: word count and sender's domain. The network takes these features as inputs, assigns weights to
them, computes a weighted sum, and passes it through an activation function (e.g., sigmoid). If the output is
above a certain threshold, the email is classified as spam; otherwise, it's categorized as not spam. However,
Single Layer Neural Networks may struggle with more complex email classification tasks involving nuanced
patterns beyond linear separation.
Q. Multilayer neural network / Multilayer Perceptron: Input Layer:Neurons in the input layer represent the
features of the input data.Each neuron corresponds to a feature, and the input layer's size is determined by
the number of features.Hidden Layers: MLPs have one or more hidden layers situated between the input and
output layers.Neurons in each hidden layer apply weighted sums of inputs and pass the result through an
activation function.The number of hidden layers and the number of neurons in each layer are configurable
parameters.Weights and Biases:Connections between neurons in different layers have associated
weights.Each neuron has a bias term that contributes to the overall input before applying the activation
function.During training, weights and biases are adjusted to minimize the difference between the predicted
output and the true target values. Steps :- Algorithm Initialization: Initialize the weights and biases for each
connection in the network. Random small values are commonly used.Define Parameters:Set the learning rate
(α), a small positive constant that determines the step size during weight updates. Choose an activation
function for the hidden layers. Select the activation function in the output layer based on the. Forward
Propagation: For each training example, do the following: Input the features into the input layer neurons.
Compute the weighted sum of inputs for each neuron in the hidden layers and apply the chosen activation
function. Propagate the outputs through the network to compute the final predictions in the output layer.
Q. Activation Functions: 1. Activation functions are mathematical functions applied to the output of neurons
in artificial neural networks. 2. They introduce nonlinearity to the network, allowing it to learn and model
complex patterns in data. 3. Common activation functions include: - Sigmoid: S-shaped curve squashing input
values between 0 and 1. - ReLU (Rectified Linear Unit): Returns 0 for negative inputs and the input value for
positive inputs. - Tanh (Hyperbolic Tangent): S-shaped curve squashing input values between -1 and 1, similar
to sigmoid but centered at 0. - Softmax: Used in the output layer for multiclass classification, converting raw
scores into probabilities. - Leaky ReLU: Similar to ReLU but allows a small gradient for negative inputs to
mitigate dead neurons. 4. The choice of activation function impacts the network's learning speed,
convergence, and ability to capture complex relationships. 5. Sigmoid and Tanh functions, while effective,
may suffer from the vanishing gradient problem, slowing down learning in deep networks..
Q. Backpropagation is a supervised learning algorithm widely employed to train artificial neural networks,
including Multilayer Perceptrons (MLPs). It involves an iterative process where a set of training data is input
into the network, and the predicted output is compared with the actual target values using a chosen loss
function. The algorithm then computes the gradient of the loss with respect to the weights and biases through
the chain rule of calculus. The calculated gradients are propagated backward through the network, allowing
for the systematic adjustment of weights and biases to minimize the overall error. This iterative forward and
backward propagation process is repeated for multiple epochs or until the network converges to an optimal
state. Backpropagation plays a pivotal role in training deep networks, enabling them to learn complex
representations and patterns from data. Various enhancements, such as stochastic gradient descent and
adaptive learning rate methods, contribute to the efficiency of the backpropagation algorithm in optimizing
neural network parameters.
Backpropagation Algorithm: The backpropagation algorithm is a supervised learning technique used to train
artificial neural networks, particularly Multilayer Perceptrons (MLPs). It involves an iterative process to
minimize the difference between the predicted and actual outputs by adjusting the network's weights and
biases. The algorithm starts with the forward propagation of input data through the network, calculating the
predicted output and measuring the error using a specified loss function. The error is then propagated
backward through the network, and the gradients of the loss with respect to each weight and bias are
computed using the chain rule of calculus. The weights and biases are updated in the opposite direction of
the gradient, facilitating the reduction of the error during subsequent iterations. This iterative forward and
backward process is repeated for multiple epochs until the network learns to make accurate predictions.
Variants of gradient descent, such as stochastic gradient descent, are often employed to optimize the
convergence and efficiency of the backpropagation algorithm.
Q. Radial Basis Function Network (RBFN): 1. Radial Basis Function Network (RBFN) is a type of artificial
neural network primarily used for function approximation and pattern recognition. 2. RBFN consists of three
layers: an input layer, a hidden layer with radial basis function neurons, and an output layer. 3. The network
uses radial basis functions, typically Gaussian, as activation functions in its hidden layer. 4. Each neuron in the
hidden layer computes its output based on the similarity (measured by distance) between the input data and
a center point associated with the neuron. 5. Neurons closer to the input pattern have higher activation, while
those farther away have lower activation due to the Gaussian activation function Example: Imagine using a
Radial Basis Function Network for stock price prediction. RBFN could learn patterns from historical stock data
by modeling non-linear relationships between various factors affecting stock prices. The network employs
radial basis functions to transform input features such as previous prices, trading volume, and market
sentiment into a higher-dimensional space, enabling it to make predictions based on learned patterns in the
data. RBFN's ability to capture local variations and non-linearities in the data contributes to its usefulness in
such prediction tasks.
Q. Functional Link Artificial Neural Network (FLANN): 1. Functional Link Artificial Neural Network (FLANN)
is an extension of the traditional Artificial Neural Network (ANN) architecture. 2. FLANN incorporates
additional nonlinear transformations of input data using functional expansion. 3. Instead of solely relying on
the direct inputs, FLANN employs nonlinear functions (like polynomial or trigonometric functions) applied to
the input data. 4. These nonlinear transformations generate new features or inputs, which are then used to
train the network. 5. FLANN extends the input space, allowing it to learn more complex patterns and
relationships present in the data. Example: Consider a problem of predicting the energy output of a solar
panel based on weather conditions. A Functional Link Artificial Neural Network (FLANN) could be used to
model the relationship between various weather parameters (such as temperature, humidity, sunlight
intensity) and the energy output. FLANN, by applying nonlinear transformations to the input weather data,
can capture complex interactions among these parameters, allowing for more accurate predictions compared
to traditional linear models or simple ANN architectures.
Q. Neural Networks: A neural network is a computational model inspired by the way biological neural
networks in the human brain work. It is used in machine learning and artificial intelligence to perform tasks
such as pattern recognition, classification, regression, and more. Neural networks consist of interconnected
nodes, also known as neurons or units, organized into layers. This layer receives input data and passes it to
the next layer. Each node in the input layer represents a feature or attribute of the input data.Hidden Layers:
These are intermediate layers between the input and output layers. Each node in a hidden layer applies a
weighted sum of its inputs and passes the result through an activation function. The number of hidden layers
and the number of nodes in each layer are parameters that can be adjusted based on the complexity of the
problem.
Q. Artificial Neural Networks (ANN): Artificial Neural Networks (ANN) are a class of machine learning models
inspired by the structure and functioning of the human brain. 2. ANNs consist of interconnected nodes
organized in layers: an input layer, one or more hidden layers, and an output layer. 3. Each node (neuron) in
a layer receives input, processes it using weights and biases, and generates an output that becomes the input
for the next layer 4. The connections between neurons have associated weights that are adjusted during the
learning process to improve the network's performance. 5. Activation functions (like sigmoid, ReLU, tanh)
introduce non-linearities in neuron outputs, allowing the network to learn complex patterns Example:
Consider an ANN designed for image classification. The network receives pixel values of an image as input
and learns to classify the image into various classes (e.g., cat, dog, car) based on features it extracts during
training. Through iterations, the network adjusts its weights to improve accuracy, ultimately becoming
proficient in recognizing and classifying unseen images. Working Of ANN: An Artificial Neural Network (ANN)
consists of interconnected layers of nodes, with an input layer receiving raw data and passing it through
hidden layers, each applying weighted connections and activation functions to transform the input. The
network's architecture, including the number of layers and neurons, is a key design aspect. The output layer
produces final predictions or values. During training, the model adjusts weights and biases to minimize a loss
function, measuring the difference between predictions and true targets. This iterative learning process
employs optimization algorithms like gradient descent. The choice of activation functions introduces non-
linearity, enabling the network to learn complex patterns. Application Of ANN: Image and Pattern
Recognition: ANNs are extensively used in image classification, object detection, and facial recognition.
Convolutional Neural Networks (CNNs), a specialized type of ANN, excel in these tasks. Natural Language
Processing (NLP):ANNs are employed in tasks such as sentiment analysis, language translation, and chatbot
development. Recurrent Neural Networks (RNNs) and Long Short-Term Memory (LSTM) networks are
commonly used for sequential data processing. Speech Recognition: ANNs play a crucial role in converting
spoken language into text, powering applications like voice assistants, transcription services, and voice-
controlled devices.
Q. Convolutional Neural Networks (CNNs): 1. Convolutional Neural Networks (CNNs) are specialized deep
learning models designed for processing and analyzing visual data like images and videos. 2. CNNs are
composed of three main layers: convolutional layers, pooling layers, and fully connected layers. 3.
Convolutional Layers: - These layers perform feature extraction by applying convolution operations to input
images using learnable filters or kernels. - Filters slide across the input, extracting local features like edges,
textures, and patterns. - Each filter learns to detect specific features, and multiple filters generate feature
maps capturing different aspects of the input. Working Of CNN: Convolutional Neural Networks (CNNs)
operate by systematically applying convolutional layers to input images, using filters to detect local features
like edges and textures. Activation functions, typically ReLU, introduce non-linearity, and pooling layers
downsample the feature maps, capturing salient information. Flattening converts the feature maps into a
one-dimensional vector, which is then processed by fully connected layers to learn global patterns. The final
output layer, often using softmax activation for classification, produces predictions. During training, CNNs
iteratively adjust their parameters through backpropagation to minimize the difference between predicted
and actual outputs. Renowned for their success in computer vision tasks, CNNs can leverage pre-trained
models via transfer learning to enhance performance on specific tasks with smaller datasets.
Q. Recurrent Neural Networks (RNNs) are a type of artificial neural network designed to process sequential
data by maintaining a memory of previous inputs. 2. Unlike feedforward neural networks, RNNs have
connections that form directed cycles, allowing them to exhibit temporal dynamic behavior. 3. RNNs process
input sequences of variable lengths, making them suitable for tasks involving sequences like time series, text,
speech, and video data. 4. Each neuron in an RNN receives an input and a hidden state from the previous
time step, allowing the network to retain information over time. 5. The hidden state serves as a memory that
encodes information about previous inputs, influencing the network's predictions at the current step. 6. RNNs
use the same set of weights across all time steps, enabling them to share learned features across different
parts of the sequence. Example: Consider a language model using an RNN that predicts the next word in a
sentence based on the preceding words. Given a sequence "The cat sat on the," the RNN processes this input
sequentially, leveraging its memory to predict the next word (e.g., "mat"). The RNN's ability to retain
information about preceding words allows it to generate contextually relevant predictions, making it useful
for tasks like language generation, text completion, or speech generation.