0% found this document useful (0 votes)
12 views

Computer 1st to 3rd unit

machine learning

Uploaded by

majidmir410
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
12 views

Computer 1st to 3rd unit

machine learning

Uploaded by

majidmir410
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 22

Supervised and Unsupervised Learning

8 Marks (Long Answer):


Compare and contrast Supervised and Unsupervised learning with suitable examples.
Explain their respective algorithms and use cases in real-world applications.

Answer:
Supervised and unsupervised learning are two primary types of machine learning
techniques. Both aim to find patterns in data but differ in the nature of the data they work
with and their goals.

• Supervised Learning:
o Definition: In supervised learning, the algorithm is trained using a labeled
dataset. The dataset contains input-output pairs, where the input is the
feature data and the output is the target variable (label).
o Goal: The aim is to learn a mapping function from inputs to outputs so that
the model can predict the output for unseen inputs.
o Examples of Algorithms:
1. Linear Regression: Predicts a continuous value.
2. Decision Trees: A tree-like model of decisions that can be
used for classification or regression.
3. Support Vector Machines (SVM): Separates classes by
finding the optimal hyperplane.
o Use Cases: Spam email detection (binary classification), predicting house
prices (regression), image recognition (multiclass classification).
o Process: The model is trained on a dataset where the correct answers are
provided. After training, the model can predict outcomes for new, unseen
data.
• Unsupervised Learning:
o Definition: In unsupervised learning, the dataset does not contain any
labeled outputs. The algorithm tries to learn the structure of the data by
identifying patterns, relationships, or clusters.
o Goal: The objective is to find hidden structures or groupings in the data.
o Examples of Algorithms:
4. K-Means Clustering: Divides data into k clusters based on
similarity.
5. Principal Component Analysis (PCA): Reduces the
dimensionality of the data while preserving most of the variance.
6. Hierarchical Clustering: Builds a hierarchy of clusters either
by merging smaller clusters or splitting a larger cluster.
o Use Cases: Market segmentation, anomaly detection, and gene sequence
analysis.
o Process: Since the data is unlabeled, the model tries to learn inherent
patterns in the data, such as clusters, without guidance.
• Comparison:
b. Nature of Data: Supervised learning works with labeled data, while
unsupervised learning works with unlabeled data.
c. Goal: The goal of supervised learning is prediction (classification or
regression), while unsupervised learning aims to find hidden structures
(clustering or association).
d. Performance Measurement: In supervised learning, performance is
measured using accuracy, precision, recall, etc., whereas in unsupervised
learning, metrics such as cluster purity or silhouette score are used.

5 Marks (Short Answer):


What are the differences between Supervised and Unsupervised Learning? Explain
with examples.

Answer:

• Supervised Learning:
o In supervised learning, the model is trained on a dataset containing both
input data and corresponding labeled outputs.
o Example: Predicting house prices based on features like size, location, and
number of rooms (using linear regression).
o Goal: To predict a label (or value) for new, unseen data.
• Unsupervised Learning:
o In unsupervised learning, the model works with data that does not have
labeled outputs. The algorithm tries to find hidden patterns or clusters in the
data.
o Example: Grouping customers based on their purchasing behavior (using K-
Means clustering).
o Goal: To uncover hidden structures, such as groups or associations, in the
data.
• Key Differences:
e. Data: Supervised learning uses labeled data; unsupervised learning uses
unlabeled data.
f. Outcome: Supervised learning predicts an output (classification or
regression), while unsupervised learning finds patterns (clustering or
association).
g. Examples: Supervised learning examples include email spam detection,
while unsupervised learning examples include customer segmentation.

3 Marks (Very Short Answer):


Define supervised learning and give an example.

Answer:
Supervised learning is a type of machine learning where the model is trained on a labeled
dataset, meaning the inputs are paired with corresponding outputs. The goal is to predict
the output for new data based on this training.

Example: Predicting house prices based on features like the size of the house, the number
of rooms, and the location.

Getting and Cleaning Data

8 Marks (Long Answer):


Explain the process of getting and cleaning data from various sources like web APIs
and databases. Discuss the importance of data cleaning in machine learning.

Answer:
The success of a machine learning model heavily depends on the quality of data.
Therefore, the process of obtaining and cleaning data is crucial.

• Getting Data:
o Data can be obtained from several sources, including:
1. Web APIs: Application Programming Interfaces provide a way
to interact with web services to request and receive data. Examples
include the Twitter API, which allows fetching tweets, or the Google
Maps API for geolocation data.
2. Databases: Structured data is stored in relational databases
(e.g., MySQL, PostgreSQL). SQL queries are used to fetch data from
these databases.
3. Web Scraping: When data is not provided via an API, web
scraping tools like BeautifulSoup or Scrapy are used to extract data
from HTML pages.
4. Flat Files: Data might be available in formats like CSV, Excel,
or JSON, which can be imported using libraries like pandas in Python.
• Cleaning Data:
o Once the data is obtained, it often contains inconsistencies, missing values,
or errors. Data cleaning ensures that the data is reliable and suitable for
modeling.
o Steps in Data Cleaning:
5. Handling Missing Data: Data often has missing entries.
Techniques include:
• Deleting rows/columns with missing values.
• Imputing missing values with the mean, median, or mode.
6. Removing Duplicates: Duplicate entries can distort results,
so identifying and removing duplicates is essential.
7. Outlier Detection: Extreme values, or outliers, can skew the
analysis. Techniques such as Z-scores or IQR (Interquartile Range)
are used to detect outliers.
8. Data Type Conversion: Converting data types to ensure
consistency (e.g., converting text strings to numerical categories).
9. Normalization/Scaling: Ensuring that features are on a similar
scale is important for many algorithms (e.g., scaling features between
0 and 1).
• Importance of Data Cleaning:
o Improves Model Accuracy: Clean data allows for more accurate and
meaningful model training.
o Avoids Bias: Removing errors and inconsistencies prevents the model from
learning biased patterns.
o Ensures Consistency: Cleaning helps to standardize the data, ensuring
consistency in the inputs provided to the model.
Without cleaning, the model can produce inaccurate results, overfit noisy data, or
completely fail to train effectively.

5 Marks (Short Answer):


What are the common techniques used for cleaning data? Explain the challenges
faced during the data cleaning process.

Answer:

• Techniques for Cleaning Data:


h. Handling Missing Data: Missing data can be filled using methods like mean,
median, mode, or more complex algorithms like k-nearest neighbors.
i. Removing Duplicates: Duplicate data entries can skew results, so they
must be identified and removed.
j. Outlier Detection: Outliers are detected and either removed or capped to
avoid distortion in analysis.
k. Data Type Conversion: Data must be converted to appropriate types (e.g.,
categorical to numerical, string to date format).
l. Normalization: Ensuring that features are on the same scale, especially
important for algorithms like SVM or K-means.
• Challenges:
m. Handling a Large Volume of Data: In big data scenarios, processing speed
and memory limitations can be a challenge.
n. Dealing with Inconsistent Formats: Data collected from multiple sources
may have inconsistencies in structure, format, or type.
o. Handling Missing Data: Determining the appropriate technique for handling
missing data is often difficult. Simple deletion can lead to loss of important
information, while imputation can introduce bias.
p. Identifying and Removing Outliers: Outliers can be difficult to identify, and
removing them may lead to loss of significant data.

3 Marks (Very Short Answer):


What is data cleaning? Why is it necessary?

Answer:
Data cleaning is the process of correcting or removing inaccurate, incomplete, or
irrelevant data from a dataset. It ensures that the data is accurate, consistent, and suitable
for analysis.

It is necessary because raw data often contains errors, missing values, and
inconsistencies, which can negatively impact the performance of machine learning
models.

Data Preprocessing

8 Marks (Long Answer)


Discuss the major steps involved in data preprocessing, including descriptive
summarization, data reduction, and data discretization. Provide examples for each
step.

Answer:
Data preprocessing is a crucial step in the machine learning pipeline, transforming raw
data into a suitable format before feeding it to algorithms. The process includes several
steps, such as descriptive summarization, data reduction, data discretization, and more.

2. Descriptive Summarization:
o Definition: Descriptive summarization involves generating summary
statistics about the data, such as mean, median, mode, variance, and
standard deviation, which help understand the distribution and spread of the
data.
o Example: For a dataset of house prices, descriptive statistics might show
the average price, the most common price range, and the variation in prices.
o Use: These summaries give a quick overview of data properties and help
identify any obvious issues such as skewness, outliers, or missing values.
3. Data Reduction:
o Definition: Data reduction techniques reduce the size of the dataset without
losing important information. This is essential for improving computational
efficiency and reducing overfitting in models.
o Methods:
▪ Feature Selection: Selecting a subset of relevant features from the
original dataset (e.g., using correlation analysis, or forward selection).
▪ Principal Component Analysis (PCA): A dimensionality reduction
technique that transforms the original features into a smaller set of
uncorrelated variables, called principal components.
o Example: Reducing a dataset from 100 features to 20 important features
using PCA.
o Use: Data reduction helps to make the model simpler, faster to train, and
often more accurate by focusing on the most relevant information.
4. Data Discretization:
o Definition: Data discretization is the process of converting continuous data
into discrete buckets or intervals. This is often used to simplify models or to
prepare continuous variables for certain algorithms that require categorical
inputs.
o Methods:
▪ Binning: Grouping continuous values into bins (e.g., age ranges like 0-
10, 11-20, etc.).
▪ Clustering: Grouping values based on similarity, often using
algorithms like K-means.
o Example: Converting a continuous variable like income (which can take any
value) into categories such as "Low Income," "Medium Income," and "High
Income."
o Use: Discretization helps in reducing the complexity of the model and is
particularly useful for decision trees and rule-based models.

Together, these preprocessing steps ensure the data is clean, consistent, and in the right
format, improving the efficiency and accuracy of machine learning models.

5 Marks (Short Answer)


What is data discretization and why is it important in machine learning? Provide an
example.

Answer:
Data Discretization is the process of converting continuous data into discrete categories
or intervals. This technique is important because certain machine learning algorithms,
such as decision trees or Naive Bayes, perform better with categorical data.

Importance:

• It helps reduce the complexity of the model by transforming continuous variables


into a more interpretable form.
• It is often used when we want to simplify the model and make it easier to
understand.
• Discretization can improve the performance of machine learning algorithms that
work better with categorical inputs.

Example:
Suppose we have a continuous variable like "age" in a dataset. Instead of keeping "age" as
a continuous value, we can create bins, such as:

• 0-18: Child
• 19-35: Young Adult
• 36-60: Adult
• 60+: Senior

3 Marks (Very Short Answer)


What is data preprocessing? Name two techniques used for data preprocessing.

Answer:
Data preprocessing is the process of transforming raw data into a format that is clean,
structured, and ready for machine learning algorithms. It involves handling missing data,
encoding categorical variables, scaling, and more.

Two Techniques:

5. Normalization: Scaling features to ensure they are on a similar scale.


6. Data Cleaning: Removing or imputing missing data.

UNIT II

Classification

8 Marks (Long Answer)


Explain the process of classification in machine learning. Discuss decision tree
induction and Bayesian classification, along with their advantages and disadvantages.

Answer:
Classification is a supervised learning task where the goal is to predict the category or
class label of a new observation based on training data.
7. Process of Classification:
o A dataset with labeled examples is used to train the model.
o Each instance in the dataset is represented by features and a class label.
o The algorithm learns a mapping between the features and the class label.
o Once trained, the model can predict the class label for new, unseen data
points.
8. Decision Tree Induction:
o Definition: A decision tree is a tree-like structure where each internal node
represents a decision based on a feature, and each leaf node represents a
class label.
o Process: The decision tree is built by recursively splitting the dataset based
on the most important feature, using measures like Gini impurity or
information gain.
o Advantages:
▪ Easy to interpret and visualize.
▪ Can handle both categorical and continuous data.
o Disadvantages:
▪ Prone to overfitting, especially with deep trees.
▪ Sensitive to noise in the data.
9. Bayesian Classification:
o Definition: Bayesian classification is based on Bayes’ Theorem, which
calculates the posterior probability of a class given some features.
o Process: The algorithm calculates the probability of each class given the
input features and assigns the class with the highest probability to the new
instance.
o Advantages:
▪ Works well with small datasets.
▪ Robust to noise and irrelevant features.
o Disadvantages:
▪ Assumes independence between features, which may not always
hold true in real-world datasets.

Classification is widely used in spam detection, medical diagnosis, and image recognition,
among many other applications.
5 Marks (Short Answer)
What is rule-based classification? Explain with an example how rule-based systems
classify data.

Answer:
Rule-based classification is a machine learning technique that uses a set of if-then rules
for classifying data. Each rule consists of an antecedent (if-part) and a consequent (then-
part). The model classifies an instance by checking which rules apply to the input data.

• Process:
o The system learns a set of rules from the training data.
o When new data is presented, the rules are evaluated to determine which
rule(s) best classify the new instance.
o The instance is assigned the class label specified in the rule's consequent.
• Example:
A rule-based classifier for determining if an email is spam might have the following
rules:
o Rule 1: IF the subject contains the word "lottery" AND the sender is
unknown, THEN classify as "Spam."
o Rule 2: IF the sender is a known contact AND the subject contains no
suspicious keywords, THEN classify as "Not Spam."

The email is classified based on which rule is triggered by its content.

3 Marks (Very Short Answer)


Define classification. Name any two classification algorithms.

Answer:
Classification is a supervised machine learning task where the goal is to predict the
categorical class label of a given data point based on its features.

Two Classification Algorithms:

10. Decision Trees.


11. Naive Bayes.
Linear Regression

8 Marks (Long Answer)


Explain linear regression in detail, including its assumptions, the process of fitting a
model, and how it can be used for prediction. Discuss the difference between simple
and multiple linear regression.

Answer:
Linear regression is a supervised learning algorithm used for predicting a continuous
output variable based on one or more input features. The goal is to model the relationship
between the independent variables (features) and the dependent variable (target) by fitting
a linear equation to the observed data.

12. Simple Linear Regression:


o Definition: Simple linear regression models the relationship between one
independent variable (feature) and one dependent variable (target).
o Equation:
y = β0 +β1x + ϵ
where y is the predicted value, β0 is the intercept, β1 is the slope of the line,
x is the independent variable, and ϵ is the error term.
o Process:
▪ Collect data with input-output pairs.
▪ Estimate the parameters β0 and β1 by minimizing the residual sum
of squares (RSS), typically using the least squares method.
▪ The model is then used to predict the output yyy for new input values
of xxx.
o Assumptions:
1. Linearity: The relationship between the independent variable
and the dependent variable is linear.
2. Independence: The residuals (errors) are independent of each
other.
3. Homoscedasticity: The residuals have constant variance.
4. Normality: The residuals are normally distributed.
13. Multiple Linear Regression:
o Definition: Multiple linear regression models the relationship between two
or more independent variables and a dependent variable.
o Equation:
y = β0 + β1x1 + β2x2 +⋯+ βnxn
where x1,x2,…,xn are the independent variables.
o Process: The procedure is similar to simple linear regression but involves
more than one feature. The algorithm fits a plane (or hyperplane) in a higher-
dimensional space.
o Assumptions: Same as those for simple linear regression, but also requires
that the independent variables are not highly correlated (i.e., no
multicollinearity).
14. Use Cases:
o Simple Linear Regression: Predicting house prices based on square
footage.
o Multiple Linear Regression: Predicting house prices based on square
footage, location, number of rooms, and year built.
15. Differences:
o Simple Linear Regression deals with one independent variable, while
Multiple Linear Regression involves two or more independent variables.
o The complexity and dimensionality increase with multiple variables in
multiple linear regression.

5 Marks (Short Answer)


What is gradient descent in linear regression, and how does it work?

Answer:
Gradient descent is an optimization algorithm used to minimize the cost function in linear
regression by iteratively adjusting the model’s parameters (weights).

• How it Works:
a. Initialization: Start with random values for the parameters (weights and
biases).
b. Compute the Cost Function: The cost function (mean squared error in
linear regression) is calculated to measure how far off the model’s
predictions are from the actual target values.
c. Update Parameters: The algorithm calculates the gradient (derivative) of the
cost function with respect to each parameter. It then updates the
parameters in the opposite direction of the gradient to reduce the cost
function.
d. Learning Rate: The size of the step taken in each iteration is determined by
the learning rate. A small learning rate leads to slow convergence, while a
large one may cause overshooting.

The process repeats until the cost function converges to a minimum, indicating that the
model’s parameters are optimal.

3 Marks (Very Short Answer)


Define simple linear regression and provide its equation.

Answer:
Simple linear regression is a statistical technique that models the relationship between a
dependent variable and one independent variable by fitting a straight line to the data.

Equation:
y = β0 + β1x + ϵ
where y is the predicted value, β0 is the intercept, β1 is the slope, x is the independent
variable, and ϵ is the error term.

Logistic Regression

8 Marks (Long Answer)


Explain logistic regression and its significance in classification tasks. Discuss how it
differs from linear regression and provide a real-world application.

Answer:
Logistic regression is a supervised learning algorithm used for binary classification tasks,
where the target variable is categorical and typically represents two classes (e.g., 0 and 1,
or True and False).

16. Logistic Function (Sigmoid):


o Logistic regression models the probability of a particular class by applying
the sigmoid function to a linear combination of the input features.
o Sigmoid Function:
P(y=1∣x)= 1 / {1+e−(β0+β1x1+β2x2+⋯+βnxn) }
where P(y=1∣x) is the probability of the positive class.
o The sigmoid function outputs a value between 0 and 1, which represents the
probability of the instance belonging to the positive class (class 1). Based on
this probability, a decision boundary (usually 0.5) is used to classify the
instance as either class 0 or class 1.
17. Differences from Linear Regression:
o Linear Regression is used for continuous output, whereas Logistic
Regression is used for categorical (binary) output.
o In linear regression, the output can take any real value, but in logistic
regression, the output is a probability between 0 and 1.
o Linear regression models the relationship using a straight line, while
logistic regression uses the sigmoid curve to model probabilities.
18. Significance in Classification:
o Logistic regression is widely used in binary classification problems where we
need to predict one of two possible outcomes (e.g., spam vs. non-spam
emails, disease vs. no disease).
o It is interpretable, as the coefficients represent the weight of each feature’s
contribution to the decision.
o Logistic regression assumes a linear relationship between the features and
the log-odds of the target, making it suitable for linearly separable data.
19. Real-World Application:
o Medical Diagnosis: Logistic regression is commonly used in the medical
field to predict the probability of a patient having a disease based on factors
like age, blood pressure, and cholesterol levels.

5 Marks (Short Answer)


How is logistic regression used for classification, and what is the decision boundary?

Answer:
Logistic regression is used to predict the probability that a given input belongs to a
particular class. It applies the sigmoid function to map the output of a linear equation to a
value between 0 and 1, which represents the probability of the input being in the positive
class (class 1).

• Decision Boundary:
The decision boundary is a threshold used to classify the predicted probability. If
the probability is above the threshold (typically 0.5), the instance is classified as
class 1 (positive); otherwise, it is classified as class 0 (negative).
• Example:
If the probability of a patient having a disease (class 1) is 0.7, and the decision
boundary is 0.5, the logistic regression model would classify the patient as
"diseased."

3 Marks (Very Short Answer)


What is the sigmoid function in logistic regression? Write its equation.

Answer:
The sigmoid function in logistic regression is used to map a real-valued number to a
probability between 0 and 1, representing the likelihood of the positive class.

Equation:
P(y=1∣x)= 1 / {1+e−(β0+β1x1+⋯+βnxn)}

UNIT III

Clustering

8 Marks (Long Answer)


Explain clustering in detail. Discuss the difference between agglomerative and
divisive hierarchical clustering, with examples.

Answer:
Clustering is an unsupervised learning technique that groups data points into clusters
based on their similarity. It is often used for tasks such as market segmentation, image
compression, and anomaly detection.

20. Clustering:
o The goal is to divide a dataset into distinct clusters such that data points
within the same cluster are more similar to each other than to those in other
clusters.
o Common clustering algorithms include K-Means, DBSCAN, and hierarchical
clustering.
21. Hierarchical Clustering:
o Hierarchical clustering is a clustering method that builds a hierarchy of
clusters, represented as a tree called a dendrogram. It has two main types:
Agglomerative (bottom-up) and Divisive (top-down) clustering.
22. Agglomerative Clustering:
o Process: This is a bottom-up approach where each data point starts as its
own cluster. At each step, the two closest clusters are merged based on a
distance metric (e.g., Euclidean distance). The process continues until all
points are merged into a single cluster or a stopping criterion (such as the
desired number of clusters) is reached.
o Example: If we have five data points (A, B, C, D, E), agglomerative clustering
might first merge the two closest points, A and B, into one cluster. Then it
might merge C and D, followed by merging E into one of these clusters,
continuing until all points are in a single cluster.
o Advantages:
▪ Simple and easy to implement.
▪ Does not require the number of clusters to be specified in advance.
o Disadvantages:
▪ Computationally expensive for large datasets.
▪ Sensitive to noise and outliers.
23. Divisive Clustering:
o Process: This is a top-down approach where all data points start in one large
cluster. At each step, the cluster is split into two based on dissimilarity. The
process continues until all data points are isolated into individual clusters or
a desired number of clusters is achieved.
o Example: Starting with the same five data points (A, B, C, D, E), divisive
clustering first considers all points as one cluster. Then it divides them into
two clusters based on the largest dissimilarities, continuing the process until
the desired clusters are achieved.
o Advantages:
▪ Captures global structure better than agglomerative clustering.
o Disadvantages:
▪ More computationally expensive than agglomerative clustering.
24. Difference Between Agglomerative and Divisive Clustering:
o Agglomerative: Starts with individual points and merges them into larger
clusters.
o Divisive: Starts with one large cluster and splits it into smaller clusters.
o Complexity: Agglomerative is more commonly used because it is
computationally simpler.
25. Use Cases:
o Agglomerative Clustering: Often used for gene expression analysis, where
individual genes are grouped based on their expression patterns.
o Divisive Clustering: Useful in document clustering, where documents are
divided into different topics.

5 Marks (Short Answer)


What is the difference between agglomerative and divisive clustering in hierarchical
clustering?

Answer:
Agglomerative clustering is a bottom-up approach where each data point starts as its
own cluster, and at each step, the closest clusters are merged until a single cluster is
formed or a stopping criterion is reached.

Divisive clustering is a top-down approach, where all data points start in one large
cluster, and at each step, the cluster is split into smaller clusters based on dissimilarity
until each data point forms its own cluster or a stopping criterion is met.

• Key Difference: Agglomerative clustering merges small clusters into larger ones,
while divisive clustering splits large clusters into smaller ones.

3 Marks (Very Short Answer)


Define clustering and name two clustering algorithms.

Answer:
Clustering is an unsupervised learning technique that groups data points into clusters
based on their similarity. The goal is to ensure that points within the same cluster are more
similar to each other than to those in other clusters.

Two Clustering Algorithms:

26. K-Means Clustering.


27. Agglomerative Hierarchical Clustering.
Regularization

8 Marks (Long Answer)


Explain regularization in machine learning. Discuss its role in preventing overfitting,
and explain the difference between L1 (Lasso) and L2 (Ridge) regularization.

Answer:
Regularization is a technique used in machine learning to prevent overfitting by adding a
penalty to the loss function during the training of a model. Overfitting occurs when the
model learns the noise and details in the training data, making it less effective in
generalizing to new, unseen data. Regularization discourages the model from fitting too
closely to the training data by penalizing large weights.

28. Overfitting:
o Overfitting occurs when a model has too many parameters relative to the
amount of data available. The model becomes overly complex and starts to
"memorize" the training data, performing poorly on new data.
o Regularization helps by constraining or shrinking the coefficients in the
model, leading to a simpler, more general model.
29. Types of Regularization:
o L1 Regularization (Lasso):
▪ Adds a penalty equal to the absolute value of the coefficients to the
loss function.
▪ Lasso Penalty Term:
L1 Penalty=λ ∑ i = 1 to n ∣βi∣
▪Effect: Tends to push some coefficients to exactly zero, effectively
performing feature selection by shrinking irrelevant feature weights to
zero.
▪ Use Case: Lasso is used when we believe that many of the features
are irrelevant, and we want to select a subset of important features.
o L2 Regularization (Ridge):
▪ Adds a penalty equal to the square of the magnitude of the
coefficients to the loss function.
▪ Ridge Penalty Term:
L2 Penalty=λ ∑ i =1 to n βi^2
▪ Effect: Shrinks the coefficients but does not set them to zero,
meaning that all features are retained, but their influence is reduced.
▪ Use Case: Ridge is used when we believe that all features have some
relevance but need to control their individual contribution to prevent
overfitting.
30. Difference Between L1 and L2 Regularization:
o L1 Regularization tends to produce sparse models where some feature
coefficients are exactly zero, which can be helpful for feature selection.
o L2 Regularization produces models where all features are retained but with
reduced coefficients, leading to a more general model without the sparsity of
L1.
31. Role in Preventing Overfitting:
o Regularization controls the complexity of the model by penalizing large
weights. As a result, the model is less likely to overfit the training data and
more likely to generalize to unseen data.
o Both L1 and L2 regularization shrink the coefficients, but L1 tends to
completely eliminate some features, while L2 only reduces their influence.

5 Marks (Short Answer)


What is L2 regularization? How does it help in preventing overfitting?

Answer:
L2 regularization, also known as Ridge regularization, is a technique that adds a penalty
proportional to the square of the coefficients' magnitudes to the loss function. This penalty
discourages large coefficients, preventing the model from becoming too complex.

• Equation:
Loss=MSE+λ∑ i=1 to n βi^2
• Preventing Overfitting: By shrinking the coefficients, L2 regularization reduces the
model's complexity, ensuring that it does not fit the noise in the training data. This
results in better generalization to unseen data.

3 Marks (Very Short Answer)


What is regularization? Name two types of regularization.

Answer:
Regularization is a technique used in machine learning to prevent overfitting by adding a
penalty to the loss function during training, discouraging large coefficients in the model.
Two Types of Regularization:

32. L1 Regularization (Lasso).


33. L2 Regularization (Ridge).

Prediction and Accuracy

8 Marks (Long Answer)


Explain the role of prediction in machine learning. Discuss how accuracy is evaluated
for classifiers and predictors. Provide examples of common evaluation metrics.

Answer:
Prediction is the process in machine learning where a trained model is used to infer the
outcome for new, unseen data. The goal is to generalize from the training data to make
accurate predictions on test or real-world data. In supervised learning, prediction involves
estimating the target variable based on input features.

34. Role of Prediction:


o The model learns patterns in the training data, then applies these patterns to
predict outcomes for new data points.
o For regression tasks, the predicted output is a continuous variable, while for
classification tasks, the predicted output is a category or class label.
35. Evaluating Accuracy:
o Accuracy is one of the most common metrics used to evaluate classification
models. It is defined as the percentage of correct predictions made by the
model.
o Formula:
Accuracy=Number of Correct Predictions / Total Number of Predictions
o Example: If a model correctly predicts 80 out of 100 test samples, its
accuracy is 80%.
36. Common Evaluation Metrics:
o Precision: The ratio of correctly predicted positive observations to the total
predicted positives. It measures the accuracyof the positive predictions
made by the model.
• Formula:
Precision=True Positives/(True Positives+False Positives )
• Example: If a model predicts 10 positive cases, and 8 of them are actually positive,
then the precision is 80%.
• Recall (Sensitivity or True Positive Rate): The ratio of correctly predicted positive
observations to all actual positives. It measures the model's ability to identify
positive cases.
• Formula:
Recall=True Positives /(True Positives+False Negatives )
• Example: If a model identifies 8 out of 10 actual positive cases, then the recall is
80%.
• F1 Score: The harmonic mean of precision and recall, providing a balanced
measure when there is an uneven class distribution.
• Formula:
F1=2 × {(Precision×Recall /Precision+Recall)}
• Example: If precision and recall are both 80%, the F1 score is also 80%.
• Confusion Matrix: A matrix that displays the number of true positives, true
negatives, false positives, and false negatives. It helps in understanding the
performance of the classification model in more detail.
• Specificity: The ratio of correctly predicted negative observations to all actual
negatives.
• Formula:
Specificity= (True Negatives)/ (True Negatives+False Positives)
• Example: If a model correctly identifies 90 out of 100 negative cases, its specificity
is 90%.
• Mean Squared Error (MSE): For regression tasks, MSE is used to evaluate how
close the predicted values are to the actual values. It measures the average
squared difference between the predicted and actual values.
o Formula:
MSE=1/n ∑ i=1 to n (yi−y’ i)^2
• Example: If the actual values are close to the predicted values, the MSE will be
small.
37. ROC Curve and AUC:
o ROC Curve: The Receiver Operating Characteristic (ROC) curve plots the
True Positive Rate (Recall) against the False Positive Rate (1 - Specificity) at
various threshold levels.
o AUC (Area Under the Curve): The area under the ROC curve. A model with
an AUC closer to 1 is considered to have good performance.
38. Real-World Application:
o In medical diagnosis, precision is crucial when predicting a rare disease to
ensure that positive predictions are likely true.
o In spam detection, recall is more important to ensure that all spam emails
are caught, even if a few non-spam emails are flagged incorrectly.

5 Marks (Short Answer)


What is precision, and how is it different from recall?

Answer:
Precision is the ratio of correctly predicted positive observations to the total predicted
positives. It measures the accuracy of the model’s positive predictions.

• Formula:
Precision= (True Positives) /(True Positives+False Positives)
• Recall, on the other hand, is the ratio of correctly predicted positive observations to
all actual positives. It measures the model’s ability to identify all positive cases.
• Formula:
Recall= (True Positives)/ (True Positives+False Negatives )
• Difference: Precision focuses on the quality of positive predictions, while recall
focuses on the model's ability to capture all positive cases.

3 Marks (Very Short Answer)


What is the F1 score in machine learning, and why is it useful?

Answer:
The F1 score is the harmonic mean of precision and recall. It provides a single metric that
balances both precision and recall, making it useful when there is an uneven class
distribution or when both precision and recall are equally important.

• Formula:
F1=2 × {(Precision×Recall )/(Precision+Recall)}

It is useful because it combines both precision and recall into one metric, especially in
cases where a high precision or recall alone might not be sufficient for evaluating model
performance.

You might also like