0% found this document useful (0 votes)
38 views

Unit-3 Data Analytics Material

Uploaded by

adars251
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
38 views

Unit-3 Data Analytics Material

Uploaded by

adars251
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 21

1

Unit-III
Q.no-1 What are association rules? Explain
Association Analysis: It discovers the probability of occurrence of items in a
collection. It helps in finding some interesting relationships in large datasets.
A dataset contains data objects, each containing a set of attributes. An attribute is
also called a dimension, feature, or variable representing a data object's
characteristic features.
For example: Height, qualification, color, etc.
Association rule mining: The below figure shows the general logic behind association
rules. Given a large collection of transactions, in which each transaction consists of
one or more items, association rules go through the items being purchased to see
what items are frequently bought together and to discover a list of rules that
describe the purchasing behavior. The goal of association rules is to discover
interesting relationships among the items. The interesting relationships depend on
the business context and the nature of the algorithm being used for the discovery

Each of the uncovered rules is in the form X→Y, meaning that when item X is
observed, item Y is also observed. In this case, the left-hand side (L.H.S) of the rule is
X and the right-hand side (R.H.S) of the rule is Y
Using association rules, patterns can be discovered from the data that allow
the association rules algorithms to disclose rules of related product purchases. The
2

uncovered rules are listed on the right side of the Figure. The first three rules suggest
that when bread is purchased, 90% of the time milk is purchased also. When eggs are
purchased, 40% of the time bread is purchased also. When milk is purchased, 23% of
the eggs are also purchased.
In the example of a retail store, association rules are used over transactions
that consist of one more item. In fact, because of their popularity in mining customer
transactions, association rules are sometimes, referred to as market basket analysis.
Each transaction can be viewed as the shopping basket of a customer that contains
one or more items. This is also known as items. The term itemset refers to a
collection of items or individual entities that contain some kind of relationship. This
could be a set of hyperlinks clicked on by one used in a single session, or a set of tasks
done in one day. An item containing K items is called a K-itemset denoted by K (Item
1, item 2, ……. Item K).
Q. No-2 What is apriori algorithm? Explain.
Ans: The Apriori algorithm takes a bottom-up iterative approach to uncover the
frequent item by first determining all the possible items (or 1-itemsets, for example
{bread}, {eggs}, {milk},) and then identifying which among them are frequent.
Assuming the minimum support threshold identifies and retains those item sets that
appear in at least 50% of all transactions and discards the item sets that have support
less than 0.5 in fewer than 50% of the transaction
In the next iteration of the Apriori algorithm, the identified frequent 1-itemsets are
paired into 2-itemsets (for example, {bread, eggs}, {milk, bread} ….,) and again
evaluated to identify the frequent-2 itemset among them.
At each iteration, the algorithm checks whether the support criterion can be met: if it
can, the algorithm grows the itemset, repeating the process until it runs out of
support or until the item sets, reach a predefined length, let variable Ck be the set of
candidate K-itemset that satisfy the minimum support. Given a transaction database
D, a minimum support threshold $$, and an optimal parameter N indicating the
maximum length an itemset could reach, Apriori iteratively computer frequent
itemset L k+1 based on Lk
3
4
5

Q. No 3 What are the applications of association rules?


The term market basket analysis refers to a specific implementation of
association rules mining that many companies use for a variety of purposes, including
these:
1. Broad-scale approaches to better merchandising- what products should be
included in or excluded from the monthly inventory.
2. Cross-merchandising between products and high-margin or high-ticket items
3. Physical or logical placement of product within related categories of products
4. Promotional program- multiple product purchase incentives managed through
a loyalty card program

Besides online service basket analysis, association rules are commonly used
for recommender systems and clickstream analysis
Many online service providers such as Amazon and Netflix use recommender
systems, Recommender systems can use association rules to discover related
products or identify customers who have similar interests. For Example, association
rules may suggest that those customers who have bought products A, B, and C are
more similar to this customer. These findings provide opportunities for retailers to
cross-sell their products.
Clickstream analysis refers to the analytics on data related to web- browsing and
user clicks, which are stored on the client or the server side. Web usage log files
generated on web servers contain huge amounts of information, and association
rules can potentially give useful knowledge to web usage data analysis. For Example,
association rules may suggest that website visitors who land on page X click on links
A, B, and C much more often than links D, E, and F this observation provides valuable
insight on how to personalize better and recommend content to site visitors.
Q. No 4 What is clustering? Give an overview
Ans: In general, clustering is the use of unsupervised techniques for grouping similar
objects, in machine learning, unsupervised refers to the problem of finding hidden
structured within unlabeled data. Clustering techniques are unsupervised in the
sense that the data scientist does not determine, in advance, the levels to apply to
the clusters. The structure of the describes the objects of interest and determines
how best to group the objects
6

Clustering is a method often used for exploratory analysis of the data. In clustering,
there are no predictions made. Rather, clustering methods find the similarities
between objects according to the object attributes and group the similar objects into
clusters. Clustering techniques are utilized in marketing, economics, and various
branches of science. A popular clustering method it K-means.
Q. No-5 What is the K-means algorithm? How does the K-means cluster
work? Explain.
K-Means Clustering is an unsupervised Machine Learning algorithm, which groups
the unlabeled dataset into different clusters
Unsupervised Machine Learning;
is the process of teaching a computer to use unlabelled, unclassified data and
enabling the algorithm to operate on that data without supervision. Without any
previous data training, the machine’s job in this case is to organize unsorted data
according to parallels, patterns, and variations.
K means clustering and assigning data points to one of the K clusters depending on
their distance from the canter of the clusters. It starts by randomly assigning the
clusters centroid in the space. Then each data point is assigned to one of the
clusters based on its distance from the centroid of the cluster. After assigning each
point to one of the clusters, new cluster centroids are assigned. This process runs
iteratively until it finds a good cluster. In the analysis, we assume that some
clusters is given in advance and we have to put points in one of the groups.
In some cases, K is not clearly defined, and we have to think about the optimal
number of K. K Means clustering performs best when data is well separated. When
data points overlap this clustering is not suitable. K Means is faster as compared to
other clustering techniques. It provides a strong coupling between the data points.
K Means clusters do not provide clear information regarding the quality of clusters.
Different initial assignments of cluster centroids may lead to different clusters.
Also, the K Means algorithm is sensitive to noise. It may have stuck in local minima.
How does k-means cluster work?
7

We are given a data set of items, with certain features, and values for these
features (like a vector). The task is to categorize those items into groups. To achieve
this, we will use the K-means algorithm, an unsupervised learning algorithm. ‘K’ in
the name of the algorithm represents the number of groups/clusters we want to
classify our items into.
(It will help if you think of items as points in an n-dimensional space). The algorithm
will categorize the items into k groups or clusters of similarity. To calculate that
similarity, we will use the Euclidean distance as a measurement.
The algorithm works as follows:
1. First, we randomly initialize k points, called means or cluster centroids.
2. We categorize each item to its closest mean, and we update the mean’s
coordinates, which are the averages of the items categorized in that
cluster so far.
3. We repeat the process for a given number of iterations and at the end,
we have our clusters.
The “points” mentioned above are called means because they are the mean values
of the items categorized in them. To initialize these means, we have a lot of
options. An intuitive method is to initialize the means at random items in the data
set. Another method is to initialize the means at random values between the
boundaries of the data set (if for a feature x, the items have values in [0,3], we will
initialize the means with values for x at [0,3]).

The above algorithm in pseudocode is as follows:


Initialize k means with random values
--> For a given number of iterations:

--> Iterate through items:

--> Find the mean closest to the item by calculating


the Euclidean distance of the item with each of the means

--> Assign item to mean

--> Update mean by shifting it to the average of the items in that cluster

Q. No-6 What is regression analysis?


Regression, a statistical approach, dissects the relationship between dependent
and independent variables, enabling predictions through various regression
models.
The article delves into regression in machine learning, elucidating models,
terminologies, types, and practical applications.
What is Regression?
8

Regression is a statistical approach used to analyze the relationship between a


dependent variable (target variable) and one or more independent variables
(predictor variables). The objective is to determine the most suitable function that
characterizes the connection between these variables.
It seeks to find the best-fitting model, which can be utilized to make predictions or
draw conclusions.
Regression in Machine Learning
It is a supervised machine learning technique, used to predict the value of the
dependent variable for new, unseen data. It models the relationship between the
input features and the target variable, allowing for the estimation or prediction of
numerical values.
Regression analysis problem works with if the output variable is a real or
continuous value, such as “salary” or “weight”. Many different models can be used,
the simplest is the linear regression. It tries to fit data with the best hyper-plane
which goes through the points.
Terminologies Related to the Regression Analysis in Machine Learning
Terminologies Related to Regression Analysis:
 Response Variable: The primary factor to predict or understand in
regression, also known as the dependent variable or target variable.
 Predictor Variable: Factors influencing the response variable, used to
predict its values; also called independent variables.
 Outliers: Observations with significantly low or high values compared to
others, potentially impacting results and best avoided.
 Multicollinearity: High correlation among independent variables, which
can complicate the ranking of influential variables.
 Underfitting and Overfitting : Overfitting occurs when an algorithm
performs well on training but poorly on testing, while underfitting
indicates poor performance on both datasets.

Q. No: 7 What are the various types of regression


There are two main types of regression:
 Simple Regression
o Used to predict a continuous dependent variable based
on a single independent variable.
o Simple linear regression should be used when there is
only a single independent variable.
 Multiple Regression
o Used to predict a continuous dependent variable based
on multiple independent variables.
o Multiple linear regression should be used when there
are multiple independent variables.
 Polynomial Regression
o Relationship between the dependent variable and
independent variable(s) follows a nonlinear pattern.
9

o Provides flexibility in modelling a wide range of


functional forms.
Q. No 7 Logistic Regression
Logistic regression is used for binary classification where we use the sigmoid
function, which takes input as independent variables and produces a probability
value between 0 and 1.
For example, we have two classes Class 0 and Class 1 if the value of the logistic
function for an input is greater than 0.5 (threshold value) then it belongs to Class 1
otherwise it belongs to Class 0. It’s referred to as regression because it is the
extension of linear regression but is mainly used for classification problems.

Key Points:

 Logistic regression predicts the output of a categorical dependent


variable. Therefore, the outcome must be a categorical or discrete value.
 It can be either Yes or No, 0 or 1, true or False, etc. but instead of giving
the exact value as 0 and 1, it gives the probabilistic values which lie
between 0 and 1.
 In Logistic regression, instead of fitting a regression line, we fit an “S”
shaped logistic function, which predicts two maximum values (0 or 1).
Logistic Function – Sigmoid Function
 The sigmoid function is a mathematical function used to map the
predicted values to probabilities.
 It maps any real value into another value within a range of 0 and 1. The
value of the logistic regression must be between 0 and 1, which cannot
go beyond this limit, so it forms a curve like the “S” form.
 The S-form curve is called the Sigmoid function or the logistic function.
 In logistic regression, we use the concept of the threshold value, which
defines the probability of either 0 or 1. Such as values above the
threshold value tend to be 1, and a value below the threshold value tends
to be 0.
10

Q. No-8 What is Support Vector Regression or What is support


vector machines?
Support Vector Machine is a supervised learning algorithm that can be used for
regression as well as classification problems. So, if we use it for regression
problems, then it is termed as support vector regression
Support vector regression is a regression algorithm that works for continuous
variables. Below are some keywords that are used in support vector regression
1. Kernel: It is a function used to map lower-dimensional data into
higher-dimensional data
2. Hyperplane: In general, SVM, it is a separation line between two
classes, but is SVR, it is a line that helps to predict the continuous
variables and cover most of the data points
3. Boundary line: Boundary lines are the two lines apart from
the hyperplane, which creates a margin for the point.
4. Support Vectors: The support vector is the datapoint which is nearest
to the hyperplane and opposite class.
In SVR we always try to determine a hyperplane with a maximum margin. So
that the maximum number of data points are covered in that margin. The main goal
of SVR is to consider the maximum datapoints within the boundary lines and the
hyperplane must contain a maximum number of datapoints.

Q. No-9 What is decision tree classification? What is the decision tree


regression
11

1. A Decision Tree is a supervised learning algorithm that can be used for solving
both classification and regression problems
2. It can solve problems for both categorical and numerical data
3. Decision Tree regression builds a tree-like structure in which each internal
node represents the “test” for an attribute, each branch represents the result
of the test, and each leaf node represents the final decision or result
4. A decision tree is constructed starting from the root node/ parents’ node,
which splits into left and right child nodes. These child nodes are further
divided into their children node, and themselves become the parent node of
those nodes. Consider the below image:

The above image shows an example of Decision Tree regression, here, the
model is trying to predict the choice of a person between sports cars or
Luxury cars.
1. Random forest is one of the most powerful supervised learning algorithms
Which is capable of performing regression as well as classification tasks.
2. Random Forest regression is an ensemble learning method that combines
multiple decision trees and predicts the final output based on the average
of each tree output. The combined decision tree is called the base model,
and it can be represented more formally as:
3. Random forest uses the bootstrap aggregation technique of ensemble
learning in which aggregated decision tree runs in parallel and do not
interact with each other
4. With the help of random forest regression, we can prevent overfitting in
the model by creating random subsets of the dataset.

Q.no. 10 What is Naïve Bayes classification? Explain.


The Naïve Bayes algorithm is comprised of two words Naïve and Bayes, which
can be described as:
12

1. Naïve: It is called Naïve because it assumes that the occurrence of a certain


feature is independent of the occurrence of other features. Such as if the fruit
is identified on the basis of colour, shape, and taste, then red, spherical, and
sweet fruit is recognized as an apple. Hence each feature individually
contributes to identifying that it is an apple without depending on each other.
2. Bayes: Bayes are called Bayes because they depend on the principle of Bayes
Theorem.

Bayes’ theorem:
1. Bayes theorem is also known as Bayes’ Rule or Bayes’ law, which is used to
determine the probability of a hypothesis with prior knowledge. It depends on
the conditional probability.
2. The formula for Bayes’ theorem is given as:

P(A/B) =
P ( BA ) P (A )
P(B)
Where.
P(A/B) is posterior probability: Probability of hypothesis A on the observed event B
P(B/A) is likelihood probability: The probability of the evidence given that the
probability of a hypothesis is true
P(A) is prior probability: The probability of the hypothesis before observing the
evidence.
P(B) is Marginal Probability: Probability of Evidence

Working of Naïve Bayes’ Classifier:


The working of Naïve Bayes’ classifier can be understood with the help of the
below example:
Suppose we have a dataset of Weather conditions and the corresponding target
variable “Play”. So, using this dataset we need to decide whether we should play or
not on a particular day according to the weather conditions. So, to solve this
problem, we need to follow the following steps:
1. Convert the given dataset into frequency tables.
2. Generate a Likelihood table by finding the probabilities of given features.
3. Now, use Bayes theorem to calculate the posterior probability

Problem: If the weather is Sunny, then the player should play or not?

Solution: To solve this, first consider the below dataset:


Outlook Play
0 Rainy Yes
1 Sunny Yes
2 Overcast Yes
13

3 Overcast Yes
4 Sunny No
5 Rainy Yes
6 Sunny Yes
7 Overcast Yes
8 Rainy No
9 Sunny No
10 Sunny Yes
11 Rainy No
12 Overcast Yes
13 Overcast Yes

Frequency table for the Weather conditions:

Weather Yes No
Overcast 5 0
Rainy 2 2
Sunny 3 2
Total 10 5

Likelihood Table weather conditions

Weather No Yes
Overcast 0 5 5/14=0.35
Rainy 2 2 4/14=0.29
Sunny 2 3 5/14=0.35
All 4/14=0.29 10/14=0.71 1

P (Yes/sunny) =P(Sunny/yes) *P(yes)/P(sunny)


P (sunny/yes) = 3/10=0.3
P (sunny)=0.35 P(yes)= 0.71
So, p (yes/ sunny) =0.3*0.71/0.35=0.60
P (no/sunny) =p(sunny/no) * P(no)/P (sunny)
P(sunny/no) =2/4=0.5
P(no)=0.29
P(Sunny)=0.35
So, P(no/sunny) = 0.5*0.29/0.35=0.41
14

Advantages of Naïve Bayes classifier:


1. Naïve Bayes is one of the fast and easy Machine Learning algorithms to
predict a class of datasets
2. If can be used for Binary as well as multi-class classification
3. It performs well in multi-class predictions as compared to the other
algorithms
4. It is the most popular choice for test classification problems

Disadvantages of Naïve Bayes classifier


1. Naïve Bayes assumes that all features are independent or unrelated, so it
cannot learn the relationship between features

Applications of Naïve Bayes classification:


1. It is used for credit scoring
2. It is used in medical data classification
3. It can be used in real-time predictions because the Naïve Bayes classifier is an
eager learner
4. It is used in text classification such as Spam filtering and sentiment analysis

Q.No.11. What is attribute selection? What are various methods used for this?
(OR)
How to choose the best attribute at each node in a decision tree?
Ans: Attribute Subset selection Measure is a technique used in the data mining
process for data reduction. The data reduction is necessary to make better analysis
and prediction of the target variable.

The two main ASM techniques


1. Gini Index
2. Information Gain (ID3

1. Gini index:
The measure of the degree of probability of a particular variable being wrongly
classified when it is randomly chosen is called the Gini index or Gini impurity. The
data is equally distributed based on the Gini Index.
Mathematically Formula:
15

P(i)= Probability of an object being classified into a particular class

When you use the Gini index as the criterion for the algorithm to select the feature
for the root node. The feature with the least Gini index is the selected

Attribute Selection Measures

• It is a heuristic approach to select the best splitting criterion that separates a


given data partition, D, of class-labelled training tuples into individual classes.

• Splitting criterion is called the best when after splitting, each partition will be
pure.

• A partition is called pure when all the tuples that fall into the partition
belong to the same class.
16

• Attribute selection measures are also known as splitting rules because they
determine how the tuples at a given node are to be split.

• First, a rank is provided for each attribute that describes the training tuples.
The attribute having the best score for the measure is chosen as the splitting
attribute for the given tuples.

• If the splitting attribute is continuous-valued or if we are restricted to binary


trees, then respectively either a split point or a splitting subset must also be
determined as part of the splitting criterion. partition scenarios Examples.

1.A is discrete-valued: In this case, the outcomes of the test at node N


correspond directly to the known values of A. A branch is created for each
known value, aj, of A and labelled with that value (as in the figure). Partition D j is
the subset of class-labelled tuples in D having value a j of A.

2.A is continuous-valued: In this case, the test at node N has two possible
outcomes, corresponding to the conditions A ≤ split point and A > split point,
respectively, where the split point is the split point returned by the Attribute
selection method as part of the splitting criterion.

3.If A is discrete-valued and a binary tree must be produced, then the test is of
the form A SA, where SA is the splitting subset for A.

• According to the algorithm the tree node created for partition D is labelled
with the splitting criterion, and the tuples are partitioned accordingly. [Also
Shown in the figure].

• There are three popular attribute selection measures: Information Gain,


Gain ratio, and, Gini index.

• Information gain:
The attribute with the highest information gain is chosen as the splitting attribute.
This attribute minimizes the information needed to classify the tuples in the
resulting partitions.
Let D, the data partition, be a training set of class-labeled tuples.
let class label attribute has m distinct values defining m distinct classes, C i (for i =
1,..., m). Let Ci,D be the set of tuples of class C i in D. Let |D| and |C i,D| denote the
number of tuples in D and C i,D, respectively.

• Then the expected information needed to classify a tuple in D is given by


17

• where p is the nonzero probability that an arbitrary tuple in D belongs to


i

class Ci and is estimated by |Ci,D|/|D|. Info(D) is the average amount of


information needed to identify the class label of a tuple in D. Info(D) is also
known as the entropy of D.

• Now, suppose we have to partition the tuples in D on some attribute A


having v distinct values, {a 1, a2,..., av }.
Then the expected information required to classify the tuple from D based on

attribute A is:

• The term |D | /|D| acts as the weight of the j th partition. Info A (D) is the
j

expected information required to classify a tuple from D based on the


partitioning by A.

• Information gain is defined as the difference between


the original information requirement and the new requirement (i.e.
obtained after portioning on A).
Gain(A) = Info (D) – Info A (D)
Now, the attribute A with the highest information gain is chosen as the splitting
attribute.

Short Answer Questions

Q. No-1 Describe the main steps in the Apriori algorithm


The main steps of the Apriori algorithm are:
1. Generate candidate itemset of length 1
2. Calculate the support of these items and prune those that do not meet the
minimum support threshold
18

3. Use the frequent itemset of the previous length to generate candidate


itemsets of the next length
4. Request the process until no more frequent itemsets are found

Q. No-2 Explain the K-means clustering algorithm and its steps.

The k-means clustering algorithm is an unsupervised machine learning algorithm


used to partition a dataset into k clusters. The main steps of the k-means algorithm
are:
1. Initialization: Select k initial centroids, either randomly or using some
heuristic.
2. Assignment: Assign each data point to the nearest centroid, forming k
clusters
3. Update: Calculate the new centroids as the mean of all data points assigned
to each cluster
4. Repeat: Repeat the assignment and update steps until the centroids no
longer change significantly or a minimum number of iterations is reached.
The goal is to minimize the within-cluster sum of squares (WCSS), which
measures the variance within each cluster.

Q. No-3 Discuss the importance of the selection of K in the k-means algorithm and
describe the method to choose an appropriate value of k.
Selecting the appropriate number of clusters k is crucial for the performance of the
k-means algorithm. If k is too small, distinct groups may be merged. If k is too large,
the algorithm may find meaningless clusters, and the method to choose k includes.
1. Elbow Method: Plot the WCSS against the number of clusters and look for an
“elbow” point where the rate of decrease sharply slows
2. Silhouette Analysis: Measures how similar an object is to its own cluster
compared to another cluster. The average silhouette score can help
determine the best K
3. Cross-Validation: Use techniques like cross-validation to evaluate the
performance of different k values based on predefined criteria
Q.N-4 Describe how you would evaluate the quality of clusters produced by the k-
means algorithm

The quality of clusters produced by the k-means algorithm can be evaluated using
several methods
1. Within-cluster sum of squares (WCSS): Measures the compactness of the
clusters, Lower WCSS indicates tighter clusters
2. Between-Cluster Sum of Squares (BCSS): Measures the separation of the
clusters, Higher BCSS indicates well-separated clusters.
3. Silhouette Score: Measures the cohesion and separation of clusters. Values
range from -1 to +1 with higher values indicating better-defined clusters.
19

4. Davies-Bouldin Index: ratio of within-cluster distances to between-cluster


distances. Lower values indicate better clustering.
5. Visual inspection: For 2D or 3D data, visually inspect the clusters using
scatter plots to assess their shape and separation
6. External Validation: It ground truth labels are available, use metrics like
adjusted Rand Index (ARI) or Normalized Mutual Information (NMI) to
compare the clustering results with the true labels.

Q.No-5 How can linear regression be applied to real-world data analytics


problem? Provide an example with a brief explanation?

Linear regression is widely used in real-world data analytics for predicting and
understand relationships between variables. An example application is in the
housing market:
1. Problem: Predicting house prices based on various features like square footage,
number of bedrooms, location and age of house
2. Application: A Liner regression model can be built where the dependent variable
is the house, price, and the independent variables are features mentioned.
Steps:
1. Data collection: Gather data on house prices and features
2. Data preprocessing: Clean the data handle missing values and encode
categorical
variables
3. Model Building: Fit a linear regression model to the data
4. Evaluation: Use R-squared, residual analysis, and cross-validation to evaluate the
model
5. Prediction: Use the model to predict prices of new houses based on their
features.
6. Outcome: The model helps real estate agents and buyers estimate house prices
based
on various factors, aiding in better decision-making and pricing strategies

Q, No-6 What are advantages and disadvantages of using decision tree


classification in data analytics
Advantages:
1. Interpretability: Decision tree is easy to understand and interpret as they mimic
human decision- making processes.
2. No Need for Feature Scaling: Denison trees do not require normalization of data
3. Handles Non-linear Relationships: They can capture non-liner relationship
between features and target variable
4. Versability: Can be used for both classification and regression tasks
Disadvantages
20

1. Overfitting: Decision trees can easily overfit the training data, especially if they
are deep and complex
2. Small changes in the data can lead to significantly different tress
3. Decision trees can be biased towards features with more levels or categories
4. Sensitive to noisy data, which can lead to incorrect splits.

Q. No-7 How can the performance of a Naïve Bayes classifier be evaluated, and
what metrics are commonly used for this purpose?

The performance of a Naïve Bayes classifier can be evaluated using several metrics,
commonly used in classification tasks.
1. Accuracy: The proportion of correctly classified instances among the total
instances
2. Precision: The proportion of true positive predictions among the total predicted
positives, indicating the accuracy of positive predictions.
3. Recall: The proportion of true positive predictions among the actual positives,
indicating the model’s ability to identify positive instances.
4. FI Score: The harmonic means of precision and recall, providing a balance
between them.
5.Confusion Matrix: A table showing the true vs. predicted classifications useful for
calculating others metrics.
6. ROC-AUC Score: The area under the Receiver Operating Characteristic curve,
measuring the model’s ability to distinguish between classes.

Q. No- 8 Provide an example of a real-world application of Naïve Bayes


classification, explaining the problem and how Naïve Bayes helps solve it?

Ans: Example: Email spam Detection


1. Problem: Identifying whether an incoming email is spam or not based on its
content and metadata.
2. Application: Naïve Bayes classification can be used to build a spam filter by
analysing features such as the frequency of certain words, presence of hyperlinks,
sender information, and email metadata.

Steps:
1. Data Collection: Gather a dataset of emails labelled as spam or not spam
21

2. Feature Extraction : Extract features like word frequencies, presence of specific


words, and metadata.
3.Model training: Evaluate the model using metrics like accuracy, precision, recall
and FI score.
4. Prediction: Use the trained model to classify new incoming emails as spam or not
spam.
5. Outcome: The Naïve Bayes classifier helps in accurately identifying spam emails,
reducing the chances of spam reading the user’s index and improving emails
management efficiency

You might also like