0% found this document useful (0 votes)

16 views

Module 6

Ai and Ml

Uploaded by

Ankit

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

16 views

Module 6

Ai and Ml

Uploaded by

Ankit

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

You are on page 1/ 82

Module 6

Learning from Examples

Forms of Learning – Dimensionality reduction - Regression – Statistical Methods:
Naïve Bayes, Nearest Neighbor, Decision Trees – Random Forest, Clustering,
Ensemble Learning, Case studies – Machine Learning in Signal Processing,
Intelligent Antenna.
Machine learning
• Like human learning from past experiences.
• A computer does not have “experiences”.
• A computer system learns from data, which represent some “past
experiences” of an application domain.
• Machine learning is building machines that can adapt and learn from
experience without being explicitly programmed.
• To learn a target function that can be used to predict the values of a
class attribute.
Machine learning
In machine learning,
• There is a learning algorithm.
• Data called as training data set is fed to the learning algorithm.
• Learning algorithm draws inferences(conclusion) from the training data
set.
• It generates a model which is a function that maps input to the output.
Machine learning Algorithms
There are three types of machine learning algorithms.
• Supervised Learning
• Unsupervised Learning
• Reinforcement Learning
Supervised vs. Unsupervised Learning
• Supervised learning: Classification is seen as supervised learning from
examples.
• Supervision: The data (observations, measurements, etc.) are labeled with pre-
defined classes.
• Test data are classified into these classes.
• Unsupervised learning (clustering)
• Class labels of the data are unknown
• Given a set of data, the task is to establish the existence of classes or clusters in
the data

5
Supervised Learning

In this type of machine learning algorithm,

• The training data set is a labeled data set.
• In other words, the training data set contains the input value (X) and
target value (Y).
• The learning algorithm generates a model.
• Then, new data set consisting of only the input value is fed.
• The model then generates the target value based on its learning.
Supervised learning process: Two steps
 Learning (training): Learn a model using the training data
 Testing: Test the model using unseen test data to assess the model
accuracy

Number of correct classifications

Accuracy  ,
Total number of test cases

7
Application - 1
• A credit card company receives thousands of applications for new cards.
Each application contains information about an applicant,
• age
• Marital status
• annual salary
• outstanding debts
• credit rating
• etc.
• Data: Loan application data
• Task: Predict whether a loan should be approved or not.
• Performance measure: Accuracy
An example: Data (loan application)
Approved or not
Application - 2
• An emergency room in a hospital measures 17 variables (e.g.,
blood pressure, age, etc) of newly admitted patients.
• A decision is needed: whether to put a new patient in an intensive-
care unit.
• Due to the high cost of ICU, those patients who may survive less
than a month are given higher priority.
• Problem: to predict high-risk patients and discriminate them from
low-risk patients.
Types of Supervised Learning Algorithm

• There are two types of supervised learning algorithm.

• Classification
• Regression
Unsupervised Learning

• In this type of machine learning algorithm,

• The training data set is an unlabeled data set.
• In other words, the training data set contains only the input value (X)
and not the target value (Y).
• Based on the similarity between data, it tries to draw inference from
the data such as finding patterns or clusters.
Overview of Classification

• The primary task performed by classifiers is to assign class labels to new

observations.
• Most classification methods are supervised, they start with a training set
of pre-labeled observations to learn how likely the attributes of these
observations may contribute to the classification of future unlabeled
observations.
• For example, existing marketing, sales, and customer demographic data
can be used to develop a classifier to assign a “purchase” or “no
purchase” label to potential future customers.
• Classification is widely used for prediction purposes.
• Classification can help health care professionals diagnose heart disease patients.
• Based on an e-mail’s content, e-mail providers also use classification to decide
whether the incoming e-mail messages are spam.
• The two fundamental classification methods are Decision trees and Naive
Bayes.
Decision Trees

• A decision tree (also called prediction tree) uses a tree structure to specify
sequences of decisions and consequences.
• Given input X = {x1,x2,….xn}, the goal is to predict a response or output
variable ‘Y’. Each member of the set {x1,x2,….xn} is called an input
variable.
• The prediction can be achieved by constructing a decision tree with test
points and branches.
• At each test point, a decision is made to pick a specific branch and
traverse down the tree. Eventually, a final point is reached, and a
prediction can be made.
• Each test point in a decision tree involves testing a particular input
variable (or attribute), and each branch represents the decision being
made.
• Due to its flexibility and easy visualization, decision trees are commonly
deployed in data mining applications for classification purposes.
Decision Trees
• The input values of a decision tree can be categorical or continuous.
• A decision tree employs a structure of test points (called nodes) and branches,
which represent the decision being made.
• A node without further branches is called a leaf node. The leaf nodes return
class labels and, in some implementations, they return the probability scores.
• A decision tree can be converted into a set of decision rules.
• For example, income and mortgage_amount are input variables, and the
response is the output variable default with a probability score.
IF income < $50,000 AND mortgage_amount > $100K THEN default =
True WITH PROBABILITY 75%
• Decision trees can be easily represented in a visual way, and the corresponding
decision rules are quite straightforward.
• Additionally, because the result is a series of logical if-then statements, there is
no underlying assumption of a linear (or nonlinear) relationship between the
input variables and the response variable.
Decision Trees
• Decision trees have two varieties:
• Classification trees
• Regression trees
• Classification trees are usually applied to output variables that are
categorical—often binary—in nature, such as yes or no, purchase or
not purchase, and so on.
• Regression trees, are applied to output variables that are numeric or
continuous, such as the predicted price of a consumer good or the
likelihood a subscription will be purchased.
Overview of a Decision Tree
• The following diagram shows an example of using a decision tree to
predict whether customers will buy a product.
• The term branch refers to the outcome of a decision and is
visualized as a line connecting two nodes. If a decision is numerical,
the “greater than” branch is usually placed on the right, and the
“less than” branch is placed on the left.
• Depending on the nature of the variable, one of the branches may
need to include an “equal to” component.
Example of a Decision Tree – Customer Buy a
Product

In the example, the root node splits into two branches with a Gender test.
The right branch contains all those records with the variable Gender equal to Male, and the left
branch contains all those records with the variable Gender equal to Female to create the depth 1
internal nodes.
Each internal node effectively acts as the root of a sub-tree, and a test for each node is determined
independently of the other internal nodes.
The left-hand side (LHS) internal node splits on a question based on the Income variable to create
leaf nodes at depth 2, whereas the right-hand side (RHS) splits on a question on the Age variable.
The decision tree shows that females with income less than or equal to $45,000 and males 40 years
old or younger are classified as people who would purchase the product. In traversing this tree, age
does not matter for females, and income does not matter for males.
Decision Tree
• Internal nodes are the decision or test points. Each internal node refers to an
input variable or an attribute. The top internal node is called the root.
• The decision tree in the above example is a binary tree in that each internal
node has no more than two branches. The branching of a node is referred to as
a split.
• Sometimes decision trees may have more than two branches stemming from a
node. For example, if an input variable Weather is categorical and has three
choices— Sunny, Rainy, and Snowy— the corresponding node Weather in the
decision tree may have three branches labeled as Sunny, Rainy, and Snowy,
respectively.
• The depth of a node is the minimum number of steps required to reach the
node from the root. In the above example, nodes Income and Age have a depth
of one, and the four nodes on the bottom of the tree have a depth of two.
• Leaf nodes are at the end of the last branches on the tree. They represent class
labels—the outcome of all the prior decisions. The path from the root to a leaf
node contains a series of decisions made at various internal nodes.
Applications
• Decision trees are used to classify animals, (like cold- blooded or warm-blooded,
mammal or not mammal).
• Another example is a checklist of symptoms during a doctor’s evaluation of a
patient.
• The artificial intelligence engine of a video game commonly uses decision trees
to control the autonomous actions of a character in response to various
scenarios.
• Retailers can use decision trees to segment customers or predict response rates
to marketing and promotions.
• Financial institutions can use decision trees to help decide if a loan application
should be approved or denied. In the case of loan approval, computers can use
the logical if-then statements to predict whether the customer will default on
the loan.
• For customers with a clear (strong) outcome, no human interaction is required;
for observations that may not generate a clear response, a human is needed for
the decision.
Construction – General Algorithm

• The objective of a decision tree algorithm is to construct a tree T from a training set S.
• If all the records in S belong to some class C, or if S is sufficiently pure (greater than a preset threshold), then
that node is considered a leaf node and assigned the label C.
• The purity of a node is defined as its probability of the corresponding class.
• If not all the records in S belong to class C or if S is not sufficiently pure, the algorithm selects the next most
informative attribute A, and partitions S according to A‘s values.
• The algorithm constructs sub-trees T1,T2….. for the subsets of S recursively until one of the following criteria
is met:
• All the leaf nodes in the tree satisfy the minimum purity threshold.
• The tree cannot be further split with the preset minimum purity threshold.
• Any other stopping criterion is satisfied (such as the maximum depth of the tree).
• The first step in constructing a decision tree is to choose the most informative attribute. A common way to
identify the most informative attribute is to use entropy-based methods, which are used by decision tree
learning algorithms such as ID3 (or Iterative Dichotomiser 3).
• The entropy methods select the most informative attribute based on two basic measures:
• Entropy, which measures the impurity of an attribute
• Information gain, which measures the purity of an attribute
• At each split, the decision tree algorithm picks the most informative attribute out of the remaining attributes.
• The extent to which an attribute is informative is determined by measures such as entropy and information
gain.
Construction – General Algorithm
• Given a class X, and its label x ϵ X , let p(x) be the probability of X, H(X) be
the entropy of X. H(X) is defined as,

• Entropy H(X) is zero when p(x) is either zero or one.

• For a binary classification (true or false), H(X) is zero if the probability of
each label x p(x) is either zero or one.
• On the other hand, H(X) achieves the maximum entropy when all the
class labels are equally probable.
• For a binary classification, H(X) = 1, if the probability of all class labels is
50/50.
• The maximum entropy increases as the number of possible outcomes
increases.
Example
• As an example of a binary random variable, consider tossing a coin with
known, probabilities of coming up heads or tails.
• Let x=1 represent ‘head’ and x=0 represent ‘tail’.
• The entropy of the unknown result of the next toss is maximized when the
coin is fair.
• That is, when heads and tails have equal probability, i.e p(x=1) = p(x=0) = 0.5,
the entropy is calculated as
H(X) = -(0.5 log2 0.5 + 0.5 log2 0.5) = 1.
• When the probability of tossing a head is equal to 0 or 1, the entropy is
minimized to 0.
• Therefore, the entropy for a completely pure variable is 0 and is 1 for a set
with equal occurrences for both the classes (head and tail, or yes and no).
Naïve Bayes Classifier
• Naive Bayes is a probabilistic classification method based on Bayes’ theorem (or Bayes’ law).
• Bayes’ theorem gives the relationship between the probabilities of two events and their
conditional probabilities.
• A naive Bayes classifier assumes that the presence or absence of a particular feature of a class is
unrelated to the presence or absence of other features.
• For example, an object can be classified based on its attributes such as shape, color, and weight.
• A classification for an object that is spherical, yellow, and less than 60 grams in weight may be a
tennis ball.
• Even if these features depend on each other or upon the existence of the other features, a
naïve Bayes classifier considers all these properties to contribute independently to the
probability that the object is a tennis ball.
• The input variables are generally categorical, but variations of the algorithm can accept
continuous variables.
• There are also ways to convert continuous variables into categorical ones.
• This process is often referred to as the discretization of continuous variables.
• In the tennis ball example, a continuous variable such as weight can be grouped into intervals to
be converted into a categorical variable.
Applications
• Naive Bayes classifiers are easy to implement and can execute efficiently
even without prior knowledge of the data.
• They are among the most popular algorithms for classifying text
documents.
• Spam filtering is a classic use case of naïve Bayes text classification.
• Bayesian spam filtering has become a popular mechanism to distinguish
spam e-mail from legitimate e-mail.
• Naive Bayes classifiers can also be used for fraud detection.
• In the domain of auto insurance, for example, based on a training set with
attributes such as driver’s rating, vehicle age, vehicle price, historical
claims by the policy holder, police report status, and claim genuineness,
Naive Bayes can provide probability-based classification of whether a new
claim is genuine.
Bayes Theorem
• The conditional probability of event C occurring, given that event A
has already occurred, is denoted as P(C/A), which can be found using
the formula,

where C is the class label C ϵ (C1,C2,….Cn) and A is the observed

attributes A = (A1,A2,….An).
• P(C) and P(A) and the conditional probabilities of C given A and A
given C, namely P(C|A) and P(A|C).
Naïve Bayes Classifier
Naïve Bayes Classifier
Prior Probability:
PlayTennis(Yes) = 9/14 = 0.642
PlayTennis(No) = 5/14 = 0.357

Conditional Probabilities:
Naïve Bayes Classifier
Naïve Bayes Classifier

P(No) > P(Yes)

Hence the New Instance can be classified as No
Naïve Bayes Classifier
Naïve Bayes Classifier
Solution:
Use the m-estimate of probabilities:
P() =
p : prior estimate of the probability
m: equivalent sample size (constant)
In the absence of other information, assume a
uniform prior: p =
where k is the number of values that the
attribute can take.
Naïve Bayes Classifier
Naïve Bayes Classifier
Naïve Bayes Classifier
K-Nearest Neighbour Classifier (KNN)
• The K-Nearest Neighbors (KNN) algorithm is a supervised machine learning
method.
• K-NN algorithm assumes the similarity between the new case/data and
available cases and put the new case into the category that is most similar to
the available categories.
• K-NN algorithm stores all the available data and classifies a new data point
based on the similarity.
• K-NN algorithm can be used for Regression as well as for Classification but
mostly it is used for the Classification problems.
• K-NN is a non-parametric algorithm, which means it does not make any
assumption on underlying data.
• It is also called a lazy learner algorithm because it does not learn from the
training set immediately instead it stores the dataset and at the time of
classification, it performs an action on the dataset.
• KNN algorithm at the training phase just stores the dataset and when it gets
new data, then it classifies that data into a category that is much similar to the
new data.
Working of KNN
The K-NN working can be explained on the basis of the below
algorithm:
Step-1: Select the number K of the neighbors
Step-2: Calculate the Euclidean distance of K number of
neighbors

Step-3: Take the K nearest neighbors as per the calculated

Euclidean distance.
Step-4: Among these k neighbors, count the number of the data
points in each category.
Step-5: Assign the new data points to that category for which the
number of the neighbor is maximum.
Working of KNN
How to choose the value of k for KNN Algorithm?
• The value of k is very crucial in the KNN algorithm to define the
number of neighbors in the algorithm.
• The value of k in the k-nearest neighbors (k-NN) algorithm should
be chosen based on the input data.
• If the input data has more outliers or noise, a higher value of k
would be better.
• It is recommended to choose an odd value for k to avoid ties in
classification.
• Cross-Validation methods can help in selecting the best k value
for the given dataset.
K- Nearest Neighbour (KNN) Classifier
K- Nearest Neighbour (KNN) Classifier
K- Nearest Neighbour (KNN) Classifier
K- Nearest Neighbour (KNN) Classifier
K- Nearest Neighbor (KNN) Classifier
K- Nearest Neighbor (KNN) Regression
Linear Regression
• Linear regression is one of most popular machine Learning
algorithms.
• It is a statistical method used for predictive analysis.
• Linear regression makes predictions for continuous/real or
numeric variables such as sales, salary, age, product
price, etc.
• Linear regression algorithm shows a linear relationship between
a dependent (y) and one or more independent (y) variables.
• Since linear regression shows the linear relationship, which
means it finds how the value of the dependent variable is
changing according to the value of the independent variable.
• The linear regression model provides a sloped straight line
representing the relationship between the variables.
Linear Regression
Linear Regression
Mathematically, linear regression is represented as:
Y= mX+b
Y= Dependent Variable (Target Variable)
X= Independent Variable (predictor Variable)
m= slope of the line (how much Y changes for a change in
X)
b is the intercept (Value of Y when X is 0)
Types of Linear Regression
Simple Linear Regression:
If a single independent variable is used to predict the value of
a numerical dependent variable, then a Linear Regression
algorithm is called Simple Linear Regression.

Multiple Linear regression:

If more than one independent variable is used to predict the
value of a numerical dependent variable, then a Linear
Regression algorithm is called Multiple Linear Regression.

A linear line showing the relationship between the dependent

and independent variables is called a regression line.
Finding the best fit line
The main goal is to find the best fit line (ie) the error between
predicted values and actual values should be minimized.
The best fit line will have the least error.
The different values for weights or coefficient of lines (a 0, a1)
gives the different line of regression, and the cost function is
used to estimate the values of the coefficient for the best fit
line.
Cost function optimizes the regression coefficients or weights.
It measures how a linear regression model is performing.
The cost function is used to find the accuracy of
the mapping function, which maps the input variable to the
output variable.
Cost Function
For Linear Regression, the Mean Squared Error
(MSE) cost function is used, which is the average of
squared error occurred between the predicted values and
actual values.
It can be written as:

Where,
N=Total number of observation
Yi = Actual value
(a1xi+a0)= Predicted value.
Linear Regression
Linear Regression
Formula used for linear regressions is, y = a + bx
Linear Regression
• For example, suppose we have the following dataset with the weight
and height of seven individuals:

Using Linear Regression For a person who weighs 170 pounds, how
tall would we expect them to be?
Linear Regression
Y = 32.7830 + 0.2001x
For a person who weighs 170 pounds
Y = 32.7830 + 0.2001 * 170
Y=32.7830+34.017
Y=66.8 inches
Overview of Clustering
• Clustering is one of the unsupervised learning algorithms for grouping
similar objects.
• In machine learning, unsupervised algorithms refers to the problem of
finding hidden structure (make inferences) within unlabeled data.
• it groups data instances that are similar to (near) each other in one cluster and
data instances that are very different (far away) from each other into different
clusters.
• Clustering techniques are unsupervised in the sense that the data
scientist does not determine, in advance, the labels to apply to the
clusters.
• The structure of the data describes the objects of interest and
determines how best to group the objects.
Overview of Clustering
• Clustering is a method used for exploratory analysis of the data.
• In clustering, there are no predictions made. Rather, clustering
methods find the similarities between objects according to the object
attributes and group the similar objects into clusters.
• Clustering techniques are utilized in marketing, economics, and various
branches of science. A popular clustering method is k-means.
Use Cases

• Clustering is often used as a lead-in to classification.

• Once the clusters are identified, labels can be applied to each cluster to
classify each group based on its characteristics.
• Clustering is primarily an exploratory technique to discover hidden
structures of the data, possibly as a prelude to more focused analysis or
decision processes.
• Some specific applications of k-means are image processing, medical,
and customer segmentation.
Use Cases - Customer Segmentation

• Marketing and sales groups use k-means to better identify customers

who have similar behaviors and spending patterns.
• For example, a wireless provider may look at the following customer
attributes: monthly bill, number of text messages, data volume
consumed, minutes used during various daily periods, and years as a
customer.
• The wireless company could then look at the naturally occurring
clusters and consider tactics to increase sales or reduce the customer
churn rate, the proportion of customers who end their relationship
with a particular company.
K-means clustering
• K-means is a partitional clustering algorithm.
• Let the set of data points (or instances) D be
{x1, x2, …, xn},
where xi = (xi1, xi2, …, xir) is a vector in a real-valued space X  Rr, and r is the number of attributes
(dimensions) in the data.
• K means algorithm is an iterative algorithm that partition the dataset into K pre-defined
distinct non-overlapping subgroups (clusters) where each data point belongs to only one
group.
• The k-means algorithm partitions the similar data points into fixed number (k) of clusters in a
dataset.
• Each cluster has a cluster center, called centroid.
• k is specified by the user
• A cluster refers to a collection of data points aggregated together because of certain
similarities.
• K-means algorithm refers to the number of centroids in the dataset, and then allocates every
data point to the nearest cluster, while keeping the centroids as small as possible.
• ‘means’ in the K-means refers to averaging of the data; that is, finding the centroid.
Cluster and Centroid
• A cluster is represented by a single point, known as centroid (or cluster center) of
the cluster.

• Centroid is computed as the mean of all data points in a cluster 𝐶𝑗 = Σ 𝑥 𝑖

• A centroid is the imaginary or real location representing the center of the cluster.

• The data points in each cluster are as similar as possible according to a similarity
measure such as Euclidean-based distance or correlation-based distance.
• The less variation within the clusters, the more homogeneous (similar) the data
points are within the same cluster.
K-Means Clustering
• Given a collection of objects each with n measurable attributes, k-means is an
analytical technique that, for a chosen value of k, identifies k clusters of objects based
on the objects’ proximity to the center of the k groups.
• The center is determined as the arithmetic average (mean) of each cluster’s n-
dimensional vector of attributes.
• The following diagram illustrates three clusters of objects with two attributes.
• Each object in the dataset is represented by a small dot color-coded to the closest large
dot, the mean of the cluster.
K-means Algorithm – Working Principle

• K-means algorithm starts with a first group of randomly selected

centroids, which are used as the beginning points for every cluster,
and then performs iterative (repetitive) calculations to optimize the
positions of the centroids.
• It halts creating and optimizing clusters when either:
• The centroids have stabilized — there is no change in their values because
the clustering has been successful.
• The defined number of iterations has been achieved.
K-means Algorithm – Working Principle
• To illustrate the method to find k clusters from a collection of M objects
with n attributes, the two- dimensional case (n = 2) is examined.
• It is much easier to visualize the k-means method in two dimensions.
• In two-dimension scenario, each object has two attributes, and consider
each object corresponding to the point (xi,yi), where x and y denote the
two attributes and i = 1, 2 … M.
• For a given cluster of m points (m ≤ M), the point that corresponds to
the cluster’s mean is called a centroid. A centroid refers to a point that
corresponds to the center of mass for an object.
K-means Algorithm – Working Principle
The K-means algorithm to find k clusters can be described using the four steps.
1. Choose the value of k and the k initial guesses for the centroids.
2. Compute the distance from each data point (xi,yi) to each centroid. Assign each data point to the closest
centroid. This association defines the first k clusters.
In two dimensions, the distance, d, between any two points, (x1,y1), and (x2,y2 is expressed by using the
Euclidean distance measure provided in by the Equation,

3. Compute the centroids for the clusters by taking the average of the all data points that belong to each cluster. The
centroid (xc,yc) of m points in a k-means cluster is calculated as follows

where (xc,yc) is the ordered pair of the arithmetic means of the coordinates of the m points in the cluster. In this
step, a centroid is computed for each of the k clusters.
4. Repeat Steps 2 and 3 until the algorithm converges to an answer.
1. Assign each point to the closest centroid computed in Step 3.
2. Compute the centroid of newly defined clusters.
3. Repeat until the algorithm converges to the final answer
Convergence
• Convergence is reached when the computed centroids do not change or
the centroids and the assigned points oscillate back and forth from one
iteration to the next. The latter case can occur when there are one or
more points that are equal distances from the computed centroid.
• To generalize the prior algorithm to n dimensions, suppose there are M
objects, where each object is described by n attributes or property values
(p1,p2,…pn) . Then object ‘i’ is described by (pi1,pi2,…pin) for i = 1,2,…,
M. In other words, there is a matrix with M rows corresponding to the M
objects and n columns to store the attribute values.
• For a given point, pi, at (pi1,pi2,…pin) and a centroid, q, located at (q1,q2,
…qn), the distance, d, between pi and q, is expressed as

• The centroid, q, of a cluster of m points, (pi1,pi2,…pin) is calculated as

Example- K means Clustering
Example- K means Clustering
Reasons to Choose and Cautions
• K-means is a simple and straightforward method for defining clusters.
Once clusters and their associated centroids are identified, it is easy to
assign new objects (for example, new customers) to a cluster based on
the object’s distance from the closest centroid. Because the method is
unsupervised, k-means helps to eliminate subjectivity from the
analysis.
• Although k-means is considered as an unsupervised method, there are
still several decisions that the practitioner must make:
• What object attributes should be included in the analysis?
• What unit of measure (for example, miles or kilometers) should be used for
each attribute?
• Do the attributes need to be rescaled so that one attribute does not have a
disproportionate effect on the results?
• What other considerations might apply?
K-Means Clustering Example
• Consider height and weight information. Using these two variables,
group the objects based on height and weight information.

From the above chart, you will expect that there are two visible clusters/segments
and want these to be identified using K Means algorithm.
K-Means Clustering Example
K-Means Clustering Example
K-Means Clustering Example
INTRODUCTION
Model Ensembles

Rather than creating a single model, they generate a set of models and then make
predictions by aggregating the outputs of these models.
A prediction model that is composed of a set of models is called a model ensemble.
In the context of ensemble models, each model should make predictions independently
of the other models in the ensemble.
Given a large population of independent models, an ensemble can be very accurate
even if the individual models in the ensemble perform only marginally better than
random guessing.
INTRODUCTION
Model Ensembles

There are two defining characteristics of ensemble models:

1. They build multiple different models from the same dataset by inducing each
model using a modified version of the dataset.
2. They make a prediction by aggregating the predictions of the different models
in the ensemble. For categorical target features, this can be done using different
types of voting mechanisms, and for continuous target features, this can be done
using a measure of the central of the different model predictions, such as the
mean or the median.
INTRODUCTION
Boosting

There are two standard approaches to creating ensembles: boosting and bagging.
When we use boosting, each new model added to an ensemble is biased to pay
more attention to instances that previous models misclassified. This is done by
incrementally adapting the dataset used to train the models.
To do this we use a weighted dataset where each instance has an associated
weight wi ≥ 0, initially set to where n is the number of instances in the dataset.
These weights are used as a distribution over which the dataset is sampled to
create a replicated training set, in which the number of times an instance is
replicated is proportional to its weight.
INTRODUCTION
Boosting

Boosting works by iteratively creating models and adding them to the ensemble.
The iteration stops when a predefined number of models have been added.
During each iteration the algorithm does the following:
1. Induces a model using the weighted dataset and calculates the total error, ∈,
in the set of predictions made by the model for the instances in the training
dataset. The ∈ value is calculated by summing the weights of the training
instances for which the predictions made by the model are incorrect.
INTRODUCTION
Boosting
2. Increases the weights for the instances misclassified by the model using

and decreases the weights for the instances correctly classified by the model
using

3. Calculates a confidence factor, α, for the model such that α increases as ∈

decreases. A common way to calculate the confidence factor is
INTRODUCTION
Boosting

Once the set of models has been created, the ensemble makes predictions using a
weighted aggregate of the predictions made by the individual models.
The weights used in this aggregation are the confidence factors associated with
each model.
For categorical target features, the ensemble returns the majority target level
using a weighted vote, and for continuous target features, the ensemble returns
the weighted mean.
INTRODUCTION
Bagging
When we use bagging (or bootstrap aggregating), each model in the
ensemble is trained on a random sample of the dataset where,
importantly, each random sample is the same size as the dataset and
sampling with replacement is used. These random samples are known as
bootstrap samples, and one model is induced from each bootstrap
sample.
The reason that we sample with replacement is that this will result in
duplicates within each of the bootstrap samples, and consequently,
every bootstrap sample will be missing some of the instances from the
dataset.
As a result, each bootstrap sample will be different, and this means that
models trained on different bootstrap samples will also be different.
INTRODUCTION
Bagging
Decision tree induction algorithms are particularly well suited to use with
bagging.
This is because decision trees are very sensitive to changes in the dataset: a
small change in the dataset can result in a different feature being selected to
split the dataset at the root, or high up in the tree, and this can have a ripple
effect throughout the subtrees under this node.
Frequently, when bagging is used with decision trees, the sampling process is
extended so that each bootstrap sample only uses a randomly selected subset
of the descriptive features in the dataset. This sampling of the feature set is
known as subspace sampling.
Subspace sampling further encourages the diversity of the trees within the
ensemble and has the advantage of reducing the training time for each tree.
INTRODUCTION
Bagging
Figure 4.20 illustrates the process of creating a model ensemble
using bagging and
subspace sampling.
The combination of bagging, subspace sampling, and decision trees
is known as a random forest model.
Once the individual models have been induced, the ensemble makes
predictions by returning the majority vote or the median depending
on the type of prediction required.
For continuous target features, the median is preferred to the mean
because the mean is more heavily affected by outliers.
INTRODUCTION
Bagging

Lesson 10 Decision Trees
No ratings yet
Lesson 10 Decision Trees
31 pages
Lecture 8
No ratings yet
Lecture 8
28 pages
Module 04
No ratings yet
Module 04
75 pages
Classification, Prediction
100% (1)
Classification, Prediction
67 pages
UNIT II Machine Learning
No ratings yet
UNIT II Machine Learning
118 pages
Machine Learning: Mona Leeza Email: Monaleeza - Bukc@bahria - Edu.pk
No ratings yet
Machine Learning: Mona Leeza Email: Monaleeza - Bukc@bahria - Edu.pk
60 pages
Lec.7.intro.D.S. Fall 2023
No ratings yet
Lec.7.intro.D.S. Fall 2023
26 pages
Decisiontree1 2
No ratings yet
Decisiontree1 2
29 pages
Konsep Ensemble
No ratings yet
Konsep Ensemble
52 pages
Machine_Learning_Lecture_08_Decision Tree Learning (1)
No ratings yet
Machine_Learning_Lecture_08_Decision Tree Learning (1)
67 pages
AI Unit 4
No ratings yet
AI Unit 4
15 pages
ML & DL Notes
No ratings yet
ML & DL Notes
30 pages
UNIT II Machine Learning
No ratings yet
UNIT II Machine Learning
118 pages
Module 04 Edited
No ratings yet
Module 04 Edited
19 pages
ML L8 Decision Tree
No ratings yet
ML L8 Decision Tree
109 pages
Lecture 6 - Decision Trees
No ratings yet
Lecture 6 - Decision Trees
43 pages
Unit 3 Classification - Dr. Vidyut D
No ratings yet
Unit 3 Classification - Dr. Vidyut D
72 pages
Module 5 Machine Learning
No ratings yet
Module 5 Machine Learning
36 pages
Notes_Decision_Tree
No ratings yet
Notes_Decision_Tree
22 pages
5 Learning
No ratings yet
5 Learning
8 pages
Classification Algorithm
No ratings yet
Classification Algorithm
78 pages
Unit IV Da Online - PPTX 2 82
No ratings yet
Unit IV Da Online - PPTX 2 82
81 pages
ML-Lec-06-Supervised Learning-Decision Trees
No ratings yet
ML-Lec-06-Supervised Learning-Decision Trees
45 pages
Unit 2
No ratings yet
Unit 2
11 pages
DM Unit-3
No ratings yet
DM Unit-3
46 pages
unit 5
No ratings yet
unit 5
25 pages
14 2 DT
No ratings yet
14 2 DT
40 pages
Data Mining Unit-Iii
No ratings yet
Data Mining Unit-Iii
36 pages
Decision Tree
No ratings yet
Decision Tree
57 pages
Artificial Intelligence: Slide 6
100% (1)
Artificial Intelligence: Slide 6
42 pages
Decision Tree
No ratings yet
Decision Tree
14 pages
Decision Tree & Random Forest
No ratings yet
Decision Tree & Random Forest
34 pages
Introduction to AI
No ratings yet
Introduction to AI
51 pages
Module 5 - Supervised Learning Algorithms
No ratings yet
Module 5 - Supervised Learning Algorithms
38 pages
EDA Cat2
No ratings yet
EDA Cat2
54 pages
08 - Classification - Decision Trees
No ratings yet
08 - Classification - Decision Trees
116 pages
Classification & Prediction
No ratings yet
Classification & Prediction
24 pages
ml unit 3
No ratings yet
ml unit 3
13 pages
Learning
No ratings yet
Learning
51 pages
Machine Learning
No ratings yet
Machine Learning
9 pages
Business Analytics: Data Classification
No ratings yet
Business Analytics: Data Classification
36 pages
UNIT II 2.1 ML Decision Tree Learning
No ratings yet
UNIT II 2.1 ML Decision Tree Learning
55 pages
Decision Tree
100% (1)
Decision Tree
57 pages
Asset v1 MKAU+SEng9032+DEV 01+Type@Asset+Block@ML Chapterthree
No ratings yet
Asset v1 MKAU+SEng9032+DEV 01+Type@Asset+Block@ML Chapterthree
129 pages
Introduction To Machine Learning
No ratings yet
Introduction To Machine Learning
10 pages
Lecture 7 Overview of ML models
No ratings yet
Lecture 7 Overview of ML models
77 pages
Module 3 Notes (1)
No ratings yet
Module 3 Notes (1)
31 pages
Decision Tree Is An Upside
No ratings yet
Decision Tree Is An Upside
7 pages
5 Learning
No ratings yet
5 Learning
7 pages
Unit 4 Classification
No ratings yet
Unit 4 Classification
87 pages
Decision Tree Part 1
No ratings yet
Decision Tree Part 1
16 pages
ML Classifiers
No ratings yet
ML Classifiers
48 pages
DWH Unit 4
No ratings yet
DWH Unit 4
10 pages
Lect6 PDF
No ratings yet
Lect6 PDF
66 pages
Unit-2 Material (1)
No ratings yet
Unit-2 Material (1)
52 pages
08 Class Basic
No ratings yet
08 Class Basic
103 pages
DMI UNIT 4
No ratings yet
DMI UNIT 4
34 pages
Decision Tree Algorithm: and Classification Problems Too
No ratings yet
Decision Tree Algorithm: and Classification Problems Too
12 pages
DWDM - Unit - V
No ratings yet
DWDM - Unit - V
93 pages
Elementary Statistics
From Everand
Elementary Statistics
jay prakash Maheshwari
5/5 (1)
8.Design_and_Analysis_of_a_Conformal_MIMO_Ingestible_Bolus_Sensor_Antenna_for_Wireless_Capsule_Endoscopy_for_Animal_Husbandry
No ratings yet
8.Design_and_Analysis_of_a_Conformal_MIMO_Ingestible_Bolus_Sensor_Antenna_for_Wireless_Capsule_Endoscopy_for_Animal_Husbandry
9 pages
7.A_Circularly_Polarized_Omnidirectional_Antenna_for_Wireless_Capsule_Endoscope_System
No ratings yet
7.A_Circularly_Polarized_Omnidirectional_Antenna_for_Wireless_Capsule_Endoscope_System
12 pages
1.Design_of_Wideband_Implantable_Antenna_for_Wireless_Capsule_Endoscope_System
No ratings yet
1.Design_of_Wideband_Implantable_Antenna_for_Wireless_Capsule_Endoscope_System
5 pages
4.Dual-turn electrically coupled loop antenna for gastrointestinal capsules
No ratings yet
4.Dual-turn electrically coupled loop antenna for gastrointestinal capsules
11 pages
Module 5 03preprocessing
No ratings yet
Module 5 03preprocessing
63 pages
Agenda: - Introduction - Basics - Classification - Clustering - Regression - Use-Cases
No ratings yet
Agenda: - Introduction - Basics - Classification - Clustering - Regression - Use-Cases
30 pages
Bi12-019 Bi12-263 LW2
No ratings yet
Bi12-019 Bi12-263 LW2
17 pages
Intweb: An Ai-Based Approach For Adaptive Web
No ratings yet
Intweb: An Ai-Based Approach For Adaptive Web
8 pages
Plant Disease Detection Using Image Processing
No ratings yet
Plant Disease Detection Using Image Processing
17 pages
Cps 8210 Assignment 2
No ratings yet
Cps 8210 Assignment 2
3 pages
Accepted Full List
No ratings yet
Accepted Full List
21 pages
Live 2 - AI - K Means Clustering
No ratings yet
Live 2 - AI - K Means Clustering
17 pages
SMK Means An Improved Mini Batch K Means Algorithm
No ratings yet
SMK Means An Improved Mini Batch K Means Algorithm
16 pages
Alternatives To The 15% Rule: Sandia Report
No ratings yet
Alternatives To The 15% Rule: Sandia Report
55 pages
Cluster Analysis. Discriminant Analysis. MDS
No ratings yet
Cluster Analysis. Discriminant Analysis. MDS
2 pages
DAY Course Content Description
No ratings yet
DAY Course Content Description
1 page
Introduction To Machine Learning
No ratings yet
Introduction To Machine Learning
61 pages
Kim Learning Discriminative Dynamics With Label Corruption For Noisy Label Detection CVPR 2024 Paper
No ratings yet
Kim Learning Discriminative Dynamics With Label Corruption For Noisy Label Detection CVPR 2024 Paper
11 pages
Data Science With R
No ratings yet
Data Science With R
53 pages
Tea 20225
No ratings yet
Tea 20225
26 pages
2008 Homburg, Jensen & Krohmer - Configurations of Marketing and Sales - A Taxonomy (DV) PDF
No ratings yet
2008 Homburg, Jensen & Krohmer - Configurations of Marketing and Sales - A Taxonomy (DV) PDF
24 pages
COS40007 Design project (2)
No ratings yet
COS40007 Design project (2)
11 pages
21AI71 SIMP TIE (1)_250107_124440
No ratings yet
21AI71 SIMP TIE (1)_250107_124440
19 pages
Clustering and Its Application To GIS
No ratings yet
Clustering and Its Application To GIS
8 pages
IIT Syllabus
No ratings yet
IIT Syllabus
20 pages
4 Clustering With K-Means - Kaggle
No ratings yet
4 Clustering With K-Means - Kaggle
9 pages
Multivariate Analysis (Minitab)
100% (1)
Multivariate Analysis (Minitab)
43 pages
Unit 3 DVA
No ratings yet
Unit 3 DVA
50 pages
Self-Study Plan For Becoming A Quantitative Trader - Part II
No ratings yet
Self-Study Plan For Becoming A Quantitative Trader - Part II
4 pages
Data Mining For CRM
No ratings yet
Data Mining For CRM
14 pages
Deep Learning - Roy Keyes
No ratings yet
Deep Learning - Roy Keyes
163 pages
Weka Lab
No ratings yet
Weka Lab
11 pages
Unsupervised Learning
No ratings yet
Unsupervised Learning
41 pages
Effect of Fax Cilating IC Mechanisms On Financial Sustainabilit
No ratings yet
Effect of Fax Cilating IC Mechanisms On Financial Sustainabilit
25 pages
(Ebook) Robust Cluster Analysis and Variable Selection by Gunter Ritter ISBN 9781439857960, 1439857962 - Experience the full ebook by downloading it now
No ratings yet
(Ebook) Robust Cluster Analysis and Variable Selection by Gunter Ritter ISBN 9781439857960, 1439857962 - Experience the full ebook by downloading it now
49 pages

Module 6

Uploaded by

Module 6

Uploaded by

Module 6

Learning from Examples

In this type of machine learning algorithm,

Number of correct classifications

• There are two types of supervised learning algorithm.

• In this type of machine learning algorithm,

• The primary task performed by classifiers is to assign class labels to new

• Entropy H(X) is zero when p(x) is either zero or one.

where C is the class label C ϵ (C1,C2,….Cn) and A is the observed

P(No) > P(Yes)

Step-3: Take the K nearest neighbors as per the calculated

Multiple Linear regression:

A linear line showing the relationship between the dependent

• Clustering is often used as a lead-in to classification.

• Marketing and sales groups use k-means to better identify customers

• Centroid is computed as the mean of all data points in a cluster 𝐶𝑗 = Σ 𝑥 𝑖

• K-means algorithm starts with a first group of randomly selected

• The centroid, q, of a cluster of m points, (pi1,pi2,…pin) is calculated as

There are two defining characteristics of ensemble models:

3. Calculates a confidence factor, α, for the model such that α increases as ∈

You might also like