0% found this document useful (0 votes)
38 views18 pages

Unit Iii

Uploaded by

alwin
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
38 views18 pages

Unit Iii

Uploaded by

alwin
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 18

UNIT - III

Machine learning (ML) is defined as a discipline of artificial intelligence (AI) that provides
machines the ability to automatically learn from data and past experiences to identify patterns
and make predictions with minimal human intervention.
Supervised Learning
• Supervised learning is a type of machine learning that uses labeled data to train machine
learning models. In labeled data, the output is already known. The model just needs to
map the inputs to the respective outputs.
• The algorithm uses this knowledge to try to generalize to new examples that its never
seen before.
• Using labeled inputs and outputs, the model can measure its accuracy and learn over
time
• An example of supervised learning is to train a system that identifies the image of an
animal.
• Supervised Learning is classified into two types
• Classification :
• Whether the output is a discrete class label eg: spam & not Spam
• Linear Classifiers, Support Vector machines Decision trees, random forests
• Regression:
• The output is a continuous value, such as price or probability
• Linear regression and logistic regression are two common types of regression
algorithms
Unsupervised Learning
• Machine algorithm is not given any labels and these algorithms discover hidden
patterns in data and group them together without need for human intervention.
• These algorithms don’t make predictions they only group the data
• Clustering : The algorithm groups similar experiences together eg: businesses might
group customers together based on similarities like age, location or spending habits
• Associations : The algorithm looks for relationships between variables in the data,
businesses want to know which items are often bought together
• Dimensional Reduction: The algorithm reduces the variables in the data, while still
preserving as much of the information as possible. Normally this technique is used in
the pre-processing data stage, removing noise from visual images
Semi-Supervised Learning
• The training data set with both labelled and unlabeled data
• Semi-supervised learning is a branch of machine learning that
combines supervised and unsupervised learning by using both labeled and unlabeled
data to train models for classification and regression tasks.
• train an initial model on a few labeled samples and then iteratively apply it to the
greater number of unlabeled data.
Reinforcement Learning
• Reinforcement Learning is a type of machine learning algorithm that learns to solve a
multi-level problem by trial and error.
• The machine is trained on real-life scenarios to make a sequence of decisions. It
receives either rewards or penalties for the actions it performs. Its goal is to maximize
the total reward.
• Natural Language Processing
• Image Processing

Linear Regression
• Linear regression analysis is used to predict the value of a variable based
on the value of another variable.
• The variable you want to predict is called the dependent variable.
• The variable you are using to predict the other variable's value is called the
independent variable.
• Linear regression fits a straight line or surface that minimizes the
discrepancies between predicted and actual output values.
• There are simple linear regression calculators that use a “least squares”
method to discover the best-fit line for a set of paired data.
• Estimate the value of X (dependent variable) from Y (independent
variable).
Least Square Method

Example
]

Multiple Linear Regression


Example
Logistic Regression
• Supervised learning algorithms can be grouped under two main categories:
• Regression: Predicting continuous target variables. For example,
predicting the price of a house is a regression task.
• Classification: Prediction discrete target variables. For example, predicting
if an email is spam is a classification task.
• Logistic regression is a supervised learning algorithm which is mostly used
to solve binary “classification” tasks although it contains the word
“regression” .
• “logistic” referring to logistic function which actually does the
classification task in the algorithm

Refer problem in the last page


Bayesian Linear Regression
Bayesian Regression
• Bayesian regression is a type of linear regression that uses Bayesian statistics
to estimate the unknown parameters of a model.
• It uses Bayes’ theorem to estimate the likelihood of a set of parameters given
observed data.
• The goal of Bayesian regression is to find the best estimate of the parameters
of a linear model that describes the relationship between the independent and
the dependent variables.
Bayesian Linear Regression
• Bayesian linear regression considers various plausible explanations for how the data
were generated.
• It makes predictions using all possible regression weights, weighted by their posterior
probability.
• In the Bayesian viewpoint, we formulate linear regression using probability distributions
rather than point estimates.
• The response, y, is not estimated as a single value, but is assumed to be drawn from a
probability distribution. The model for Bayesian Linear Regression with the response
sampled from a normal distribution is:

• The output, y is generated from a normal (Gaussian) Distribution characterized by a


mean and variance.
• The mean for linear regression is the transpose of the weight matrix multiplied by the
predictor matrix.
• The variance is the square of the standard deviation σ
• The aim of Bayesian Linear Regression is not to find the single “best” value of the
model parameters, but rather to determine the posterior distribution for the model
parameters.
• The posterior probability of the model parameters is conditional upon the training inputs
and outputs:

The two primary benefits of Bayesian Linear Regression.


1. Priors: If we have domain knowledge, or a guess for what the model parameters
should be, we can include them in our model. If we don’t have any estimates ahead
of time, we can use non-informative priors for the parameters such as a normal
distribution.
2. Posterior: The result of performing Bayesian Linear Regression is a distribution of
possible model parameters based on the data and the prior. This allows us to
quantify our uncertainty about the model: if we have fewer data points, the posterior
distribution will be more spread out.

As the amount of data points increases, the likelihood washes out the prior, and in the
case of infinite data, the outputs for the parameters converge to the values obtained from
Ordinary Least Squares.

Discriminative model:
Linear Discriminant Analysis (LDA) is a dimensionality reduction and classification technique
commonly used in machine learning and pattern recognition.
In the context of classification, it aims to find a linear combination of features that best separates
different classes or categories of data.
It seeks to reduce the dimensionality of the feature space while preserving as much of the class-
separability information as possible.

Steps:
1. Data Preparation: Let’s say we have 150 iris samples with four features each,
and the samples are evenly distributed among the three species.
2. Compute Class Statistics: Calculate the mean and covariance matrix for each
feature in each class. This gives us three mean vectors and three covariance
matrices (one for each class).
3. Compute Between-Class and Within-Class Scatter Matrices: Calculate the
between-class scatter matrix by computing the differences between the mean
vectors of each class and the overall mean, and then summing these outer
products. Calculate the within-class scatter matrix by summing the covariance
matrices of each class, weighted by the number of samples in each class.
4. Compute Eigenvectors and Eigenvalues: Solve the generalized eigenvalue
problem using the between-class scatter matrix and the within-class scatter matrix.
This gives us a set of eigenvectors and their corresponding eigenvalues.
5. Select Discriminant Directions: Sort the eigenvectors by their eigenvalues in
descending order. Let’s say we want to reduce the dimensionality to 2, so we
select the top two eigenvectors.
6. Transform Data: Project the original iris data onto the two selected eigenvectors.
This gives us a new two-dimensional representation of the data.
7. Classification: In the reduced-dimensional space, we can use a classifier (e.g., k-
nearest neighbors) to classify the iris flowers into one of the three species based
on their positions in the reduced space.

Probabilistic discriminative model

Naïve Bayes classifiers


• The Naïve Bayes classifier is a supervised machine learning algorithm. which is
used for classification tasks, such as text classification.
• Naive Bayes is a classification technique based on Bayes’ Theorem with an
assumption that all the features that predict the target value are independent of
each other.
• It calculates each class’s probability and then picks the one with the highest
probability.
• Naive Bayes With “naive” assumption of independence among predictors
• It works with huge data and is mostly used to solve text kinds of data.
• Examples: Email classification, Twitter sentiment analysis, etc.

Baye’s Theorem:
• Bayes theorem is an indispensable law of probability, allowing you to deductively
quantify unknown probabilities
• Bayes’ Theorem allows you to update the predicted probabilities of an event by
incorporating new information.
• Bayes’ Theorem was named after 18th-century mathematician Thomas Bayes.
• It often is employed in finance in calculating or updating risk evaluation.
• The theorem has become a useful element in the implementation of machine
learning.
• Bayes’ theorem is stated mathematically as the following equation:

How the Naive Bayes algorithm works:


• We are training a data set of weather and the corresponding target variable ‘Play’
(suggesting possibilities of playing).
• Now, we need to classify whether players will play or not based on weather
conditions.
Steps:
1. Convert the data set into a frequency table.
2. Create a Likelihood table by finding the probabilities like Overcast probability = 0.29 and
probability of playing is 0.64.
3. Now, use the Naive Bayesian equation to calculate the posterior probability for each class.
4. The class with the highest posterior probability is the outcome of the prediction.
Problem Statement:
Players will play if the weather is sunny. Is this statement correct?
P(Yes | Sunny) = P( Sunny | Yes) * P(Yes) / P (Sunny)
Here we have,
P (Sunny |Yes) = 3/9 = 0.33,
P(Sunny) = 5/14 = 0.36,
P( Yes)= 9/14 = 0.64
Now, P (Yes | Sunny) = 0.33 * 0.64 / 0.36 = 0.60, (high probability)

P(No | Sunny) = P( Sunny | No) * P(No) / P (Sunny)


Here we have
P (Sunny |No) = 2/5 = 0.4,
P(Sunny) = 5/14 = 0.36,
P( No)= 5/14 = 0.36
Now, P (No | Sunny) = 0.4 * 0.36 / 0.36 = 0.40, (low probability)
Players will play if the weather is sunny. This statement is correct.

Support Vector Machine (SVM) Algorithm


➢ It is a supervised machine learning problem used to find a hyperplane that best
separates the two classes.
➢ support vector machine is based on statistical approaches.
➢ There can be an infinite number of hyperplanes passing through a point and
classifying the two classes perfectly, finding the maximum margin between the
hyperplanes that means maximum distances between the two classes.
• Support Vectors: These are the points that are closest to the hyperplane. A
separating line will be defined with the help of these data points.
• Margin: it is the distance between the hyperplane and the observations closest to
the hyperplane (support vectors).

Step 1: SVM algorithm predicts the classes. One of the classes is identified as 1 while the other
is identified as -1.
Step 2: As optimization problems always aim at maximizing or minimizing something while
looking and tweaking for the unknowns, in the case of the SVM classifier, a loss function
known as the hinge loss function is used and tweaked to find the maximum margin.
Step 3: For ease of understanding, this loss function can also be called a cost function whose
cost is 0 when no class is incorrectly predicted. To bring these concepts in theory, a
regularization parameter is added.

Step 4: As is the case with most optimization problems, weights are optimized by calculating
the gradients using advanced mathematical concepts of calculus viz. partial derivatives.\
Step 5: The gradients are updated only by using the regularization parameter when there is no
error in the classification while the loss function is also used when misclassification happens.

Step 6: The gradients are updated only by using the regularization parameter when there is no
error in the classification, while the loss function is also used when misclassification happens.

Hard Margin
Hard Margin refers to that kind of decision boundary that makes sure that all the data points
are classified correctly. While this leads to the SVM classifier not causing any error, it can also
cause the margins to shrink thus making the whole purpose of running an SVM algorithm futile.
Soft Margin
a regularization parameter is also added to the loss function in the SVM classification
algorithm. This combination of the loss function with the regularization parameter allows the
user to maximize the margins at the cost of misclassification.

Decision Tree
Decision Tree is a supervised (labeled data) machine learning algorithm that can be used
for both classification and regression problems.
A decision tree is a tree-like structure that is used as a model for classifying data.

Step by Step Procedure

• Step 1: Determine the Root of the Tree


• Step 2: Calculate Entropy for The Classes

• Step 3: Calculate Entropy After Split for Each Attribute


• Step 4: Calculate Information Gain for each split

• Step 5: Perform the Split


• Step 6: Perform Further Splits
• Step 7: Complete the Decision Tree

Refer problem
Random forest
Random forest, a popular machine learning algorithm developed by Leo Breiman and
Adele Cutler, merges the outputs of numerous decision trees to produce a single
outcome.
One of the most important features of the Random Forest Algorithm is that it can handle
the data set containing continuous variables
Random forest Classifier works on the Bagging principle.
Bagging, also known as Bootstrap Aggregation, serves as the ensemble technique in the
Random Forest algorithm. Here are the steps involved in Bagging:
1. Selection of Subset: Bagging starts by choosing a random sample, or subset, from
the entire dataset.
2. Bootstrap Sampling: Each model is then created from these samples, called
Bootstrap Samples, which are taken from the original data with replacement. This
process is known as row sampling.
3. Bootstrapping: The step of row sampling with replacement is referred to as
bootstrapping.
4. Independent Model Training: Each model is trained independently on its
corresponding Bootstrap Sample. This training process generates results for each
model.
5. Majority Voting: The final output is determined by combining the results of all
models through majority voting. The most commonly predicted outcome among
the models is selected.
6. Aggregation: This step, which involves combining all the results and generating
the final output based on majority voting, is known as aggregation.
Steps Involved in Random Forest Algorithm
• Step 1: In the Random forest model, a subset of data points and a subset of
features is selected for constructing each decision tree. Simply put, n random
records and m features are taken from the data set having k number of records.
• Step 2: Individual decision trees are constructed for each sample.
• Step 3: Each decision tree will generate an output.
• Step 4: Final output is considered based on Majority Voting or Averaging for
Classification and regression, respectively.

You might also like