AI & ML Unit 3 Notes
AI & ML Unit 3 Notes
Classification:
• Machine learning implementations are classified into four major categories,
depending on nature of learning “signal” or “response”.
✓ Supervised Learning
✓ Unsupervised Learning
✓ Reinforcement Learning
✓ Semi-supervised Learning
Supervised Learning:
• Supervised Learning is the machine learning task of learning a function that
maps an input to an output based.
• The given data is labeled.
• Both Classification and Regression problems are supervised learning problems.
• For example, the inputs could be camera images, each one accompanied by an
output saying “bus” or “pedestrian,” etc.
• An output like this is called a label.
• Classification: Classification algorithms are used to solve the classification
problems in which the output variable is categorical such as “yes” or “No”. Some
popular classification algorithms are Random Forest Algorithm, Decision
Tree Algorithm. Logistic Regression Algorithm.
• Regression: Regression algorithms are used to solve regression problems in
which there is a linear relationship between input and output variables. Some
popular regression algorithms are Simple Linear regression Algorithm,
Decision Tree Algorithm.
Advantages of supervised learning:
• Work in labeled dataset
• Helpful in predict the output.
Disadvantages of supervised learning:
• Not able to solve complex tasks.
• Predict wrong output.
Applications of supervised learning:
• Image segmentation
• Medical Diagnosis
• Fraud Detection
Unsupervised Learning:
• Unsupervised learning is a type of machine learning algorithm used to draw
inferences from datasets consisting of input data without labeled responses.
• The machine is trained using unlabeled dataset.
• Both Clustering and Association problems are supervised learning problems.
• For example, when shown millions of images taken from the Internet, a
computer vision system can identify a large cluster of similar images which an
English speaker would call “cats.”
• Clustering: It is unsupervised method of grouping objects into clusters such that
objects with most similarities remains into a group and has less or no similarities
of another group.
• Association: An association rule is an unsupervised learning method which
finds the relationships between variables in large database.
Advantages of unsupervised learning:
• Used for complicated tasks.
Disadvantages of unsupervised learning:
• Output is less accurate
• Working is more difficult
Applications of unsupervised learning:
• Network analysis
Reinforcement Learning:
• Reinforcement learning is the problem of getting an agent to act in the world
so as to maximize its rewards.
• In reinforcement learning the agent learns from a series of reinforcements:
rewards and punishments.
• For example — Consider teaching a dog a new trick: we cannot tell him
what to do, what not to do, but we can reward/punish it if it does the
right/wrong thing.
Semi-supervised Learning:
• Semi-Supervised learning is a type of Machine Learning algorithm that
represents the intermediate ground between Supervised and Unsupervised
learning algorithms.
• It uses the combination of labeled and unlabeled datasets during the training
period,
Linear Regression Models:
• Regression is essential for any machine learning problem
• Linear Regression is a linear approach to modeling the relationship between a
dependent variable and one or more independent variables.
• It is one of the easiest and most popular machine learning algorithms.
• It makes predictions for continuous/real or numeric variables such as sales,
salary, age.
• Linear regression algorithm shows a linear relationship between a dependent (y)
and one or more independent (x) variables, hence called as linear regression.
• The linear regression model provides a sloped straight line representing the
relationship between the variables.
• Let X be the independent variable and Y be the dependent variable.
• A linear relationship between these two variables as follows: Y=mX+c
Where m: slope, c:y-intercept
Types of linear regression:
Simple Linear Regression: If a single independent variable is used to predict
the value of a numerical dependent variable, then such a Linear Regression
algorithm is called Simple Linear Regression.
Equation: Y = b0 + b1x
Multiple Linear Regression: If more than one independent variable is used to
predict the value of a numerical dependent variable, then such a Linear
Regression algorithm is called Multiple Linear Regression.
Equation: Y = b0 + b1x1 + b2x2 + … + bnxn
Least Squares Regression:
• Least squares are a commonly used method in regression analysis.
• The Least squares method is a form of mathematical regression analysis used to
determine the line of best fit for a set of data.
• The Least Square Regression method is calculated as Y’ = bX+a
Where Y’ represents predicted value,
X represents Known value,
b and a represents numbers
• Steps to implement Least Squares Regression in python are listed below:
Step 1: Import the required python libraries such as numpy, pandas
Step 2: Read and load the dataset
Step 3: Create a scatter plot to check the relationship between two variables
Step 4: To assign X and Y as independent and dependent variables.
Step 5: Compute the mean of variables X and Y to determine value of slope
Step 6: Calculate the slope(m) and y-intercept using formula
Bayesian Linear Regression:
• Bayesian Regression is used when the data is insufficient in the dataset or the
data is poorly distributed.
• The output of a Bayesian Regression model is obtained from a probability
distribution.
• The aim of Bayesian Linear Regression is to find the ‘posterior‘ distribution for
the model parameters
• The expression for Posterior is :
Where,
Posterior: It is the probability of an event to occur; say, H, given that another
event; say, E has already occurred. i.e., P(H | E).
Prior: It is the probability of an event H has occurred prior to another event. i.e.,
P(H).
Likelihood: It is a likelihood function in which some parameter variable is
marginalized.
• The Bayesian Ridge Regression formula is as follows:
Gradient Descent:
• Gradient descent is an optimization algorithm that finds the best-fit line for a
given training dataset in a smaller number of iterations
• If m and c are plotted against MSE, it will acquire a bowl shape.
• Cost Function: The cost is the error in our predicted value. It is calculated using
Mean Squared Error function.
• Learning Rate: A learning rate is used for each pair of input and output values.
It is a scalar factor and co-efficients are updated in direction towards minimizing
error.
• The steps are listed below
Step 1: Initially, let m-0,c=0
Step 2: Calculating the partial derivatives of loss function “m” to get derivative
D
Step 3: Similarly, find the partial derivative with respect to c, Dc.
Step 4: Update the current values of m and c using the following equations.
m = m – LDm
c = c – LDc
Step 5: Repeat this process until our cost function is very small.
Discriminant Function:
• A function of a set of variables that is evaluated for samples of events or objects
and used as an aid in discriminating between or classifying them.
• A discriminant function (DF) maps independent (discriminating) variables into
a latent variable D
• DF is usually postulated to be a linear function: D = a0 + a1 x1 + a2 x2 ... aNxN
• The goal of discriminant analysis is to find such values of the coefficients.
• Whenever there is a requirement to separate two or more classes having multiple
features efficiently, the Linear Discriminant Analysis model is considered the
most common technique to solve such classification problems.
• For example if there are classes with multiple features and need to separate them
efficiently. Classify them using a single feature, then it may show overlapping.
• To overcome the overlapping issue in the classification process, must increase
the number of features regularly
• A discriminant function that is a linear combination of the components of x can
be written as
g(X) = WTX + W0
• The linear discriminant function g(x) can be written as
Logistic Regression:
• Logistic regression is the Machine Learning algorithms, under the classification
algorithm of Supervised Learning technique.
• Logistic regression is used to describe data and the relationship between one
dependent variable and one or more independent variables.
• The independent variables can be nominal, ordinal, or of interval type.
• Logistic regression predicts the output of a categorical dependent variable.
• It can be either Yes or No, 0 or 1, true or False, etc. it gives the probabilistic
values which lie between 0 and 1.
• Logistic regression is used for solving the classification problems.
• In Logistic Regression (y) can be between 0 and 1only, let’s divide the above
equation by 1,
• Let’s consider two independent variables x1,x2 and one dependent variables
which is either a blue circle or a red box.
• In SVM algorithm, to maximize the margin between the data points and the
hyperplane, the loss function helps to maximize the margin is called Hinge loss.
Hinge Loss:
• The cost is 0 if the predicted value and the actual value are of the same sign. If
they are not, then calculate the loss value.
• The objective of the regularization parameter is to balance the margin
maximization and loss.
SVM Kernel:
• The SVM kernel is a function that takes low dimensional input space and
transforms it into high dimensional space.
• It converts non-separable problem to separable problem.
• It is mostly useful in non-linear separation problems.
Types of SVM:
• Simple SVM: Typically used for linear regression and classification problems.
• Kernel SVM: More flexibility for non-linear data.
Advantages:
✓ Effective on datasets with multiple features.
✓ Memory Efficient
✓ Different kernel functions can be specified foe decision functions
Disadvantages:
✓ Works best on small sample sets
✓ Regularization is crucial.
Applications:
✓ Used to solve various real-world problems
✓ Helpful in text and hypertext categorization
✓ Classification of images
✓ Classification of satellites
Decision Tree:
• Decision Tree is a supervised learning technique that can be used for both
classification and Regression problems.
• It is a tree-structured classifier, where internal nodes represent the features of a
dataset, branches represent the decision rules and each leaf node represents the
outcome.
• In a Decision tree, there are two nodes, the Decision Node and Leaf Node.
• Decision nodes are used to make any decision.
• The goal of using a Decision Tree is to create a training model that can use to
predict the class or value of the target variable by learning simple decision rules
inferred from prior data.
• In order to build a tree, use the CART algorithm, which stands for Classification
and Regression Tree algorithm.
Information Gain:
• Information gain or IG is a statistical property that measures how well a
given attribute separates the training examples according to their target
classification.
Gini Index:
• Gini index as a cost function used to evaluate splits in the dataset.
• It can be calculated using formula:
Gain Ratio:
• It is defined as the information gain is divided by SplitInfo
Reduction in Variance:
• Reduction in variance is an algorithm that uses the standard formula of
variance to choose the best split.
Chi – Square:
• The acronym CHAID stands for Chi-squared Automatic Interaction
Detector.
• It finds out the statistical significance between the differences between sub-
nodes and parent node.
Advantages:
✓ Simple to understand
✓ Useful for solving decision-related problems.
Disadvantages:
✓ Complex
✓ Over fitting issue
Random Forest:
• Random Forest is a classifier that contains a number of decision trees on
various subsets of the given dataset and takes the average to improve the
predictive accuracy of that dataset.
• Random Forest is a popular machine learning algorithm that belongs to the
supervised learning technique.
• It can be used for both Classification and Regression problems in ML.
• It is based on the concept of ensemble learning.