0% found this document useful (0 votes)

19 views32 pages

ML Unit-2 Final

The document covers various classification and regression models, including linear separability, decision trees, linear regression, logistic regression, and their applications. It explains concepts such as decision regions, linear discriminants, and the differences between linear and logistic regression. Additionally, it discusses the decision tree algorithm, its advantages and disadvantages, and the ID3 and C4.5 algorithms for attribute selection.

Uploaded by

siva M

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

19 views32 pages

ML Unit-2 Final

Uploaded by

siva M

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 32

Unit-2

Syllabus: Classification and Regression Models:

Linear Separability and decision regions, linear discriminates, linear regression, logistic regression, decision trees-
ID3 and C4.5, KNN.

Linear separability:

• Two sets of data points in a two dimensional space are said to be linearly separable when they can be
completely separable by a single straight line.
• For example, here is a case of selling a house based on area and price.
• We have got a number of data points for that along with the class, which is house Sold/Not Sold
What is linearly separable and linearly non separable data?
If you can draw a line or hyper plane that can separate those points into two classes, then. the data
is separable. If not, the data is termed as non linearly separable data.

Boolean AND & OR are linear separable problems while XOR is non linear separable

1
• The most classic example of linearly inseparable pattern is a logical exclusive-OR(XOR) function.
• Shown in figure is the illustration of XOR function that two classes, 0 for red dot and 1 for blue dot, cannot
be separated with a single line.
• The solution seems that patterns can be logically classified with two lines L1 and L2

Decision Regions:

• When performing pattern recognition, a set of patterns can be represented in a pattern space, in which
each pattern is represented as a point at a particular set of coordinates.

• The decision regions are separated by surfaces called the decision boundaries
• A decision region is an area or volume, marked by cuts in the pattern space.
• All of the patterns within a usable decision region belong to the same class.
 All feature vectors in a decision region are assigned to the same category.
 The decision regions are often simply connected, but they can be multiply connected as well.
 These separating surfaces represent points where there are ties between two or more categories.

2
linear discriminates:
 Linear discriminant or Fisher's linear discriminant, a method used in statistics, pattern recognition, and
machine learning to find a linear combination of features that characterizes or separates two or more
classes of objects or events.
 It is mainly used to express one dependent variable as a linear combination of other features or
measurements.
 LDA projects data from a D dimensional feature space down to a D’ (D>D’) dimensional space in a way to
maximize the variability between the classes and reducing the variability within the classes.

Regression:
Regression is a supervised learning technique that supports finding the correlation among variables. A regression
problem is when the output variable is a real or continuous value.

In Regression, we plot a graph between the variables which best fit the given data points. The machine learning
model can deliver predictions regarding the data.

Linear Regression

Linear regression is one of the easiest and most popular Machine Learning algorithms. It is a statistical method that
is used for predictive analysis. Linear regression makes predictions for continuous/real or numeric variables such
as sales, salary, age, product price, etc.

Linear regression algorithm shows a linear relationship between a dependent (y) and one or more independent (x)
variables, hence called as linear regression. Since linear regression shows the linear relationship, which means it
finds how the value of the dependent variable is changing according to the value of the independent variable.

The linear regression model provides a sloped straight line representing the relationship between the variables.
Consider the below image:

3
Mathematically, we can represent a linear regression as:

y= a0+a1x+ ε

Here,

Y= Dependent Variable (Target Variable)

X= Independent Variable (predictor Variable)
a0= intercept of the line (Gives an additional degree of freedom)
a1 = Linear regression coefficient (scale factor to each input value).
ε = random error

The values for x and y variables are training datasets for Linear Regression model representation.

Linear Regression Line

A linear line showing the relationship between the dependent and independent variables is called a regression
line. A regression line can show two types of relationship:

o Positive Linear Relationship:

If the dependent variable increases on the Y-axis and independent variable increases on X-axis, then such a
relationship is termed as a Positive linear relationship.

4
o Negative Linear Relationship:
If the dependent variable decreases on the Y-axis and independent variable increases on the X-axis, then
such a relationship is called a negative linear relationship.

Finding the best fit line:

When working with linear regression, our main goal is to find the best fit line that means the error between
predicted values and actual values should be minimized. The best fit line will have the least error.

The different values for weights or the coefficient of lines (a0, a1) gives a different line of regression, so we need to
calculate the best values for a0 and a1 to find the best fit line, so to calculate this we use cost function.

Cost function:

o The different values for weights or coefficient of lines (a0, a1) gives the different line of regression, and the
cost function is used to estimate the values of the coefficient for the best fit line.
o Cost function optimizes the regression coefficients or weights. It measures how a linear regression model is
performing.
o We can use the cost function to find the accuracy of the mapping function, which maps the input variable
to the output variable. This mapping function is also known as Hypothesis function.

For Linear Regression, we use the Mean Squared Error (MSE) cost function, which is the average of squared error
occurred between the predicted values and actual values. It can be written as:

For the above linear equation, MSE can be calculated as:

Where,

N=Total number of observation

Yi = Actual value
(a1xi+a0)= Predicted value.

Applications (usecases) of linear regression:

5
o Sales Forecasting
o Risk Analysis
o Housing Applications To Predict the prices and other factors
o Finance Applications To Predict Stock prices, investment evaluation, etc.

Types of Regression models:

• It is the simplest form of regression, and models y as a linear function of x.

• Where the variance of y is assumed to be constant, and a and b are regression coefficients specifying the Y-
intercept and slope of the line, respectively.

• These coefficients can be solved for by the method of least squares, which estimates the best-fitting straight
line as the one that minimizes the error between the actual data and the estimate of the line.

6
Nonlinear Regression:

“How can we model data that does not show a linear dependence? For example, what if a given response
variable and predictor variable have a relationship that may be modeled by a polynomial function?”

• Polynomial regression is often of interest when there is just one predictor variable.

• It can be modeled by adding polynomial terms to the basic linear model.

• By applying transformations to the variables, we can convert the nonlinear model into a linear one that can
then be solved by the method of least squares.

• Note that polynomial regression is a special case of multiple regression.

• That is, the addition of high-order terms like x2, x3, and so on, which are simple functions of the single
variable, x, can be considered equivalent to adding new independent variables.

7
Example-2

• Table shows a set of paired data where x is the number of years of work experience of a college graduate
and y is the corresponding salary of the graduate.
• Predict that the salary of a college graduate with, say, 10 years of experience is___

8
Logistic Regression:

Logistic regression is a statistical method that is used for building machine learning models where the dependent
variable is dichotomous: i.e. binary. Logistic regression is used to describe data and the relationship between one
dependent variable and one or more independent variables. The independent variables can be nominal, ordinal, or
of interval type.

The name “logistic regression” is derived from the concept of the logistic function that it uses. The logistic function
is also known as the sigmoid function. The value of this logistic function lies between zero and one.

The following is an example of a logistic function we can use to find the probability of a vehicle breaking down,
depending on how many years it has been since it was serviced last.

9
Here is how you can interpret the results from the graph to decide whether the vehicle will break down or not.

Advantages of the Logistic Regression Algorithm

 Logistic regression performs better when the data is linearly separable

 It does not require too many computational resources as it’s highly interpretable
 There is no problem scaling the input features—It does not require tuning
 It is easy to implement and train a model using logistic regression
 It gives a measure of how relevant a predictor (coefficient size) is, and its direction of association (positive or
negative

Applications of Logistic Regression

 Using the logistic regression algorithm, banks can predict whether a customer would default on loans or not
 To predict the weather conditions of a certain place (sunny, windy, rainy, humid, etc.)
10
 Ecommerce companies can identify buyers if they are likely to purchase a certain product
 Companies can predict whether they will gain or lose money in the next quarter, year, or month based on their
current performance
 To classify objects based on their features and attributes

How Does the Logistic Regression Algorithm Work?

Consider the following example: An organization wants to determine an employee’s salary increase based on their
performance.
For this purpose, a linear regression algorithm will help them decide. Plotting a regression line by considering the
employee’s performance as the independent variable, and the salary increase as the dependent variable will make
their task easier.

Now, what if the organization wants to know whether an employee would get a promotion or not based on their
performance? The above linear graph won’t be suitable in this case. As such, we clip the line at zero and one, and
convert it into a sigmoid curve (S curve)

Based on the threshold values, the organization can decide whether an employee will get a salary increase or not.
The equation of the sigmoid function is:

The sigmoid curve obtained from the above equation is as follows:

11
Example-2:

12
Differences between Linear and Logistic Regression:

Linear Regression Logistic Regression

Linear regression is used to predict the continuous Logistic Regression is used to predict the categorical
dependent variable using a given set of independent dependent variable using a given set of independent
variables. variables.

Linear Regression is used for solving Regression Logistic regression is used for solving Classification
problem. problems.
In linear regression, we find the best fit line, by which In Logistic Regression, we find the S-curve by which
we can easily predict the output. we can classify the samples.

Least square estimation method is used for estimation Maximum likelihood estimation method is used for
of accuracy. estimation of accuracy.

The output for Linear Regression must be a The output of Logistic Regression must be a
continuous value, such as price, age, etc. Categorical value such as 0 or 1, Yes or No, etc.

In Linear regression, it is required that relationship In Logistic regression, it is not required to have the
between dependent variable and independent linear relationship between the dependent and
variable must be linear. independent variable.

13
Decision Tree Algorithm:

Decision Tree is a Supervised learning technique that can be used for both classification and Regression
problems, but mostly it is preferred for solving Classification problems.
o It is a tree-structured classifier
The tree has three types of nodes:
– Root node that has no incoming edges and zero or more outgoing edges.
– Internal nodes, each of which has exactly one incoming edge and two or more outgoing edges.
– Leaf or terminal nodes, each of which has exactly one incoming edge and no outgoing edges.
In a decision tree, each leaf node is assigned a class label.
The non-terminal nodes, which include the root and other internal nodes, contain attribute test conditions
to separate records that have different characteristics.

o The decisions or the test are performed on the basis of features of the given dataset.
o It is a graphical representation for getting all the possible solutions to a problem/decision based on given
conditions.
o It is called a decision tree because, similar to a tree, it starts with the root node, which expands on further
branches and constructs a tree-like structure.
o A decision tree simply asks a question, and based on the answer (Yes/No), it further split the tree into subtrees.
oBelow diagram explains the general structure of a decision tree

How does the Decision Tree algorithm Work?

In a decision tree, for predicting the class of the given dataset, the algorithm starts from the root node of the tree.
This algorithm compares the values of root attribute with the record (real dataset) attribute and, based on the
comparison, follows the branch and jumps to the next node.
For the next node, the algorithm again compares the attribute value with the other sub-nodes and move further. It
continues the process until it reaches the leaf node of the tree. The complete process can be better understood
using the below algorithm:

 Step-1: Begin the tree with the root node, says S, which contains the complete dataset.
 Step-2: Find the best attribute in the dataset using Attribute Selection Measure (ASM).
 Step-3: Divide the S into subsets that contains possible values for the best attributes.
 Step-4: Generate the decision tree node, which contains the best attribute.

14
 Step-5: Recursively make new decision trees using the subsets of the dataset created in step -3. Continue
this process until a stage is reached where you cannot further classify the nodes and called the final node as
a leaf node.

Example:
A decision tree that is used to classify whether a person is Fit or Unfit. The decision nodes here are questions like
‘’‘Is the person less than 30 years of age?’, ‘Does the person eat junk?’, etc. and the leaves are one of the two
possible outcomes viz. Fit and Unfit. Looking at the Decision Tree we can say make the following decisions: if a
person is less than 30 years of age and doesn’t eat junk food then he is Fit, if a person is less than 30 years of age
and eats junk food then he is Unfit and so on.

Advantages of the Decision Tree

o It is simple to understand as it follows the same process which a human follow while making any decision
in real-life.
o It can be very useful for solving decision-related problems.
o It helps to think about all the possible outcomes for a problem.
o There is less requirement of data cleaning compared to other algorithms.

Disadvantages of the Decision Tree

o The decision tree contains lots of layers, which makes it complex.
o It may have an overfitting issue, which can be resolved using the Random Forest algorithm.
o For more class labels, the computational complexity of the decision tree may increase.

Attribute Selection Measures in decision tree:

While implementing a Decision tree, the main issue arises that how to select the best attribute for the root node
and for sub-nodes. So, to solve such problems there is a technique, which is called as Attribute selection measure
or ASM. By this measurement, we can easily select the best attribute for the nodes of the tree. There are two
popular techniques for ASM, which are:
o Entropy
o Information Gain
o Gini Index
o Gain ratio

i) ID3 Algorithm :
ID3 stands for Iterative Dichotomiser 3 and is named such because the algorithm iteratively (repeatedly)
dichotomizes(divides) features into two or more groups at each step.

 It is a classification algorithm that follows a greedy approach by selecting a best attribute that yields
maximum Information Gain(IG) or minimum Entropy(H).

ID3 only work with Discrete or nominal data, but C4. 5 work with both Discrete and Continuous data

15
ID3 algorithm selects the best attribute based on the concept of entropy and information gain for developing the
tree. C4. 5 algorithm acts similar to ID3 but improves a few of ID3 behaviors. The metric (or heuristic) used in C4. 5
to measure impurity is the Gain Ratio.

We denote our dataset as S, entropy is calculated as:

Entropy(S) = - ∑ pᵢ * log₂(pᵢ) ; i = 1 to n

where,
n is the total number of classes in the target column (in our case n = 2 i.e YES and NO)
pᵢ is the probability of class ‘i’ or the ratio of “number of rows with class i in the target column” to the “total number
of rows” in the dataset.

Information Gain for a feature column A is calculated as:

IG(S, A) = Entropy(S) - ∑((|Sᵥ| / |S|) * Entropy(Sᵥ))

where Sᵥ is the set of rows in S for which the feature column A has value v, |Sᵥ| is the number of rows in Sᵥ and
likewise |S| is the number of rows in S.

ID3 Steps (Algorithm or procedure):

1. Calculate the Information Gain of each feature.

2. Considering that all rows don’t belong to the same class, split the dataset S into subsets using the feature for
which the Information Gain is maximum.
3. Make a decision tree node using the feature with the maximum Information gain.
4. If all rows belong to the same class, make the current node as a leaf node with the class as its label.
5. Repeat for the remaining features until we run out of all features, or the decision tree has all leaf nodes

Example:

16
17
18
19
20
21
22
Advantages of using ID3 :

 Understandable prediction rules are created from the training data.

23
 Builds the fastest tree.
 Builds a short tree.
 Only need to test enough attributes until all data is classified.
 Finding leaf nodes enables test data to be pruned, reducing number of tests.
 Whole dataset is searched to create tree.
Disadvantages of using ID3:

 Data may be over-fitted or over-classified, if a small sample is tested.

 Only one attribute at a time is tested for making a decision.
 Classifying continuous data may be computationally expensive, as many trees must be generated to see
where to break the continuum.

ii) C4.5 Algorithm:

ID3 only work with Discrete or nominal data, but C4. 5 work with both Discrete and Continuous data. It is a
successor of ID3

ID3 algorithm selects the best attribute based on the concept of entropy and information gain for developing the
tree.

C4.5 algorithm selects the best attribute based on Gain ratio

The information gain measure is biased toward tests with many outcomes. That is, it prefers to select attributes
having a large number of values. For example, consider an attribute that acts as a unique identifier such as
product_ID. A split on product_ID would result in a large number of partitions (as many as there are values), each
one containing just one tuple. Because each partition is pure, the information required to classify data set D based
on this partitioning would be Infoproduct_ID(D)= 0. Therefore, the information gained by partitioning on this attribute
is maximal. Clearly, such a partitioning is useless for classification.

C4.5, a successor of ID3, uses an extension to information gain known as gain ratio, which attempts to overcome
this bias. It applies a kind of normalization to information gain using a “split information” value defined
analogously with Info(D) as

This value represents the potential information generated by splitting the training data set, D, into v partitions,
corresponding to the v outcomes of a test on attribute A. Note that, for each outcome, it considers the number of
tuples having that outcome same partitioning. The gain ratio is defined as with respect to the total number of
tuples in D. It differs from information gain, which measures the information with respect to classification that is
acquired based on the

The attribute with the maximum gain ratio is selected as the splitting attribute.

C4.5 Steps (Algorithm or procedure):

24
1. Check for the above base cases.
2. For each attribute a, find the normalised information gain ratio from splitting on A.
3. Let A_best be the attribute with the highest normalized information gain.
4. Create a decision node that splits on A_best.
5. Recursively on the sublists obtained by splitting on A_best, and add those nodes as children of node.
The base cases are the following:

 All the examples from the training set belong to the same class ( a tree leaf labeled with that class is
returned ).
 The training set is empty ( returns a tree leaf called failure ).
 The attribute list is empty ( returns a leaf labeled with the most frequent class or the disjunction of all the
classes).

Advantages of C4.5:
 Can use both categorical and continuous values
 The algorithm inherently employs Single Pass Pruning Process to Mitigate overfitting.
 It can work with both Discrete and Continuous Data
 C4.5 can handle the issue of incomplete data very well.
 Builds models that can be easily interpreted
 Easy to implement
 Deals with noise

Disadvantages of C4.5:
 Small variation in data can lead to different decision trees (especially when the variables are close to each
other in value)
 Does not work very well on a small training set

25
26
27
28
29
Comparison between ID3 and C4.5 algorithms:

K-Nearest Neighbour (KNN) Algorithm:

 K-Nearest Neighbour is one of the simplest Machine Learning algorithms based on Supervised Learning
technique.

 K-NN algorithm assumes the similarity between the new case/data and available cases and put the new
case into the category that is most similar to the available categories.

 K-NN algorithm stores all the available data and classifies a new data point based on the similarity. This
means when new data appears then it can be easily classified into a well suite category by using K- NN
algorithm.

 K-NN algorithm can be used for Regression as well as for Classification but mostly it is used for the
Classification problems.

Example:

30
31
How does K-NN work:

KNN Steps (Algorithm or procedure):

The K-NN working can be explained on the basis of the below algorithm:

 Step-1: Select the number K of the neighbors

 Step-2: Calculate the Euclidean distance of K number of neighbors

 Step-3: Take the K nearest neighbors as per the calculated Euclidean distance.

 Step-4: Among these k neighbors, count the number of the data points in each category.

 Step-5: Assign the new data points to that category for which the number of the neighbor is maximum.

Advantages of KNN Algorithm:

 It is simple to implement.
 It is robust to the noisy training data
 It can be more effective if the training data is large.

Disadvantages of KNN Algorithm:

 Always needs to determine the value of K which may be complex some time.
 The computation cost is high because of calculating the distance between the data points for all the training
samples.
Applications of KNN
The following are some of the areas in which KNN can be applied successfully −
Banking System
• KNN can be used in banking system to predict weather an individual is fit for loan approval? Does that
individual have the characteristics similar to the defaulters one.
Calculating Credit Ratings
• KNN algorithms can be used to find an individual’s credit rating by comparing with the persons having
similar traits.
Politics
• With the help of KNN algorithms, we can classify a potential voter into various classes like “Will Vote”, “Will
not Vote”, “Will Vote to Party ‘Congress’, “Will Vote to Party ‘BJP’.
• Other areas in which KNN algorithm can be used are Speech Recognition, Handwriting Detection, Image
Recognition and Video Recognition.

Linear Regression
No ratings yet
Linear Regression
16 pages
What Are Linear Models in Machine Learning (1) .Docx (Unit3 ML)
No ratings yet
What Are Linear Models in Machine Learning (1) .Docx (Unit3 ML)
60 pages
Unit-4 DS Student
No ratings yet
Unit-4 DS Student
43 pages
DS Unit-Iv
No ratings yet
DS Unit-Iv
34 pages
Machine Learning: Bilal Khan
100% (2)
Machine Learning: Bilal Khan
20 pages
ML 2 ND Unit
No ratings yet
ML 2 ND Unit
50 pages
ML - Module 2
No ratings yet
ML - Module 2
16 pages
4 ML
No ratings yet
4 ML
41 pages
Types of Supervised Learning2
No ratings yet
Types of Supervised Learning2
66 pages
ML Unit2
No ratings yet
ML Unit2
69 pages
Simple Linear Regression With Example Problem
No ratings yet
Simple Linear Regression With Example Problem
12 pages
AI - Mod 5. Part 3
No ratings yet
AI - Mod 5. Part 3
26 pages
Regression: Unit Iii
No ratings yet
Regression: Unit Iii
54 pages
6 Regression Analysis
No ratings yet
6 Regression Analysis
12 pages
Unit 3
No ratings yet
Unit 3
30 pages
Regression: UNIT - V Regression Model
100% (1)
Regression: UNIT - V Regression Model
21 pages
Linear Regression
No ratings yet
Linear Regression
24 pages
Linear Regression
No ratings yet
Linear Regression
11 pages
Unit 2 Notes - Final
No ratings yet
Unit 2 Notes - Final
32 pages
ML Algorithm
No ratings yet
ML Algorithm
4 pages
Unit 2
No ratings yet
Unit 2
19 pages
L4a - Supervised Learning
No ratings yet
L4a - Supervised Learning
25 pages
DMML Unit4
No ratings yet
DMML Unit4
77 pages
(Unit-04) Part-01 - ML Algo
No ratings yet
(Unit-04) Part-01 - ML Algo
49 pages
Lecture 2
No ratings yet
Lecture 2
17 pages
Unit - Iii Data Analysis
No ratings yet
Unit - Iii Data Analysis
39 pages
AAI Lecture 10 SP 25
No ratings yet
AAI Lecture 10 SP 25
37 pages
DA Notes 3
No ratings yet
DA Notes 3
12 pages
Supervised Learning Algorithms
No ratings yet
Supervised Learning Algorithms
20 pages
Mod3 Eda
No ratings yet
Mod3 Eda
16 pages
Aiml Unit 3
No ratings yet
Aiml Unit 3
9 pages
Unit 2
No ratings yet
Unit 2
136 pages
Unit5 R
No ratings yet
Unit5 R
5 pages
BA3 4 5modules
No ratings yet
BA3 4 5modules
258 pages
Unit 2
No ratings yet
Unit 2
67 pages
Data Science
100% (1)
Data Science
14 pages
Module 5
No ratings yet
Module 5
48 pages
18-Linear Regression
No ratings yet
18-Linear Regression
29 pages
Chapter 6 Supervised Learning
No ratings yet
Chapter 6 Supervised Learning
6 pages
ML Points
No ratings yet
ML Points
13 pages
Regression in M.L
No ratings yet
Regression in M.L
13 pages
Machine Learning and Deep Learning Course
No ratings yet
Machine Learning and Deep Learning Course
23 pages
Module 3
No ratings yet
Module 3
34 pages
Linear Regression
No ratings yet
Linear Regression
36 pages
ML Lecture - 3
No ratings yet
ML Lecture - 3
47 pages
Regression Analysis in Machine Learning
No ratings yet
Regression Analysis in Machine Learning
13 pages
Regression Analysis in Machine Learning
No ratings yet
Regression Analysis in Machine Learning
26 pages
DA Unit-3
No ratings yet
DA Unit-3
13 pages
Solving One Variable Linear Equations
No ratings yet
Solving One Variable Linear Equations
10 pages
6 ML Updated
No ratings yet
6 ML Updated
23 pages
Notes 2
No ratings yet
Notes 2
22 pages
OE-ML Unit - 3
No ratings yet
OE-ML Unit - 3
29 pages
Unit - II - DA
No ratings yet
Unit - II - DA
22 pages
Unit - 2 MLA
No ratings yet
Unit - 2 MLA
57 pages
Chapter4 Regression
No ratings yet
Chapter4 Regression
15 pages
Unit 2 3 Notes
No ratings yet
Unit 2 3 Notes
16 pages
Unit-Iii-1 1
No ratings yet
Unit-Iii-1 1
31 pages
Unit 15
No ratings yet
Unit 15
12 pages
Unit-5 Decision Trees & Ensembles Methods
No ratings yet
Unit-5 Decision Trees & Ensembles Methods
11 pages
Lecture 12
No ratings yet
Lecture 12
19 pages
Classification Basedon Decision Tree Algorithm
No ratings yet
Classification Basedon Decision Tree Algorithm
10 pages
Advanced Machine Learning CIE
No ratings yet
Advanced Machine Learning CIE
13 pages
Detecting Malicious Websites Using Machine Learning
No ratings yet
Detecting Malicious Websites Using Machine Learning
58 pages
CS8091 LN
No ratings yet
CS8091 LN
68 pages
Data Mining Unit-2
No ratings yet
Data Mining Unit-2
37 pages
LINFO2262: Decision Trees + Random Forests: Pierre Dupont
No ratings yet
LINFO2262: Decision Trees + Random Forests: Pierre Dupont
43 pages
Unit 5 - DA - Classification & Clustering
No ratings yet
Unit 5 - DA - Classification & Clustering
105 pages
Minor Project Report
No ratings yet
Minor Project Report
50 pages
Decision Tree Algorithm Tutorial With Example in R
No ratings yet
Decision Tree Algorithm Tutorial With Example in R
23 pages
Session 17-Decision Tree
No ratings yet
Session 17-Decision Tree
16 pages
Shivaji University, Kolhapur
No ratings yet
Shivaji University, Kolhapur
12 pages
SVM, KNN, Tree NBC
No ratings yet
SVM, KNN, Tree NBC
22 pages
Business Data Mining Week 11
No ratings yet
Business Data Mining Week 11
15 pages
Unit 4a Decision Tree
No ratings yet
Unit 4a Decision Tree
90 pages
ML-3-Decision Tree
No ratings yet
ML-3-Decision Tree
17 pages
3 - Decision Trees
No ratings yet
3 - Decision Trees
16 pages
CS607 Quiz 2 by Attiq Kundi-Updated On 05-06-2023
No ratings yet
CS607 Quiz 2 by Attiq Kundi-Updated On 05-06-2023
7 pages
Decision Tree
No ratings yet
Decision Tree
16 pages
Data Mining Unit-IV
No ratings yet
Data Mining Unit-IV
7 pages
03 InformationGain
No ratings yet
03 InformationGain
20 pages
Decision Tree and Random Forest
No ratings yet
Decision Tree and Random Forest
23 pages
Summer Internship Report
No ratings yet
Summer Internship Report
27 pages
Deshpande Et Al. 2024 - NLP Driven - Chatbot For Career - Counseling
No ratings yet
Deshpande Et Al. 2024 - NLP Driven - Chatbot For Career - Counseling
7 pages
UNIT-IV - Decision Tree Induction
No ratings yet
UNIT-IV - Decision Tree Induction
19 pages
Employee Performance Appraisal For Salary Hike - Project
No ratings yet
Employee Performance Appraisal For Salary Hike - Project
93 pages
整合機器學習方法於決策樹為基智慧型排程系統之研究
No ratings yet
整合機器學習方法於決策樹為基智慧型排程系統之研究
76 pages
MLP - Iv Eee
No ratings yet
MLP - Iv Eee
36 pages

ML Unit-2 Final

Uploaded by

ML Unit-2 Final

Uploaded by

Unit-2

Syllabus: Classification and Regression Models:

Y= Dependent Variable (Target Variable)

Linear Regression Line

o Positive Linear Relationship:

Finding the best fit line:

For the above linear equation, MSE can be calculated as:

N=Total number of observation

Applications (usecases) of linear regression:

Types of Regression models:

• It is the simplest form of regression, and models y as a linear function of x.

• It can be modeled by adding polynomial terms to the basic linear model.

• Note that polynomial regression is a special case of multiple regression.

Advantages of the Logistic Regression Algorithm

 Logistic regression performs better when the data is linearly separable

Applications of Logistic Regression

How Does the Logistic Regression Algorithm Work?

The sigmoid curve obtained from the above equation is as follows:

Linear Regression Logistic Regression

How does the Decision Tree algorithm Work?

Advantages of the Decision Tree

Disadvantages of the Decision Tree

Attribute Selection Measures in decision tree:

We denote our dataset as S, entropy is calculated as:

Information Gain for a feature column A is calculated as:

IG(S, A) = Entropy(S) - ∑((|Sᵥ| / |S|) * Entropy(Sᵥ))

ID3 Steps (Algorithm or procedure):

1. Calculate the Information Gain of each feature.

 Understandable prediction rules are created from the training data.

 Data may be over-fitted or over-classified, if a small sample is tested.

ii) C4.5 Algorithm:

C4.5 algorithm selects the best attribute based on Gain ratio

C4.5 Steps (Algorithm or procedure):

K-Nearest Neighbour (KNN) Algorithm:

KNN Steps (Algorithm or procedure):

 Step-1: Select the number K of the neighbors

 Step-2: Calculate the Euclidean distance of K number of neighbors

Advantages of KNN Algorithm:

Disadvantages of KNN Algorithm:

You might also like