0% found this document useful (0 votes)

29 views73 pages

Ds Module 4

The document discusses machine learning techniques including supervised learning, unsupervised learning and reinforcement learning. It also discusses classification and regression problems in machine learning and how to evaluate classifiers using accuracy, confusion matrix and other metrics. Key machine learning algorithms like linear regression, logistic regression are also explained.

Uploaded by

Prathik Srinivas

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

29 views73 pages

Ds Module 4

Uploaded by

Prathik Srinivas

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

You are on page 1/ 73

2023-2024

21IS5C05
Data Science

Module 4
Rampur Srinath
NIE, Mysuru
[email protected] 1
Machine Learning

Machine learning involves coding programs that

automatically adjust their performance in
accordance with their exposure to information in
data.

Machine learning can be considered a subfield of

artificial intelligence (AI) we can roughly divide
the field into the following three major classes.
• Supervised Learning
• Unsupervised Learning
• Reinforcement Learning
Supervised learning: Algorithms which learn from a training set of
labeled examples to generalize to the set of all possible inputs.
Examples of techniques in supervised learning: logistic regression,
support vector machines, decision trees, random forest, etc.

Unsupervised learning: Algorithms that learn from a training set of

unlabeled examples. Used to explore data according to some
statistical, geometric or similarity criterion. Examples of unsupervised
learning include k-means clustering and kernel density estimation.

Reinforcement learning: Algorithms that learn via reinforcement from

criticism that provides information on the quality of a solution, but not
on how to improve it. Improved solutions are achieved by iteratively
exploring the solution space.
• As a data scientist, the first step you apply given a certain
problem is to identify the question to be answered. According
to the type of answer we are seeking, we are directly aiming
for a certain set of techniques.
• If our question is answered by YES/NO, we are facing a
classification problem. Classifiers are also the tools to use if
our question admits only a discrete set of answers, i.e., we
want to select from a finite number of choices.
• – Given the results of a clinical test, e.g., does this patient
suffer from diabetes?
• – Given a magnetic resonance image, is it a tumor shown in
the image?
• – Given the past activity associated with a credit card, is the
current operation fraudulent?
• If our question is a prediction of a real-valued quantity, we
are faced with a regression problem.
• – Given the description of an apartment, what is the expected
market value of the flat? What will the value be if the
apartment has an elevator?
• – Given the past records of user activity on Apps, how long
will a certain client be connected to our App?
• – Given my skills and marks in computer science and maths,
what mark will I achieve in a data science course?
• Classification is the natural choice of machine learning tools
for prediction with discrete known outcomes. According to
the cardinality of the target set, one usually distinguishes
between binary classifiers when the target output only takes
two values, i.e., the classifier answers questions with a yes or
a no; or multiclass classifiers, for a larger number of classes.
• We can encode both target states in a numerical variable,
e.g., a successful loan target can take value +1; and it is −1,
otherwise.
A problem in Scikit-learn is modeled as
follows:
Input data is structured in Numpy arrays. The size of the array
is expected to be [n_samples, n_features]:
• – n_samples: The number of samples (n). Each sample is an
item to process (e.g., classify). A sample can be a document,
a picture, an audio file, a video, an astronomical object, a
row in a database or whatever you can describe with a fixed
set of quantitative traits.
• – n_features: The number of features (d) or distinct traits
that can be used to describe each item in a quantitative
manner. Features are generally real-valued, but may be
Boolean, discrete-valued or even categorical.
Considering data arranged as in the previous matrices we refer to:
• the columns as features, attributes, dimensions, regressors,
covariates, predictors, or independent variables;
• the rows as instances, examples, or samples;
• the target as the label, outcome, response, or dependent variable.

All objects in Scikit-learn share a uniform and limited API

consisting of three complementary interfaces:
• an estimator interface for building and fitting models (fit());
• a predictor interface for making predictions (predict());
• a transformer interface for converting data (transform()).
The basic measure of performance of a classifier is its
accuracy. This is defined as the number of correctly
predicted examples divided by the total amount of examples.
Accuracy is related to the error as follows: acc = 1 − err .

Each estimator has a score() method that invokes the default

scoring metric.
• Although accuracy is the most normal metric for evaluating
classifiers, there are cases when the business value of
correctly predicting elements from one class is different from
the value for the prediction of elements of another class.
• In those cases, accuracy is not a good performance metric
and more detailed analysis is needed. The confusion matrix
enables us to define different metrics considering such
scenarios.
The confusion matrix considers the concepts of the classifier
outcome and the actual ground truth or gold standard. In a
binary problem, there are four possible cases:
• • True positives (TP): When the classifier predicts a sample
as positive and it really is positive.
• • False positives (FP): When the classifier predicts a sample
as positive but in fact it is negative.
• • True negatives (TN): When the classifier predicts a sample
as negative and it really is negative.
• • False negatives (FN): When the classifier predicts a
sample as negative but in fact it is positive.
Training,Validation and Test

• Test data is used exclusively for assessing performance at

the end of the process and will never be used in the learning
process.
• Validation data is used explicitly to select the
parameters/models with the best performance according to an
estimation of the generalization error. This is a form of
learning.
• Training data are used to learn the instance of the model
from a model class.
1. Split the original dataset into training and test data. For
example, use 30% of the original dataset for testing purposes.
This data is held back and will only be used to assess the
performance of the method.
2. Use the remaining training data to select the
hyperparameters by means of cross-validation.
3. Train the model with the selected parameter and assess the
performance using the test dataset.
Regression Analysis

• Regression is related to how to make predictions about real-world

quantities such as, for instance, the predictions alluded to in the
following questions.
• How does sales volume change with changes in price?
• How is sales volume affected by the weather?
• How does the title of a book affect its sales?
• How does the amount of a drug absorbed vary with the patient’s
body weight; and does this relationship depend on blood pressure?
• How many customers can I expect today?
• At what time should I go home to avoid traffic jams?
• What is the chance of rain on the next two Mondays;
• what is the expected temperature?
All these questions have a common structure:
they ask for a response that can be expressed
as a combination of one or more
(independent) variables (also called covariates
or predictors).

The role of regression is to build a model to

predict the response from the variables. This
process involves the transition from data to
model.
More specifically, the model can be useful in different
tasks, such as the following:
(1) analyzing the behavior of data (the relation between the
response and the variables),
(2) predicting data values (whether continuous or discrete),
(3) Finding important variables for the model.
• In order to understand how a regression model can be
suitable for tackling these tasks, we will introduce three
practical cases for which we use three real datasets and solve
different questions. These practical cases will motivate
• simple linear regression,
• multiple linear regression,
• logistic regression
Linear Regression

• The objective of performing a regression is to build a model

to express the relation between the response and a
combination of one or more (independent) variables.
• The model allows us to predict the response y from the
variables.

• Two quantities are correlated if there is a relationship

between the two variables,
• The simplest model which can be considered is a linear
model, where the response y depends linearly on the d
variables xi :

y = Xw,
Correlation Coefficient

• The correlation coefficient measures the degree of linear

relationship among variables.
• In a correlation analysis we estimate a value bounded
between -1 and 1 and we call it the correlation coefficient.
This coefficient tells us the strength of the linear association
between the two variables.
• If the two quantities vary in tandem (if one
increases/decreases, the other one does too) the correlation
coefficient is positive,
• It is negative when the two quantities vary out of sync (if
one decreases, the other one increases).
• It is important to remember that the correlation coefficient
measures the strength of linear relationship between the
variables.
• A value of zero does not mean that there is no relationship at
all. It simply indicates that there is no linear relation
between the variables in question.

People use umbrellas when it rains does not mean that

umbrellas cause rain to fall
Types of Linear Regression

Linear regression can be further divided into two types of the

algorithm:
• Simple Linear Regression:
If a single independent variable is used to predict the value of
a numerical dependent variable, then such a Linear
Regression algorithm is called Simple Linear Regression.
• Multiple Linear regression:
If more than one independent variable is used to predict the
value of a numerical dependent variable, then such a Linear
Regression algorithm is called Multiple Linear Regression.
Linear Regression Line

• A linear line showing the relationship between the dependent

and independent variables is called a regression line. A
regression line can show two types of relationship:
Positive Linear Relationship:

• If the dependent variable increases on the Y-axis and

independent variable increases on X-axis, then such a
relationship is termed as a Positive linear relationship.
Negative Linear Relationship:

• f the dependent variable decreases on the Y-axis and

independent variable increases on the X-axis, then such a
relationship is called a negative linear relationship.
Finding the best fit line:

• When working with linear regression, our main goal is to

find the best fit line that means the error between predicted
values and actual values should be minimized. The best fit
line will have the least error.
• The different values for weights or the coefficient of lines
(a0, a1) gives a different line of regression, so we need to
calculate the best values for a0 and a1 to find the best fit line,
so to calculate this we use cost function.
Cost function

• The different values for weights or coefficient of lines (a 0, a1)

gives the different line of regression, and the cost function is
used to estimate the values of the coefficient for the best fit
line.
• Cost function optimizes the regression coefficients or
weights. It measures how a linear regression model is
performing.
• We can use the cost function to find the accuracy of
the mapping function, which maps the input variable to the
output variable. This mapping function is also known
as Hypothesis function.
MSE

• For Linear Regression, we use the Mean Squared Error

(MSE) cost function, which is the average of squared error
occurred between the predicted values and actual values. It
can be written as:
• For the above linear equation, MSE can be calculated as:
• Residuals: The distance between the actual value and
predicted values is called residual. If the observed points are
far from the regression line, then the residual will be high,
and so cost function will high. If the scatter points are close
to the regression line, then the residual will be small and
hence the cost function.
Simple Linear Regression

• Simple linear regression considers n samples of a single

variable x and describes the relationship between the
variable and the response with the model:

where the parameter a0 is called the intercept or the constant

term.
• Simple Linear Regression is a type of Regression algorithms
that models the relationship between a dependent variable
and a single independent variable. The relationship shown by
a Simple Linear Regression model is linear or a sloped
straight line, hence it is called Simple Linear Regression.
• The key point in Simple Linear Regression is that
the dependent variable must be a continuous/real value.
However, the independent variable can be measured on
continuous or categorical values.
Simple Linear regression algorithm has mainly two objectives:
• Model the relationship between the two variables. Such as
the relationship between Income and expenditure, experience
and Salary, etc.
• Forecasting new observations. Such as Weather forecasting
according to temperature, Revenue of a company according
to the investments in a year, etc.
Implementation

import numpy as nm
import matplotlib.pyplot as mtp
import pandas as pd

data_set= pd.read_csv('Salary_Data.csv')

x= data_set.iloc[:, :-1].values
y= data_set.iloc[:, 1].values
# Splitting the dataset into training and test set.
from sklearn.model_selection import train_test_split
x_train, x_test, y_train, y_test= train_test_split(x, y, test_size=
1/3, random_state=0)

#Fitting the Simple Linear Regression model to the trainin

g dataset
from sklearn.linear_model import LinearRegression
regressor= LinearRegression()
regressor.fit(x_train, y_train)
#Prediction of Test and Training set result
y_pred= regressor.predict(x_test)
x_pred= regressor.predict(x_train)
mtp.scatter(x_train, y_train, color="green")
mtp.plot(x_train, x_pred, color="red")
mtp.title("Salary vs Experience (Training Dataset)")
mtp.xlabel("Years of Experience")
mtp.ylabel("Salary(In Rupees)")
mtp.show()
#visualizing the Test set results
mtp.scatter(x_test, y_test, color="blue")
mtp.plot(x_train, x_pred, color="red")
mtp.title("Salary vs Experience (Test Dataset)")
mtp.xlabel("Years of Experience")
mtp.ylabel("Salary(In Rupees)")
mtp.show(
Multiple Linear Regression

• There may be various cases in which the response variable is

affected by more than one predictor variable; for such cases,
the Multiple Linear Regression algorithm is used.

• Multiple Linear Regression is one of the important regression

algorithms which models the linear relationship between a
single dependent continuous variable and more than one
independent variable.
• For MLR, the dependent or target variable(Y) must be the
continuous/real, but the predictor or independent variable
may be of continuous or categorical form.
• Each feature variable must model the linear relationship with
the dependent variable.
• MLR tries to fit a regression line through a multidimensional
space of data-points.
Assumptions for Multiple Linear Regression:
• A linear relationship should exist between the Target and
predictor variables.
• The regression residuals must be normally distributed.
• MLR assumes little or no multicollinearity (correlation
between the independent variable) in data.
Implementation

# importing libraries
import numpy as nm
import matplotlib.pyplot as mtp
import pandas as pd

#importing datasets
data_set= pd.read_csv('50_CompList.csv')

#Extracting Independent and dependent Variable

x= data_set.iloc[:, :-1].values
y= data_set.iloc[:, 4].values
#Catgorical data
from sklearn.preprocessing import LabelEncoder, OneHotEnc
oder
labelencoder_x= LabelEncoder()
x[:, 3]= labelencoder_x.fit_transform(x[:,3])
onehotencoder= OneHotEncoder(categorical_features= [3])
x= onehotencoder.fit_transform(x).toarray()

#avoiding the dummy variable trap:

x = x[:, 1:]
# Splitting the dataset into training and test set.
from sklearn.model_selection import train_test_split
x_train, x_test, y_train, y_test= train_test_split(x, y, test_size=
0.2, random_state=0)

#Fitting the MLR model to the training set:

from sklearn.linear_model import LinearRegression
regressor= LinearRegression()
regressor.fit(x_train, y_train)
#Predicting the Test set result;
y_pred= regressor.predict(x_test)

print('Train Score: ', regressor.score(x_train, y_train))

print('Test Score: ', regressor.score(x_test, y_test))

Train Score: 0.9501847627493607

Test Score: 0.9347068473282446
ML Polynomial Regression

• Polynomial Regression is a regression algorithm that models

the relationship between a dependent(y) and independent
variable(x) as nth degree polynomial. The Polynomial
Regression equation is given below:

• It is also called the special case of Multiple Linear

Regression in ML. Because we add some polynomial terms
to the Multiple Linear regression equation to convert it into
Polynomial Regression.
• "In Polynomial regression, the original features are
converted into Polynomial features of required degree
(2,3,..,n) and then modeled using a linear model."
Implementation

# importing libraries
import numpy as nm
import matplotlib.pyplot as mtp
import pandas as pd

#importing datasets
data_set= pd.read_csv('Position_Salaries.csv')

#Extracting Independent and dependent Variable

x= data_set.iloc[:, 1:2].values
y= data_set.iloc[:, 2].values
#Fitting the Linear Regression to the dataset
from sklearn.linear_model import LinearRegression
lin_regs= LinearRegression()
lin_regs.fit(x,y)

#Fitting the Polynomial regression to the dataset

from sklearn.preprocessing import PolynomialFeatures
poly_regs= PolynomialFeatures(degree= 2)
x_poly= poly_regs.fit_transform(x)
lin_reg_2 =LinearRegression()
lin_reg_2.fit(x_poly, y)
#Visulaizing the result for Linear Regression model
mtp.scatter(x,y,color="blue")
mtp.plot(x,lin_regs.predict(x), color="red")
mtp.title("Bluff detection model(Linear Regression)")
mtp.xlabel("Position Levels")
mtp.ylabel("Salary")
mtp.show()
#Visulaizing the result for Polynomial Regression
mtp.scatter(x,y,color="blue")
mtp.plot(x, lin_reg_2.predict(poly_regs.fit_transform(x)), colo
r="red")
mtp.title("Bluff detection model(Polynomial Regression)")
mtp.xlabel("Position Levels")
mtp.ylabel("Salary")
mtp.show()
For degree= 3:
If we change the degree=3, then we will give a more accurate plot,
as shown in the below image.
Degree= 4: Let's again change the degree to 4, and now will
get the most accurate plot. Hence we can get more accurate
results by increasing the degree of Polynomial.
Logistic Regression

• Logistic regression is one of the most popular Machine

Learning algorithms, which comes under the Supervised
Learning technique. It is used for predicting the categorical
dependent variable using a given set of independent
variables.
• Logistic regression predicts the output of a categorical
dependent variable. Therefore the outcome must be a
categorical or discrete value. It can be either Yes or No, 0 or
1, true or False, etc. but instead of giving the exact value as 0
and 1, it gives the probabilistic values which lie between 0
and 1.
• Logistic Regression is much similar to the Linear Regression
except that how they are used. Linear Regression is used for
solving Regression problems, whereas Logistic regression is
used for solving the classification problems.
• In Logistic regression, instead of fitting a regression line, we
fit an "S" shaped logistic function, which predicts two
maximum values (0 or 1).
• Logistic Regression is a significant machine learning
algorithm because it has the ability to provide probabilities
and classify new data using continuous and discrete datasets.
Logistic Function (Sigmoid Function):

• The sigmoid function is a mathematical function used to map

the predicted values to probabilities.
• It maps any real value into another value within a range of 0
and 1.
• The value of the logistic regression must be between 0 and 1,
which cannot go beyond this limit, so it forms a curve like
the "S" form. The S-form curve is called the Sigmoid
function or the logistic function.
• In logistic regression, we use the concept of the threshold
value, which defines the probability of either 0 or 1. Such as
values above the threshold value tends to 1, and a value
below the threshold values tends to 0.
The mathematical steps to get Logistic
Regression equations are given below:
• We know the equation of the straight line can be written as:

• In Logistic Regression y can be between 0 and 1 only, so for

this let's divide the above equation by (1-y):

• But we need range between -[infinity] to +[infinity], then

take logarithm of the equation it will become:
Implementation

#Data Pre-procesing Step

# importing libraries
import numpy as nm
import matplotlib.pyplot as mtp
import pandas as pd

#importing datasets
data_set= pd.read_csv('user_data.csv')
#Extracting Independent and dependent Variable
x= data_set.iloc[:, [2,3]].values
y= data_set.iloc[:, 4].values

# Splitting the dataset into training and test set.

from sklearn.model_selection import train_test_split
x_train, x_test, y_train, y_test= train_test_split(x, y, test_size=
0.25, random_state=0)
#feature Scaling
from sklearn.preprocessing import StandardScaler
st_x= StandardScaler()
x_train= st_x.fit_transform(x_train)
x_test= st_x.transform(x_test)

#Fitting Logistic Regression to the training set

from sklearn.linear_model import LogisticRegression
classifier= LogisticRegression(random_state=0)
classifier.fit(x_train, y_train)
#Predicting the test set result
y_pred= classifier.predict(x_test)

#Creating the Confusion matrix

from sklearn.metrics import confusion_matrix
cm= confusion_matrix()
Thank You

Example of Supervised Learning Algorithms
No ratings yet
Example of Supervised Learning Algorithms
5 pages
TYCS Data Science Manual
No ratings yet
TYCS Data Science Manual
44 pages
Assignment 3
No ratings yet
Assignment 3
15 pages
Sem Iv Syllabus
No ratings yet
Sem Iv Syllabus
9 pages
An R Companion To Applied Regression 2nd Edition
No ratings yet
An R Companion To Applied Regression 2nd Edition
538 pages
Linear Regression
No ratings yet
Linear Regression
3 pages
Cfa l2 2024 Volume1 1522872379
No ratings yet
Cfa l2 2024 Volume1 1522872379
30 pages
Effects of Capital Structure On Financial Performance of Insurance Companies in Nepal
No ratings yet
Effects of Capital Structure On Financial Performance of Insurance Companies in Nepal
12 pages
Mas 9701 Cost Concepts and Analysis
No ratings yet
Mas 9701 Cost Concepts and Analysis
10 pages
2410.11375v1 Statistical Analysis of The Impact of FIA Regulations On Safety
No ratings yet
2410.11375v1 Statistical Analysis of The Impact of FIA Regulations On Safety
6 pages
IDA117V Supervised ML
No ratings yet
IDA117V Supervised ML
39 pages
Module - 03 Machine Learning (BCS602) Search Creators
No ratings yet
Module - 03 Machine Learning (BCS602) Search Creators
29 pages
SOL Study Material
No ratings yet
SOL Study Material
74 pages
ML 01 (Pranavv)
No ratings yet
ML 01 (Pranavv)
14 pages
Nsbe9ege Ism Ch12
No ratings yet
Nsbe9ege Ism Ch12
88 pages
Rbi Grade B (Depr) - Test-13
No ratings yet
Rbi Grade B (Depr) - Test-13
3 pages
TD Meth 2024
No ratings yet
TD Meth 2024
6 pages
Abm CH 5 Full PDF
No ratings yet
Abm CH 5 Full PDF
91 pages
ML - Practical List
No ratings yet
ML - Practical List
3 pages
AI ML 3 Updated
No ratings yet
AI ML 3 Updated
34 pages
Predictive ModellingAnalytics
No ratings yet
Predictive ModellingAnalytics
27 pages
Sales 1
No ratings yet
Sales 1
36 pages
Unit 3
No ratings yet
Unit 3
45 pages
NITK Unit 3 Lecture 21 Regression
No ratings yet
NITK Unit 3 Lecture 21 Regression
20 pages
CSE3506 PPT Ref1
No ratings yet
CSE3506 PPT Ref1
135 pages
M2 - Supervised Machine Learning
No ratings yet
M2 - Supervised Machine Learning
79 pages
Model Comparison On Genomic Predictions Using High Density Markers For Different Groups of Bulls in The Nordic Holstein Population
No ratings yet
Model Comparison On Genomic Predictions Using High Density Markers For Different Groups of Bulls in The Nordic Holstein Population
10 pages
Class 3 - Classification
No ratings yet
Class 3 - Classification
80 pages
Data Analytics
No ratings yet
Data Analytics
4 pages
CS601 - Machine Learning - Unit 1 - Notes - 1672759748
No ratings yet
CS601 - Machine Learning - Unit 1 - Notes - 1672759748
13 pages
1 - Intro To Machine Learning
No ratings yet
1 - Intro To Machine Learning
34 pages
Electric Power Systems Research: Sciencedirect
No ratings yet
Electric Power Systems Research: Sciencedirect
12 pages
Unit 2 Supervised Learning and Applications
No ratings yet
Unit 2 Supervised Learning and Applications
13 pages
Datamining Unit4
No ratings yet
Datamining Unit4
21 pages
ML 2 ND Unit
No ratings yet
ML 2 ND Unit
50 pages
Machine Learning Note
No ratings yet
Machine Learning Note
40 pages
Supervised and Unsupervised Learning
No ratings yet
Supervised and Unsupervised Learning
92 pages
Predictive Analytics - Regression
No ratings yet
Predictive Analytics - Regression
27 pages
Machine Learning
No ratings yet
Machine Learning
62 pages
An Introduction To Modern Missing Data Analyses
No ratings yet
An Introduction To Modern Missing Data Analyses
33 pages
Unit 1 (DS)
No ratings yet
Unit 1 (DS)
15 pages
m2 Data Analytic and Visualization
No ratings yet
m2 Data Analytic and Visualization
53 pages
Unit1 ML
No ratings yet
Unit1 ML
15 pages
Regression Analysis Linear and Multiple Regression
No ratings yet
Regression Analysis Linear and Multiple Regression
6 pages
Week 4 - Intro To ML
No ratings yet
Week 4 - Intro To ML
37 pages
41 Machine Learning Algorithms I
No ratings yet
41 Machine Learning Algorithms I
8 pages
Fam QB Ans
No ratings yet
Fam QB Ans
9 pages
Linear Regression
No ratings yet
Linear Regression
4 pages
ML Introduction
No ratings yet
ML Introduction
76 pages
Aiml Unit 3
No ratings yet
Aiml Unit 3
9 pages
ML 1 PPT Unit 1
No ratings yet
ML 1 PPT Unit 1
93 pages
Unit 3
No ratings yet
Unit 3
17 pages
Chapter - 2-ML
No ratings yet
Chapter - 2-ML
63 pages
Chapter 2
No ratings yet
Chapter 2
136 pages
Machine Learning
No ratings yet
Machine Learning
87 pages
Forecasting and Learning Theory
No ratings yet
Forecasting and Learning Theory
46 pages
Module 3
No ratings yet
Module 3
63 pages
ML 2
No ratings yet
ML 2
155 pages
DMML Unit4
No ratings yet
DMML Unit4
77 pages
Fourth Edition: Descriptive Analytics I: Nature of Data, Statistical Modeling, and Visualization
No ratings yet
Fourth Edition: Descriptive Analytics I: Nature of Data, Statistical Modeling, and Visualization
66 pages
Machine Learning
No ratings yet
Machine Learning
33 pages
Unit 2 - NOTES1 - ML
No ratings yet
Unit 2 - NOTES1 - ML
35 pages
Lecture 9
No ratings yet
Lecture 9
27 pages
W8-Supervised Learning Methods
No ratings yet
W8-Supervised Learning Methods
30 pages
Machine Learning
No ratings yet
Machine Learning
41 pages
Multivariate Analysis
100% (2)
Multivariate Analysis
11 pages
Basics of ML and Evaluation
No ratings yet
Basics of ML and Evaluation
42 pages
Regression
No ratings yet
Regression
24 pages
ML Unit 1
No ratings yet
ML Unit 1
73 pages
Presentation Regression
No ratings yet
Presentation Regression
12 pages
ML DL NLP Definitions
No ratings yet
ML DL NLP Definitions
22 pages
Logistic Regression
100% (1)
Logistic Regression
37 pages
Regression
No ratings yet
Regression
45 pages
Machine Learning Concepts
No ratings yet
Machine Learning Concepts
68 pages
Unit I
No ratings yet
Unit I
14 pages
DSR Notes 3 To 5
No ratings yet
DSR Notes 3 To 5
70 pages
Week 9 - PROG 8510 Week 9
No ratings yet
Week 9 - PROG 8510 Week 9
27 pages
Module 5
No ratings yet
Module 5
48 pages
Chapter 6 Supervised Learning
No ratings yet
Chapter 6 Supervised Learning
6 pages
Machinelearning Algorithm Basics2 NOTES
No ratings yet
Machinelearning Algorithm Basics2 NOTES
72 pages
COMP1801 - Copy 1
No ratings yet
COMP1801 - Copy 1
18 pages
Lesson 05 Forecasting & Smoothing Methods
No ratings yet
Lesson 05 Forecasting & Smoothing Methods
6 pages
Unit III - I
No ratings yet
Unit III - I
15 pages
Machine Learning Theory
100% (1)
Machine Learning Theory
12 pages
05-1 Supervised Learning
No ratings yet
05-1 Supervised Learning
65 pages
Assumptions of Multiple Linear Regression
No ratings yet
Assumptions of Multiple Linear Regression
18 pages
Summer of Science-Final Report
100% (1)
Summer of Science-Final Report
7 pages
LP III Lab Manual
100% (1)
LP III Lab Manual
8 pages

Ds Module 4

Uploaded by

Ds Module 4

Uploaded by

2023-2024

Machine learning involves coding programs that

Machine learning can be considered a subfield of

Unsupervised learning: Algorithms that learn from a training set of

Reinforcement learning: Algorithms that learn via reinforcement from

All objects in Scikit-learn share a uniform and limited API

Each estimator has a score() method that invokes the default

• Test data is used exclusively for assessing performance at

• Regression is related to how to make predictions about real-world

The role of regression is to build a model to

• The objective of performing a regression is to build a model

• Two quantities are correlated if there is a relationship

• The correlation coefficient measures the degree of linear

People use umbrellas when it rains does not mean that

Linear regression can be further divided into two types of the

• A linear line showing the relationship between the dependent

• If the dependent variable increases on the Y-axis and

• f the dependent variable decreases on the Y-axis and

• When working with linear regression, our main goal is to

• The different values for weights or coefficient of lines (a 0, a1)

• For Linear Regression, we use the Mean Squared Error

• Simple linear regression considers n samples of a single

where the parameter a0 is called the intercept or the constant

#Fitting the Simple Linear Regression model to the trainin

• There may be various cases in which the response variable is

• Multiple Linear Regression is one of the important regression

#Extracting Independent and dependent Variable

#avoiding the dummy variable trap:

#Fitting the MLR model to the training set:

print('Train Score: ', regressor.score(x_train, y_train))

Train Score: 0.9501847627493607

• Polynomial Regression is a regression algorithm that models

• It is also called the special case of Multiple Linear

#Extracting Independent and dependent Variable

#Fitting the Polynomial regression to the dataset

• Logistic regression is one of the most popular Machine

• The sigmoid function is a mathematical function used to map

• In Logistic Regression y can be between 0 and 1 only, so for

• But we need range between -[infinity] to +[infinity], then

#Data Pre-procesing Step

# Splitting the dataset into training and test set.

#Fitting Logistic Regression to the training set

#Creating the Confusion matrix

You might also like