0% found this document useful (0 votes)

341 views8 pages

Simple Linear Regression - Assignn5

This document outlines the steps to perform simple linear regression using scikit-learn on a dataset relating SAT scores to GPA. It includes: 1) Importing necessary packages like NumPy and the LinearRegression model from scikit-learn. 2) Providing the SAT-GPA dataset and performing exploratory data analysis. 3) Creating a linear regression model using LinearRegression and fitting it to the data. 4) Transforming the data using log, exponential and polynomial transformations to reduce errors and obtain the best fit model. 5) Checking the results of each model by comparing RMSE values and selecting the best model.

Uploaded by

Sravani Adapa

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

341 views8 pages

Simple Linear Regression - Assignn5

Uploaded by

Sravani Adapa

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

You are on page 1/ 8

Simple Linear Regression With scikit-learn

There are five basic steps when you’re implementing linear regression:

1. Import the packages and classes you need.

2. Provide data to work with and eventually do appropriate transformations.
3. Create a regression model and fit it with existing data.
4. Check the results of model fitting to know whether the model is satisfactory.
5. Apply the model for predictions.

These steps are more or less general for most of the regression approaches and implementations.

Problem Statement: -

A student from a certain University was asked to prepare a dataset and build a
prediction model for predicting SAT scores based on the exam giver’s GPA. Approach
- A regression model needs to be built with target variable ‘SAT_Scores’and record
the RMSE values, Correlation coefficient values for different transformation models.

Step 1: Import packages and classes

The first step is to import the package numpy and the class LinearRegression from sklearn.linear_model:

import numpy as np
from sklearn.linear_model import LinearRegression
Now, you have all the functionalities you need to implement linear regression.

The fundamental data type of NumPy is the array type called numpy.ndarray. The rest of this article
uses the term array to refer to instances of the type numpy.ndarray.

The class sklearn.linear_model.LinearRegression will be used to perform linear and polynomial

regression and make predictions accordingly.

Step 2: Provide data

The second step is defining data to work with. The inputs (regressors, 𝑥) and output (predictor, 𝑦).

SAT_GPA.csv is imported .
Exploratory data analysis is performed on data

Step 3: Create a model and fit it

The next step is to create a linear regression model and fit it using the existing data.

Let’s create an instance of the class LinearRegression, which will represent the regression model:

Simple linear regression

model = LinearRegression()
This statement creates the variable model as the instance of LinearRegression. You can provide several
optional parameters to LinearRegression

statsmodels.formula.api is imported to build a model based on ols of data

model1=smf.ols('calories ~ weight',data=cal_data).fit()

Regression line is plotted after obtaining predicted values

after plotting scattered plot root mean squared error is calculated

In order to reduce the errors and to obtain best fit line Transformation is performed on data

Log transformation

In exponential transformation, transformation is applied on y data

#x=log(gpa),y=score

scattered plot is plotted

later correlation coefficient is obtained between transformed input and output

model2 is built on obtained data

new regression line is plotted

new rmse is calculated

Exponential transformation

In exponential transformation, transformation is applied on y data

#x=(gpa),y=log(score)

scattered plot is plotted

later correlation coefficient is obtained between transformed input and output

model3 is built on obtained data

new regression line is plotted

new rmse is calculated

Polynomial transformation

x=gpa ,x^2=gpa*gpa, y=log(score)

from sklearn.preprocessing import PolynomialFeatures to build the polynomial regression

new regression line

from the above regressive model the rmse is obtained

choose the best model by using all RMSE values of above transformations

models with respective RMS values are tabulated

from the above observations exp model is taken as best

Step 4: Get results

Once you have your model fitted, you can get the results to check whether the model works
satisfactorily and interpret it.

the summary of final model is

final model is fitted on train and test split data and prediction is observed

the final rmse value is

Testbank & Ebook Statistics For Management and Economics 12th Edition Keller Instant
No ratings yet
Testbank & Ebook Statistics For Management and Economics 12th Edition Keller Instant
17 pages
Capstone Project - DS With R
No ratings yet
Capstone Project - DS With R
2 pages
2
0% (1)
2
36 pages
SMDM Project Report-Survi Ghura
100% (1)
SMDM Project Report-Survi Ghura
26 pages
QuantEconlectures Python3 PDF
100% (1)
QuantEconlectures Python3 PDF
1,125 pages
Sample - Customer Churn Prediction Python Documentation
No ratings yet
Sample - Customer Churn Prediction Python Documentation
33 pages
Additional Exercice S Data Science
No ratings yet
Additional Exercice S Data Science
3 pages
Simple Linear Regression - Assign2
No ratings yet
Simple Linear Regression - Assign2
9 pages
Simple Linear Regression - Assign3
No ratings yet
Simple Linear Regression - Assign3
8 pages
Simple Linear Regression - Assign3
No ratings yet
Simple Linear Regression - Assign3
8 pages
02 - Decision Tree Classification On Iris Dataset
No ratings yet
02 - Decision Tree Classification On Iris Dataset
6 pages
Simple - Linear - Regression - Ipynb - Colaboratory
No ratings yet
Simple - Linear - Regression - Ipynb - Colaboratory
2 pages
20dit073 Jay Prajapati ML
No ratings yet
20dit073 Jay Prajapati ML
68 pages
Wine Case Report
100% (2)
Wine Case Report
16 pages
Quiz Feedback1 - Coursera
100% (1)
Quiz Feedback1 - Coursera
7 pages
Statistics Probability
No ratings yet
Statistics Probability
66 pages
Sajjad DS
100% (2)
Sajjad DS
97 pages
1.1 Simple Linear Regression Model
100% (1)
1.1 Simple Linear Regression Model
15 pages
Business Report Project - Sheetal - SMDM
100% (1)
Business Report Project - Sheetal - SMDM
20 pages
ML Lab6.Ipynb - Colaboratory
100% (1)
ML Lab6.Ipynb - Colaboratory
5 pages
Homework 2
100% (1)
Homework 2
12 pages
Tutorial 2 - Clustering
100% (2)
Tutorial 2 - Clustering
6 pages
Hypothesis Testing - Problem Statement
No ratings yet
Hypothesis Testing - Problem Statement
4 pages
Python For Data Analytics
No ratings yet
Python For Data Analytics
3 pages
Data Science & Business Analytics: Post Graduate Program in
No ratings yet
Data Science & Business Analytics: Post Graduate Program in
16 pages
Assignment 02
No ratings yet
Assignment 02
9 pages
Problem 1
No ratings yet
Problem 1
12 pages
ML MU Unit 2
100% (3)
ML MU Unit 2
84 pages
Correlation & Regression
100% (1)
Correlation & Regression
53 pages
Logistic Regression: Gunjan Bharadwaj Assistant Professor Dept of CEA
100% (1)
Logistic Regression: Gunjan Bharadwaj Assistant Professor Dept of CEA
42 pages
Python Assignment 1 A
No ratings yet
Python Assignment 1 A
2 pages
Linear Regression: What Is Regression Analysis?
100% (1)
Linear Regression: What Is Regression Analysis?
21 pages
Lecture 4 Linear Regression
100% (1)
Lecture 4 Linear Regression
44 pages
Duplication - Typecasting-Problem Statement
100% (1)
Duplication - Typecasting-Problem Statement
3 pages
SMDM - Week 1 Checklist
100% (1)
SMDM - Week 1 Checklist
3 pages
Lead Scoring Subjective Questions
No ratings yet
Lead Scoring Subjective Questions
3 pages
1b.data Understanding
No ratings yet
1b.data Understanding
4 pages
Wholesale Custumer
100% (1)
Wholesale Custumer
32 pages
Tutorial 2018 Optimization
No ratings yet
Tutorial 2018 Optimization
7 pages
LDA 01 Linear Discriminant Analysis
No ratings yet
LDA 01 Linear Discriminant Analysis
65 pages
Linear Regression With LM Function, Diagnostic Plots, Interaction Term, Non-Linear Transformation of The Predictors, Qualitative Predictors
100% (1)
Linear Regression With LM Function, Diagnostic Plots, Interaction Term, Non-Linear Transformation of The Predictors, Qualitative Predictors
15 pages
DEA-7TT2 Associate-Data Science and Big Data Analytics v2 Exam
0% (1)
DEA-7TT2 Associate-Data Science and Big Data Analytics v2 Exam
4 pages
Logistic Regression Model Study Assignment
100% (1)
Logistic Regression Model Study Assignment
5 pages
2a EDA
No ratings yet
2a EDA
16 pages
All Life Bank - AIML - ML - Project - Low - Code - Notebook
No ratings yet
All Life Bank - AIML - ML - Project - Low - Code - Notebook
78 pages
Machine Learning Guided Project
No ratings yet
Machine Learning Guided Project
23 pages
FRA Project Report - Chilla Nagaraju
100% (1)
FRA Project Report - Chilla Nagaraju
66 pages
Logistic Regression
No ratings yet
Logistic Regression
41 pages
Project 5 PDF
100% (1)
Project 5 PDF
48 pages
ML MU Unit 2
100% (2)
ML MU Unit 2
42 pages
Variable Selection
No ratings yet
Variable Selection
15 pages
Multinomial Problem Statement
No ratings yet
Multinomial Problem Statement
28 pages
Dinya Antony MRA ML2
100% (1)
Dinya Antony MRA ML2
24 pages
Simple Regression Quiz
No ratings yet
Simple Regression Quiz
6 pages
DS+C25 PGDDS+Masters
No ratings yet
DS+C25 PGDDS+Masters
13 pages
Assignment-Based Subjective Questions/Answers
No ratings yet
Assignment-Based Subjective Questions/Answers
3 pages
WINE Prediction Quality
100% (1)
WINE Prediction Quality
6 pages
AS Notebook - PCA - Wine Data-4
100% (1)
AS Notebook - PCA - Wine Data-4
1 page
Machine Learning: Lecture 13: Model Validation Techniques, Overfitting, Underfitting
100% (2)
Machine Learning: Lecture 13: Model Validation Techniques, Overfitting, Underfitting
26 pages
Simple Linear Regression - Assign4
No ratings yet
Simple Linear Regression - Assign4
8 pages
Simple Linear Regression - Assign
No ratings yet
Simple Linear Regression - Assign
8 pages
The Impact of Globalization On Cross-Cultural Communication: Lowell C. Matthews and Bharat Thakkar
No ratings yet
The Impact of Globalization On Cross-Cultural Communication: Lowell C. Matthews and Bharat Thakkar
16 pages
Operations Compendium DMS IIT Delhi 2024-26
No ratings yet
Operations Compendium DMS IIT Delhi 2024-26
62 pages
Chap 1,2,3,5,6 (QA) Upload
No ratings yet
Chap 1,2,3,5,6 (QA) Upload
6 pages
Research Report
No ratings yet
Research Report
42 pages
The Effect of Service and Food Quality On Customer Satisfaction and Hence Customer Retention
No ratings yet
The Effect of Service and Food Quality On Customer Satisfaction and Hence Customer Retention
12 pages
CertyIQ AI-900 NewExamDumps 40ImpQue-2023
0% (1)
CertyIQ AI-900 NewExamDumps 40ImpQue-2023
89 pages
Optimization in Pharmaceutics, Formulation
No ratings yet
Optimization in Pharmaceutics, Formulation
75 pages
Phan
No ratings yet
Phan
50 pages
Costing Research
No ratings yet
Costing Research
31 pages
2.the Impact of Emotional Intelligence On Students
No ratings yet
2.the Impact of Emotional Intelligence On Students
10 pages
11.3 - Mixture Experiments
No ratings yet
11.3 - Mixture Experiments
8 pages
Mexican Assimilation March 05
No ratings yet
Mexican Assimilation March 05
31 pages
PFDA (Programming For Data Analysis) APU
No ratings yet
PFDA (Programming For Data Analysis) APU
60 pages
(Original PDF) Introductory Statistics, 9th Edition by Prem S. Mann PDF Download
100% (2)
(Original PDF) Introductory Statistics, 9th Edition by Prem S. Mann PDF Download
53 pages
NOTES OF Python Ok
No ratings yet
NOTES OF Python Ok
73 pages
Project Report Submitted in The Partial Fulfillment of The Requirements For The Award of The Degree of
No ratings yet
Project Report Submitted in The Partial Fulfillment of The Requirements For The Award of The Degree of
34 pages
Insurance Claims and Audit Quality
No ratings yet
Insurance Claims and Audit Quality
24 pages
(Program Curriculum) : PG Diploma in Data Science
No ratings yet
(Program Curriculum) : PG Diploma in Data Science
6 pages
Working Capital Management and Financial Performance of Listed Manufacturing Firms in Nigeria: Moderating Effect of Managerial Ownership
No ratings yet
Working Capital Management and Financial Performance of Listed Manufacturing Firms in Nigeria: Moderating Effect of Managerial Ownership
20 pages
Numerai Competition EDA
No ratings yet
Numerai Competition EDA
8 pages
2 PB
No ratings yet
2 PB
15 pages
Appendix-56
No ratings yet
Appendix-56
39 pages
Application of Learning Curves in The Aerospace Industry Handout
No ratings yet
Application of Learning Curves in The Aerospace Industry Handout
34 pages
Pengaruh Kualitas Produk Dan Harga Terhadap Keputusan Pembelian Mobil Daihatsu Grand Max Pick Up
No ratings yet
Pengaruh Kualitas Produk Dan Harga Terhadap Keputusan Pembelian Mobil Daihatsu Grand Max Pick Up
11 pages
Linear Regression Model Slope: Ŷ B + B X + B X + B X + + B X
No ratings yet
Linear Regression Model Slope: Ŷ B + B X + B X + B X + + B X
9 pages
Ragin, Charles (2008) Fuzzy-Set Social Science
100% (1)
Ragin, Charles (2008) Fuzzy-Set Social Science
3 pages
Jurnal Jutrids C Indu-1
No ratings yet
Jurnal Jutrids C Indu-1
14 pages
GWR and Health
No ratings yet
GWR and Health
12 pages

Simple Linear Regression - Assignn5

Uploaded by

Simple Linear Regression - Assignn5

Uploaded by

Simple Linear Regression With scikit-learn

1. Import the packages and classes you need.

Step 1: Import packages and classes

The first step is to import the package numpy and the class LinearRegression from sklearn.linear_model:

The class sklearn.linear_model.LinearRegression will be used to perform linear and polynomial

Step 2: Provide data

Step 3: Create a model and fit it

Simple linear regression

statsmodels.formula.api is imported to build a model based on ols of data

Regression line is plotted after obtaining predicted values

after plotting scattered plot root mean squared error is calculated

In exponential transformation, transformation is applied on y data

scattered plot is plotted

later correlation coefficient is obtained between transformed input and output

model2 is built on obtained data

new regression line is plotted

In exponential transformation, transformation is applied on y data

scattered plot is plotted

model3 is built on obtained data

new regression line is plotted

new rmse is calculated

x=gpa ,x^2=gpa*gpa, y=log(score)

from sklearn.preprocessing import PolynomialFeatures to build the polynomial regression

new regression line

from the above regressive model the rmse is obtained

models with respective RMS values are tabulated

Step 4: Get results

the summary of final model is

the final rmse value is

You might also like