unit5_R

Gji

Uploaded by

Surivkl Vkl

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

1 views

unit5_R

Gji

Uploaded by

Surivkl Vkl

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 5

BCA V SEM R PROGRAMMING RAJADHANI DEGREE COLLGE

UNIT-5
What is Linear Regression?
 It is a statistical method that is used for predictive analysis.
 Linear regression algorithm shows a linear relationship between a
dependent (y) and one or more independent (y) variables, hence called as
linear regression. Since linear regression shows the linear relationship,
which means it finds how the value of the dependent variable changes
according to the value of the independent variable. It is mathematically
denoted as y=ax+b
Linear Regression Line:A linear line showing the relationship between the
dependent and independent variables is called a regression line. A regression line
can show two types of relationship:
Positive Linear Relationship: If the dependent variable increases on the Y-axis
and the independent variable increases on the X-axis, then such a relationship is
termed as a Positive linear relationship.
Negative Linear Relationship: If the dependent variable decreases on the Y-axis
and independent variable increases on the X-axis, then such a relationship is
called a negative linear relationship.
There are two types of linear regression.
 Simple Linear Regression
 Multiple Linear Regression
Simple Linear Regression
Simple linear regression is used to estimate the relationship between two
quantitative variables. Simple linear regression uses only one independent
variable. You can use simple linear regression when you want to know:
 How strong the relationship is between two variables (e.g., the relationship
between rainfall and soil erosion).
 The value of the dependent variable at a certain value of the independent
variable (e.g., the amount of soil erosion at a certain level of rainfall).
Assumptions of simple linear regression
Simple linear regression is a parametric test, meaning that it makes certain
assumptions about the data. These assumptions are:
Homogeneity of variance (homoscedasticity): the size of the error in our
prediction doesn’t change significantly across the values of the independent
variable.

INNAHAI ANUGRAHAM
BCA V SEM R PROGRAMMING RAJADHANI DEGREE COLLGE

Independence of observations: the observations in the dataset were collected

using statistically valid sampling methods, and there are no hidden relationships
among observations.
Normality: The data follows a normal distribution.
Linear regression makes one additional assumption:
The relationship between the independent and dependent variable is linear: the
line of best fit through the data points is a straight line (rather than a curve or
some sort of grouping factor).

Multiple linear regression: Multiple linear regression is used to estimate the

relationship between two or more independent variables and one dependent
variable. You can use multiple linear regression when you want to know:
 How strong the relationship is between two or more independent variables
and one dependent variable (e.g. how rainfall, temperature, and amount of
fertilizer added affect crop growth).
 The value of the dependent variable at a certain value of the independent
variables (e.g. the expected yield of a crop at certain levels of rainfall,
temperature, and fertilizer addition).
Assumptions of multiple linear regression
Multiple linear regression makes all of the same assumptions as simple linear
regression namely Homogeneity of variance, Independence of observations,
Normality, Linearity.

Implementation in R
In R programming, lm() function is used to create linear regression model.
Syntax: lm(formula,data)
 Formula: This is symbolic description of a model to be fitted. It is written
in the form response ~ predictor1+predictor2+….(reponse is dependent
variable and predictor is independent variable, in case of simple linear
regression only one predictor and one or more for Multiple linear
Regression)
 Data: specifies the data frame containing the variables in the formula.
Example:
# Create the data frame
data <- data.frame(
Years_Exp = c(1.1, 1.3, 1.5, 2.0, 2.2, 2.9, 3.0, 3.2, 3.2, 3.7),
Salary = c(39343.00, 46205.00, 37731.00, 43525.00,
39891.00, 56642.00, 60150.00, 54445.00, 64445.00,
57189.00)

INNAHAI ANUGRAHAM
BCA V SEM R PROGRAMMING RAJADHANI DEGREE COLLGE

# Fitting Simple Linear Regression to the Training set

lm.r= lm(formula = Salary ~ Years_Exp,data = data)
#Summary of the model
summary(lm.r)
Output:

Call:
lm(formula = Salary ~ Years_Exp, data = data)
Residuals:
1 2 3 5 6 8 10
463.1 5879.1 -4041.0 -6942.0 4748.0 381.9 -489.1
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 30927 4877 6.341 0.00144 **
Years_Exp 7230 1983 3.645 0.01482 *
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Residual standard error: 4944 on 5 degrees of freedom
Multiple R-squared: 0.7266, Adjusted R-squared: 0.6719
F-statistic: 13.29 on 1 and 5 DF, p-value: 0.01482

Predict() function in R
This is a built-in function in the R language used to extract predicted values from
complex machine-learning models widely used by analysts.
Syntax:
predict(object, newdata, interval)
 object: The class inheriting from the linear model
 newdata: Input data to predict the values
 interval: Type of interval calculation
Example:
Library “cars”
df <- datasets::cars
speed dist
1 4 2
2 4 10
3 7 4
4 7 22

INNAHAI ANUGRAHAM
BCA V SEM R PROGRAMMING RAJADHANI DEGREE COLLGE

5 8 16
6 9 10
7 10 18
8 10 26
9 10 34
10 11 17
# Creates a linear model
my_linear_model <- lm(dist~speed, data = df)

# Prints the model results

my_linear_model
Call:
lm(formula = dist ~ speed, data = df)
output:
Coefficients:
(Intercept) speed
-17.579 3.932
# Creating a data frame
variable_speed <- data.frame(speed = c(11,11,12,12,12,12,13,13,13,13))

# Fiting the linear model

linear_model <- lm(dist~speed, data = df)

# Predicts the future values

predict(linear_model, newdata = variable_speed)
output:

1 2 3 4 5
25.67740 25.67740 29.60981 29.60981 29.60981
6 7 8 9 10
29.60981 33.54222 33.54222 33.54222 33.54222
LINEAR MODEL SELECTION
It is often the case that some or many of the variables used in a multiple regression
model are in fact not associated with the response variable. Including such
irrelevant variables leads to unnecessary complexity in the resulting model.
Unfortunately, manually filtering through and comparing regression models can
be tedious. Luckily, several approaches exist for automatically performing feature
selection or variable selection — that is, for identifying those variables that result
in superior regression results. This leads to the concept of model selection.

INNAHAI ANUGRAHAM
BCA V SEM R PROGRAMMING RAJADHANI DEGREE COLLGE

Linear Regression Diagnostics in R

Linear regression diagnostics in R are essential for assessing the validity and
reliability of the linear regression model’s assumptions and for detecting potential
issues that may affect the model’s performance. Below is a theoretical explanation
of some common linear regression diagnostics in R.

Certainly, let’s delve into diagnostic procedures for linear regression in R. We’ll
focus on the key diagnostic checks:

 Residual Analysis: Check for patterns in the residuals.

 Outlier Detection: Identify potential outliers.
 Influence and Cook’s Distance: Identify influential observations.
 Multicollinearity: Check for high correlation between predictors.
Example:
# Example: Residual analysis
#model <- lm(Y ~ X, data = data)

# Create diagnostic plots (residuals vs. fitted values, residuals vs. normal
quantiles,
#and a histogram of residuals)
par(mfrow = c(2, 2))
plot(model)

INNAHAI ANUGRAHAM

Physics Coursework STPM
100% (1)
Physics Coursework STPM
20 pages
SC&RP - Unit 5
No ratings yet
SC&RP - Unit 5
36 pages
Regression Analysis
No ratings yet
Regression Analysis
52 pages
Machine Learning QB
No ratings yet
Machine Learning QB
32 pages
Regression: Unit Iii
No ratings yet
Regression: Unit Iii
54 pages
Exp7_PSLP
No ratings yet
Exp7_PSLP
5 pages
Data Analytics Unit III
No ratings yet
Data Analytics Unit III
15 pages
UNIt-3 TY
No ratings yet
UNIt-3 TY
67 pages
lecture 9-10
No ratings yet
lecture 9-10
28 pages
Regression Analysis in Machine Learning
No ratings yet
Regression Analysis in Machine Learning
13 pages
Predictive Modeling-Handouts
No ratings yet
Predictive Modeling-Handouts
11 pages
1linear Regression
No ratings yet
1linear Regression
12 pages
Regression PDF
No ratings yet
Regression PDF
16 pages
Lab-3: Regression Analysis and Modeling Name: Uid No. Objective
No ratings yet
Lab-3: Regression Analysis and Modeling Name: Uid No. Objective
9 pages
MIT 302 - Statistical Computing II - Tutorial 03
No ratings yet
MIT 302 - Statistical Computing II - Tutorial 03
16 pages
Predictive Modelling Using Linear Regression: © Analy Datalab Inc., 2016. All Rights Reserved
No ratings yet
Predictive Modelling Using Linear Regression: © Analy Datalab Inc., 2016. All Rights Reserved
16 pages
LINEAR REGRESSION IN R
No ratings yet
LINEAR REGRESSION IN R
6 pages
Regression Modelling
No ratings yet
Regression Modelling
25 pages
Linear - Regression & Evaluation Metrics
No ratings yet
Linear - Regression & Evaluation Metrics
31 pages
Unit2 ML Notes
No ratings yet
Unit2 ML Notes
19 pages
linearregression-190924053948
No ratings yet
linearregression-190924053948
10 pages
Linear Regression. Com
No ratings yet
Linear Regression. Com
13 pages
ML 2 nd Unit
No ratings yet
ML 2 nd Unit
50 pages
ML - Module 2
No ratings yet
ML - Module 2
16 pages
Module 1 Notes
100% (1)
Module 1 Notes
73 pages
Linear Regression
No ratings yet
Linear Regression
24 pages
Karthik Nambiar 60009220193
No ratings yet
Karthik Nambiar 60009220193
9 pages
Machine Learning Algorithm
100% (2)
Machine Learning Algorithm
20 pages
Linear Model
No ratings yet
Linear Model
10 pages
Dependent Independent Variable (S) : Regression: What Is Regression
No ratings yet
Dependent Independent Variable (S) : Regression: What Is Regression
15 pages
Introudction To Regression Analysis and Measuring With Stat Model 1702371825910
No ratings yet
Introudction To Regression Analysis and Measuring With Stat Model 1702371825910
16 pages
U02Lecture06 Regression
No ratings yet
U02Lecture06 Regression
25 pages
BA Notes[End Sem)
No ratings yet
BA Notes[End Sem)
26 pages
Regression
No ratings yet
Regression
14 pages
Experiment No 7
No ratings yet
Experiment No 7
7 pages
U-4_IML
No ratings yet
U-4_IML
17 pages
9 Types of Regression Analysis
No ratings yet
9 Types of Regression Analysis
16 pages
Simple Linear Regression Homework Solutions
100% (1)
Simple Linear Regression Homework Solutions
6 pages
ML Unit 2
No ratings yet
ML Unit 2
27 pages
MLT Unit 2
No ratings yet
MLT Unit 2
53 pages
Unit 2linear Regression Bayesian Learning
No ratings yet
Unit 2linear Regression Bayesian Learning
12 pages
AI18
No ratings yet
AI18
11 pages
Predictive Modelling Using Linear Regression
No ratings yet
Predictive Modelling Using Linear Regression
12 pages
Session_19&20
No ratings yet
Session_19&20
54 pages
Linear Regression
No ratings yet
Linear Regression
24 pages
Regression
No ratings yet
Regression
45 pages
Linear Regression - Six Sigma Study Guide
No ratings yet
Linear Regression - Six Sigma Study Guide
17 pages
Lab 1
No ratings yet
Lab 1
6 pages
Notes 2
No ratings yet
Notes 2
22 pages
m2 Data analytic and visualization
No ratings yet
m2 Data analytic and visualization
53 pages
Unit-III (Data Analytics)
100% (1)
Unit-III (Data Analytics)
15 pages
Unit 2
No ratings yet
Unit 2
76 pages
Thesis Using Multiple Regression
100% (3)
Thesis Using Multiple Regression
5 pages
Final Answer Bank
No ratings yet
Final Answer Bank
10 pages
Unit 5-1
No ratings yet
Unit 5-1
17 pages
ML Using Python Unit3 pdf
No ratings yet
ML Using Python Unit3 pdf
8 pages
Simple Linear and Logistic Regression
No ratings yet
Simple Linear and Logistic Regression
81 pages
An Introduction To Regression Analysis
No ratings yet
An Introduction To Regression Analysis
7 pages
ML-U2-Regression
No ratings yet
ML-U2-Regression
20 pages
Correlation and Regression: Six Sigma Thinking, #8
From Everand
Correlation and Regression: Six Sigma Thinking, #8
Sumeet Savant
5/5 (1)
Multi-dimensional Monte Carlo Integrations Utilizing Mathematica
From Everand
Multi-dimensional Monte Carlo Integrations Utilizing Mathematica
SUJAUL CHOWDHURY
No ratings yet
8.electronic Spreadsheetnotes
No ratings yet
8.electronic Spreadsheetnotes
23 pages
Assembly Language Programme
No ratings yet
Assembly Language Programme
2 pages
Dissertation Submitted To
No ratings yet
Dissertation Submitted To
5 pages
Bengaluru North University: College: 6109 Adarsha College of Management Studies, Kolar
No ratings yet
Bengaluru North University: College: 6109 Adarsha College of Management Studies, Kolar
1 page
Bengaluru North University: College: 6109 Adarsha College of Management Studies, Kolar
No ratings yet
Bengaluru North University: College: 6109 Adarsha College of Management Studies, Kolar
1 page
Reading 1 Multiple Regression 1
No ratings yet
Reading 1 Multiple Regression 1
59 pages
ECD202 Lec04 2023
No ratings yet
ECD202 Lec04 2023
9 pages
Basic Genetics: One Gene
No ratings yet
Basic Genetics: One Gene
2 pages
Kolmogorov Uji Normalitas
No ratings yet
Kolmogorov Uji Normalitas
19 pages
ML Lab Manual
No ratings yet
ML Lab Manual
29 pages
2024 DiD Handout
No ratings yet
2024 DiD Handout
4 pages
Kinetic Molecular Theory of Gases
No ratings yet
Kinetic Molecular Theory of Gases
4 pages
Econ 335 Wooldridge CH 3
No ratings yet
Econ 335 Wooldridge CH 3
26 pages
Sample Test Final
No ratings yet
Sample Test Final
4 pages
Regression Question Excel Solution Spring 2014-15
No ratings yet
Regression Question Excel Solution Spring 2014-15
5 pages
Assignment - 3 - Lab 5.6 SS
No ratings yet
Assignment - 3 - Lab 5.6 SS
4 pages
1q3b8AXWiBQ80Aki_yDW-q_qNGhtwoVV
No ratings yet
1q3b8AXWiBQ80Aki_yDW-q_qNGhtwoVV
8 pages
Practice Problem Set
No ratings yet
Practice Problem Set
2 pages
Simple Linear Regression
No ratings yet
Simple Linear Regression
5 pages
Quiz
No ratings yet
Quiz
2 pages
Time Series Forecasting Report for Devishree Rose and Devishree Sparkling Wine Sales
No ratings yet
Time Series Forecasting Report for Devishree Rose and Devishree Sparkling Wine Sales
6 pages
Experiment 10 Hardy Weinberg Principle Tutorial
No ratings yet
Experiment 10 Hardy Weinberg Principle Tutorial
4 pages
Yaregal Birhanu
No ratings yet
Yaregal Birhanu
8 pages
Ds Lab 4.Ipynb - TARUN
No ratings yet
Ds Lab 4.Ipynb - TARUN
6 pages
k2 - Attachments - CT Lecture 18a. Multiple Logistic Regression Model 3
No ratings yet
k2 - Attachments - CT Lecture 18a. Multiple Logistic Regression Model 3
27 pages
Example 04.02 Butler With Deliveries-JayDomingoFinal
No ratings yet
Example 04.02 Butler With Deliveries-JayDomingoFinal
75 pages
Lecture 3
No ratings yet
Lecture 3
36 pages
Concepts - Model Evaluation (Data Mining Fundamentals)
No ratings yet
Concepts - Model Evaluation (Data Mining Fundamentals)
40 pages
IE354 Slides 10 Chp11
No ratings yet
IE354 Slides 10 Chp11
68 pages
SML Assignment: 2) For Ridge Regression, If The Regularization Parameter 0, What Does It Mean?
No ratings yet
SML Assignment: 2) For Ridge Regression, If The Regularization Parameter 0, What Does It Mean?
5 pages
Detailed Lesson Plan in Science 10
74% (19)
Detailed Lesson Plan in Science 10
5 pages
Lecture 8. Confidence Intervals for the Mean
No ratings yet
Lecture 8. Confidence Intervals for the Mean
87 pages
Regresi 2
No ratings yet
Regresi 2
3 pages
Statistics 260420
No ratings yet
Statistics 260420
4 pages