0% found this document useful (0 votes)
1 views

unit5_R

Gji

Uploaded by

Surivkl Vkl
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
1 views

unit5_R

Gji

Uploaded by

Surivkl Vkl
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 5

BCA V SEM R PROGRAMMING RAJADHANI DEGREE COLLGE

UNIT-5
What is Linear Regression?
 It is a statistical method that is used for predictive analysis.
 Linear regression algorithm shows a linear relationship between a
dependent (y) and one or more independent (y) variables, hence called as
linear regression. Since linear regression shows the linear relationship,
which means it finds how the value of the dependent variable changes
according to the value of the independent variable. It is mathematically
denoted as y=ax+b
Linear Regression Line:A linear line showing the relationship between the
dependent and independent variables is called a regression line. A regression line
can show two types of relationship:
Positive Linear Relationship: If the dependent variable increases on the Y-axis
and the independent variable increases on the X-axis, then such a relationship is
termed as a Positive linear relationship.
Negative Linear Relationship: If the dependent variable decreases on the Y-axis
and independent variable increases on the X-axis, then such a relationship is
called a negative linear relationship.
There are two types of linear regression.
 Simple Linear Regression
 Multiple Linear Regression
Simple Linear Regression
Simple linear regression is used to estimate the relationship between two
quantitative variables. Simple linear regression uses only one independent
variable. You can use simple linear regression when you want to know:
 How strong the relationship is between two variables (e.g., the relationship
between rainfall and soil erosion).
 The value of the dependent variable at a certain value of the independent
variable (e.g., the amount of soil erosion at a certain level of rainfall).
Assumptions of simple linear regression
Simple linear regression is a parametric test, meaning that it makes certain
assumptions about the data. These assumptions are:
Homogeneity of variance (homoscedasticity): the size of the error in our
prediction doesn’t change significantly across the values of the independent
variable.

INNAHAI ANUGRAHAM
BCA V SEM R PROGRAMMING RAJADHANI DEGREE COLLGE

Independence of observations: the observations in the dataset were collected


using statistically valid sampling methods, and there are no hidden relationships
among observations.
Normality: The data follows a normal distribution.
Linear regression makes one additional assumption:
The relationship between the independent and dependent variable is linear: the
line of best fit through the data points is a straight line (rather than a curve or
some sort of grouping factor).

Multiple linear regression: Multiple linear regression is used to estimate the


relationship between two or more independent variables and one dependent
variable. You can use multiple linear regression when you want to know:
 How strong the relationship is between two or more independent variables
and one dependent variable (e.g. how rainfall, temperature, and amount of
fertilizer added affect crop growth).
 The value of the dependent variable at a certain value of the independent
variables (e.g. the expected yield of a crop at certain levels of rainfall,
temperature, and fertilizer addition).
Assumptions of multiple linear regression
Multiple linear regression makes all of the same assumptions as simple linear
regression namely Homogeneity of variance, Independence of observations,
Normality, Linearity.

Implementation in R
In R programming, lm() function is used to create linear regression model.
Syntax: lm(formula,data)
 Formula: This is symbolic description of a model to be fitted. It is written
in the form response ~ predictor1+predictor2+….(reponse is dependent
variable and predictor is independent variable, in case of simple linear
regression only one predictor and one or more for Multiple linear
Regression)
 Data: specifies the data frame containing the variables in the formula.
Example:
# Create the data frame
data <- data.frame(
Years_Exp = c(1.1, 1.3, 1.5, 2.0, 2.2, 2.9, 3.0, 3.2, 3.2, 3.7),
Salary = c(39343.00, 46205.00, 37731.00, 43525.00,
39891.00, 56642.00, 60150.00, 54445.00, 64445.00,
57189.00)

INNAHAI ANUGRAHAM
BCA V SEM R PROGRAMMING RAJADHANI DEGREE COLLGE

# Fitting Simple Linear Regression to the Training set


lm.r= lm(formula = Salary ~ Years_Exp,data = data)
#Summary of the model
summary(lm.r)
Output:

Call:
lm(formula = Salary ~ Years_Exp, data = data)
Residuals:
1 2 3 5 6 8 10
463.1 5879.1 -4041.0 -6942.0 4748.0 381.9 -489.1
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 30927 4877 6.341 0.00144 **
Years_Exp 7230 1983 3.645 0.01482 *
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Residual standard error: 4944 on 5 degrees of freedom
Multiple R-squared: 0.7266, Adjusted R-squared: 0.6719
F-statistic: 13.29 on 1 and 5 DF, p-value: 0.01482

Predict() function in R
This is a built-in function in the R language used to extract predicted values from
complex machine-learning models widely used by analysts.
Syntax:
predict(object, newdata, interval)
 object: The class inheriting from the linear model
 newdata: Input data to predict the values
 interval: Type of interval calculation
Example:
Library “cars”
df <- datasets::cars
speed dist
1 4 2
2 4 10
3 7 4
4 7 22

INNAHAI ANUGRAHAM
BCA V SEM R PROGRAMMING RAJADHANI DEGREE COLLGE

5 8 16
6 9 10
7 10 18
8 10 26
9 10 34
10 11 17
# Creates a linear model
my_linear_model <- lm(dist~speed, data = df)

# Prints the model results


my_linear_model
Call:
lm(formula = dist ~ speed, data = df)
output:
Coefficients:
(Intercept) speed
-17.579 3.932
# Creating a data frame
variable_speed <- data.frame(speed = c(11,11,12,12,12,12,13,13,13,13))

# Fiting the linear model


linear_model <- lm(dist~speed, data = df)

# Predicts the future values


predict(linear_model, newdata = variable_speed)
output:

1 2 3 4 5
25.67740 25.67740 29.60981 29.60981 29.60981
6 7 8 9 10
29.60981 33.54222 33.54222 33.54222 33.54222
LINEAR MODEL SELECTION
It is often the case that some or many of the variables used in a multiple regression
model are in fact not associated with the response variable. Including such
irrelevant variables leads to unnecessary complexity in the resulting model.
Unfortunately, manually filtering through and comparing regression models can
be tedious. Luckily, several approaches exist for automatically performing feature
selection or variable selection — that is, for identifying those variables that result
in superior regression results. This leads to the concept of model selection.

INNAHAI ANUGRAHAM
BCA V SEM R PROGRAMMING RAJADHANI DEGREE COLLGE

Linear Regression Diagnostics in R


Linear regression diagnostics in R are essential for assessing the validity and
reliability of the linear regression model’s assumptions and for detecting potential
issues that may affect the model’s performance. Below is a theoretical explanation
of some common linear regression diagnostics in R.

Certainly, let’s delve into diagnostic procedures for linear regression in R. We’ll
focus on the key diagnostic checks:

 Residual Analysis: Check for patterns in the residuals.


 Outlier Detection: Identify potential outliers.
 Influence and Cook’s Distance: Identify influential observations.
 Multicollinearity: Check for high correlation between predictors.
Example:
# Example: Residual analysis
#model <- lm(Y ~ X, data = data)

# Create diagnostic plots (residuals vs. fitted values, residuals vs. normal
quantiles,
#and a histogram of residuals)
par(mfrow = c(2, 2))
plot(model)

INNAHAI ANUGRAHAM

You might also like