0% found this document useful (0 votes)

4 views13 pages

Linear Regression

Uploaded by

mithalibadi8147

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

4 views13 pages

Linear Regression

Uploaded by

mithalibadi8147

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

You are on page 1/ 13

Simple Linear regression

Linear regression is a regression model that uses a straight line to describe the relationship
between variables. It finds the line of best fit through your data by searching for the value of
the regression coefficient(s) that minimizes the total error of the model.

Simple linear regression models describe the effect that a particular variable, called the
explanatory variable, might have on the value of a continuous outcome variable, called the
response variable.

General Concepts
The purpose of a linear regression model is to come up with a function that estimates the
mean of one variable given a particular value of another variable. These variables are known
as the response variable (the “outcome” variable whose mean you are attempting to ﬁnd) and
the explanatory variable (the “predictor” variable whose value you already have).

Deﬁnition of the Model

The simple linear regression model states that the value of a response is expressed as the
following equation: Y|X = β0 + β1X + Eps

Y|X reads as “the valueof Y conditional upon the value of X,

The Eps term represents random error

Residual Assumptions

assumptions made about eps, which are deﬁned as follows

 The value of eps is assumed to be normally distributed such that eps ∼N(0,σ).
 That eps is centered (that is, has a mean of) zero.
 The variance of eps , σ2, is constant.

Parameters

The value denoted by β0 is called the intercept, and that of β1 is called the slope. Together,
they are also referred to as the regression coefﬁcients and are interpreted as follows:

• The intercept, β0, is interpreted as the expected value of the response variable when the
predictor is zero.

• Generally, the slope, β1, is the focus of interest. This is interpreted as the change in the
mean response for each one-unit increase in the predictor.

Estimating the Intercept and Slope Parameters

The ﬁtted model of interest concerns the mean response value, denoted ˆ y, for a speciﬁc
value of the predictor, x, and is written as follows:

Fitting Linear Models with lm

The lm() function creates a linear regression model in R. This function takes an R formula Y
~ X where Y is the outcome variable and X is the predictor variable

In R, the command lm performs the estimation for you.

model<- lm(response ~ predictor, )

example:

model_fit=lm(sbp~age+weight, data = bprep)

The first argument is the now-familiar response ~ predictor formula, which specifies the
desired model. Through data=datasource we specifically instruct lm to look in the object
supplied to the data argument.

If you simply enter the name of the "lm" object at the prompt, it will provide the most basic
output: a repeat of your call and the estimates of the intercept ( ˆ β0) and slope ( ˆ β1).

Call:
lm(formula = sbp ~ age + weight, data = bprep)

Coefficients:
(Intercept) age weight
43.8642 0.2734 0.4398
The model in prepared by using coefficients as follows

Y_cap=43.8642+ 0.2734(x1)+ 0.4398(x2)

By taking different independent values for x, the dependent variable y is responsed.

Summery :
The summary function in R is useful to quickly summarize the values in a vector, data frame,
regression model, or ANOVA model in R. summary function in regards to regression models.
is the summary of the distribution of residuals from the regression model.

Residuals : difference between the observed and predicted values.

Median : residual should be near to zero

The min, max and 1Q, 3Q residuals should be close in magnitude

Estimates are used to predict the value of the response variable

Std error is the average amount that the estimate varies from the actual value.

t-value=estimate/std.error

Pr(>|t|) gives a p value for the t-test to determine if the coefficient is significant.

R-squared gives a measurement of what % of the variance in the response variable can be
explained by the regression
Multiple r squared typically increases each time you add a predictor (x) variable

Adjusted R-squared controls for each additional predictor added (to prevent from
overfitting), so it ma not increase as you add more variables

If your Multiple R-squared is much higher than your adjusted R-squared, your model might
be overfitting

F static indicated if the model as a whole is statistically significant.

the ﬁtted regression line will be added with the function abline as

abline(survfit,lwd=2)

confint() function :
You can use the confint() function in R to calculate a confidence interval for one or more
parameters in a fitted regression model.

This function uses the following basic syntax:

confint(object, parm, level=0.95)

where:

 object: Name of the fitted regression model

 parm: Parameters to calculate confidence interval for (default is all)
 level: Confidence level to use (default is 0.95)

Example: How to Use confint() Function in R

confint(model_fit)
2.5 % 97.5 %
(Intercept) -131.1025620 218.830951
age -1.6770934 2.223863
weight -0.2633077 1.142859
coef() function:
To extract the components of coefficients of an "lm" object, the “direct-access” function
you use is coef().

Ex:

#coefficeients of linear regression model

mycoefs <- coef(model_fit)
mycoefs

(Intercept) age weight

43.8641943 0.2733849 0.4397756

Here, the regression coefﬁcients are extracted from the object and then separately assigned to
the objects beta0.hat and beta1.hat.

Predict() function:
given a trained model, predict the label of a new set of data.

The predict() function in R is used to predict the values based on the input data.

Syntax :

predict(object, newdata, interval)

where

 object: The class inheriting from the linear model

 newdata: Input data to predict the values
 interval: Type of interval calculation

#complete multiple linear regression illustrate all the

functions
bprep=read.csv("F:/bpreport.csv")
bprep

sbp age weight

1 130 52 165
2 133 59 167
3 150 67 180
4 128 73 155
5 151 64 212

#mtrix of scatter plots

pairs(bprep)

#corelation coefficient of matrix

round(cor(bprep),2)

sbp age weight

sbp 1.00 0.19 0.87
age 0.19 1.00 0.00
weight 0.87 0.00 1.00

#fitting model with lm

model_fit=lm(sbp~age+weight, data = bprep)
model_fit

Call:
lm(formula = sbp ~ age + weight, data = bprep)

Coefficients:
(Intercept) age weight
43.8642 0.2734 0.4398

#summary of model_fit
summary(model_fit)

Call:
lm(formula = sbp ~ age + weight, data = bprep)

Residuals:
1 2 3 4 5
-0.6432 -0.4364 8.6594 -3.9865 -3.5933
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 43.8642 40.6649 1.079 0.394
age 0.2734 0.4533 0.603 0.608
weight 0.4398 0.1634 2.691 0.115

Residual standard error: 7.225 on 2 degrees of freedom

Multiple R-squared: 0.7917, Adjusted R-squared: 0.5834
F-statistic: 3.801 on 2 and 2 DF, p-value: 0.2083

#coefficeients of linear regression model

mycoefs <- coef(model_fit)
mycoefs

(Intercept) age weight

43.8641943 0.2733849 0.4397756

confint(model_fit, level=0.95)

(Intercept) age weight

43.8641943 0.2733849 0.4397756

#predict sbp for the given age=40 and weight=200

newdata=data.frame(age=49, weight=210)
predict(model_fit,newdata)

1
149.6129

#predict sbp for the given age=40 and weight=200 with conidance interval
ci=predict(model_fit,newdata, interval = "confidence", level=0.95)
ci

fit lwr upr

1 149.6129 110.6869 188.539

Types of errors in hypothesis testing

Hypothesis testing is performed with the objective of obtaining a p-value in order to quantify
evidence against the null statement H0. This is rejected in favour of the alternative, Ha, if the
p-value is itself less than a predeﬁned signiﬁcance level α, which is conventionally 0.05 or
0.01

To be able to test the validity of your rejection or retention of the null hypothesis, you must
be able to identify two kinds of errors:
TypeI Errors
• A Type I error occurs when you incorrectly reject a true H0. In any given hypothesis test,
the probability of a Type I error is equivalent to the signiﬁcance level α.

If your p-value is less than α, you reject the null statement. If the null is really true, though,
the α directly deﬁnes the probability that you incorrectly reject it. This is referred to as a Type
I error.

Type II error
A Type II error refers to incorrect retention of the null hypothesis—in other words, obtaining
a p-value greater than the signiﬁcance level when it’s the alternative hypothesis that’s
actually true. For the same scenario you’ve been looking at so far (an upper-tailed test for a
single sample mean), Figure 18-3 illustrates the probability of a Type II error, shaded and
denoted β.
ANOVA (Analysis of Variance):

ANOVA is a statistical test used to determine if there are significant differences between

the means of two or more groups. It analyzes the variance within and between groups to

assess whether the differences observed are due to random chance or actual group differences.

ANOVA is commonly used when you have a continuous dependent variable and one or

more categorical independent variables with multiple levels. The test compares the means

across the groups and calculates an F-statistic and p-value to determine if the differences are

statistically significant.

Chi-Square Test:

The Chi-Square test is a statistical test used to examine the association or independence

between two categorical variables. It compares the observed frequencies of each category

with the expected frequencies under the assumption of independence. The test determines

whether there is a significant relationship between the variables based on the discrepancies

between observed and expected frequencies.

Chi-Square tests are often used when you have categorical data and want to determine if

there is a relationship between two variables. It is commonly used in fields such as social

sciences, biology, market research, and quality control.

Thus, ANOVA is used to compare means across multiple groups with continuous
dependent variables and categorical independent variables. On the other hand, Chi-
Square tests assess the association or independence between categorical variables. The
choice between ANOVA and Chi-Square depends on the nature of the variables you are
analyzing and the research question you want to answer.

Types of chisquare
Single Categorical Variable :

Like the Z-test, the one-dimensional chi-squared test is also concerned with
comparing proportions but in a setting where there are more than two
proportions. A chi-squared test is used when you have k levels (or categories) of
a categorical variable and want to hypothesize about their relative frequencies to
ﬁnd out what proportion of n observations fall into each deﬁned category.

Calculation: Chi-Squared Test of Distribution

The quantities of interest are the proportion of n observations in each of k

categories, π1, . . ., πk, for a single mutually exclusive and exhaustive
categorical variable. The null hypothesis deﬁnes hypothesized null values for
each proportion; label these respectively as π0(1), . . ., π0(k). The test statistic
χ2 is given as

where Oi is the observed count and Ei is the expected count in the ith category; i
= 1, . . ., k. The Oi are obtained directly from the raw data, and the expected
counts, Ei = nπ0(i), are merely the product of the overall sample size n with the
respective null proportion for each category.

TwoCategorical Variables:

The chi-squared test can also apply to the situation in which you have two
mutually exclusive and exhaustive categorical variables at hand—call them
variable A and variable B. It is used to detect whether there might be some
inﬂuential relationship (in other words, dependence) between A and B by
looking at the way in which the distribution of frequencies change together with
respect to their categories. If there is no relationship, the distribution of
frequencies in variable A will have nothing to do with the distribution of
frequencies in variable B. As such, this particular variant of the chi-squared test
is called a test of independence and is always performed with the following
hypotheses: H0 : Variables A and B are independent. (or, There is no
relationship between A and B.) HA : Variables A and B are not independent.
(or, There is a relationship between A and B.)

skin <- matrix(c(20,32,8,52,9,72,8,32,16,64,30,12),4,3,

dimnames=list(c("Injection","Tablet","Laser","Herbal"),
c("None","Partial","Full")))

A two-dimensional table presenting frequencies in this fashion is called a

contingency table.

Calculation: Chi-Squared Test of Independence :

To compute the test statistic, presume data are presented as a kr × kc

contingency table, in other words, a matrix of counts, based on two categorical
variables (both mutually exclusive and exhaustive). The focus of the test is the
way in which the frequencies of N observations between the kr levels of the
“row” variable and the kc levels of the “column” variable are jointly distributed.
The test statistic χ2 is given with

Types of anova

1. A one-way ANOVA
2. two-way ANOVA
A one-way ANOVA only involves one factor or independent variable. A two-way ANOVA
involves two independent variables and one dependent variable. The number of
observations (sample size) need not be the same in each group.

CALABRESI, Guido, Some Thoughts On Risk Distribution and The Law of Torts
No ratings yet
CALABRESI, Guido, Some Thoughts On Risk Distribution and The Law of Torts
56 pages
Net It A Snapshot of Contemporary Architecture Design and Photography in Italy Scagline Pino (Editor)
No ratings yet
Net It A Snapshot of Contemporary Architecture Design and Photography in Italy Scagline Pino (Editor)
82 pages
Sandesh CS
No ratings yet
Sandesh CS
32,767 pages
RiP Final Study
No ratings yet
RiP Final Study
35 pages
R-Programming - Unit 5
No ratings yet
R-Programming - Unit 5
43 pages
Experiment No.8 - Fit Simple Linear Regression Models Using Built-In Functions.
No ratings yet
Experiment No.8 - Fit Simple Linear Regression Models Using Built-In Functions.
8 pages
Approaches To Language Curriculum Development
No ratings yet
Approaches To Language Curriculum Development
18 pages
02 Experiment 2 DEKP2213 Sem2 20222023
No ratings yet
02 Experiment 2 DEKP2213 Sem2 20222023
10 pages
Business Analytics Unit - V Notes - 60637708 - 2025 - 05 - 15 - 02 - 16
No ratings yet
Business Analytics Unit - V Notes - 60637708 - 2025 - 05 - 15 - 02 - 16
37 pages
Cambridge Assessment International Education: Hindi As A Second Language 0549/01 October/November 2019
No ratings yet
Cambridge Assessment International Education: Hindi As A Second Language 0549/01 October/November 2019
7 pages
Installation Manual: Glendinning Electronic Engine Controls
100% (1)
Installation Manual: Glendinning Electronic Engine Controls
56 pages
Linear Regression
No ratings yet
Linear Regression
26 pages
R Stastics PDF
No ratings yet
R Stastics PDF
30 pages
BDA Exp7 Removed
No ratings yet
BDA Exp7 Removed
4 pages
Graded Homework 1 Solutions
No ratings yet
Graded Homework 1 Solutions
19 pages
Chapter 14
No ratings yet
Chapter 14
65 pages
Module 3: Linear Regression: TMA4268 Statistical Learning V2025
No ratings yet
Module 3: Linear Regression: TMA4268 Statistical Learning V2025
110 pages
AgriculturalEngineering - Icomm PDF
100% (1)
AgriculturalEngineering - Icomm PDF
223 pages
Crash 2024 07 26 - 20.04.54 Client
No ratings yet
Crash 2024 07 26 - 20.04.54 Client
31 pages
Lab 8
No ratings yet
Lab 8
30 pages
5、How Much Inequity Do You See? Structural Power, Perceptions of Gender and Racial Inequity, And Support for Diversity Initiatives
No ratings yet
5、How Much Inequity Do You See? Structural Power, Perceptions of Gender and Racial Inequity, And Support for Diversity Initiatives
25 pages
Chapter ID8757 Clean Version
No ratings yet
Chapter ID8757 Clean Version
15 pages
WINSEM2024-25 CSE3506 ELA CH2024250502181 Reference Material III 21-12-2024 21NEW3
No ratings yet
WINSEM2024-25 CSE3506 ELA CH2024250502181 Reference Material III 21-12-2024 21NEW3
7 pages
Chapter4 Notes
No ratings yet
Chapter4 Notes
18 pages
Simple Linear and Logistic Regression
No ratings yet
Simple Linear and Logistic Regression
81 pages
Exam 1 Notes
No ratings yet
Exam 1 Notes
4 pages
Regression Analysis Assignment1111
No ratings yet
Regression Analysis Assignment1111
13 pages
F Regression
No ratings yet
F Regression
65 pages
Machine Learning-Lecture 1 (Student)
No ratings yet
Machine Learning-Lecture 1 (Student)
14 pages
Simple Linear Regression
No ratings yet
Simple Linear Regression
27 pages
Statics Thinking-Regression
No ratings yet
Statics Thinking-Regression
51 pages
NotaryStateAtLarge PDF
No ratings yet
NotaryStateAtLarge PDF
3 pages
Linearregression
No ratings yet
Linearregression
18 pages
Brown Hauenstein 2005 Interrater Agreement Reconsidered An Alternative To The RWG Indices
No ratings yet
Brown Hauenstein 2005 Interrater Agreement Reconsidered An Alternative To The RWG Indices
20 pages
Lecture 16 Regression
No ratings yet
Lecture 16 Regression
30 pages
3CP10 Final MJJ Linear Regression
No ratings yet
3CP10 Final MJJ Linear Regression
68 pages
Ic Types
No ratings yet
Ic Types
36 pages
Unit 1 Network and Security New Study Notes
No ratings yet
Unit 1 Network and Security New Study Notes
6 pages
Lecture 3.1
No ratings yet
Lecture 3.1
21 pages
Statistical Analysis
No ratings yet
Statistical Analysis
26 pages
Module III (Part II) (Regression and Time Series)
No ratings yet
Module III (Part II) (Regression and Time Series)
118 pages
Regression Equation: Independent Variable Predictor Variable Explanatory Variable Dependent Variable Response Variable
No ratings yet
Regression Equation: Independent Variable Predictor Variable Explanatory Variable Dependent Variable Response Variable
60 pages
HW5
No ratings yet
HW5
8 pages
Stats101A - Chapter 2
No ratings yet
Stats101A - Chapter 2
59 pages
Laporan Stressing (Stressing Report) : Precast I - Girder Segmental H-125cm L-16m CTC-220cm
No ratings yet
Laporan Stressing (Stressing Report) : Precast I - Girder Segmental H-125cm L-16m CTC-220cm
1 page
Simple Regression
100% (1)
Simple Regression
50 pages
6th Lecture Note 108335647 230518 203102
No ratings yet
6th Lecture Note 108335647 230518 203102
35 pages
Simple Linear Regression Analysis - Final
No ratings yet
Simple Linear Regression Analysis - Final
46 pages
Exercice V
No ratings yet
Exercice V
5 pages
Investment Grade Energy Auditor Certification Detailv5
No ratings yet
Investment Grade Energy Auditor Certification Detailv5
20 pages
Linear Regression
100% (2)
Linear Regression
228 pages
Regression Analysis Using R
No ratings yet
Regression Analysis Using R
17 pages
ASSIGNMENT Marketing
No ratings yet
ASSIGNMENT Marketing
5 pages
D Linear Regression With R
No ratings yet
D Linear Regression With R
9 pages
MAP 716 Lecture 4 Simple Linear Regression
No ratings yet
MAP 716 Lecture 4 Simple Linear Regression
23 pages
Which Test When: 1 Exploratory Tests
No ratings yet
Which Test When: 1 Exploratory Tests
5 pages
1130048585final Petition
No ratings yet
1130048585final Petition
50 pages
Supervised Learning With R
No ratings yet
Supervised Learning With R
30 pages
TCMG - MEEG 573 - SP - 20 - Lecture - 7
No ratings yet
TCMG - MEEG 573 - SP - 20 - Lecture - 7
69 pages
Using R For Linear Regression
No ratings yet
Using R For Linear Regression
9 pages
R Unit 4th and 5th
No ratings yet
R Unit 4th and 5th
17 pages
SC&RP - Unit 5
No ratings yet
SC&RP - Unit 5
36 pages
R Lab 4
No ratings yet
R Lab 4
7 pages
Reg Analysis
No ratings yet
Reg Analysis
63 pages
18 SL Regression 1 320E F21
No ratings yet
18 SL Regression 1 320E F21
40 pages
Weatherwax Weisberg Solutions
No ratings yet
Weatherwax Weisberg Solutions
162 pages
101 Square Meals
No ratings yet
101 Square Meals
129 pages
Disaster Response Using Robot Tech
No ratings yet
Disaster Response Using Robot Tech
4 pages
Extended Response Questions From IB Test Bank Responses MICRO MACRO Nkv2zc
100% (1)
Extended Response Questions From IB Test Bank Responses MICRO MACRO Nkv2zc
28 pages
Detailed Lesson Plan in Earth Scienc1
No ratings yet
Detailed Lesson Plan in Earth Scienc1
7 pages
Mindanao State University General Santos City: Simple Linear Regression
No ratings yet
Mindanao State University General Santos City: Simple Linear Regression
12 pages
@regression
No ratings yet
@regression
33 pages
Pitch and Frequency
No ratings yet
Pitch and Frequency
3 pages
Regression
No ratings yet
Regression
24 pages
Topic Simple Linear Regression
No ratings yet
Topic Simple Linear Regression
38 pages
Monthly Updates: Lead Sponsor
No ratings yet
Monthly Updates: Lead Sponsor
20 pages
What Is Linear Regression
No ratings yet
What Is Linear Regression
14 pages
Linear Regression With LM Function, Diagnostic Plots, Interaction Term, Non-Linear Transformation of The Predictors, Qualitative Predictors
100% (1)
Linear Regression With LM Function, Diagnostic Plots, Interaction Term, Non-Linear Transformation of The Predictors, Qualitative Predictors
15 pages
Inference in Regression: Brian Caffo, Jeff Leek and Roger Peng Johns Hopkins Bloomberg School of Public Health
No ratings yet
Inference in Regression: Brian Caffo, Jeff Leek and Roger Peng Johns Hopkins Bloomberg School of Public Health
14 pages
Cyanide Facts
No ratings yet
Cyanide Facts
8 pages
List of Documents To Be Attached With The Application Form For Registration As Professional Engineer (Pe) (Through Epe)
100% (1)
List of Documents To Be Attached With The Application Form For Registration As Professional Engineer (Pe) (Through Epe)
6 pages
Module05 Notes
No ratings yet
Module05 Notes
19 pages
Independent Contractor Agreement 19
100% (1)
Independent Contractor Agreement 19
6 pages
Linear Regression Experiment
No ratings yet
Linear Regression Experiment
6 pages
Computer Networks
No ratings yet
Computer Networks
29 pages
Chapter 3 Notes
No ratings yet
Chapter 3 Notes
5 pages
0 Offer Letter
No ratings yet
0 Offer Letter
5 pages
Multi-dimensional Monte Carlo Integrations Utilizing Mathematica
From Everand
Multi-dimensional Monte Carlo Integrations Utilizing Mathematica
SUJAUL CHOWDHURY
No ratings yet
Exercises of Statistical Inference
From Everand
Exercises of Statistical Inference
Simone Malacrida
No ratings yet
Top Numerical Methods With Matlab For Beginners!
From Everand
Top Numerical Methods With Matlab For Beginners!
Andrei Besedin
No ratings yet

Linear Regression

Uploaded by

Linear Regression

Uploaded by

Simple Linear regression

Deﬁnition of the Model

Y|X reads as “the valueof Y conditional upon the value of X,

The Eps term represents random error

assumptions made about eps, which are deﬁned as follows

Estimating the Intercept and Slope Parameters

Fitting Linear Models with lm

In R, the command lm performs the estimation for you.

model<- lm(response ~ predictor, )

model_fit=lm(sbp~age+weight, data = bprep)

Y_cap=43.8642+ 0.2734(x1)+ 0.4398(x2)

By taking different independent values for x, the dependent variable y is responsed.

Residuals : difference between the observed and predicted values.

Median : residual should be near to zero

The min, max and 1Q, 3Q residuals should be close in magnitude

Estimates are used to predict the value of the response variable

F static indicated if the model as a whole is statistically significant.

This function uses the following basic syntax:

confint(object, parm, level=0.95)

 object: Name of the fitted regression model

Example: How to Use confint() Function in R

#coefficeients of linear regression model

(Intercept) age weight

predict(object, newdata, interval)

 object: The class inheriting from the linear model

#complete multiple linear regression illustrate all the

sbp age weight

#mtrix of scatter plots

#corelation coefficient of matrix

sbp age weight

#fitting model with lm

Residual standard error: 7.225 on 2 degrees of freedom

#coefficeients of linear regression model

(Intercept) age weight

(Intercept) age weight

#predict sbp for the given age=40 and weight=200

fit lwr upr

Types of errors in hypothesis testing

between observed and expected frequencies.

sciences, biology, market research, and quality control.

Calculation: Chi-Squared Test of Distribution

The quantities of interest are the proportion of n observations in each of k

skin <- matrix(c(20,32,8,52,9,72,8,32,16,64,30,12),4,3,

A two-dimensional table presenting frequencies in this fashion is called a

Calculation: Chi-Squared Test of Independence :

To compute the test statistic, presume data are presented as a kr × kc

You might also like