0% found this document useful (0 votes)

29 views10 pages

Linear Model

Uploaded by

parasf0143

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

29 views10 pages

Linear Model

Uploaded by

parasf0143

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

You are on page 1/ 10

What is Statistical Modelling?

Statistical modelling can be defined as the method of using different statistical techniques for
describing, analysing and making predictions on the relationships within the data. It mainly involves
creating representations or models for capturing underlying patterns, structures and associations in
data, mathematically.

Statistical Modelling Techniques

Statistical modelling techniques are methods used to analyse data and uncover relationships,
patterns, and insights within it. These techniques involve the application of statistical principles to
create models that represent the underlying structure of the data.

Types of Statistical Models in R

1. Linear Models

At the core of statistical modelling, linear models form a cornerstone. They establish relationships
between a dependent variable and one or more independent variables, assuming a linear
connection. These models offer simplicity, interpretability, and a strong theoretical basis, making
them invaluable for understanding data patterns and making predictions.

Linear Regression Examples:

1. Real Estate Pricing: Predicting the selling price of houses based on features like square
footage, number of bedrooms, location, and age of the property. In this scenario, the
response variable (house price) is continuous, and a linear relationship is typically assumed
between the house prices and the features.

2. Academic Performance: Estimating a student’s final grade based on continuous predictors

such as hours spent studying, attendance rate, and scores in previous exams. The final grade
is a continuous outcome expected to have a linear relationship with these predictors.

Linear Regression is employed to predict a continuous numerical outcome based on one or more
predictors. Its simplicity and interpretability make it a popular choice.

model <- lm(mpg ~ hp + drat + wt, data = mtcars)

#view model summary

summary(model)

lm(formula = mpg ~ hp + drat + wt, data = mtcars)

Part 1

Residual

In linear regression, a residual is the difference between the actual value and the value predicted
by the model. It is calculated as the observed value minus the predicted value. A least-squares
regression model minimizes the sum of the squared residuals. If the observed value is larger than
the predicted value, the residual is positive, and if the predicted value is larger than the observed
value, the residual is negative
Ex:

Residuals:

Min 1Q Median 3Q Max

-3.3598 -1.8374 -0.5099 0.9681 5.7078

The minimum residual was -3.3598, the median residual was -0.5099 and the max residual
was 5.7078.

Part 2

Coefficient

The linear regression coefficients describe the mathematical relationship between each
independent variable and the dependent variable. The p values for the coefficients indicate
whether these relationships are statistically significant.

Coefficients:

Estimate Std. Error t value Pr(>|t|)

(Intercept) 29.394934 6.156303 4.775 5.13e-05 ***

hp -0.032230 0.008925 -3.611 0.001178 **

drat 1.615049 1.226983 1.316 0.198755

wt -3.227954 0.796398 -4.053 0.000364 ***

---

Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

We can use these coefficients to form the following estimated regression equation:

mpg = 29.39 – .03hp + 1.62drat – 3.23*wt

For each predictor variable, we’re given the following values:

Variables of interest in an experiment (those that are measured or observed) are called response
or dependent variables. Other variables in the experiment that affect the response and can be set
or measured by the experimenter are called predictor, explanatory, or independent variables.

Estimate: The estimated coefficient. This tells us the average increase in the response variable
associated with a one unit increase in the predictor variable, assuming all other predictor variables
are held constant.

Std. Error: This is the standard error of the coefficient. This is a measure of the uncertainty in our
estimate of the coefficient.

t value: This is the t-statistic for the predictor variable, calculated as (Estimate) / (Std. Error).

Pr(>|t|): This is the p-value that corresponds to the t-statistic. If this value is less than some alpha
level (e.g. 0.05) than the predictor variable is said to be statistically significant.
If we used an alpha level of α = .05 to determine which predictors were significant in this
regression model, we’d say that hp and wt are statistically significant predictors while drat is not.

Part 3

Assessing Model Fit

Residual standard error: 2.561 on 28 degrees of freedom

Multiple R-squared: 0.8369, Adjusted R-squared: 0.8194

F-statistic: 47.88 on 3 and 28 DF, p-value: 3.768e-11

Residual standard error: This tells us the average distance that the observed values fall from the
regression line. The smaller the value, the better the regression model is able to fit the data.

The degrees of freedom is calculated as n-k-1 where n = total observations and k = number of
predictors. In this example, mtcars has 32 observations and we used 3 predictors in the regression
model, thus the degrees of freedom is 32 – 3 – 1 = 28.

Multiple R-Squared: This is known as the coefficient of determination. It tells us the proportion of
the variance in the response variable that can be explained by the predictor variables.

This value ranges from 0 to 1. The closer it is to 1, the better the predictor variables are able to
predict the value of the response variable.

Adjusted R-squared: This is a modified version of R-squared that has been adjusted for the number
of predictors in the model. It is always lower than the R-squared.

The adjusted R-squared can be useful for comparing the fit of different regression models that use
different numbers of predictor variables.

F-statistic: This indicates whether the regression model provides a better fit to the data than a
model that contains no independent variables. In essence, it tests if the regression model as a
whole is useful.

p-value: This is the p-value that corresponds to the F-statistic. If this value is less than some
significance level (e.g. 0.05), then the regression model fits the data better than a model with no
predictors.

“When building regression models, we hope that this p-value is less than some significance level
because it indicates that the predictor variables are actually useful for predicting the value of the
response variable.”
II. ANOVA (Analysis of Variance) compares means across different groups which is particularly useful
for experimental designs.

ANCOVA (Analysis of Covariance) extends ANOVA by incorporating continuous covariates to account

for their influence on the response variable.

To perform an ANOVA test in R, you can use the

aov function or the anova_test function from the rstatix package, which provides a user-friendly
framework to perform ANOVA tests.

ANOVA tests are of two types:

 One-way ANOVA: One-way When there is a single categorical independent variable (also
known as a factor) and a single continuous dependent variable, an ANOVA is employed. It
seeks to ascertain whether there are any notable variations in the dependent variable’s
means across the levels of the independent variable.
[A one-way ANOVA is used to determine whether or not there is a statistically significant
difference between the means of three or more independent groups.]

 Two-way ANOVA: When there are two categorical independent variables (factors) and one
continuous dependent variable, two-way ANOVA is used as an extension of one-way ANOVA.
You can evaluate both the direct impacts of each independent variable and how they interact
with one another on the dependent variable.

Step 1: Create the Data

 Suppose we want to determine if three different workout
programs lead to different average weight loss in
individuals.
 To test this, we recruit 90 people to participate in an
experiment in which we randomly assign 30 people to
follow either program A, program B, or program C for
one month.
 data <- data.frame(program = rep(c('A', 'B', 'C'), each = 30),
 weight_loss = c(runif(30, 0, 3),
 runif(30, 0, 5),
 runif(30, 1, 7)))

 #view first six rows of data frame
 head(data)

 program weight_loss
 1 A 2.6900916
 2 A 0.7965260
 3 A 1.1163717
 4 A 1.7185601
 5 A 2.7246234
 6 A 0.6050458
 model <- aov(weight_loss ~ program, data = data)
 #view summary of one-way ANOVA model

 summary(model)

 Df Sum Sq Mean Sq F value Pr(>F)
 program 2 98.93 49.46 30.83
7.55e-11 ***
 Residuals 87 139.57 1.60
 ---
 Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Here’s how to interpret every value in the output:

Df program: The degrees of freedom for the variable program. This is calculated as #groups -1. In this
case, there were 3 different workout programs, so this value is: 3-1 = 2.

Df Residuals: The degrees of freedom for the residuals. This is calculated as

#total observations – # groups. In this case, there were 90 observations and 3 groups, so this value is:
90 -3 = 87.

Sum Sq program: The sum of squares associated with the variable program. This value is 98.93.

Sum Sq Residuals: The sum of squares associated with the residuals or “errors.” This value is 139.57.

Mean Sq. Program: The mean sum of squares associated with program. This is calculated as Sum Sq.
program / Df program. In this case, this is calculated as: 98.93 / 2 = 49.46.

Mean Sq. Residuals: The mean sum of squares associated with the residuals. This is calculated as
Sum Sq. residuals / Df residuals. In this case, this is calculated as: 139.57 / 87 = 1.60.

F Value: The overall F-statistic of the ANOVA model. This is calculated as Mean Sq. program / Mean
sq. Residuals. In this case, it is calculated as: 49.46 / 1.60 = 30.83.

Pr(>F): The p-value associated with the F-statistic with numerator df = 2 and denominator df = 87. In
this case, the p-value is 7.552e-11, which is an extremely tiny number.

The most important value in the entire output is the p-value because this tells us whether there is a
significant difference in the mean values between the three groups.

Since the p-value in our ANOVA table (.7552e-11) is less than .05, we have sufficient evidence to
reject the null hypothesis.

Since each of the adjusted p-values is less than .05, we can conclude that there is a significant
difference in mean weight loss between each group.
2. Generalised Linear Models (GLMs)

Generalised Linear Models (GLMs) expand the capabilities of linear models by accommodating a
wider range of response variable types. Traditional linear regression assumes a normal distribution
for the outcome, whereas GLMs can handle response variables that follow different probability
distributions.

1. Disease Diagnosis: Predicting the probability of a patient having a particular disease (say,
diabetes) based on various factors like age, body mass index, family history, and blood
pressure. This is a binary outcome (disease: yes/no), making logistic regression, a type of
GLM, appropriate.

2. Traffic Accident Count Analysis: Modeling the number of traffic accidents occurring at an
intersection based on factors like traffic volume, day of the week, and weather conditions.
Since the response variable (number of accidents) is a count, a Poisson regression, which is a
GLM suitable for count data, would be appropriate.

Logistic Regression is tailored for predicting binary outcomes, making it invaluable for classification
tasks.

Poisson Regression is suitable for counting data, modelling phenomena like the number of
occurrences within a specific time period.

glm() : function is used to perform

This function uses the following syntax:

glm(formula, family=gaussian, data, …)

where:

 formula: The formula for the linear model (e.g. y ~ x1 + x2)

 family: The statistical family to use to fit the model. Default is gaussian but other options
include binomial, Gamma, and poisson among others.

 data: The name of the data frame that contains the data
 head(mtcars)

 mpg cyl disp hp drat wt qsec vs
am gear carb
 Mazda RX4 21.0 6 160 110 3.90 2.620 16.46 0 1 4
4
 Mazda RX4 Wag 21.0 6 160 110 3.90 2.875 17.02 0 1 4
4
 Datsun 710 22.8 4 108 93 3.85 2.320 18.61 1 1 4
1
 Hornet 4 Drive 21.4 6 258 110 3.08 3.215 19.44 1 0 3
1
 Hornet Sportab 18.7 8 360 175 3.15 3.440 17.02 0 0 3
2
 Valiant 18.1 6 225 105 2.76 3.460 20.22 1 0 3
1

We will use the variables disp and hp to predict the probability that a given car takes on a value of 1
for the am variable.

# Load the dataset

data(mtcars)

# Fit a logistic regression model

model <- glm(am ~ disp + hp, data = mtcars, family = binomial)

# View the model summary

summary(model)

Call:

glm(formula = am ~ disp + hp, family = binomial, data = mtcars)

Part1

Deviance Residuals:

Min 1Q Median 3Q Max

-1.9665 -0.3090 -0.0017 0.3934 1.3682

Part 2

Coefficients:

Estimate Std. Error z value Pr(>|z|)

(Intercept) 1.40342 1.36757 1.026 0.3048

disp -0.09518 0.04800 -1.983 0.0474 *

hp 0.12170 0.06777 1.796 0.0725 .

---

Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

(Dispersion parameter for binomial family taken to be 1)

For example, a one unit increase in the predictor variable disp is associated with an average change
of -0.09518 in the log odds of the response variable am taking on a value of 1. This means that higher
values of disp are associated with a lower likelihood of the am variable taking on a value of 1.

The standard error gives us an idea of the variability associated with the coefficient estimate. We
then divide the coefficient estimate by the standard error to obtain a z value.

For example, the z value for the predictor variable disp is calculated as -.09518 / .048 = -1.983.

The p-value Pr(>|z|) tells us the probability associated with a particular z value. This essentially tells
us how well each predictor variable is able to predict the value of the response variable in the model.

For example, the p-value associated with the z value for the disp variable is .0474. Since this value is
less than .05, we would say that disp is a statistically significant predictor variable in the model.

The null deviance in the output tells us how well the response variable can be predicted by a model
with only an intercept term.

Part 3

Null deviance: 43.230 on 31 degrees of freedom

Residual deviance: 16.713 on 29 degrees of freedom

AIC: 22.713

Number of Fisher Scoring iterations: 8

The residual deviance tells us how well the response variable can be predicted by the specific model
that we fit with p predictor variables. The lower the value, the better the model is able to predict the
value of the response variable.

To determine if a model is “useful” we can compute the Chi-Square statistic as:

X2 = Null deviance – Residual deviance

with p degrees of freedom.

We can then find the p-value associated with this Chi-Square statistic. The lower the p-value, the
better the model is able to fit the dataset compared to a model with just an intercept term.

The null deviance in the output tells us how well the response variable can be predicted by a model
with only an intercept term.
The residual deviance tells us how well the response variable can be predicted by the specific model
that we fit with p predictor variables. The lower the value, the better the model is able to predict the
value of the response variable.

3. Nonlinear Models

It represents complex relationships between variables that straight lines cannot adequately
capture. These models offer greater flexibility to fit data exhibiting curves, peaks, or other non-linear
patterns.

By accommodating a wider range of functional forms, nonlinear models often provide more accurate
and informative insights in comparison to their linear counterparts. We employ Nonlinear Least
Squares to fit models with complex, non-linear patterns in the data.

Maximum Likelihood model

A likelihood ratio test compares the goodness of fit of two nested regression models.

A nested model is simply one that contains a subset of the predictor variables in the overall
regression model.

library(lmtest)

#fit full model

model_full <- lm(mpg ~ disp + carb + hp + cyl, data = mtcars)

#fit reduced model

model_reduced <- lm(mpg ~ disp + carb, data = mtcars)

fc #perform likelihood ratio test for differences in models

lrtest(model_full, model_reduced)

Likelihood ratio test

Model 1: mpg ~ disp + carb + hp + cyl

Model 2: mpg ~ disp + carb

#Df LogLik Df Chisq Pr(>Chisq)

1 6 -77.558

2 4 -78.603 -2 2.0902 0.3517

From the output we can see that the Chi-Squared test-statistic is 2.0902 and the corresponding p-
value is 0.3517.

Since this library(lmtest)

#fit full model

model_full <- lm(mpg ~ disp + carb, data = mtcars)

#fit reduced model

model_reduced <- lm(mpg ~ disp, data = mtcars)

#perform likelihood ratio test for differences in models

lrtest(model_full, model_reduced)

Likelihood ratio test

Model 1: mpg ~ disp + carb

Model 2: mpg ~ disp

#Df LogLik Df Chisq Pr(>Chisq)

1 4 -78.603

2 3 -82.105 -1 7.0034 0.008136 **

---

Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

p-value is not less than .05, we will fail to reject the null hypothesis.

From the output we can see that the p-value of the likelihood ratio test is 0.008136. Since this is less
than .05, we would reject the null hypothesis.

Data Analytics Unit 3
No ratings yet
Data Analytics Unit 3
104 pages
15multiple Linear Regression
No ratings yet
15multiple Linear Regression
168 pages
R-Programming - Unit 5
No ratings yet
R-Programming - Unit 5
43 pages
DMV Unit 3 PPT - RSK - 250419 - 125620 Jfhuehiwhu
No ratings yet
DMV Unit 3 PPT - RSK - 250419 - 125620 Jfhuehiwhu
89 pages
Introduction of Regression
No ratings yet
Introduction of Regression
57 pages
Business Analytics Unit - V Notes - 60637708 - 2025 - 05 - 15 - 02 - 16
No ratings yet
Business Analytics Unit - V Notes - 60637708 - 2025 - 05 - 15 - 02 - 16
37 pages
Multiple Regression
No ratings yet
Multiple Regression
61 pages
R Stastics PDF
No ratings yet
R Stastics PDF
30 pages
Regression PDF
No ratings yet
Regression PDF
33 pages
Dar Lec10
No ratings yet
Dar Lec10
22 pages
Regression Linear
No ratings yet
Regression Linear
24 pages
Statistical Modelling
No ratings yet
Statistical Modelling
39 pages
Regression GL M
No ratings yet
Regression GL M
315 pages
3.regression Slides
100% (1)
3.regression Slides
25 pages
Statistical Analysis
No ratings yet
Statistical Analysis
26 pages
Chapter4 Notes
No ratings yet
Chapter4 Notes
18 pages
9 W9INSE6220 Fall 2023
No ratings yet
9 W9INSE6220 Fall 2023
42 pages
Simple Linear Regression
No ratings yet
Simple Linear Regression
20 pages
Unit 3
No ratings yet
Unit 3
24 pages
04 BasicAnalyses
No ratings yet
04 BasicAnalyses
44 pages
R Notesss
No ratings yet
R Notesss
12 pages
Math2831 Course Pack
No ratings yet
Math2831 Course Pack
246 pages
Linearregression
No ratings yet
Linearregression
18 pages
R Module 11 - Statistics
No ratings yet
R Module 11 - Statistics
35 pages
Regression An Ova
No ratings yet
Regression An Ova
24 pages
Sec2 Regression PDF
No ratings yet
Sec2 Regression PDF
183 pages
MIT 302 - Statistical Computing II - Tutorial 03
No ratings yet
MIT 302 - Statistical Computing II - Tutorial 03
16 pages
Ch18 Multiple Regression
No ratings yet
Ch18 Multiple Regression
51 pages
03 - Simple Linear Regression
No ratings yet
03 - Simple Linear Regression
13 pages
BES - Lecture 10 - Simple Linear Regression
No ratings yet
BES - Lecture 10 - Simple Linear Regression
15 pages
Monika Project
No ratings yet
Monika Project
34 pages
R Unit 4th and 5th
No ratings yet
R Unit 4th and 5th
17 pages
Module01 LinearRegression
No ratings yet
Module01 LinearRegression
41 pages
Exam 1 Notes
No ratings yet
Exam 1 Notes
4 pages
Linear Regression
100% (2)
Linear Regression
28 pages
Chapter 3 MLR
No ratings yet
Chapter 3 MLR
40 pages
Regression Analysis Using R
No ratings yet
Regression Analysis Using R
17 pages
Regression Models - Follow
No ratings yet
Regression Models - Follow
7 pages
Simple Regression Model Fitting
No ratings yet
Simple Regression Model Fitting
5 pages
SM Notes 2020
No ratings yet
SM Notes 2020
139 pages
Linear Regression
100% (2)
Linear Regression
228 pages
Module01.1 LinearRegression
No ratings yet
Module01.1 LinearRegression
32 pages
Weatherwax Weisberg Solutions
No ratings yet
Weatherwax Weisberg Solutions
162 pages
Multiple Regression
100% (1)
Multiple Regression
21 pages
Classical Machine Learning: Linear Regression: Ramesh S
No ratings yet
Classical Machine Learning: Linear Regression: Ramesh S
28 pages
Evans Analytics2e PPT 08
No ratings yet
Evans Analytics2e PPT 08
65 pages
Experiment No.8 - Fit Simple Linear Regression Models Using Built-In Functions.
No ratings yet
Experiment No.8 - Fit Simple Linear Regression Models Using Built-In Functions.
8 pages
Chapter 5. Regression Models: 1 A Simple Model
No ratings yet
Chapter 5. Regression Models: 1 A Simple Model
49 pages
Unit 5
No ratings yet
Unit 5
104 pages
Exam Pa Note
No ratings yet
Exam Pa Note
73 pages
Linear Regression With LM Function, Diagnostic Plots, Interaction Term, Non-Linear Transformation of The Predictors, Qualitative Predictors
100% (1)
Linear Regression With LM Function, Diagnostic Plots, Interaction Term, Non-Linear Transformation of The Predictors, Qualitative Predictors
15 pages
Lab-3: Regression Analysis and Modeling Name: Uid No. Objective
No ratings yet
Lab-3: Regression Analysis and Modeling Name: Uid No. Objective
9 pages
Statistical Testing and Prediction Using Linear Regression: Abstract
No ratings yet
Statistical Testing and Prediction Using Linear Regression: Abstract
10 pages
Intro To Regresion: Codergirl Data Analysis
No ratings yet
Intro To Regresion: Codergirl Data Analysis
32 pages
Chapter 14
No ratings yet
Chapter 14
15 pages
Ms 236 N 0
No ratings yet
Ms 236 N 0
63 pages
Which Test When: 1 Exploratory Tests
No ratings yet
Which Test When: 1 Exploratory Tests
5 pages
Data Analysis With Microsoft Excel
92% (24)
Data Analysis With Microsoft Excel
532 pages
Test de Fluidez Verbal
No ratings yet
Test de Fluidez Verbal
47 pages
CHANDRA
No ratings yet
CHANDRA
8 pages
Econometrics 1 Handbook 1
No ratings yet
Econometrics 1 Handbook 1
201 pages
Chapter 4 Estimation Theory
0% (1)
Chapter 4 Estimation Theory
40 pages
Mkt3mre Spss Workshops
No ratings yet
Mkt3mre Spss Workshops
111 pages
Regression Modelling With Actuarial and Financial Applications - Key Notes
No ratings yet
Regression Modelling With Actuarial and Financial Applications - Key Notes
3 pages
Assignment 6 Regression Solution PDF
No ratings yet
Assignment 6 Regression Solution PDF
5 pages
Intermediate Regression With Statsmodels in Python
No ratings yet
Intermediate Regression With Statsmodels in Python
129 pages
Estimation of Errors in Gas Meters
No ratings yet
Estimation of Errors in Gas Meters
59 pages
322634447
No ratings yet
322634447
285 pages
23mba0045 CB (Da)
No ratings yet
23mba0045 CB (Da)
27 pages
Triacylglycerol Analysis of Fats and Oils by Evaporative Light Scattering Detection
No ratings yet
Triacylglycerol Analysis of Fats and Oils by Evaporative Light Scattering Detection
7 pages
CORE Stat and Prob Q3 Mod7 W7 Defining Sampling Distribution of - The Sample Mean For Normal Population
No ratings yet
CORE Stat and Prob Q3 Mod7 W7 Defining Sampling Distribution of - The Sample Mean For Normal Population
16 pages
Econ 2220 Lecture 5
No ratings yet
Econ 2220 Lecture 5
26 pages
02methods Practice
No ratings yet
02methods Practice
75 pages
Widjojo A.prakoso Vol.18 No.3
No ratings yet
Widjojo A.prakoso Vol.18 No.3
8 pages
Ahu Persembe - The Effects of Foreign Direct Investment in Turkey On Export Performance
100% (1)
Ahu Persembe - The Effects of Foreign Direct Investment in Turkey On Export Performance
20 pages
Mindanao State University General Santos City: Simple Linear Regression
No ratings yet
Mindanao State University General Santos City: Simple Linear Regression
12 pages
3D Affine Coordinate Transformations: Constantin-Octavian Andrei
No ratings yet
3D Affine Coordinate Transformations: Constantin-Octavian Andrei
63 pages
FM3 SEND Week 5 - Online Quiz Assessment (FINISHED)
No ratings yet
FM3 SEND Week 5 - Online Quiz Assessment (FINISHED)
10 pages
KAREN JOY MAGSAYO - Final Exam
No ratings yet
KAREN JOY MAGSAYO - Final Exam
17 pages
Logistic Regression: Multivariate Analysis
No ratings yet
Logistic Regression: Multivariate Analysis
29 pages
Multiple Linear Regression
No ratings yet
Multiple Linear Regression
34 pages
The Comparative Politics of Corruption: Accounting For The East Asian Paradox in Empirical Studies of Corruption, Growth and Investment
No ratings yet
The Comparative Politics of Corruption: Accounting For The East Asian Paradox in Empirical Studies of Corruption, Growth and Investment
19 pages
ETC1000 Exam Sem1 2017 PDF
No ratings yet
ETC1000 Exam Sem1 2017 PDF
11 pages
AI-GEOSTATS - The Central Information Server For Geostatistics and Spatial ST
No ratings yet
AI-GEOSTATS - The Central Information Server For Geostatistics and Spatial ST
13 pages
Cheat Sheet For Test 4 Updated
No ratings yet
Cheat Sheet For Test 4 Updated
8 pages
Explore Kelompok: 1. Uji Normalitas Data Luas Luka Bakar
No ratings yet
Explore Kelompok: 1. Uji Normalitas Data Luas Luka Bakar
5 pages

Linear Model

Uploaded by

Linear Model

Uploaded by

What is Statistical Modelling?

Statistical Modelling Techniques

Types of Statistical Models in R

Linear Regression Examples:

2. Academic Performance: Estimating a student’s final grade based on continuous predictors

model <- lm(mpg ~ hp + drat + wt, data = mtcars)

#view model summary

lm(formula = mpg ~ hp + drat + wt, data = mtcars)

Min 1Q Median 3Q Max

-3.3598 -1.8374 -0.5099 0.9681 5.7078

Estimate Std. Error t value Pr(>|t|)

(Intercept) 29.394934 6.156303 4.775 5.13e-05 ***

hp -0.032230 0.008925 -3.611 0.001178 **

drat 1.615049 1.226983 1.316 0.198755

wt -3.227954 0.796398 -4.053 0.000364 ***

mpg = 29.39 – .03*hp + 1.62*drat – 3.23*wt

For each predictor variable, we’re given the following values:

Assessing Model Fit

Residual standard error: 2.561 on 28 degrees of freedom

Multiple R-squared: 0.8369, Adjusted R-squared: 0.8194

F-statistic: 47.88 on 3 and 28 DF, p-value: 3.768e-11

ANCOVA (Analysis of Covariance) extends ANOVA by incorporating continuous covariates to account

To perform an ANOVA test in R, you can use the

ANOVA tests are of two types:

Step 1: Create the Data

Here’s how to interpret every value in the output:

Df Residuals: The degrees of freedom for the residuals. This is calculated as

glm() : function is used to perform

This function uses the following syntax:

glm(formula, family=gaussian, data, …)

 formula: The formula for the linear model (e.g. y ~ x1 + x2)

# Load the dataset

# Fit a logistic regression model

model <- glm(am ~ disp + hp, data = mtcars, family = binomial)

# View the model summary

glm(formula = am ~ disp + hp, family = binomial, data = mtcars)

Min 1Q Median 3Q Max

-1.9665 -0.3090 -0.0017 0.3934 1.3682

Estimate Std. Error z value Pr(>|z|)

(Intercept) 1.40342 1.36757 1.026 0.3048

disp -0.09518 0.04800 -1.983 0.0474 *

hp 0.12170 0.06777 1.796 0.0725 .

(Dispersion parameter for binomial family taken to be 1)

Null deviance: 43.230 on 31 degrees of freedom

Residual deviance: 16.713 on 29 degrees of freedom

Number of Fisher Scoring iterations: 8

To determine if a model is “useful” we can compute the Chi-Square statistic as:

X2 = Null deviance – Residual deviance

with p degrees of freedom.

Maximum Likelihood model

#fit full model

model_full <- lm(mpg ~ disp + carb + hp + cyl, data = mtcars)

#fit reduced model

model_reduced <- lm(mpg ~ disp + carb, data = mtcars)

fc #perform likelihood ratio test for differences in models

Likelihood ratio test

Model 2: mpg ~ disp + carb

#Df LogLik Df Chisq Pr(>Chisq)

2 4 -78.603 -2 2.0902 0.3517

Since this library(lmtest)

#fit full model

model_full <- lm(mpg ~ disp + carb, data = mtcars)

#fit reduced model

model_reduced <- lm(mpg ~ disp, data = mtcars)

#perform likelihood ratio test for differences in models

Likelihood ratio test

Model 1: mpg ~ disp + carb

Model 2: mpg ~ disp

#Df LogLik Df Chisq Pr(>Chisq)

2 3 -82.105 -1 7.0034 0.008136 **

You might also like

mpg = 29.39 – .03hp + 1.62drat – 3.23*wt