0% found this document useful (0 votes)

16 views26 pages

ST201 Project Report 2023 Mark72

The document describes a study that uses linear regression to analyze factors related to anxiety levels during the COVID-19 pandemic. Survey responses are used as data to build models identifying key predictors of variance in anxiety. Stepwise selection identified that older age and higher COVID risk relate to anxiety levels, though the effect of age is ambiguous. Models with 10 variables minimized criteria for model selection.

Uploaded by

cweqing

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

16 views26 pages

ST201 Project Report 2023 Mark72

Uploaded by

cweqing

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 26

ST201 Project Coversheet

Project Title: Studying Generalized Anxiety Levels During COVID-19 Using Linear
Regression Analysis

Word count: 2483

Candidate IDs: 45448 46201 44320
Date: 4 May 2023

Permission to use as an example

We give permission for my assignment to be used as an example in ST201 (in an electronic
form).
Abstract
Many recent concerns were raised regarding COVID-19 related mental health
issues. This study utilizes responses to a European survey as data to build linear
regression model to identify driving factors of variance in anxiety levels during the
pandemic. Stepwise feature selection was used to identify key features from many
available predictors. Despite increased vulnerability to the disease, older age is found to
be related to lower anxiety levels during the pandemic. Health and life risk of COVID-
19 is positively related to anxiety level. Whether the effect vary between age groups is
ambiguous.

Introduction

The COVID-19 pandemic and its related large-scale mandates and restriction policies put
heavy tolls on minds of many individuals. Saeed et al. (2022) found that pandemic-related
anxiety has become especially prevalent in healthcare workers, students, parents, and teachers.
High covid Risk groups and professions are shown to be more prone to anxieties and even
mental illnesses due to prolonged exposure and high perceived risks in a study in 2020 (Walton
et al.) Recent studies in 2021 also showed that people of old age face much higher pandemic
related difficulties, not only due to increased risks of harsh symptom if the virus, but also
because they are less adaptive in their lifestyle and can face increased difficulties when travel
restrictions are imposed. (Lebrasseur et al.)

This study aims at identifying 1) the extent at which age is associated with pandemic
related anxieties, 2) the extent at which Covid risk is related to people’s anxieties, and whether
the relationship vary across age groups, and 3) key factors that influences people’s anxiety
during the pandemic in general. This will be done through building linear regression models and
using model building techniques such as feature selections to identify key predictors regarding
generalized anxiety scores. Inference will be made from the linear regression model regarding
whether the data shows that the predictors of interest are of high significance.
Data and Exploratory Analysis

Dataset
The dataset used contains 1115 observations of participants in a European survey on the COVID-
19 pandemic. The study records demographics information such as sex, age, and education of the
participants, and their survey responses to the extent at which the COVID-19 pandemic has posed
difficulties to different aspects of their lives. (See Appendix A for difficulties corresponding to the
pandemic difficulties variables)

Data type Variable

Numeric Age, Pandemic_Difficulties_*(All), Covid_risk, Social_support

Factor Sex, Education, IncomeContinuity, HealthStatus, Unemployed, Student

Outcome variable Gad_score (numeric)

Descriptive statistics

Table 1 | Descriptive Statistic for numeric variables. The youngest participant is 18 yearls old and
the oldest is 85 years old. Pandemic_Difficulties_* have been renamed to PD*. There are no missing
data. The Gad_score variable is standardized, with mean 0 and standard deviation around 1.
Table 2 | Descriptive Statistic for categorical/dummy variables. Sex0= Male, Sex1=Female. around 1.
Dummy variables are generated for each category of Education and HealthStatus, both originally stored as
integers. Education2 = Vocational education, Education3 = Secondary education, Education4 = Post-
secondary education, Education5 = University education and above. Ommited group for education is primary
education. HealthStatus3 = No pre-existing health condition, HealthStatus 2 = don’t know, HealthStatus
omitted group = Has pre-existing health condition. IncomeContinuity have 391 missing values and NAs are
generated and used as a category.

Exploratory Data Analysis

A correlation matrix was constructed between variables with numeric values (Appendix
A) to address concerns with perfect or near-perfect collinearity in the data. Namely, high levels
of correlations are expected between values for different types of pandemic anxiety, as an
individual who faces more difficulties in one aspect of the pandemic are expected to be more
prone to other pandemic difficulties. The correlation matrix yielded positive pair-wise
correlations between all pandemic difficulty levels as expected, but no pair-wise correlations
exceeded 0.50 between the numeric variables. As a result, multicollinearity was not of particular
concern in the regression analysis.

One other concern was that pandemic difficulties record discrete responses, and the true
functional form might not be linear in the pandemic difficulty value, then perhaps generating
dummies for each level of difficulty (e.g., PD1==1, PD1==2, etc) better captures the true
functional form. Plotting Gad_score against different pandemic difficulties (Appendix B)
showed that the increment at which Gad_score increased with each discrete jump (from 1 to 2,
from 2 to 3, etc) in pandemic difficulties are mostly constant throughout.

Continuous variables were plotted against Gad_score (Figure 1) to check functional form
assumptions. For covid risk, age, and social support (Appendix C), there seem to be linear
relationships with Gad_score. The signs (positive for covid risk, negative for a) were expected.
There might exist potential outliers for participants with very high age (Figure 1, b).
Nonetheless, the high variance can be induced by the lack of observations for individuals beyond
70 years of age paired with high variance of anxiety score for any age group.

a b

Figure 1 | Relationship between Covid_risk/Age and anxiety score.

a, Mean anxiety score (Gad_score) plotted against covid_risk. For easier visualization the mean
value for all Gad_scores in each Covid_risk group is plotted using group_by.mean() in R.
b, Mean anxiety score plotted against Age. Similarly, the mean Gad_score is generated for each
age and plotted.
There is not enough reason to believe functional forms with higher degree polynomials
exist in the population. Therefore, only linear terms of those variables will be included in the first
stage of model building. The potential for the true function to contain higher polynomials is not
dismissed as pair-wise scatterplots in Figure 1 do not control for other features. If residual plots
during model building yield different results, then adding polynomial regression will be
considered.
Regression Analysis and Results

Initializing the model

The model building process began with an initial linear regression model (Appendix E)
containing all available features in the dataset. All variables were included because for each
independent variable in the dataset there exist arguments that they should affect the generalized
anxiety level. The initial linear regression model yielded a 𝑅 2 value of 0.36, so 36% of the
variations in generalized anxiety level were explained by the model. However, the 𝑅 2 value is
likely inflated as 30 regressors were included in this initial model. With the OLS approach,
adding more regressors in a linear regression always yields higher 𝑅 2 . The adjusted 𝑅 2 value,
0.336, is more indicative of the model’s performance.
Most coefficients in the initial regression were not statistically significant at the 5% level.
It is however unwise to remove all variables that are not statistically significant, as 1) a set of
variables can appear as not statistically significant due to multicollinearity, and 2) some variables
may appear statistically significant by chance. Therefore, stepwise feature selection was used to
reduce the number of features and add interpretability to the model. Feature selection can also
prevent overfitting and allows better predictions for new data.
Feature Selection
Forward and backward selection was used to reduce the number of features. (Appendix
E) Both forward and backward selection are stepwise selection methods that aim to produce the
model that best fits the data with a reduced number of features.
a b

Figure 2 | Adjusted 𝑹𝟐 and BIC of models of different numbers of features used in stepwise selection.
a, Forward stepwise selection. The red line represents BIC score, and blue line represents adjusted 𝑅 2 at different
number of features. As shown with respective dotted lines, Here n=10 minimizes BIC and n=18 maximizes
adjusted 𝑅 2
b, Backward stepwise selection. Same with forward selection, n=10 also minimizes the BIC score. n=17 now
maximizes adjusted 𝑅 2

For stepwise feature selection, an approximation for the best set of features (best subset
selection is required for the strictly best set of features) is given for every choice of numbers of
features. BIC (Bayesian Information Criterion)1 score is used to find the best model among 30
different models were returned for forward and backward feature selection. The BIC penalizes
model inaccuracy and high number of features in the model. For both forward and backward
selection BIC for each number of features were calculated (Figure 2). 𝑛 = 10 minimizes the BIC
for both forward and backwards selection. Using adjusted 𝑅 2 as the criterion for choosing the
model would result in a model with more parameters and reduced interpretability, as shown in
figure 2. When 𝑛 = 10, forward and backward stepwise selection offers the same linear
regression model, which also saves the effort of choosing between the two models. (See Figure 3
for the model until this stage)

Figure 3 | Residual plots of linear regression model after feature selection.

The residual plots show that the linear functional form is suitable for most variables except for
Pandemic_Difficulties_10 and Pandemic_Difficulties_11. Second degree polynomial terms can be used for
those variables to better capture the functional form for these variables.

Examining effects of covid risk on different age groups: adding interactions

By imposing a functional form that has interactions on the linear regression model,
association of regressors on generalized anxiety levels is allowed to be dependent on other
regressors. Namely, whether the association between covid risk and generalized anxiety levels

1
1
𝐵𝐼𝐶 = (𝑅𝑆𝑆 + log(𝑛) 𝑑𝜎̂ 2 ) , where 𝜎̂ 2 is the estimate for the variance of the error term, 𝑛 is the number of
𝑛
observations, and 𝑑 is the number of features.
depend on a person’s age is of particular interest. Nonetheless, interactions of age with all other
variables will be added to begin with. After this, feature selection will be performed again with
the same methods to filter out interaction terms that do not help achieve better model fits
significantly.
Checking residual plots and adding second degree polynomial terms
Residual plots were constructed to ensure the most suitable functional form is used in the
linear regression model. (Appendix F) The linear functional form seemed suitable for all features
with exception for Pandemic_difficulties_10 and Pandemic_difficulties_11, for which the
quadratic functional form seemed more suitable. (figure 3) Residual plots for continuous
variables (Age, Social_support, and Covid_risk) supports the previous assumption that a linear
functional form for those variables is sufficient.
Since both Pandemic_difficulties_10 and Pandemic_difficulties_11 have discrete values
(survey responses between 1, 2, 3, and 4), arguments can be made for transforming those
variables into categorical/dummy variables. However, after a dummy variable is created for each
response, some of those variables of less statistical significance will be eliminated during
stepwise feature selection, which leaves an incomplete set of response variables in the final
model. Such a model may be better at generating predictions but will be less intuitive in giving
interpretations.
Final model presentation

Table 3 | Models reached after stepwise selection at different stages.

The coefficients (standard error in parenthesis) suggest the direction and strength of the association
between the variables and Gad_score, the dependent variable. Asterisks indicate statistical significance,
with more asterisks indicating greater significance. For models with all features see Appendix G.

In figure 3, model (1) is the model obtained after stepwise feature selection from OLS
with a linear functional term of all variables and with no interaction terms. After adding
interaction terms of Age with all other variables, model (2) and (3) are obtained through forward
and backward feature selection respectively and yielded very distinct choices on features. That
said, model (2) has relatively more intuitive interpretations in coefficients, thus terms of
Pandemic_Difficulties_10 and Pandemic_Difficulties_11 are further added to arrive at model
(4).
Assessing model accuracy with cross validation

*Final linear model for inference

Table 4 | Cross-validation mean squared errors for tested linear models.
The coefficients (standard error in parenthesis) suggest the direction and strength of the association
between the variables and Gad_score, the dependent variable. Asterisks indicate statistical significance,
with more asterisks indicating greater significance. For models with all features see Appendix G.

Cross-validation tests were carried out to examine the prediction accuracy of linear
models in the model building process. Mean squared error of cross-validation test sets are
computed at 𝑘 = 5, 𝑘 = 10, and 𝑘 = 1115 (LOOCV) respectively. (Table 4) The initial linear
model with all predictors (OLS) has marginally higher cross-validation MSEs across the board
when compared to the feature selection model based on the same set of variables. When new
interaction terms with the Age predictor were added (‘OLS+interaction terms’, Table 4) the MSE
increased for CV, suggesting overfitting of the test data. Once feature selection was used again,
there is a larger drop in MSE, which implies significance of adding the interactions and perhaps
collinearities in the interacted terms.
Figure 4 provides visual comparison between cross-validation performance of the linear
models. Models with less predictors generally outperformed those with more predictors. The
feature selected model with interaction and polynomial terms (model (4) in Table 3) has the best
test performance.

Figure 4 | Cross-validation mean squared errors for tested linear models at k=5, k=10, and k=1115
From left to right are CV MSE of iterations of the linear regression model. The vertical axis does not begin at
zero but 0.6 to show better comparisons between model performance. The true variation in MSE of the models
are less significant than the figure may suggest.
Discussion and Limitations

Potential non-included cofounding variables

The goal with the linear model in this paper is to explain the relationship between
generalized anxiety with COVID-19 and the available predictors. One sign that there might exist
some non-included cofounding variables is that all pandemic difficulty levels show high pair-
wise correlation. (See appendix A) Some non-included cofounding variable can be causing the
variation the different aspects of pandemic difficulties at the same time. For instance, a person
with low income could be subjected to increased difficulties in both 1) Pandemic_Difficulties_3:
increased number of daily duties and 2) Pandemic_Difficulties_10: Feeling of uncertainty,
unpredictability of the situation.

Missing data / data structure limitations

The only missing data in the raw dataset were 391 missing values for IncomeContinuity.
For the regression a separated category of IncomeContinuity was created to account for the
missing values. Note that IncomeContinuity as a predictor was filtered out by feature selection in
the final models, which could be due to its missing values.
The dataset offers a range of numeric predictors yet most of those predictors are discrete
integers. In a linear model, for each of those discrete predictors (16 of which being
Pandemic_Difficulties responses) strong assumptions had to be made on discrete jump having
the same marginal association on anxiety level, which likely does not hold. For this study it is
assumed that the relationship is linear, apart from quadratic associations with two of said
predictors.

Trade-off between prediction and interpretation

The linear regression model in this paper is built for primarily inference and giving policy
suggestions. Therefore, a large degree of model performance in prediction was sacrificed in
favor of better interpretability. A model that has better test performance most likely features
dummy variables for each discrete response to Pandemic_Difficulties. A better prediction model
would also have more, if not all available interaction terms. For this paper it has been considered
to add those extra variables, however, using this approach along with feature selection will leave
out some predictors (whether random interaction terms or one isolated dummy such as
Pandemic_Difficulties_1==4) that have no good interpretations. Nonetheless, the model in this
paper can achieve a better test performance while having a reduced number of features and
relatively easy inference.
Interpretation & Conclusions

Interpretation of the results

Association between people’s generalized anxiety and their age:

Models (3) and (4) from Table 3 can be used to explain the association between age and
people’s generalized anxiety. Both models identified a negative relationship, with coefficients of
-0.028 for model (3) and -0.018 for model (4). The coefficients are statistically significant at the
1% level, meaning that assuming no association between age and generalized anxiety, the results
would be within the 1% most surprising outcomes. A coefficient of -0.018 for age in model (4)
suggests that each additional year of age of the person is associated with a 0.018 standard
deviation decrease in predicted anxiety scores. Additionally, the association between age and
anxiety levels depend strongly how much different pandemic difficulties are experienced by the
person. For instance, for people who face very high Pandemic_Difficulties_11 (Feeling of
danger, anxiety associated with the spread of the virus) additional year of age will not but
associated with as much decrease in generalized anxiety scores. Linear model (3) features even
more interaction terms between age and pandemic difficulties. It can nevertheless be said from
model (3) that, higher covid risk and more pandemic difficulty seem to undermine the degree at
which additional age is associated with lower anxiety scores.

Relationship between perceived health and life risks of COVID-19 and people’s mental health
status

Observing the coefficient of Age:Covid_risk in model (3) and Covid_risk in model (4) in
Table 3, the linear regression model suggests that higher perceived health and life risks of
COVID-19 is generally associated with higher anxiety, thus worse mental health status on
average.

Comparing model (3) to model (4), it is ambiguous whether age plays a role in how covid
risk is associated with anxiety scores. Observing model (3), the coefficient for interaction term
between covid risk and age (Age:Covid_risk) is statistically significant at 0.003. Thus, the
association between covid risk and anxiety scores seem to depend on age, with each additional
year of age contributing to a 0.003 increase in how standard deviation anxiety score respond to
one additional point of covid risk. For instance, one additional point of covid risk score would
increase that anxiety score by 0.003 ∗ 48 = 0.144 standard deviations for a 48-year-old, while
an 18-year-old should see only a 0.003 ∗ 18 = 0.054 standard deviation increase. (The base
covid_risk term is not present in model (3)) Unfortunately, it cannot be said for certain whether
age does or does not play a role in the relationship between covid risk and people’s anxiety, as
the linear regression model provides a valid argument for when either one is assumed.

Policy suggestions

Model (4) in Table 3 shows that among the most significant pandemic difficulties are
Pandemic_Difficulties_5 (Feeling that freedom as a human is restricted),
Pandemic_Difficulties_10 (Feeling of uncertainty, unpredictability of the situation),
Pandemic_Difficulties_11 (Feeling of danger, anxiety associated with the spread of the virus),
Pandemic_Difficulties_13 (Boredom, monotony), and Pandemic_Difficulties_16 (Feeling lost in
the recommendations and restrictions), out of which Pandemic_Difficulties_10 and
Pandemic_Difficulties_11 is negatively associated with anxiety levels. One likely explanation is
that ceterus paribus of covid risk changes, higher attention to the spread of the virus might be
associated with increased vigilance of the situation, which correlates with less anxiety level
overall.
If policies were to be implemented in effort to reduce anxiety levels during the pandemic,
they shall be aimed at 1) decreased restrictions to freedom 2) subsidizing entertainment to reduce
boredom, and 3) give clearer and more available recommendations, restrictions, and instructions.
The age groups which the policy effects should also be considered, as the model shows that
effects of Pandemic_Difficulties_13 (Boredom, monotony) for instance is more prevalent at
younger ages. Therefore, policy makers can choose to subsidize a form of entertainment for
which people of younger age are especially affected, for example, the video game industry.
While Covid-mandating policies (mask mandates, travel restrictions) might correlate with higher
pandemic difficulties as well, policy makers can aim at enforcing penalties on spreading
misinformation and fund public infographics to reduce unnecessary fear about the virus such that
perceived covid risks are lower.
References

Lebrasseur, Audrey, et al. “Impact of the COVID-19 Pandemic on Older Adults: Rapid Review.”
JMIR Aging, vol. 4, no. 2, 2021, doi:10.2196/26474.

Saeed, Hafsah, et al. “Anxiety Linked to Covid-19: A Systematic Review Comparing Anxiety
Rates in Different Populations.” International Journal of Environmental Research and
Public Health, vol. 19, no. 4, 2022, p. 2189., doi:10.3390/ijerph19042189.

Walton, Matthew, et al. “Mental Health Care for Medical Staff and Affiliated Healthcare
Workers during the COVID-19 Pandemic.” European Heart Journal: Acute
Cardiovascular Care, vol. 9, no. 3, 2020, pp. 241–247., doi:10.1177/2048872620922795.
Appendix A. Correlation matrix of continuous variables in the raw dataset
Appendix B. Distribution histogram of Gad_score
Appendix C. Mean anxiety score level plotted against each level of social support received.
Appendix D. Mean anxiety score at different levels of pandemic difficulties
Appendix E. Linear regression model building with models featuring all predictors included.

Applied Longitudinal Analysis. ISBN 0470380276, 978-0470380277
100% (26)
Applied Longitudinal Analysis. ISBN 0470380276, 978-0470380277
23 pages
Pset 7 - Fall2019 - Solutions PDF
50% (2)
Pset 7 - Fall2019 - Solutions PDF
35 pages
CS 4407 Data Mining and Machine Learning - Term 3, 2018-2019
No ratings yet
CS 4407 Data Mining and Machine Learning - Term 3, 2018-2019
11 pages
Tentamen #1 - Data Analytics and Visualization - 2020-2021
No ratings yet
Tentamen #1 - Data Analytics and Visualization - 2020-2021
6 pages
SUSS BSBA: BUS105 Jan 2021 TMA01 Answers
No ratings yet
SUSS BSBA: BUS105 Jan 2021 TMA01 Answers
5 pages
Final Practice Questions Answers
No ratings yet
Final Practice Questions Answers
11 pages
HM 802 730 Clements - Problem Set 5 - ANSWERS
No ratings yet
HM 802 730 Clements - Problem Set 5 - ANSWERS
5 pages
Test Bank Questions Chapter 5
No ratings yet
Test Bank Questions Chapter 5
5 pages
Overview Of Bayesian Approach To Statistical Methods: Software
From Everand
Overview Of Bayesian Approach To Statistical Methods: Software
Vinaitheerthan Renganathan
No ratings yet
Discussion
No ratings yet
Discussion
14 pages
Psych Stats Worksheets DONE
No ratings yet
Psych Stats Worksheets DONE
8 pages
GMU Econ535-Applied Econometrics Problem Set2 (PS2) Solutions Spring 2024
No ratings yet
GMU Econ535-Applied Econometrics Problem Set2 (PS2) Solutions Spring 2024
14 pages
ESB2021 Resit With Solution
No ratings yet
ESB2021 Resit With Solution
9 pages
Mulia Edit
No ratings yet
Mulia Edit
11 pages
Myria Et Al
No ratings yet
Myria Et Al
23 pages
Data Analysis Using SPSS
No ratings yet
Data Analysis Using SPSS
4 pages
Mulia Edit
No ratings yet
Mulia Edit
13 pages
Modelling COVID-19 Outbreak: Segmented Regression To Assess Lockdown Effectiveness
No ratings yet
Modelling COVID-19 Outbreak: Segmented Regression To Assess Lockdown Effectiveness
5 pages
Econ Covid-19 Questions
No ratings yet
Econ Covid-19 Questions
9 pages
Introduction To Non Parametric Methods Through R Software
From Everand
Introduction To Non Parametric Methods Through R Software
Editor IJSMI
No ratings yet
Group 4
No ratings yet
Group 4
9 pages
STT 215 Exam 1 Example
No ratings yet
STT 215 Exam 1 Example
5 pages
Fitting Lines To Data Points: What Are First Differences?
No ratings yet
Fitting Lines To Data Points: What Are First Differences?
8 pages
S24 hw6
No ratings yet
S24 hw6
14 pages
Output Part5
No ratings yet
Output Part5
5 pages
Logistic Regression
No ratings yet
Logistic Regression
41 pages
Binary Logistic Regression Using Stata 17 Drop-Down Menus
No ratings yet
Binary Logistic Regression Using Stata 17 Drop-Down Menus
53 pages
Public Health, Health Economics, Regression Analysis
No ratings yet
Public Health, Health Economics, Regression Analysis
22 pages
Ho - Ancova Example
No ratings yet
Ho - Ancova Example
3 pages
Exam Solutions
No ratings yet
Exam Solutions
7 pages
Unit 540 Differences Between Two Groups With Answers
No ratings yet
Unit 540 Differences Between Two Groups With Answers
8 pages
GMU Econ535-Applied Econometrics Final Exam Spring 2023 Solutions
No ratings yet
GMU Econ535-Applied Econometrics Final Exam Spring 2023 Solutions
13 pages
Stata OLS Regression Example
No ratings yet
Stata OLS Regression Example
21 pages
655 656bridge
No ratings yet
655 656bridge
23 pages
Applied Statistics Final Note 2
No ratings yet
Applied Statistics Final Note 2
3 pages
ECON20003 S1 2024 Sample Exam
No ratings yet
ECON20003 S1 2024 Sample Exam
27 pages
Homework 3 Due Tues, June 9, 10 AM (PDF File Please) : Newsom Psy 523/623 Structural Equation Modeling, Spring 2020 1
No ratings yet
Homework 3 Due Tues, June 9, 10 AM (PDF File Please) : Newsom Psy 523/623 Structural Equation Modeling, Spring 2020 1
3 pages
Latihan Soal Utk UAS
No ratings yet
Latihan Soal Utk UAS
5 pages
Biostatistics Explored Through R Software: An Overview
From Everand
Biostatistics Explored Through R Software: An Overview
Vinaitheerthan Renganathan
3.5/5 (2)
(Original PDF) Real Stats Using Econometrics For Political Science and Public Policy Instant Download
100% (1)
(Original PDF) Real Stats Using Econometrics For Political Science and Public Policy Instant Download
45 pages
Unit 540 Differences Between Two Groups Without Answers
No ratings yet
Unit 540 Differences Between Two Groups Without Answers
5 pages
(Original PDF) Real Stats Using Econometrics For Political Science and Public Policy Instant Download
100% (8)
(Original PDF) Real Stats Using Econometrics For Political Science and Public Policy Instant Download
45 pages
CE 222 Economic Evaluation
No ratings yet
CE 222 Economic Evaluation
34 pages
Introduction To Predictive Modeling With Examples: David A. Dickey, N. Carolina State U., Raleigh, NC
No ratings yet
Introduction To Predictive Modeling With Examples: David A. Dickey, N. Carolina State U., Raleigh, NC
14 pages
Fouzia 1
No ratings yet
Fouzia 1
8 pages
Introduction To Business Statistics Through R Software: Software
From Everand
Introduction To Business Statistics Through R Software: Software
Editor IJSMI
No ratings yet
STATISTICS 8 CHAPTERS 1 TO 6 Solve
No ratings yet
STATISTICS 8 CHAPTERS 1 TO 6 Solve
8 pages
Regression I: Simple Regression: Class 21
No ratings yet
Regression I: Simple Regression: Class 21
54 pages
338457226
No ratings yet
338457226
176 pages
Exercise Before MID
No ratings yet
Exercise Before MID
5 pages
Research Questions Prior and Posterior Distributions: Bayesian Estimation
No ratings yet
Research Questions Prior and Posterior Distributions: Bayesian Estimation
1 page
Mathematical Modeling Using Linear Regresion
No ratings yet
Mathematical Modeling Using Linear Regresion
52 pages
Assignment 4
No ratings yet
Assignment 4
4 pages
19 STS728
No ratings yet
19 STS728
30 pages
DA R Assignment2
No ratings yet
DA R Assignment2
9 pages
ProblemSet Solution
No ratings yet
ProblemSet Solution
8 pages
DiD Regression
No ratings yet
DiD Regression
18 pages
Sociology: Intermediate Quantitative Research Method
No ratings yet
Sociology: Intermediate Quantitative Research Method
37 pages
ETF Sample Questions
No ratings yet
ETF Sample Questions
5 pages
Moderated Multiple Regression Using Process 'Multicategorical' Option With Categorical Moderator
No ratings yet
Moderated Multiple Regression Using Process 'Multicategorical' Option With Categorical Moderator
57 pages
Important Instructions To The Candidates:: Part B
No ratings yet
Important Instructions To The Candidates:: Part B
7 pages
Compaire Mean Assignments
No ratings yet
Compaire Mean Assignments
4 pages
Lecture 2 - Symmetric-Key-Encryption - Notes
No ratings yet
Lecture 2 - Symmetric-Key-Encryption - Notes
128 pages
Lecture 4 - PKE - Notes
No ratings yet
Lecture 4 - PKE - Notes
29 pages
Ben Fincham, Mark McGuinness, Lesley Murray (Eds.) - Mobile Methodologies-Palgrave Macmillan UK (2010)
No ratings yet
Ben Fincham, Mark McGuinness, Lesley Murray (Eds.) - Mobile Methodologies-Palgrave Macmillan UK (2010)
204 pages
Museum Object Lessons For The Digital Age: Post-Medieval Archaeology
No ratings yet
Museum Object Lessons For The Digital Age: Post-Medieval Archaeology
3 pages
Pengelolaan Program Ulangan Harian Bersama Uhb Seb
No ratings yet
Pengelolaan Program Ulangan Harian Bersama Uhb Seb
12 pages
Expt 1 - Curve Fitting
No ratings yet
Expt 1 - Curve Fitting
29 pages
Variable Selection 8.1 The Model Building Problem
No ratings yet
Variable Selection 8.1 The Model Building Problem
18 pages
Classical Linear Regression Model Assumptions and Diagnostics
No ratings yet
Classical Linear Regression Model Assumptions and Diagnostics
66 pages
Assignment
40% (5)
Assignment
9 pages
Repaso Econometria Final BUENO
No ratings yet
Repaso Econometria Final BUENO
88 pages
Spatial Econometrics
No ratings yet
Spatial Econometrics
57 pages
Forec
No ratings yet
Forec
6 pages
Advanced Forecasting Models Using Sas Software
No ratings yet
Advanced Forecasting Models Using Sas Software
10 pages
ANOVA Problems
100% (3)
ANOVA Problems
13 pages
Click The Link Below To Download - : Regression-And-Anova-10671280
100% (1)
Click The Link Below To Download - : Regression-And-Anova-10671280
81 pages
Teori
No ratings yet
Teori
13 pages
Improved Likelihood Inference in Beta Regression: Journal of Statistical Computation and Simulation
No ratings yet
Improved Likelihood Inference in Beta Regression: Journal of Statistical Computation and Simulation
14 pages
BA ZG524 Advanced Statistical Methods
No ratings yet
BA ZG524 Advanced Statistical Methods
7 pages
Pengaruh Pelatihan Dan Motivasi Kerja Te 1926c7b7
No ratings yet
Pengaruh Pelatihan Dan Motivasi Kerja Te 1926c7b7
8 pages
Firm Chair
No ratings yet
Firm Chair
13 pages
Zuur 2010
No ratings yet
Zuur 2010
12 pages
Spearman
No ratings yet
Spearman
9 pages
11-Simple Linear Regression
No ratings yet
11-Simple Linear Regression
25 pages
Diagnostics in R Commander
No ratings yet
Diagnostics in R Commander
2 pages
Data Analysis For Social Scientists (14.1310x)
No ratings yet
Data Analysis For Social Scientists (14.1310x)
12 pages
2019 GreatFacilitator ReviewMultivariate Sarstedt
No ratings yet
2019 GreatFacilitator ReviewMultivariate Sarstedt
9 pages
Sample Questions
No ratings yet
Sample Questions
5 pages
003-FIN7790 (Part2)
No ratings yet
003-FIN7790 (Part2)
162 pages
Week 4
No ratings yet
Week 4
5 pages
ANOVA of Unequal Sample Sizes
No ratings yet
ANOVA of Unequal Sample Sizes
7 pages
Assignment 01
No ratings yet
Assignment 01
2 pages
EE708 Module 3A
No ratings yet
EE708 Module 3A
28 pages

ST201 Project Report 2023 Mark72

Uploaded by

ST201 Project Report 2023 Mark72

Uploaded by

ST201 Project Coversheet

Word count: 2483

Permission to use as an example

Data type Variable

Numeric Age, Pandemic_Difficulties_*(All), Covid_risk, Social_support

Factor Sex, Education, IncomeContinuity, HealthStatus, Unemployed, Student

Outcome variable Gad_score (numeric)

Exploratory Data Analysis

Figure 1 | Relationship between Covid_risk/Age and anxiety score.

Initializing the model

Figure 3 | Residual plots of linear regression model after feature selection.

Examining effects of covid risk on different age groups: adding interactions

Table 3 | Models reached after stepwise selection at different stages.

*Final linear model for inference

Potential non-included cofounding variables

Missing data / data structure limitations

Trade-off between prediction and interpretation

Interpretation of the results

Association between people’s generalized anxiety and their age:

You might also like