0% found this document useful (0 votes)
47 views13 pages

Videos and Tutorials On Data Analysis in The Psychometrics Lab

The document provides instructions for analyzing psychometric data in R. It includes steps to export and clean data, install necessary packages, start an analysis script to load packages and data, recode items, calculate scale reliability and scores, and conduct descriptive statistics and regression analysis. Specifically, it discusses exporting data from Qualtrics, installing packages like tidyverse and psyntur, reading data into R from a CSV file, calculating Cronbach's alpha, aggregate scores, and performing regression on the cleaned data.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
47 views13 pages

Videos and Tutorials On Data Analysis in The Psychometrics Lab

The document provides instructions for analyzing psychometric data in R. It includes steps to export and clean data, install necessary packages, start an analysis script to load packages and data, recode items, calculate scale reliability and scores, and conduct descriptive statistics and regression analysis. Specifically, it discusses exporting data from Qualtrics, installing packages like tidyverse and psyntur, reading data into R from a CSV file, calculating Cronbach's alpha, aggregate scores, and performing regression on the cleaned data.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 13

Psychometrics Lab: Data Analysis

Export data and clean it


Install necessary packages
Start a script, load the packages
Read in your data
Recode the negatively key items
Calculate Cronbach’s alpha for each scale
Calculate the aggregate scores
Calculating descriptives
Regression analysis

Export data and clean it


You should do your analysis on cleaned-up data. In the following video, we explain what clean data should look
like, and how to export raw data from Qualtrics and clean it.

Install necessary packages


For this analysis, you need to have the following packages installed.

tidyverse
psyntur
car
lm.beta

Check your package listing to see if these are all installed. If not, install these package using the “Install” button in
the “Packages” tab in RStudio. Alternatively, you can do the following:

install.packages(c('tidyverse', 'psyntur', 'car', 'lm.beta'))


Make sure that you have up to date versions of these packages. In particular, make sure that the version of psyntur
is at least 0.1.0. Once installed, you can check your version number in the package listing in the “Packages” tab
(see column labelled “Version”). Alternatively, in R, you can use the command packageVersion to check package
versions. For example, to check the version of psyntur, do the following:

packageVersion("psyntur")

## [1] '0.1.0'

Start a script, load the packages

You should put all your analysis code in one R script. This script should contain all and only the code for the
analysis. In other words, everything you need to do every step of the analysis, including the reading of the data,
should be there, and there should be no unnecessary code. Keep your code clean and well organized. Use
“sections” in the script to organize your code into regions of similar code. Sections can be inserted using the “Insert
Section” item RStudio’s “Code” menu.

Once installed, you must then load the required packages using the library function as follows:

library(tidyverse)
library(psyntur)
library(car)
library(lm.beta)

Read in your data


With your data, which is should be a .csv file, you can read it into R by using the “Import Dataset” button, and
choosing “From Text (readr)”. However, this is not recommended. You should instead write a read_csv command
in your R script that reads in the data from a file. This is a much better option because you can always come back to
your script at a later point and re-run it, and your data will be read in from file without any manual intervention. In
order to use this read_csv simply and easily, you should move the data .csv file to your working directory, and
your script should be there too, and then use the read_csv command with the filename to read it in. For example, if
your .csv file is called psychometrics_lab_data.csv, first copy or move this file into R’s working directory.
You can find your working directory by typing the getwd() command. On my system, getwd() tells me that my
working directory is a folder called psychometrics in my home directory.

> getwd()
[1] "/home/andrews/psychometrics"

If I had a file named psychometrics_lab_data.csv and moved into this folder, then in my R script, I can do the
following:

lab_data <- read_csv("psychometric_lab_data.csv")

In general, whenever we read in a csv file into R using the command read_csv, the data is returned as an R data
frame. In the above example, I named this data frame lab_data. If I type lab_data, I will then see the data.
Alternatively, I were to type glimpse(lab_data), I will get a more useful view of it (see next example).

For the purposes of this guide, I will read in an example .csv file from a URL web address rather than a file on my
local computer, and I will give the data frame that is returned the name psymetr_df.

psymetr_df <- read_csv("http://data.ntupsychology.net/psychometrics_demo_data.csv")

Look at your data


Let us take a look at psymetr_df with glimpse.

glimpse(psymetr_df)

## Rows: 44
## Columns: 52
## $ gender <dbl> 1, 1, 2, 2, 2, 2, 2, 1, 1, 2, 1, 2, 2, 2, 1, 1, 1, 1, 2…
## $ age <dbl> 19, 22, 20, 20, 19, 21, 19, 18, 22, 23, 19, 18, 19, 21,…
## $ anxiety_1 <dbl> 1, 2, 2, 2, 0, 1, 2, 2, 1, 1, 2, 2, 3, 3, 2, 2, 1, 2, 1…
## $ anxiety_2 <dbl> 3, 3, 1, 2, 1, 2, 2, 2, 2, 2, 3, 1, 2, 2, 2, 1, 2, 2, 3…
## $ anxiety_3 <dbl> 1, 3, 2, 3, 1, 1, 3, 1, 1, 2, 2, 1, 2, 2, 1, 2, 3, 2, 2…
## $ anxiety_4 <dbl> 2, 2, 2, 2, 1, 2, 2, 2, 2, 1, 2, 1, 2, 2, 2, 1, 2, 3, 4…
## $ anxiety_5 <dbl> 2, 1, 2, 1, 1, 2, 2, 1, 1, 2, 3, 1, 3, 2, 2, 1, 3, 2, 1…
## $ anxiety_6 <dbl> 2, 3, 2, 2, 3, 3, 1, 2, 2, 3, 2, 2, 1, 2, 3, 3, 4, 2, 2…
## $ anxiety_7 <dbl> 2, 2, 2, 2, 3, 3, 0, 2, 2, 3, 2, 3, 1, 3, 2, 2, 2, 1, 2…
## $ anxiety_8 <dbl> 2, 3, 1, 2, 2, 2, 1, 2, 3, 3, 2, 2, 1, 3, 1, 1, 1, 1, 2…
## $ anxiety_9 <dbl> 2, 3, 2, 3, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 1, 3, 2, 2, 1…
## $ anxiety_10 <dbl> 1, 2, 1, 3, 3, 2, 2, 2, 3, 2, 2, 2, 1, 2, 1, 1, 2, 1, 1…
## $ depression_1 <dbl> 2, 2, 2, 3, 2, 1, 3, 4, 1, 2, 2, 1, 3, 3, 4, 2, 2, 4, 3…
## $ depression_2 <dbl> 1, 2, 2, 2, 2, 1, 3, 2, 2, 1, 1, 2, 1, 2, 1, 1, 2, 1, 2…
## $ depression_3 <dbl> 3, 3, 2, 3, 2, 2, 3, 2, 3, 3, 2, 1, 1, 2, 3, 2, 3, 2, 2…
## $ depression_4 <dbl> 1, 1, 4, 2, 1, 3, 3, 2, 2, 1, 1, 2, 3, 2, 1, 1, 2, 2, 3…
## $ depression_5 <dbl> 1, 3, 2, 3, 2, 1, 4, 2, 2, 2, 2, 1, 2, 2, 2, 2, 4, 2, 3…
## $ depression_6 <dbl> 1, 2, 3, 2, 1, 2, 2, 2, 1, 3, 3, 2, 1, 2, 3, 2, 3, 4, 3…
## $ depression_7 <dbl> 1, 1, 3, 2, 2, 3, 5, 2, 1, 1, 4, 1, 3, 2, 2, 4, 2, 4, 3…
## $ depression_8 <dbl> 3, 5, 2, 4, 4, 2, 2, 4, 3, 4, 2, 4, 2, 4, 4, 4, 3, 3, 4…
## $ depression_9 <dbl> 3, 4, 4, 4, 5, 4, 3, 4, 3, 4, 4, 4, 3, 3, 4, 2, 4, 1, 5…
## $ depression_10 <dbl> 2, 5, 2, 4, 5, 2, 4, 3, 2, 5, 3, 4, 2, 4, 4, 4, 1, 1, 5…
## $ efficacy_1 <dbl> 2, 1, 1, 2, 4, 3, 2, 3, 2, 3, 1, 2, 2, 2, 3, 3, 2, 3, 2…
## $ efficacy_2 <dbl> 2, 1, 3, 3, 4, 2, 2, 1, 3, 3, 2, 1, 2, 2, 2, 2, 2, 2, 3…
## $ efficacy_3 <dbl> 3, 1, 2, 3, 4, 2, 1, 1, 3, 1, 2, 2, 2, 1, 2, 2, 1, 1, 1…
## $ efficacy_4 <dbl> 3, 2, 2, 5, 3, 3, 1, 1, 3, 2, 1, 2, 2, 2, 3, 2, 2, 3, 2…
## $ efficacy_5 <dbl> 1, 1, 3, 4, 3, 3, 2, 1, 3, 3, 3, 2, 2, 1, 2, 2, 4, 3, 2…
## $ efficacy_6 <dbl> 2, 2, 2, 4, 3, 2, 2, 2, 3, 1, 1, 3, 3, 2, 2, 2, 2, 2, 1…
## $ efficacy_7 <dbl> 3, 4, 4, 4, 3, 2, 5, 5, 2, 4, 5, 4, 2, 5, 3, 3, 3, 5, 4…
## $ efficacy_8 <dbl> 3, 4, 5, 4, 1, 2, 3, 5, 4, 4, 5, 4, 2, 4, 3, 3, 3, 4, 3…
## $ efficacy_9 <dbl> 5, 2, 4, 4, 3, 3, 4, 4, 2, 3, 4, 4, 2, 3, 4, 4, 4, 2, 5…
## $ efficacy_10 <dbl> 4, 5, 4, 3, 5, 5, 3, 2, 3, 3, 5, 3, 4, 4, 5, 3, 4, 4, 4…
## $ sociability_1 <dbl> 1, 2, 4, 4, 4, 1, 2, 4, 5, 5, 4, 3, 3, 4, 2, 3, 3, 2, 5…
## $ sociability_2 <dbl> 1, 2, 1, 2, 4, 1, 2, 4, 4, 5, 5, 1, 3, 2, 5, 3, 3, 1, 5…
## $ sociability_3 <dbl> 4, 3, 1, 2, 3, 5, 3, 2, 2, 2, 3, 4, 4, 4, 2, 3, 1, 3, 4…
## $ sociability_4 <dbl> 1, 3, 2, 1, 3, 2, 3, 3, 3, 1, 4, 3, 2, 3, 1, 3, 2, 4, 3…
## $ sociability_5 <dbl> 3, 5, 5, 2, 2, 5, 3, 4, 2, 1, 4, 2, 5, 2, 2, 4, 3, 2, 4…
## $ sociability_6 <dbl> 2, 3, 4, 3, 2, 2, 4, 3, 1, 1, 1, 2, 2, 2, 1, 1, 3, 5, 3…
## $ sociability_7 <dbl> 4, 5, 5, 4, 3, 1, 5, 3, 4, 1, 4, 4, 5, 2, 2, 1, 2, 4, 1…
## $ sociability_8 <dbl> 1, 3, 4, 1, 1, 2, 2, 2, 4, 4, 1, 5, 5, 1, 2, 1, 2, 5, 3…
## $ sociability_9 <dbl> 2, 5, 5, 1, 3, 5, 5, 2, 3, 3, 1, 4, 4, 2, 4, 2, 2, 5, 4…
## $ sociability_10 <dbl> 3, 2, 4, 3, 3, 2, 4, 4, 1, 2, 3, 2, 4, 1, 3, 1, 4, 3, 1…
## $ stress_1 <dbl> 1, 4, 2, 1, 1, 0, 3, 2, 2, 2, 4, 2, 3, 1, 1, 0, 3, 4, 1…
## $ stress_2 <dbl> 2, 4, 1, 0, 1, 2, 3, 3, 2, 1, 3, 3, 2, 3, 3, 1, 1, 2, 4…
## $ stress_3 <dbl> 1, 2, 3, 1, 2, 2, 3, 1, 3, 2, 4, 3, 2, 1, 2, 0, 0, 4, 3…
## $ stress_4 <dbl> 4, 1, 0, 4, 4, 1, 2, 1, 4, 2, 1, 1, 0, 2, 3, 3, 3, 2, 0…
## $ stress_5 <dbl> 0, 2, 0, 3, 3, 4, 1, 0, 3, 3, 1, 0, 0, 0, 1, 4, 0, 0, 3…
## $ stress_6 <dbl> 0, 4, 3, 3, 1, 1, 3, 1, 2, 0, 4, 2, 3, 1, 3, 2, 1, 2, 2…
## $ stress_7 <dbl> 1, 0, 3, 2, 4, 2, 0, 0, 4, 2, 0, 4, 0, 1, 1, 2, 1, 1, 2…
## $ stress_8 <dbl> 0, 2, 3, 1, 2, 4, 2, 3, 3, 2, 0, 1, 1, 0, 1, 3, 0, 0, 3…
## $ stress_9 <dbl> 1, 2, 2, 1, 2, 3, 3, 1, 3, 0, 4, 0, 4, 3, 0, 1, 2, 3, 4…
## $ stress_10 <dbl> 0, 2, 2, 0, 0, 1, 4, 4, 1, 1, 3, 4, 3, 3, 3, 1, 1, 4, 2…

Note: As we can see, in this data, there is a consistent naming pattern for each item on each scale. For example, the
“anxiety” scale items are anxiety_1, anxiety_2, and so on, the “depression” scale items are depression_1,
depression_2, and so on. It is necessary for you to use a consistent naming scheme like this for your data.

Recode the negatively key items


In the case of my data, I have the following information about the range of values of the items and which items are
negatively keys.

anxiety: values 0-4; neg keyed = 6, 7, 8, 9, 10


depression: values 1-5; neg keyed = 8, 9, 10
efficacy: values 1-5; neg keyed = 7, 8, 9, 10
sociability: values 1-5; neg keyed = 4, 5, 6, 7, 8, 9, 10
stress: values 0-4; neg keyed = 4, 5, 7, 8

For your scales, you must get this information too.

In the following code, for each item that needs to be reverse coded, we have one line of code that names the item
and gives their original values and the new values to which they are mapped.

psymetr_df_fix <- mutate(psymetr_df,


anxiety_6 = re_code(anxiety_6, 0:4, 4:0),
anxiety_7 = re_code(anxiety_7, 0:4, 4:0),
anxiety_8 = re_code(anxiety_8, 0:4, 4:0),
anxiety_9 = re_code(anxiety_9, 0:4, 4:0),
anxiety_10 = re_code(anxiety_10, 0:4, 4:0),
depression_8 = re_code(depression_8, 1:5, 5:1),
depression_9 = re_code(depression_9, 1:5, 5:1),
depression_10 = re_code(depression_10, 1:5, 5:1),
efficacy_7 = re_code(efficacy_7, 1:5, 5:1),
efficacy_8 = re_code(efficacy_8, 1:5, 5:1),
efficacy_9 = re_code(efficacy_9, 1:5, 5:1),
efficacy_10 = re_code(efficacy_10, 1:5, 5:1),
sociability_4 = re_code(sociability_4, 1:5, 5:1),
sociability_5 = re_code(sociability_5, 1:5, 5:1),
sociability_6 = re_code(sociability_6, 1:5, 5:1),
sociability_7 = re_code(sociability_7, 1:5, 5:1),
sociability_8 = re_code(sociability_8, 1:5, 5:1),
sociability_9 = re_code(sociability_9, 1:5, 5:1),
sociability_10 = re_code(sociability_10, 1:5, 5:1),
stress_4 = re_code(stress_4, 0:4, 4:0),
stress_5 = re_code(stress_5, 0:4, 4:0),
stress_7 = re_code(stress_7, 0:4, 4:0),
stress_8 = re_code(stress_8, 0:4, 4:0)
)

Be careful with this code. Check every item to make sure the item name is correct and the original and new values
are correct.
Remember to assign results to new data frame. In the code above, after the recoding is done, the new data frame
produced is named psymetr_df_fix. We use this data frame from now on.

Calculate Cronbach’s alpha for each scale

For each scale, we want to calculate Cronbach’s alpha measure of internal consistency. We do this using the
cronbach function in the psyntur package.

Remember to use the data frame where the items have been recoded. For example, in my case, this is
psymetr_df_fix.

In the following calculations, we select the items for each scale using the start_with functions. This assumes that
all items for each scale begin with a common prefix, which they do in my case, as mentioned above. For example,
all the items on the stress scale begin with stress_, and all the items on the depression scale begin with
depression_, and so on. For each set of items that is selected, the cronbach function will return the estimate of the
\(\alpha\) coefficient and its 95% confidence interval.

cronbach(psymetr_df_fix,
anxiety = starts_with('anxiety_'),
depression = starts_with('depression_'),
efficacy = starts_with('efficacy_'),
sociability = starts_with('sociability_'),
stress = starts_with('stress_')
)

## # A tibble: 5 × 4
## scale alpha ci_lo ci_hi
## <chr> <dbl> <dbl> <dbl>
## 1 anxiety 0.620 0.452 0.788
## 2 depression 0.734 0.620 0.848
## 3 efficacy 0.706 0.577 0.835
## 4 sociability 0.634 0.473 0.794
## 5 stress 0.834 0.761 0.907

Calculate the aggregate scores


For each scale, we must calculate the mean over all the items (again, we must use the data frame with the reversed
items) to calculate the mean scores. For this, we use the command total_scores from the psyntur package. Its
syntax is very similar to that used above in cronbach. In particular, for each scale, we select all items to be
averaged over using starts_with. For example, using the following code, we get back a new data frame, which
we name psymetr_df_total, that has the the average anxiety, depression, efficacy, sociability, and
stress score for each participant.

psymetr_df_total <- total_scores(psymetr_df_fix,


anxiety = starts_with('anxiety_'),
depression = starts_with('depression_'),
efficacy = starts_with('efficacy_'),
sociability = starts_with('sociability_'),
stress = starts_with('stress_')
)
psymetr_df_total

## # A tibble: 44 × 5
## anxiety depression efficacy sociability stress
## <dbl> <dbl> <dbl> <dbl> <dbl>
## 1 2 2 2.2 3.2 1.6
## 2 1.8 1.8 1.7 2.3 2.9
## 3 2.1 2.8 2 1.9 2.3
## 4 1.8 2.3 3 3.5 1.2
## 5 1.1 1.6 3.3 3.6 1
## 6 1.6 2.3 2.7 3 1.4
## 7 2.5 3.2 1.9 2.3 3
## 8 1.8 2.3 1.7 3.1 2.4
## 9 1.5 2.2 3 3.5 1.5
## 10 1.5 1.8 2.3 4.1 1.3
## # … with 34 more rows

Should we calculate the mean or the sum of all the items’ values? If we have missing values, the sum can be
misleading. For example, if we have 10 items on a 5 point scale, the total score possible is 50. The mean shows
that they have on average the maximum score, but this is not apparent from the sum. However, sometimes people
want to report the sum, though don’t want it affect by any missing values. A way of doing this is to multiply the
mean, calculated after the missing values have been removed, by the number of items. For example, in the example
just mentioned, we could multiply the mean of 5.0 by 10 to get 50. This can be done in the total_scores function,
by saying .method = 'sum_like as follows.

total_scores(psymetr_df_fix,
anxiety = starts_with('anxiety_'),
depression = starts_with('depression_'),
efficacy = starts_with('efficacy_'),
sociability = starts_with('sociability_'),
stress = starts_with('stress_'),
.method = 'sum_like'
)

## # A tibble: 44 × 5
## anxiety depression efficacy sociability stress
## <dbl> <dbl> <dbl> <dbl> <dbl>
## 1 20 20 22 32 16
## 2 18 18 17 23 29
## 3 21 28 20 19 23
## 4 18 23 30 35 12
## 5 11 16 33 36 10
## 6 16 23 27 30 14
## 7 25 32 19 23 30
## 8 18 23 17 31 24
## 9 15 22 30 35 15
## 10 15 18 23 41 13
## # … with 34 more rows

The total_scores function uses the same aggregation method for all the variables. Sometimes, however, you
might like to calculate the, for example, mean for some variables and the sum (or sum_like) for other variables. To
do this, you must use the total_scores function twice, once of one set of variables, and then a second time for
another set of variables. The resulting two data frames can be bound together using bind_cols. In the following
example, we calculate the mean for the anxiety and depression scores, and the sum (sum_like) for the remaining
three variables, and then we bind them together with bind_cols.

bind_cols(
total_scores(psymetr_df_fix,
anxiety = starts_with('anxiety_'),
depression = starts_with('depression_'),
.method = 'mean'),
total_scores(psymetr_df_fix,
efficacy = starts_with('efficacy_'),
sociability = starts_with('sociability_'),
stress = starts_with('stress_'),
.method = 'sum_like')
)

## # A tibble: 44 × 5
## anxiety depression efficacy sociability stress
## <dbl> <dbl> <dbl> <dbl> <dbl>
## 1 2 2 22 32 16
## 2 1.8 1.8 17 23 29
## 3 2.1 2.8 20 19 23
## 4 1.8 2.3 30 35 12
## 5 1.1 1.6 33 36 10
## 6 1.6 2.3 27 30 14
## 7 2.5 3.2 19 23 30
## 8 1.8 2.3 17 31 24
## 9 1.5 2.2 30 35 15
## 10 1.5 1.8 23 41 13
## # … with 34 more rows

Calculating descriptives
For each variable, we can get back the mean, standard deviation, or any other descriptive statistic, as follows:

describe_across(psymetr_df_total,
variables = c(stress, anxiety, depression, efficacy, sociability),
functions = list(avg = mean, stdev = sd),
pivot = TRUE)

## # A tibble: 5 × 3
## variable avg stdev
## <chr> <dbl> <dbl>
## 1 stress 2.16 0.826
## 2 anxiety 1.85 0.458
## 3 depression 2.21 0.506
## 4 efficacy 2.25 0.488
## 5 sociability 3.08 0.615

We can make the above code a little bit simpler by using the everything() function for the value of the variables
argument. As we can see above, we individually selected each one of all the variables in the data set. Instead, if we
use everything(), we automatically select all the variables.

describe_across(psymetr_df_total,
variables = everything(),
functions = list(avg = mean, stdev = sd),
pivot = TRUE)

## # A tibble: 5 × 3
## variable avg stdev
## <chr> <dbl> <dbl>
## 1 anxiety 1.85 0.458
## 2 depression 2.21 0.506
## 3 efficacy 2.25 0.488
## 4 sociability 3.08 0.615
## 5 stress 2.16 0.826

If there are any missing values in psymetr_df_total, we will get NA values in the table of results from
describe_across. To avoid this, we can use counterparts of mean and sd that remove missing values before they
calculate the results. These are mean_xna and sd_xna, respectively. The following code uses these, but in this case,
because there were no missing values in the data, nothing changes in the table.

describe_across(psymetr_df_total,
variables = everything(),
functions = list(avg = mean_xna, stdev = sd_xna),
pivot = TRUE)
## # A tibble: 5 × 3
## variable avg stdev
## <chr> <dbl> <dbl>
## 1 anxiety 1.85 0.458
## 2 depression 2.21 0.506
## 3 efficacy 2.25 0.488
## 4 sociability 3.08 0.615
## 5 stress 2.16 0.826

Inter correlation matrix


We can also get the pairwise inter-correlation matrix as follows.

cor(psymetr_df_total)

## anxiety depression efficacy sociability stress


## anxiety 1.000000000 0.83274894 -0.56651310 0.004535846 0.8105351
## depression 0.832748939 1.00000000 -0.36339327 -0.085415608 0.6830987
## efficacy -0.566513097 -0.36339327 1.00000000 -0.042575472 -0.6259372
## sociability 0.004535846 -0.08541561 -0.04257547 1.000000000 -0.1448385
## stress 0.810535074 0.68309866 -0.62593715 -0.144838515 1.0000000

Note. If we had NA values in the psymetr_df_total, we would have to remove these first before we calculate the
correlation matrix. We would do this with the following version of the cor command.

cor(psymetr_df_total, use = 'complete.obs')

## anxiety depression efficacy sociability stress


## anxiety 1.000000000 0.83274894 -0.56651310 0.004535846 0.8105351
## depression 0.832748939 1.00000000 -0.36339327 -0.085415608 0.6830987
## efficacy -0.566513097 -0.36339327 1.00000000 -0.042575472 -0.6259372
## sociability 0.004535846 -0.08541561 -0.04257547 1.000000000 -0.1448385
## stress 0.810535074 0.68309866 -0.62593715 -0.144838515 1.0000000

We can make a scatterplot matrix using the command scatterplot_matrix from psyntur.

scatterplot_matrix(psymetr_df_total,
anxiety,
depression,
efficacy,
sociability,
stress)
Regression analysis

Regression
We do the multiple regression by indicating the outcome variable and the predictor variables, which in this case are
stress and anxiety, depression, efficacy, sociability, respectively.

Remember to use the data frame with the total scores.

model <- lm(stress ~ anxiety + depression + efficacy + sociability, data = psymetr_df_total)

The summary will give you the


coefficients table
the \(R^2\)
the adjusted \(R^2\)
the F statistic for the overall model null hypothesis

All of these are expected to be reported in your analysis report.

summary(model)

##
## Call:
## lm(formula = stress ~ anxiety + depression + efficacy + sociability,
## data = psymetr_df_total)
##
## Residuals:
## Min 1Q Median 3Q Max
## -0.8980 -0.3043 0.0488 0.2591 0.9700
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 1.5410 0.7505 2.053 0.04678 *
## anxiety 1.0748 0.3203 3.355 0.00178 **
## depression 0.1250 0.2583 0.484 0.63129
## efficacy -0.4512 0.1776 -2.541 0.01513 *
## sociability -0.2045 0.1143 -1.790 0.08127 .
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.455 on 39 degrees of freedom
## Multiple R-squared: 0.7247, Adjusted R-squared: 0.6965
## F-statistic: 25.67 on 4 and 39 DF, p-value: 1.803e-10

As we can see, the \(R^2\) value is 0.725, the adjusted \(R^2\) value is 0.696, the F statistic is \(F(4, 39) = 25.67\).

Confidence intervals
We can get the confidence intervals for the coefficients as follows.

confint(model)

## 2.5 % 97.5 %
## (Intercept) 0.02304224 3.05905333
## anxiety 0.42688263 1.72274893
## depression -0.39752733 0.64742803
## efficacy -0.81037120 -0.09208644
## sociability -0.43558962 0.02661865

Multicollinearity
We can measure the multicollinearity using the variance inflation factor using the vif function from car.

vif(model)

## anxiety depression efficacy sociability


## 4.476008 3.541564 1.560851 1.026784

Standardized coefficients
The standardized coefficients can be obtained using the lm.beta function from the lm.beta package. We send the
model to lm.beta to get a new standardized model, and then we can use summary etc with this model.

model_standardized <- lm.beta(model)

summary(model_standardized)

##
## Call:
## lm(formula = stress ~ anxiety + depression + efficacy + sociability,
## data = psymetr_df_total)
##
## Residuals:
## Min 1Q Median 3Q Max
## -0.8980 -0.3043 0.0488 0.2591 0.9700
##
## Coefficients:
## Estimate Standardized Std. Error t value Pr(>|t|)
## (Intercept) 1.54105 0.00000 0.75049 2.053 0.04678 *
## anxiety 1.07482 0.59642 0.32033 3.355 0.00178 **
## depression 0.12495 0.07648 0.25831 0.484 0.63129
## efficacy -0.45123 -0.26675 0.17756 -2.541 0.01513 *
## sociability -0.20449 -0.15237 0.11426 -1.790 0.08127 .
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.455 on 39 degrees of freedom
## Multiple R-squared: 0.7247, Adjusted R-squared: 0.6965
## F-statistic: 25.67 on 4 and 39 DF, p-value: 1.803e-10

You might also like