0% found this document useful (0 votes)

23 views27 pages

Help Statistiek Intro To Long Data Analysis 2023

This document provides an introduction to longitudinal data analysis and summarizes a lunchtime lecture on the topic. Longitudinal data involves measuring multiple subjects at several points in time. Linear regression is not suitable for longitudinal data due to dependency between observations from the same subject over time. Two common approaches for analyzing longitudinal data are using summary measures, which reduces the data but allows standard analyses, or multilevel modeling, which better utilizes all data by accounting for the clustering of observations within subjects. The lecture will introduce multilevel models for change to model trajectories over time while accounting for the nested structure of longitudinal data.

Uploaded by

Gebrekiros

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

23 views27 pages

Help Statistiek Intro To Long Data Analysis 2023

Uploaded by

Gebrekiros

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

You are on page 1/ 27

Help! Statistics!

Introduction to Longitudinal Data Analysis

Sacha la Bastide-van Gemert

Medical Statistics and Decision Making
Epidemiology, UMCG
Help! Statistics! -lunch time lectures

What? Frequently used statistical methods and questions in a manageable

timeframe for all researchers at the UMCG.
No knowledge of advanced statistics is required.

When? Lectures take place every 1st Tuesday of every two months, 12.00-13.00hrs.

Who? Unit for Medical Statistics and Decision Making and colleagues

When? Where? What? Who?

Feb 7, 2023 Room 16 Introduction to longitudinal data analysis Sacha la Bastide

April 4, 2023 Room 16 Machine Learning Hylke Donker
Jun 6, 2023 Room 16 … …

Slides from each presentation can be downloaded from:

https://www.rug.nl/research/epidemiology/download-area
Introduction to longitudinal data analysis: overview

What is
longitudinal data?

Why does it need a  revisiting the linear regression model

special approach?

• using summary measures

Longitudinal data • introduction of the multilevel model for change
analysis: (a.k.a. mixed effects model)
And yes, there will be some mathematical notation…

• To denote the value of a variable for subject number (at time-point ) in our dataset,
we use subscripts like this:

• To denote a variable to be Normally distributed,

with mean 0 and standard deviation , we write:

• A linear relationship (= “a straight line”) between 𝑌

𝑌 =2.5+0.5 ∗ 𝑋
variables and with a certain intercept and

a certain slope is described by the formula:

𝑋
What is longitudinal data? (1)
Clustered data For many more
examples, see the
Clustered (or nested/multilevel/hierarchical/...) data previous Help!
Example: several classrooms, within each classroom students Statistics! lecture

• Observations from students from the same classroom are more alike than students
from different classrooms: students are nested in classrooms

• Variables at student level: gender, SES, ...

multilevel data
• Variables at classroom level: teacher effect, ...
What is longitudinal data? (2)

• Longitudinal data: several subjects, each measured at several (different) points in

time t1, t2, t3, t4, t5:

t1 t1 t1 t1
t5 t5 t5
t2 t2 t2
t3 t4 t3 t4 t4 t4

• Measurements (at different time points) from one subject are more alike than
measurements from different subjects: measurements are nested within subjects
Today: focus
on continuous
• Variables at each time point: lengths, grades... multilevel data outcome
• Variables for each subject: gender, SES, ... variables
Example: adolescent alcohol use (Curran et al, 1997)*

• Sample of 82 adolescents:
37 are Children Of an Alcoholic parent (COAs), 45 are non-COAs

• Research design:
- each child assessed 3 times
(at ages 14, 15, 16)
- outcome: alcuse (continuous,
“alcohol use”-questionnaire)
- covariate: coa (dichotomous, 0=no, 1=yes)

• Research question:
Do trajectories of adolescent alcohol use differ by parental alcoholism?

* Example from: Singer & Willet: Applied longitudinal data analysis. Modeling change and event occurence (Oxford, 2003)
Longitudinal data
The data-set: person-period format

Data in long format:

for each person, each repeated
measurement is stored as a new person 1

case.
person 2

Here: 3 rows per person ...

- a time variable: age
- an outcome variable: alcuse
clustered
- a (time-independent) data!
covariate: coa
Investigating change over time
Scatterplot age-alcuse for the whole data-set:

Continuous
outcome variable
alcuse, covariates
age and coa…

... what about

alcuse

linear regression
of alcuse on age?

Let’s revisit (simple)

linear regression
age (years)
analysis...
Intermezzo
The linear regression model revisited (1)
Cross-sectional data: for each adolescent i
one observation (alcuse, age, coa)

Investigating the linear relation between Note: cross-

sectional
age and outcome alcuse: data
• what is the best fitted straight line?
= find the line “closest” to the data points in the
scatter plot

𝑌 𝑖= 𝛽0 + 𝛽1+𝜀
𝑋 𝑖 𝑖 𝜀𝑖 𝑁 (0,𝜎 2)
etc

Here, and are estimated to be -

3.0 and 0.26 : residuals
Intermezzo
The linear regression model revisited (2)
Linear regression:
- we assume mean alcohol use for fixed age values are on a straight line
- individual observations are assumed to be normally distributed around these means
(random residual)

Formally: we assume an underlying true population linear relationship

𝑌 𝑖= 𝛽0 + 𝛽1+𝜀
𝑋 𝑖 𝑖 𝜀𝑖 𝑁 (0 ,𝜎 2) Residual : random variable,
normally distributed with
constant variance σ²,
Assumptions made in order for the model to be valid: independent from the value of X
• independent observations
• linear relation between Y and X
• normally distributed residuals
• homogeneity of the residuals’ variance across values of X
Back to our longitudinal data-example...
Longitudinal data
Plot of whole group

Remember the research question:

Do trajectories of adolescent
alcohol use differ by parental
alcoholism?

Different measurements from one adolescent are related:

dependency within observations!

Linear regression is no longer an option... 12
Analysis of longitudinal data
Using summary measures (1)
Solution:
Choose ONE suitable summary measure Y which reflects a relevant feature of the curve:
- mean over time
- maximum value
- time of reaching the maximum
- maximal velocity/increase
- ...
Now there is just one outcome variable (the summary measure Y)
per adolescent ⟶ independent observations ⟶ multiple regression analysis!

Advantages:
- simple and easy (can be done using standard techniques)
- provides nice summaries of the data
Disadvantages:
- inefficient use of the whole data
- not all types of research questions can be addressed
Analysis of longitudinal data
Using summary measures (2)

Example: for each adolescent we take the

maximum value of alcohol use alcuse_max
over the three years:

• Higher median alcuse_max for COA=1

group than for COA=0 group

• Different distributions of two groups

alcuse_max much more skewed in COA=0 than in
COA=1

Does COA affect maximum alcohol use? coa

(Mann-Whitney test for independent groups)

Let’s see if we can make

better use of all our data!
14
Analysis of longitudinal data
Summarizing so far...

• Investigating change over time requires longitudinal data: multiple (ideally

≥ 3 waves) measurements over time per subject

• Using summary measures is an option, but means throwing away

information and is limited in answering research questions on
change/trajectories

• Linear regression model is not applicable, due to violations to the model

assumptions (dependency in longitudinal data!)

… so time to tackle the clustering!

15
Analysis of longitudinal data
Introducing the multilevel model for change
We want to expand the linear regression model with several random effects:
mixed effects or multilevel model

“random effects & fixed effects” “individual level & group level”

This model answers:

- within-person questions (intra-individual) Level 1
How does each person’s alcohol use change over time? multilevel
(trajectories) model

(linked pair
- between-person questions (inter-individual) Level 2 of statistical
How does having an alcoholic parent affect these trajectories? models)
Introducing the multilevel model
Exploring individual’s growth plots
to come up with a level-1 submodel

Plotting regression models

for a group of subjects i to
help answer the question:

What population
individual growth model
might have generated
these sample data?

elevation? tilt?
(non-)linear?

Note: “simpler is better”

Here we choose
a linear model
Introducing the multilevel model
The level-1 submodel for individual change
Assumption: in the population, alcuseij is a linear function of child i’s age on occasion j

, and are deviations of i’s

true trajectory from
linearity on each occasion
Individual i’s alcuse (random errors)
(hypothesized) 4
true trajectory Assumption:
3

is the intercept of i’s 𝜀𝑖 1

true trajectory 2
(= “alcuse at age 0”) 𝜀𝑖 2
1 is the slope of i’s true
trajectory
i =1, ...,82 (children) (=“rate of alcuse change”)
j=1, 2, 3 (measurements) 0
14 15 16
age
Introducing the multilevel model
What do we want from our level-2 submodels?
Demands:
1. We need two level-2 submodels :
- one for intercept 0i
- one for slope 1i
These models should:
2. specify the relationship between 0i and 1 and the covariate of interest (COA)
3. allow adolescents with common COA-values to have different individual trajectories
COA=0 COA=1
alcuse

alcuse

age age
Introducing the multilevel model
The level-2 submodels for inter-individual differences in change

Level-2 intercepts
Population average intercept and
slope for COA=0

Level-2 slopes
Effect of COA on intercept and on
slope

𝜋 0𝑖=𝛾 00+𝛾 01 𝐶𝑂 𝐴𝑖+𝜁 0 𝑖(random intercept)

𝜋1𝑖 =𝛾10 +𝛾11 𝐶𝑂 𝐴𝑖 +𝜁 1(random
𝑖
slope)

Level-2 residuals and Extra model assumptions:

Deviations of each individual’s trajectory around the predicted
average intercept and slope:

allowing for “scattering” of the individual trajectories around the >>> beyond the scope of
population mean growth trajectories
today’s lecture <<<
Introducing the multilevel model
Estimating the fixed effects (, , )
Summarizing the total model:

(level 1)

(level 2) For the average COA-adolescent,

it is 1.4 higher (at age 0)

Initial alcuse (“alcuse at age 0”) (difference in initial alcuse between COA-
for the average non-COA groups)
adolescent is -3.8

^𝜋 0𝑖=−3.8+1.4∗𝐶𝑂 𝐴𝑖
Fitted model for intercept

Fitted model for slope ^𝜋1𝑖 =0.29−0.05∗𝐶𝑂 𝐴𝑖

For the average COA-adolescent, it is 0.05

Annual rate of change for the lower (non significant)
average non-COA adolescent is
0.29 (difference in slope between COA-groups)
Introducing the multilevel model
Visualizing the results: constructing fitted growth trajectories

For COA=0 we get: For COA=1 we get:

^𝜋 0𝑖=− 3.8+1.4∗𝐶𝑂 𝐴𝑖 ^𝜋 0𝑖=− 3.8 ^𝜋 0𝑖=−3.8+1.4∗1=−2.4
𝜋^ 1𝑖 =0.29− 0.05∗𝐶𝑂 𝐴𝑖 ^𝜋1 𝑖 =0.29 ^𝜋1𝑖 =0.29−0.05∗1=0.24
Substitute the estimates into the
level-1 model
ALCUSE to get fitted growth trajectories:
2

𝑤h𝑒𝑛𝐶𝑂 𝐴𝑖=1: 𝑌^ 𝑖𝑗=−2.4+0.24∗𝑎𝑔𝑒

COA = 1

1
COA = 0 𝑤h𝑒𝑛𝐶𝑂 𝐴𝑖 =0: 𝑌^ 𝑖𝑗 =−3.8+0.29∗𝑎𝑔𝑒
dotted line: individual estimated¿ trajectory for one child i
(randomly deviation from the bold green curve due to )

green dots: actual observed values of alcuse for child i

0
13 14 15 16 17 (randomly scattered around the dotted green line due to )
AGE
The multilevel model
Combining the levels: rewriting the model

𝜋 0𝑖=𝛾 00+𝛾 01 𝐶𝑂 𝐴𝑖 +𝜁 0𝑖 𝜋1𝑖 =𝛾10 +𝛾11 𝐶𝑂 𝐴𝑖 +𝜁 1𝑖

…Toto, I’ve got a
𝑎𝑙𝑐𝑢𝑠𝑒𝑖𝑗 =𝜋 0𝑖+𝜋 1𝑖 𝑎𝑔 𝑒𝑖𝑗 +𝜀𝑖𝑗 feeling this is not
regular linear
regression
anymore…

𝑎𝑙𝑐𝑢𝑠𝑒𝑖𝑗 =( 𝛾00 +𝛾 01 𝐶𝑂 𝐴𝑖 +𝜁 0𝑖+) ( 𝛾10 +𝛾11 𝐶𝑂 𝐴𝑖 +𝜁 1𝑖 ) ∗ 𝑎𝑔𝑒𝑖𝑗 +𝜀𝑖𝑗

Same
model, now
in one
equation!

Complex residuals!
They change with age now and
Fixed part of the model shows clearly how alcuse depends on: are autocorrelated (dependent)
– covariates age and COA
– interaction term, COA age, allowing the effect of age to differ for (… very much unlike linear
levels of COA regression...)
JUST LIKE LINEAR REGRESSION!
Some final remarks on mixed effects/multilevel models

• A lot more can to be considered, such as:

- unbalanced/missing data
- time-dependent covariates
- different correlation structures/model designs/estimation methods
- models for different types of outcome variables
- …
• Mixed effects models are complex and applying them correctly is a challenge

… so why bother?
- to make efficient use of all data (even for subjects with missing measurements!)
- to properly account for correlation structures within your data (and avoiding
estimation bias in your confidence intervals/standard errors)
- in general: estimating fewer parameters, reducing number of tests, …
Books, courses and an e-module

• Snijders & Bosker: Multilevel Analysis. An introduction to basic and advanced

multilevel modeling (London, 1999, 2011)
• Verbeke & Molenberghs: Linear mixed models for longitudinal data (New York, 2000)
• Singer & Willet: Applied longitudinal data analysis. Modeling change and event
occurence (Oxford, 2003)
• Pinheiro & Bates: Mixed effects models in S and S-plus (New York, 2000)

Courses offered from our unit:

• Generalized and Linear Mixed Effects Models (SPSS, R)
• Applied Longitudinal Data Analysis (SPSS, R)
• Beyond Regression (CPE-students)
https://edubox.nl/portal.aspx#opleiding=opleiding_gnk
Next Help! Statistics!-lunchtime lecture

Hylke Donker
Machine learning
April 4th, 2023
Room 16

Principles: Life and Work
From Everand
Principles: Life and Work
Ray Dalio
4/5 (650)
The Gifts of Imperfection: Let Go of Who You Think You're Supposed to Be and Embrace Who You Are
From Everand
The Gifts of Imperfection: Let Go of Who You Think You're Supposed to Be and Embrace Who You Are
Brene Brown
4/5 (1175)
The Glass Castle: A Memoir
From Everand
The Glass Castle: A Memoir
Jeannette Walls
4.5/5 (1859)
The Perks of Being a Wallflower
From Everand
The Perks of Being a Wallflower
Stephen Chbosky
4.5/5 (4104)
Sing, Unburied, Sing: A Novel
From Everand
Sing, Unburied, Sing: A Novel
Jesmyn Ward
4/5 (1278)
Her Body and Other Parties: Stories
From Everand
Her Body and Other Parties: Stories
Carmen Maria Machado
4/5 (903)
Shoe Dog: A Memoir by the Creator of Nike
From Everand
Shoe Dog: A Memoir by the Creator of Nike
Phil Knight
4.5/5 (629)
The Emperor of All Maladies: A Biography of Cancer
From Everand
The Emperor of All Maladies: A Biography of Cancer
Siddhartha Mukherjee
4.5/5 (298)
Steve Jobs
From Everand
Steve Jobs
Walter Isaacson
4.5/5 (1139)
Angela's Ashes: A Memoir
From Everand
Angela's Ashes: A Memoir
Frank McCourt
4.5/5 (945)
The World Is Flat 3.0: A Brief History of the Twenty-first Century
From Everand
The World Is Flat 3.0: A Brief History of the Twenty-first Century
Thomas L. Friedman
3.5/5 (2289)
The Yellow House: A Memoir (2019 National Book Award Winner)
From Everand
The Yellow House: A Memoir (2019 National Book Award Winner)
Sarah M. Broom
4/5 (100)
The Outsider: A Novel
From Everand
The Outsider: A Novel
Stephen King
4/5 (2886)
OpEx Budget Template
No ratings yet
OpEx Budget Template
5 pages
A Heartbreaking Work Of Staggering Genius: A Memoir Based on a True Story
From Everand
A Heartbreaking Work Of Staggering Genius: A Memoir Based on a True Story
Dave Eggers
3.5/5 (233)
C18 Cat
90% (10)
C18 Cat
2 pages
Team of Rivals: The Political Genius of Abraham Lincoln
From Everand
Team of Rivals: The Political Genius of Abraham Lincoln
Doris Kearns Goodwin
4.5/5 (244)
Rise of ISIS: A Threat We Can't Ignore
From Everand
Rise of ISIS: A Threat We Can't Ignore
Jay Sekulow
3.5/5 (144)
Manhattan Beach: A Novel
From Everand
Manhattan Beach: A Novel
Jennifer Egan
3.5/5 (929)
Fear: Trump in the White House
From Everand
Fear: Trump in the White House
Bob Woodward
3.5/5 (841)
John Adams
From Everand
John Adams
David McCullough
4.5/5 (2547)
Conveyor S
100% (6)
Conveyor S
60 pages
The Unwinding: An Inner History of the New America
From Everand
The Unwinding: An Inner History of the New America
George Packer
4/5 (45)
The Light Between Oceans: A Novel
From Everand
The Light Between Oceans: A Novel
M.L. Stedman
4.5/5 (815)
Little Women
From Everand
Little Women
Louisa May Alcott
4.5/5 (2369)
BSC - C Language Practical
No ratings yet
BSC - C Language Practical
23 pages
Wood Species Identification Using Convolutional Neural Network (CNN) Architectures On Macroscopic Images
No ratings yet
Wood Species Identification Using Convolutional Neural Network (CNN) Architectures On Macroscopic Images
11 pages
Universal Rack: Product Data Sheet
No ratings yet
Universal Rack: Product Data Sheet
2 pages
Tolrance
No ratings yet
Tolrance
18 pages
Bobcat T40140 Telescopic Loader Specifications & Options - Main - 01
No ratings yet
Bobcat T40140 Telescopic Loader Specifications & Options - Main - 01
1 page
Modul Excel Kelas 7
No ratings yet
Modul Excel Kelas 7
11 pages
Which of The Following Statements Are CORRECT A... - Chegg - Com 1 PDF
No ratings yet
Which of The Following Statements Are CORRECT A... - Chegg - Com 1 PDF
2 pages
Chemical Reaction Practice Test
No ratings yet
Chemical Reaction Practice Test
9 pages
Aterramento Usina Ufv
No ratings yet
Aterramento Usina Ufv
34 pages
Stereo, Single-Supply 18-Bit Integrated DAC AD1859: SD Modulator With Triangular PDF Dither
No ratings yet
Stereo, Single-Supply 18-Bit Integrated DAC AD1859: SD Modulator With Triangular PDF Dither
16 pages
Worksheet G10 - Number & Language
0% (1)
Worksheet G10 - Number & Language
2 pages
Tonearm Setup Audio
No ratings yet
Tonearm Setup Audio
12 pages
Periodic Table Large
No ratings yet
Periodic Table Large
1 page
International Journal of Scientific and Statistical Computing (IJSSC) Volume (1) Issue
No ratings yet
International Journal of Scientific and Statistical Computing (IJSSC) Volume (1) Issue
18 pages
04 Syllabus 2024 25
No ratings yet
04 Syllabus 2024 25
58 pages
2025 IT Last Push Grade12 Learners Book
No ratings yet
2025 IT Last Push Grade12 Learners Book
99 pages
Bond-Slip Behavior of Reinforced Concrete Members
0% (1)
Bond-Slip Behavior of Reinforced Concrete Members
10 pages
w4 Topic 3.consumer Choice
No ratings yet
w4 Topic 3.consumer Choice
23 pages
Bakken2020 PDF
No ratings yet
Bakken2020 PDF
9 pages
PLO4, CLO3, C4: Rubrics For Assessment
No ratings yet
PLO4, CLO3, C4: Rubrics For Assessment
2 pages
Electrical Machines I
No ratings yet
Electrical Machines I
31 pages
TEG Water Equilibrium
100% (1)
TEG Water Equilibrium
9 pages
Chem JIT .....
No ratings yet
Chem JIT .....
14 pages
Hazen - Williams Equation
No ratings yet
Hazen - Williams Equation
3 pages
BSNL JTO 2001 Electronics
No ratings yet
BSNL JTO 2001 Electronics
37 pages
SD Questions About Pricing Condition
100% (1)
SD Questions About Pricing Condition
9 pages
2014-16 Transmission CVT Fluid Change - Corolla (k313)
100% (2)
2014-16 Transmission CVT Fluid Change - Corolla (k313)
16 pages

Help Statistiek Intro To Long Data Analysis 2023

Uploaded by

Help Statistiek Intro To Long Data Analysis 2023

Uploaded by

Help! Statistics!

Introduction to Longitudinal Data Analysis

Sacha la Bastide-van Gemert

What? Frequently used statistical methods and questions in a manageable

When? Where? What? Who?

Feb 7, 2023 Room 16 Introduction to longitudinal data analysis Sacha la Bastide

Slides from each presentation can be downloaded from:

Why does it need a  revisiting the linear regression model

• using summary measures

• To denote a variable to be Normally distributed,

with mean 0 and standard deviation , we write:

• A linear relationship (= “a straight line”) between 𝑌

a certain slope is described by the formula:

• Variables at student level: gender, SES, ...

• Longitudinal data: several subjects, each measured at several (different) points in

Data in long format:

Here: 3 rows per person ...

... what about

Let’s revisit (simple)

Investigating the linear relation between Note: cross-

Here, and are estimated to be -

Formally: we assume an underlying true population linear relationship

Remember the research question:

Different measurements from one adolescent are related:

dependency within observations!

Example: for each adolescent we take the

• Higher median alcuse_max for COA=1

• Different distributions of two groups

Does COA affect maximum alcohol use? coa

Let’s see if we can make

• Investigating change over time requires longitudinal data: multiple (ideally

• Using summary measures is an option, but means throwing away

• Linear regression model is not applicable, due to violations to the model

… so time to tackle the clustering!

This model answers:

Plotting regression models

Note: “simpler is better”

, and are deviations of i’s

is the intercept of i’s 𝜀𝑖 1

𝜋 0𝑖=𝛾 00+𝛾 01 𝐶𝑂 𝐴𝑖+𝜁 0 𝑖(random intercept)

Level-2 residuals and Extra model assumptions:

(level 2) For the average COA-adolescent,

Fitted model for slope ^𝜋1𝑖 =0.29−0.05∗𝐶𝑂 𝐴𝑖

For the average COA-adolescent, it is 0.05

For COA=0 we get: For COA=1 we get:

𝑤h𝑒𝑛𝐶𝑂 𝐴𝑖=1: 𝑌^ 𝑖𝑗=−2.4+0.24∗𝑎𝑔𝑒

green dots: actual observed values of alcuse for child i

𝜋 0𝑖=𝛾 00+𝛾 01 𝐶𝑂 𝐴𝑖 +𝜁 0𝑖 𝜋1𝑖 =𝛾10 +𝛾11 𝐶𝑂 𝐴𝑖 +𝜁 1𝑖

𝑎𝑙𝑐𝑢𝑠𝑒𝑖𝑗 =( 𝛾00 +𝛾 01 𝐶𝑂 𝐴𝑖 +𝜁 0𝑖+) ( 𝛾10 +𝛾11 𝐶𝑂 𝐴𝑖 +𝜁 1𝑖 ) ∗ 𝑎𝑔𝑒𝑖𝑗 +𝜀𝑖𝑗

• A lot more can to be considered, such as:

• Snijders & Bosker: Multilevel Analysis. An introduction to basic and advanced

Courses offered from our unit:

You might also like