0% found this document useful (0 votes)
6 views43 pages

Session 2

The document discusses hypothesis testing, emphasizing the roles of the null hypothesis (H0) and alternative hypothesis (H1) in statistical analysis, along with the significance of p-values in determining statistical significance. It also covers the importance of visual data representation through graphs, including histograms and boxplots, for exploring data distributions and identifying outliers. Additionally, it explains the creation and interpretation of bar charts for comparing means across different groups.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
6 views43 pages

Session 2

The document discusses hypothesis testing, emphasizing the roles of the null hypothesis (H0) and alternative hypothesis (H1) in statistical analysis, along with the significance of p-values in determining statistical significance. It also covers the importance of visual data representation through graphs, including histograms and boxplots, for exploring data distributions and identifying outliers. Additionally, it explains the creation and interpretation of bar charts for comparing means across different groups.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 43

Advanced Data Analysis

Session 2
HYPOTHESIS TESTING
Hypothesis testing
Introduction
• Most hypotheses can be expressed in terms of 2 variables: an IV and a DV

• A research hypothesis states that an effect is present

• Hypothesis testing involves 2 types of hypotheses:


 The alternative hypothesis (H1): the effect is present
 The null hypothesis (H0): the effect is absent (= opposite of H1)
Example
Some researchers found that people ate less of a food if they had previously
imagined eating it. We want to replicate this study.
 H1: If one imagines eating chocolate, one will eat less of it than if one doesn’t
imagine
 H0: If one imagines eating chocolate, one will eat the same amount as normal

• Why do we need the null hypothesis H0?


 Statistically, we cannot prove the alternative hypothesis (H1)
 What we can do is collect evidence to reject the null hypothesis
Hypothesis testing
The logic
• Null hypothesis significance testing
= system designed to tell us whether the alternative hypothesis H1 is likely
to be true, which helps us to support or not our predictions
1. We assume that the null hypothesis (H0) is true
= there is no effect, or the effect that you may observe happens by chance
2. We fit a statistical model to our data – a test statistic – that represents the
alternative hypothesis (H1) and see how well it fits
→How much variation in scores does the test statistic explain versus how much
is due to chance
3. To determine how well the model fits the data, we calculate the probability of
getting that model if the null hypothesis H0 were true (called the p-value)
= the probability that the effect is due to chance
4. If the p-value is very small (0.05 or less), we can conclude that the model fits
the data well and gain confidence in the alternative hypothesis
→ H1 is likely to be true: there is a high probability that the effect we observe
does not happen by chance
Hypothesis testing
Statistical significance
• When to reject the null hypothesis H0?
Usual criterion: p-value of .05

 If the p-value (i.e., the probability that the variation in scores is due to chance)
is p > .05 : we cannot reject the null hypothesis H0 and therefore we don’t find
support for the alternative hypothesis H1
→ The test statistic is said to be non-significant
→ Conclusion: there is no statistically significant effect

 If the p-value (i.e., the probability that the variation in scores is due to chance)
is p < .05 : we can reject the null hypothesis H0 and therefore the alternative
hypothesis H1 is supported
→ The test statistic is said to be significant
→ Conclusion: there is a statistically significant effect
Hypothesis testing
Confidence intervals and statistical significance
Reminder
Confidence interval (CI)
= A range of scores constructed such that the population mean will fall
within this range in 95% (or 99%) of samples

Two overlapping
95% confidence
intervals

 Not statistically different

Two 95%
confidence
intervals that
don’t overlap

 Statistically different
Hypothesis testing
Confidence intervals and statistical significance
Left: 95% confidence intervals
that just touch end-to-end
represent a p-value ≈ .01

Right: If there is a gap between


the two 95% confidence
intervals, then p-value < .01

A p-value of .05 is represented by


moderate overlap
• Left: 95% CI of the same
length → moderate = a
quarter of the CI
• Right: 95% CI of different
lengths → interpretation
more difficult
EXPLORING DATA WITH GRAPHS
Introduction
The SPSS chart builder
Introduction
The SPSS chart builder
You can build a graph by
using the gallery of
predefined graphs

Or you can build a


graph on an element-
by-element basis
Introduction
When and why are graphs important?

At the data exploration stage


• It allows to summarize a (large) set of data in 1 graph
• It allows to visualize the distribution of the data (shape, general trends)
• It allows to screen the data for potential problems, such as outliers
Outlier = an observation/score very different from most others

Why is it important to identify outliers?


> Outliers bias statistics (e.g., the mean) and their standard
errors and confidence intervals

 Important to draw some graphs (histograms and boxplots) for all the
dependent variables (DVs) before performing statistical analyses

At the analysis stage


• It allows to visually represent the results of statistical analyses in a clear
manner (often used in reports) (bar charts)
Introduction
What makes a good graph?
• Show the data and their underlying message
• Present many numbers/information with minimum ink
• Encourage the reader to compare different pieces of data
• Free of “chartjunk”: patterns, 3-D effects, shadows, pictures…
• Do not distort the data / create false impressions / hide effects
HISTOGRAMS
Histograms
Introduction
• Histogram = frequency distribution = visual display of how many times
each score of a metric variable occurs

145cm 175cm 205cm

• The values of observations are plotted on the horizontal axis (x-axis), and the
frequency with which each value occurs in the data set is plotted on the
vertical axis (y-axis)
→ each bar represents a different score & the height of a bar is proportional to
the corresponding frequencies for that score
Histograms
Introduction
• Scores can also be arranged in ranges of scores
Example: length → ranges of 20 cm (e.g., 145-164; 165-184; 185-204)

• It allows to summarize a (large) set of data in 1 graph, to visualize the


distribution of the data (shape, general trends), and to screen the data for
potential problems (such as outliers)
Histograms
Different types

Simple histogram: used to visualize


the frequencies of each score for a
single variable

Population pyramid: used to


visualize the relative frequencies of
scores/differences in distributions
in 2 populations
Histograms
A simple histogram
Step 1: Open the Chart builder dialog box

Step 2: Double-click on the icon for a simple histogram


Histograms
A simple histogram
Step 3: Drag the variable « Success_Post » into the x-axis box
Histograms
A simple histogram
Step 4: View the resulting graph in the SPSS output window

However, we had 2 groups of people:


those who wished upon a star for their
wish to come true and those who worked
hard for their wish to come true
→ Useful to plot the histogram separately
for the 2 groups to compare the effect of
the manipulation

 Use a « population pyramid »


Histograms
A population pyramid
Step 1: Open the Chart builder dialog box
Step 2: Double click on the icon for a population pyramid
Step 3: Drag the variable that you want to plot (« Success_Post ») into the Distribution
variable box
Step 4: Drag the variable for which you want to plot different distributions (« Strategy »)
into the Split variable box
Histograms
A population pyramid
Step 5: View the resulting graph in the SPSS output window

 A population pyramid is
a good way to visualize
differences in distributions
in different groups (or
populations)

 Histograms &
population pyramids can
also help you spot unusual
cases (outliers)
Histograms
Element properties

Instead of expressing values in frequencies (default


option), you could select « Histogram Percent » to
display values as percentages
Histograms
Element properties: creating ranges

Set the value that


you want the ranges
to begin with

Decide on how the


ranges should be
created
BOXPLOTS
Boxplots

• A boxplot is a graphical representation of important characteristics of a


set of observations for a metric variable

• It allows to visualize the median, quartiles, interquartile range, extreme


scores…

• Boxplots are useful for identifying outliers

• Boxplots are also useful for comparing distributions


Boxplots
Different types

Simple boxplot: used to plot a


boxplot of a single variable but to 1-D boxplot: used to see a boxplot
display different boxplots for for a single variable
different categories in the data
Clustered boxplot: used when you
have a second categorical variable
on which to split the data (the
boxplots for this 2nd variable are
produced in different colors)
Boxplots File: Jiminy_Cricket.sav
Example
Based on the data from the previous example, let’s plot the information for people’s
success scores after they either wished upon a star or worked hard for their dreams to
come true (variable “Success_Post”).
Step 1: Open the Chart builder dialog box
Step 2: Double click on the icon for a simple boxplot
Step 3: Drag the variable that you want to plot (« Success_Post ») into the y-axis box
Step 4: Drag the categorical variable for which you want to draw different boxplots
(« Strategy ») into the x-axis box
Boxplots Label = row number from the SPSS data editor
≠ score
Interpretation  Useful for identifying these scores in data, check that they were
entered correctly, look for reasons why this score appears unusual

Unusual scores in SPSS


rule:
• Outliers = any score >
(upper quartile + 1.5*IQR)
• Extreme scores= any score
> (upper quartile + 3*IQR)

The same rule applies to


cases below the lower
quartile
GRAPHING MEANS
BAR CHARTS & ERROR BARS
Bar charts
Introduction
• Bar charts are used when you have different groups!
Simple bar chart:
• used when you want to
see the means of scores
for 1 metric variable
across different groups of
cases (i.e., according to 1
categorical variable)

Clustered bar chart:


• used when you want to
see the means of scores
for 1 metric variable
according to 2
categorical variables
• Used when you want to
see the means of scores
for 2 metric variables
according to 1
categorical variable

Same as simple or clustered bar chart but instead of bars, the mean is
represented by a dot and a line represents the 95% CI of the mean
HOWEVER, these error bars can be added to a bar chart
Bar charts
Important consideration
• The way bar charts are created depends largely on the way data were
collected  do the means come from independent or dependent
samples?
Independent samples: there is no connection between the members of the
different groups, they are different people
e.g., one group of people saw a picture of a big hairy tarantula and another group of
people played with a real, big hairy tarantula. In both groups, level of anxiety was
measured afterwards.
e.g., gender: male and female

Dependent samples: there is a connection between the members of the


different groups, they are the same people
e.g., people saw a picture of a big hairy tarantula in period 1 and then were asked to play
with a real, big hairy tarantula in period 2. Level of anxiety was measured after each task.
e.g., husband and wife
Bar charts for independent means
Simple bar chart
Step 1: Open the Chart builder dialog box
Step 2: Double click on the icon for a simple bar chart

Always the dependent variable, for


which you want to display the mean

Always the independent variable, by


which you want to split the data of the DV

Step 3: Drag the DV (« arousal ») into the y-axis box and drag the IV
(« Film ») into the x-axis box
Bar charts for independent means
Simple bar chart
Step 4: Ask to display the error bars
Bar charts for independent means
Clustered bar chart
Step 1: Open the Chart builder dialog box
Step 2: Double click on the icon for a clustered bar chart

Step 3: Drag the DV (« arousal ») into the y-axis box, drag the IV (« Film »)
into the x-axis box, and drag the second grouping variable (« Gender ») into
the Cluster on X box
Step 4: In « Element Properties », ask to display the error bars
Bar charts for dependent means
Simple bar chart
Step 1: Open the Chart builder dialog box
Step 2: Double click on the icon for a simple bar chart

This time, however, you do not have a grouping variable defined and have the data for the
DV (number of hiccups) split in 4 sets of data (baseline, tongue pulling, carotid artery
massage, feet massage) → ?!?
Bar charts for dependent means
Simple bar chart
Step 3: Drag all the variables that together form your DV into the y-axis box
You have to drag them simultaneously
→ Hold down the Ctrl key and click on the 4 variables « baseline », « tongue
pulling », « carotid artery massage », and « feet massage »

SUMMARY
= outcome variable (DV)

INDEX
= manipulation (IV)
Bar charts for dependent means
Simple bar chart
Step 4: Edit the properties of the graph
Bar charts for dependent means
Simple bar chart
EDITING GRAPHS
Editing graphs When you created a graph, double-click on
the graph in the SPSS Output Window

When clicking on the different parts of the graph: you can change the bar colors, the
background colors, etc.
Editing graphs
When clicking on the scale of the graph: you can change the minimum and maximum values displayed,
the decimal places, the text style, etc.

But remember to:


• Present information with
minimum ink
• Keep graph free of
“chartjunk”: patterns, 3-D
effects, shadows,
pictures…
• Avoid distorting the data /
create false impressions /
hide effects
Editing graphs
Options in the chart editor window
Exploring data with graphs
Summary

You might also like