0% found this document useful (0 votes)

6 views43 pages

Session 2

The document discusses hypothesis testing, emphasizing the roles of the null hypothesis (H0) and alternative hypothesis (H1) in statistical analysis, along with the significance of p-values in determining statistical significance. It also covers the importance of visual data representation through graphs, including histograms and boxplots, for exploring data distributions and identifying outliers. Additionally, it explains the creation and interpretation of bar charts for comparing means across different groups.

Uploaded by

Rajasekharan Kuntheti Gopalakrishna

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

6 views43 pages

Session 2

Uploaded by

Rajasekharan Kuntheti Gopalakrishna

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 43

Advanced Data Analysis

Session 2
HYPOTHESIS TESTING
Hypothesis testing
Introduction
• Most hypotheses can be expressed in terms of 2 variables: an IV and a DV

• A research hypothesis states that an effect is present

• Hypothesis testing involves 2 types of hypotheses:

 The alternative hypothesis (H1): the effect is present
 The null hypothesis (H0): the effect is absent (= opposite of H1)
Example
Some researchers found that people ate less of a food if they had previously
imagined eating it. We want to replicate this study.
 H1: If one imagines eating chocolate, one will eat less of it than if one doesn’t
imagine
 H0: If one imagines eating chocolate, one will eat the same amount as normal

• Why do we need the null hypothesis H0?

 Statistically, we cannot prove the alternative hypothesis (H1)
 What we can do is collect evidence to reject the null hypothesis
Hypothesis testing
The logic
• Null hypothesis significance testing
= system designed to tell us whether the alternative hypothesis H1 is likely
to be true, which helps us to support or not our predictions
1. We assume that the null hypothesis (H0) is true
= there is no effect, or the effect that you may observe happens by chance
2. We fit a statistical model to our data – a test statistic – that represents the
alternative hypothesis (H1) and see how well it fits
→How much variation in scores does the test statistic explain versus how much
is due to chance
3. To determine how well the model fits the data, we calculate the probability of
getting that model if the null hypothesis H0 were true (called the p-value)
= the probability that the effect is due to chance
4. If the p-value is very small (0.05 or less), we can conclude that the model fits
the data well and gain confidence in the alternative hypothesis
→ H1 is likely to be true: there is a high probability that the effect we observe
does not happen by chance
Hypothesis testing
Statistical significance
• When to reject the null hypothesis H0?
Usual criterion: p-value of .05

 If the p-value (i.e., the probability that the variation in scores is due to chance)
is p > .05 : we cannot reject the null hypothesis H0 and therefore we don’t find
support for the alternative hypothesis H1
→ The test statistic is said to be non-significant
→ Conclusion: there is no statistically significant effect

 If the p-value (i.e., the probability that the variation in scores is due to chance)
is p < .05 : we can reject the null hypothesis H0 and therefore the alternative
hypothesis H1 is supported
→ The test statistic is said to be significant
→ Conclusion: there is a statistically significant effect
Hypothesis testing
Confidence intervals and statistical significance
Reminder
Confidence interval (CI)
= A range of scores constructed such that the population mean will fall
within this range in 95% (or 99%) of samples

Two overlapping
95% confidence
intervals

 Not statistically different

Two 95%
confidence
intervals that
don’t overlap

 Statistically different
Hypothesis testing
Confidence intervals and statistical significance
Left: 95% confidence intervals
that just touch end-to-end
represent a p-value ≈ .01

Right: If there is a gap between

the two 95% confidence
intervals, then p-value < .01

A p-value of .05 is represented by

moderate overlap
• Left: 95% CI of the same
length → moderate = a
quarter of the CI
• Right: 95% CI of different
lengths → interpretation
more difficult
EXPLORING DATA WITH GRAPHS
Introduction
The SPSS chart builder
Introduction
The SPSS chart builder
You can build a graph by
using the gallery of
predefined graphs

Or you can build a

graph on an element-
by-element basis
Introduction
When and why are graphs important?

At the data exploration stage

• It allows to summarize a (large) set of data in 1 graph
• It allows to visualize the distribution of the data (shape, general trends)
• It allows to screen the data for potential problems, such as outliers
Outlier = an observation/score very different from most others

Why is it important to identify outliers?

> Outliers bias statistics (e.g., the mean) and their standard
errors and confidence intervals

 Important to draw some graphs (histograms and boxplots) for all the
dependent variables (DVs) before performing statistical analyses

At the analysis stage

• It allows to visually represent the results of statistical analyses in a clear
manner (often used in reports) (bar charts)
Introduction
What makes a good graph?
• Show the data and their underlying message
• Present many numbers/information with minimum ink
• Encourage the reader to compare different pieces of data
• Free of “chartjunk”: patterns, 3-D effects, shadows, pictures…
• Do not distort the data / create false impressions / hide effects
HISTOGRAMS
Histograms
Introduction
• Histogram = frequency distribution = visual display of how many times
each score of a metric variable occurs

145cm 175cm 205cm

• The values of observations are plotted on the horizontal axis (x-axis), and the
frequency with which each value occurs in the data set is plotted on the
vertical axis (y-axis)
→ each bar represents a different score & the height of a bar is proportional to
the corresponding frequencies for that score
Histograms
Introduction
• Scores can also be arranged in ranges of scores
Example: length → ranges of 20 cm (e.g., 145-164; 165-184; 185-204)

• It allows to summarize a (large) set of data in 1 graph, to visualize the

distribution of the data (shape, general trends), and to screen the data for
potential problems (such as outliers)
Histograms
Different types

Simple histogram: used to visualize

the frequencies of each score for a
single variable

Population pyramid: used to

visualize the relative frequencies of
scores/differences in distributions
in 2 populations
Histograms
A simple histogram
Step 1: Open the Chart builder dialog box

Step 2: Double-click on the icon for a simple histogram

Histograms
A simple histogram
Step 3: Drag the variable « Success_Post » into the x-axis box
Histograms
A simple histogram
Step 4: View the resulting graph in the SPSS output window

However, we had 2 groups of people:

those who wished upon a star for their
wish to come true and those who worked
hard for their wish to come true
→ Useful to plot the histogram separately
for the 2 groups to compare the effect of
the manipulation

 Use a « population pyramid »

Histograms
A population pyramid
Step 1: Open the Chart builder dialog box
Step 2: Double click on the icon for a population pyramid
Step 3: Drag the variable that you want to plot (« Success_Post ») into the Distribution
variable box
Step 4: Drag the variable for which you want to plot different distributions (« Strategy »)
into the Split variable box
Histograms
A population pyramid
Step 5: View the resulting graph in the SPSS output window

 A population pyramid is
a good way to visualize
differences in distributions
in different groups (or
populations)

 Histograms &
population pyramids can
also help you spot unusual
cases (outliers)
Histograms
Element properties

Instead of expressing values in frequencies (default

option), you could select « Histogram Percent » to
display values as percentages
Histograms
Element properties: creating ranges

Set the value that

you want the ranges
to begin with

Decide on how the

ranges should be
created
BOXPLOTS
Boxplots

• A boxplot is a graphical representation of important characteristics of a

set of observations for a metric variable

• It allows to visualize the median, quartiles, interquartile range, extreme

scores…

• Boxplots are useful for identifying outliers

• Boxplots are also useful for comparing distributions

Boxplots
Different types

Simple boxplot: used to plot a

boxplot of a single variable but to 1-D boxplot: used to see a boxplot
display different boxplots for for a single variable
different categories in the data
Clustered boxplot: used when you
have a second categorical variable
on which to split the data (the
boxplots for this 2nd variable are
produced in different colors)
Boxplots File: Jiminy_Cricket.sav
Example
Based on the data from the previous example, let’s plot the information for people’s
success scores after they either wished upon a star or worked hard for their dreams to
come true (variable “Success_Post”).
Step 1: Open the Chart builder dialog box
Step 2: Double click on the icon for a simple boxplot
Step 3: Drag the variable that you want to plot (« Success_Post ») into the y-axis box
Step 4: Drag the categorical variable for which you want to draw different boxplots
(« Strategy ») into the x-axis box
Boxplots Label = row number from the SPSS data editor
≠ score
Interpretation  Useful for identifying these scores in data, check that they were
entered correctly, look for reasons why this score appears unusual

Unusual scores in SPSS

rule:
• Outliers = any score >
(upper quartile + 1.5*IQR)
• Extreme scores= any score
> (upper quartile + 3*IQR)

The same rule applies to

cases below the lower
quartile
GRAPHING MEANS
BAR CHARTS & ERROR BARS
Bar charts
Introduction
• Bar charts are used when you have different groups!
Simple bar chart:
• used when you want to
see the means of scores
for 1 metric variable
across different groups of
cases (i.e., according to 1
categorical variable)

Clustered bar chart:

• used when you want to
see the means of scores
for 1 metric variable
according to 2
categorical variables
• Used when you want to
see the means of scores
for 2 metric variables
according to 1
categorical variable

Same as simple or clustered bar chart but instead of bars, the mean is
represented by a dot and a line represents the 95% CI of the mean
HOWEVER, these error bars can be added to a bar chart
Bar charts
Important consideration
• The way bar charts are created depends largely on the way data were
collected  do the means come from independent or dependent
samples?
Independent samples: there is no connection between the members of the
different groups, they are different people
e.g., one group of people saw a picture of a big hairy tarantula and another group of
people played with a real, big hairy tarantula. In both groups, level of anxiety was
measured afterwards.
e.g., gender: male and female

Dependent samples: there is a connection between the members of the

different groups, they are the same people
e.g., people saw a picture of a big hairy tarantula in period 1 and then were asked to play
with a real, big hairy tarantula in period 2. Level of anxiety was measured after each task.
e.g., husband and wife
Bar charts for independent means
Simple bar chart
Step 1: Open the Chart builder dialog box
Step 2: Double click on the icon for a simple bar chart

Always the dependent variable, for

which you want to display the mean

Always the independent variable, by

which you want to split the data of the DV

Step 3: Drag the DV (« arousal ») into the y-axis box and drag the IV
(« Film ») into the x-axis box
Bar charts for independent means
Simple bar chart
Step 4: Ask to display the error bars
Bar charts for independent means
Clustered bar chart
Step 1: Open the Chart builder dialog box
Step 2: Double click on the icon for a clustered bar chart

Step 3: Drag the DV (« arousal ») into the y-axis box, drag the IV (« Film »)
into the x-axis box, and drag the second grouping variable (« Gender ») into
the Cluster on X box
Step 4: In « Element Properties », ask to display the error bars
Bar charts for dependent means
Simple bar chart
Step 1: Open the Chart builder dialog box
Step 2: Double click on the icon for a simple bar chart

This time, however, you do not have a grouping variable defined and have the data for the
DV (number of hiccups) split in 4 sets of data (baseline, tongue pulling, carotid artery
massage, feet massage) → ?!?
Bar charts for dependent means
Simple bar chart
Step 3: Drag all the variables that together form your DV into the y-axis box
You have to drag them simultaneously
→ Hold down the Ctrl key and click on the 4 variables « baseline », « tongue
pulling », « carotid artery massage », and « feet massage »

SUMMARY
= outcome variable (DV)

INDEX
= manipulation (IV)
Bar charts for dependent means
Simple bar chart
Step 4: Edit the properties of the graph
Bar charts for dependent means
Simple bar chart
EDITING GRAPHS
Editing graphs When you created a graph, double-click on
the graph in the SPSS Output Window

When clicking on the different parts of the graph: you can change the bar colors, the
background colors, etc.
Editing graphs
When clicking on the scale of the graph: you can change the minimum and maximum values displayed,
the decimal places, the text style, etc.

But remember to:

• Present information with
minimum ink
• Keep graph free of
“chartjunk”: patterns, 3-D
effects, shadows,
pictures…
• Avoid distorting the data /
create false impressions /
hide effects
Editing graphs
Options in the chart editor window
Exploring data with graphs
Summary

KGMC Alumni Association Directory 2011
100% (1)
KGMC Alumni Association Directory 2011
29 pages
Handbook-Riser-Design - Clamps PDF
67% (3)
Handbook-Riser-Design - Clamps PDF
46 pages
SIS ESD Sistems For Process Industries Using IEC 61508 Unit7 SIL Selection
100% (1)
SIS ESD Sistems For Process Industries Using IEC 61508 Unit7 SIL Selection
100 pages
In The Company of Giants Revised (5E)
100% (1)
In The Company of Giants Revised (5E)
16 pages
Fatigue Fracture Mechanics
0% (2)
Fatigue Fracture Mechanics
12 pages
Sixteen Saviours or One?, John Perry. 1879
100% (3)
Sixteen Saviours or One?, John Perry. 1879
160 pages
Identify Your Helpers of Destiny
90% (10)
Identify Your Helpers of Destiny
6 pages
Main Title: Planning Data Analysis Using Statistical Data
100% (1)
Main Title: Planning Data Analysis Using Statistical Data
40 pages
Statistics 101
100% (1)
Statistics 101
20 pages
Descriptive Statistics
No ratings yet
Descriptive Statistics
42 pages
Data Visualization Notes Ou
No ratings yet
Data Visualization Notes Ou
125 pages
IB372 FA10 Lab01 Intro Statistics Presentation
100% (1)
IB372 FA10 Lab01 Intro Statistics Presentation
75 pages
The Nature of Mathematics
100% (1)
The Nature of Mathematics
13 pages
Biostatistics Notes
100% (1)
Biostatistics Notes
8 pages
Exploring Data: The Beast of Bias
No ratings yet
Exploring Data: The Beast of Bias
21 pages
PG Descriptive and Inferential Statistic 2024
No ratings yet
PG Descriptive and Inferential Statistic 2024
51 pages
SPSS Notes
No ratings yet
SPSS Notes
8 pages
Chapter4 3 220704 101849
No ratings yet
Chapter4 3 220704 101849
19 pages
Hypothesis Testing
No ratings yet
Hypothesis Testing
18 pages
02 Intrro Continued
No ratings yet
02 Intrro Continued
34 pages
Data Analysis: Florenda F. Cabatit RN MA Facilitator
No ratings yet
Data Analysis: Florenda F. Cabatit RN MA Facilitator
44 pages
SPSS Workshop: Day 2 - Data Analysis
No ratings yet
SPSS Workshop: Day 2 - Data Analysis
32 pages
Bio Statistics
No ratings yet
Bio Statistics
97 pages
HT Interpretation
No ratings yet
HT Interpretation
28 pages
Descriptive Statistics and Inferential Statistics: Part 1
No ratings yet
Descriptive Statistics and Inferential Statistics: Part 1
65 pages
Basic Commands SPSS
No ratings yet
Basic Commands SPSS
25 pages
2-17-Descriptive Inferential Statistics - PT 1 - JA Edit
No ratings yet
2-17-Descriptive Inferential Statistics - PT 1 - JA Edit
49 pages
Statistics 1 (Final) / Orthodontic Courses by Indian Dental Academy
No ratings yet
Statistics 1 (Final) / Orthodontic Courses by Indian Dental Academy
15 pages
Data Analysis Training Workshop - Day 2 Presentation
No ratings yet
Data Analysis Training Workshop - Day 2 Presentation
52 pages
Bio Statistics (Presentation)
No ratings yet
Bio Statistics (Presentation)
46 pages
Statistics
No ratings yet
Statistics
28 pages
Psychological Stats Reviewer
No ratings yet
Psychological Stats Reviewer
11 pages
Theory
No ratings yet
Theory
7 pages
CG8 Data-Analysis
No ratings yet
CG8 Data-Analysis
63 pages
Inferential Statistics
No ratings yet
Inferential Statistics
48 pages
Week 4 Statistics Recap MAKING MEANING OF MEASUREMENTS & RAW TEST SCORES
No ratings yet
Week 4 Statistics Recap MAKING MEANING OF MEASUREMENTS & RAW TEST SCORES
39 pages
Basic Biostatistics
No ratings yet
Basic Biostatistics
31 pages
African Religion
No ratings yet
African Religion
5 pages
Psii
No ratings yet
Psii
8 pages
Statistics
No ratings yet
Statistics
33 pages
Unit II: Basic Data Analytic Methods
No ratings yet
Unit II: Basic Data Analytic Methods
38 pages
Chapter 5 Data Analysis Ab
No ratings yet
Chapter 5 Data Analysis Ab
56 pages
Inferential Statistics
No ratings yet
Inferential Statistics
35 pages
Lecture 4 - Data Science Statistics
No ratings yet
Lecture 4 - Data Science Statistics
21 pages
Statistical Techniques - Bda
No ratings yet
Statistical Techniques - Bda
33 pages
DV Unit 1&2 Notes
No ratings yet
DV Unit 1&2 Notes
50 pages
2statistical Analysis of Data 2
No ratings yet
2statistical Analysis of Data 2
43 pages
250 Lec 5 Fall 13
No ratings yet
250 Lec 5 Fall 13
42 pages
Lecture 7.descriptive and Inferential Statistics
No ratings yet
Lecture 7.descriptive and Inferential Statistics
44 pages
Unit 3
No ratings yet
Unit 3
54 pages
Presenting Data: Descriptive Statistics
No ratings yet
Presenting Data: Descriptive Statistics
21 pages
Lecture 5: Chapter 5 Statistical Analysis of Data Yes The "S" Word
No ratings yet
Lecture 5: Chapter 5 Statistical Analysis of Data Yes The "S" Word
42 pages
Lecture 2 - MAT361 (21 JAN 2025)
No ratings yet
Lecture 2 - MAT361 (21 JAN 2025)
40 pages
A Psalm of Life
0% (1)
A Psalm of Life
12 pages
Intro SPSS by Sherif Modified
No ratings yet
Intro SPSS by Sherif Modified
45 pages
Statistics SS2020
No ratings yet
Statistics SS2020
12 pages
JASP
No ratings yet
JASP
8 pages
Biostatistics Notes: Descriptive Statistics
No ratings yet
Biostatistics Notes: Descriptive Statistics
16 pages
Cebu - Day 1 (Descriptive Statistics Lecture) Part 1
No ratings yet
Cebu - Day 1 (Descriptive Statistics Lecture) Part 1
107 pages
Paired T-Test
No ratings yet
Paired T-Test
7 pages
Module I. Basic Calculations. Average, Standard Deviation by Excel
No ratings yet
Module I. Basic Calculations. Average, Standard Deviation by Excel
48 pages
IBA Chapter 3 Slides Final Accessible
No ratings yet
IBA Chapter 3 Slides Final Accessible
61 pages
Statistics - The Big Picture
No ratings yet
Statistics - The Big Picture
4 pages
Psych Stats Sum
No ratings yet
Psych Stats Sum
10 pages
Stats 201
No ratings yet
Stats 201
5 pages
Pityriasis Versicolor
No ratings yet
Pityriasis Versicolor
21 pages
Design and Performance Analysis of 8-Bit RISC Processor Using Xilinx & Microwind Tool
No ratings yet
Design and Performance Analysis of 8-Bit RISC Processor Using Xilinx & Microwind Tool
10 pages
Eng - Avionics PTC 2019
No ratings yet
Eng - Avionics PTC 2019
186 pages
CHAP7
No ratings yet
CHAP7
24 pages
PLAY - The Bean Game - Worksheet
No ratings yet
PLAY - The Bean Game - Worksheet
5 pages
6ES72141AG400XB0 Datasheet en
No ratings yet
6ES72141AG400XB0 Datasheet en
9 pages
The Basic Building Blocks
No ratings yet
The Basic Building Blocks
19 pages
Statistics: An Introduction and Overview
No ratings yet
Statistics: An Introduction and Overview
51 pages
VVCSL Seafarers Health Self Declaration With COVID 19 Vaccine and Testing and Temperature Control Form
No ratings yet
VVCSL Seafarers Health Self Declaration With COVID 19 Vaccine and Testing and Temperature Control Form
3 pages
Assigement 1 Pete Olmeca
No ratings yet
Assigement 1 Pete Olmeca
2 pages
032&58-CIR v. Marubeni Corp., December 18, 2001
No ratings yet
032&58-CIR v. Marubeni Corp., December 18, 2001
13 pages
The New India Assurance Co. LTD.: Certificate Cum Policy Schedule
No ratings yet
The New India Assurance Co. LTD.: Certificate Cum Policy Schedule
2 pages
A Cute Letter From A Muslim Girl To Her Christian Parents
No ratings yet
A Cute Letter From A Muslim Girl To Her Christian Parents
3 pages
416 MultiSkillFoundationCourse X
No ratings yet
416 MultiSkillFoundationCourse X
15 pages
Assignment
No ratings yet
Assignment
5 pages
Multisensor Installation Tool List - 4309978 - 01
No ratings yet
Multisensor Installation Tool List - 4309978 - 01
6 pages
Untitled
No ratings yet
Untitled
4 pages
Albania 2017 2D Seismic RFI - Final PDF
No ratings yet
Albania 2017 2D Seismic RFI - Final PDF
5 pages
A Celebration of Ego Death
No ratings yet
A Celebration of Ego Death
6 pages
Dimensional Analysis
No ratings yet
Dimensional Analysis
3 pages
Statistics For Dummies
From Everand
Statistics For Dummies
Deborah J. Rumsey
4/5 (28)
This is The Statistics Handbook your Professor Doesn't Want you to See. So Easy, it's Practically Cheating...
From Everand
This is The Statistics Handbook your Professor Doesn't Want you to See. So Easy, it's Practically Cheating...
S. Deviant
4.5/5 (6)
Econometrics: A Simple Introduction
From Everand
Econometrics: A Simple Introduction
K.H. Erickson
3.5/5 (5)
Chi Squared for Beginners
From Everand
Chi Squared for Beginners
Stephanie Glen
No ratings yet
Hypothesis Testing: Six Sigma Thinking, #6
From Everand
Hypothesis Testing: Six Sigma Thinking, #6
Sumeet Savant
No ratings yet

Session 2

Uploaded by

Session 2

Uploaded by

Advanced Data Analysis

• A research hypothesis states that an effect is present

• Hypothesis testing involves 2 types of hypotheses:

• Why do we need the null hypothesis H0?

 Not statistically different

Right: If there is a gap between

A p-value of .05 is represented by

Or you can build a

At the data exploration stage

Why is it important to identify outliers?

At the analysis stage

145cm 175cm 205cm

• It allows to summarize a (large) set of data in 1 graph, to visualize the

Simple histogram: used to visualize

Population pyramid: used to

Step 2: Double-click on the icon for a simple histogram

However, we had 2 groups of people:

 Use a « population pyramid »

Instead of expressing values in frequencies (default

Set the value that

Decide on how the

• A boxplot is a graphical representation of important characteristics of a

• It allows to visualize the median, quartiles, interquartile range, extreme

• Boxplots are useful for identifying outliers

• Boxplots are also useful for comparing distributions

Simple boxplot: used to plot a

Unusual scores in SPSS

The same rule applies to

Clustered bar chart:

Dependent samples: there is a connection between the members of the

Always the dependent variable, for

Always the independent variable, by

But remember to:

You might also like