Data Analysis and Statistical Treatment
Data Analysis and Statistical Treatment
Statistical Treatment
Objectives
›Utilize appropriate statistical tools in
inferential statistics
›Interpret data/statistical results
›Test hypothesis
›Draw conclusion
Data
Data- is a collection of facts such as
numbers, words, measurements,
observations, or just descriptions of
things.
Data
Qualitative data – describes qualities
or characteristics collected through
questionnaires, interviews, or
observation.
Quantitative data – is the data that
can be counted or measured in
numerical values.
Qualitative data
Nominal data - a type of data that is
used to label variables without providing
any quantitative value.
e.g.
Names – Alexa, Michael
Colors – Orange, Yellow
Texture – Rough, Smooth
Odor – Pleasant, Unpleasant
Qualitative data
Ordinal Data - can be classified into
categories that are ranked in a natural
order.
-Ranking – 1st, 2nd, 3rd
-Socioeconomic status – poor, middle
class, rich
-Likert scales – extremely agree to
extremely disagree
Quantitative data
Interval data - is defined as a data
type that is measured along a scale,
in which each point is placed at equal
distance from one another.
Temperature (Fahrenheit and Celsius)
pH measure
Quantitative data
Ratio data - a form of quantitative
(numeric) data. It measures variables
on a continuous scale, with an equal
distance between adjacent values. A
distinguishing property of ratio data is
that it has a 'true zero’.
Length, Mass, Height, temperature in
Kelvin scale
Exercise no. 1
(Identify what type of data)
1. 10 seconds
2. Citrus x microcarpa
3. 1.5 microgram
4. Female, Male
5. 20 °C
6. Teacher I, Teacher II, Teacher III
7. 35 K
8. pH 5.5
9. 9.8 m/s2
10. Exam score
Statistics
- the science concerned with developing and
studying methods for collecting, analyzing,
interpreting, and presenting empirical data.
Two types of Statistics
1. Descriptive Statistics
2. Inferential Statistics
Descriptive Statistics
-statistics that summarize or describe features
of a data set, such as its central tendency or
dispersion.
-Descriptive statistics are broken down into
measures of central tendency and measures of
variability (spread), measures of frequency
distribution.
Purpose of Descriptive Statistics
-The main purpose of descriptive statistics is
to provide information about a data set.
- Descriptive statistics summarize the large
amount of data into several useful bits of
information.
Can Descriptive Statistics be used to
make predictions or inference?
-No. While these descriptives help understand
data attributes, inferential statistical
techniques—a separate branch of statistics—
are required to understand how variables
interact with one another in a data set.
Descriptive Statistics
-Measures of Central Tendency – describe the
center of the data (Mean, Median, Mode)
-Measure of Variability – describe the
dispersion of the data set (variance, standard
deviation, range)
-Measure of frequency distribution – describe
the occurrence of data within the data set.
Measures of Central Tendency
-Measures of central tendency describe the
center position of a distribution for a data set. A
person analyzes the frequency of each data
point in the distribution and describes it using
the mean, median, or mode, which measures
the most common patterns of the analyzed
data set.
Measures of Central Tendency
-Mean (Average) - is nothing but the average.
It is computed by adding all the values in the
data set divided by the number of observations
in it.
Advantages of mean
› The mean uses every value in the data and hence is a
good representative of the data. The irony in this is
that most of the times this value never appears in the
raw data.
Grade 7 817 93
Grade 8 816 93
Grade 9 737 84
Grade 10 735 84
ASSUMPTIONS YES NO
Inferential Data Analysis – Test of difference
PARAMETRIC TESTS NON-PARAMETRIC TESTS
Level of Significance
Test
0.01 0.05
One-tailed ±𝟐. 𝟑𝟑 ±𝟏. 𝟔𝟒𝟓
Two-tailed ±𝟐. 𝟓𝟕𝟔 ±𝟏. 𝟗𝟔
ASSUMPTIONS
3. Homogeneity of variance - The variances of
the dependent variable should be equal in each
group. This can be tested using Levene's Test of
Equality of Variances.
If Levene's Test is statistically significant,
indicating that the group variances are unequal we
can correct for this violation by using an adjusted
t-statistic based on the Welch method.
Reporting the results
An independent t-test showed
that females lost significantly
more weight over 10 weeks
dieting than males t(85)=6.16,
p<0.001.
PAIRED SAMPLES T-TEST
comparing two related groups
›The parametric paired samples t-test
(also known as the dependent
samples t-test or repeated measures
t-test).
›It compares the means between two
related groups on the same
continuous dependent variable.
PAIRED SAMPLES T-TEST
comparing two related groups
ഥ𝟏 = 𝒙
𝑯𝟎 : 𝒙 ഥ𝟐
ASSUMPTIONS of the Parametric
Paired-Samples T-test
➢The dependent variable should be measured
on a continuous scale.
➢Independent variable should consist of 2
categorical related/matched groups, i.e.
each participant is matched in each groups.
➢The differences between the matched pairs
should be approximately normally
distributed.
➢There should be no significant outliers.
PAIRED SAMPLES T-TEST
comparing two related groups
›Open 06 – Dieting A.csv
›This contains two columns of paired
data, pre-diet body mass and post 4
weeks of dieting.
›Go to T-tests → Paired samples t-
test → load the variables to analysis
box on the right.
Check the following:
Reporting the results
On average, participants lost
3.78 kg (SE: 0.29 kg) body mass
following a 4-week diet plan. A
paired samples t-test showed this
decrease to be significant
(t(77)=13.04, p<0.001).
One way - ANOVA (Analysis of Variance)
difference between three or more groups
›One-way analysis of variance
(ANOVA) compares the means of
three or more groups.
›ANOVA has been described as an
“omnibus test” that results in an F-
statistic which compares whether
there is a significant difference
between and within the groups.
One-way ANOVA (Analysis of Variance)
comparing three or more independent groups
ASSUMPTIONS
1. Independent variable must be categorical
and dependent variable must be
continuous.
2. The groups should be independent of each
other.
3. The dependent variable should be
approximately normal.
One-way ANOVA (Analysis of Variance)
comparing three or more independent groups
ASSUMPTIONS
4. There should be no significant outliers.
5. There should be homogeneity of variance
between the groups otherwise the p-value for
the F-statistics may not be reliable.
Assumption checks:
Levene’s test - measures equality of variance
(homoscedasticity)
p < 0.05 - unequal variances (Use Welch
corrections / Brown-Forsythe
correction)
p ≥ 0.05 - equal variances (Use None)
One-way ANOVA (Analysis of Variance)
comparing three or more independent groups
Assumption
checks:
Q–Q Plot -
measures
normality
distribution of
data.
One-way ANOVA (Analysis of Variance)
comparing three or more independent groups
Hypothesis testing:
p < 0.05 - there is a significant difference
between means of the groups. (Reject the H0
and proceed to post hoc testing to identify
which group is significantly different.)
p ≥ 𝟎. 𝟎𝟓 - there is no significant
difference between the means of the groups.
(Accept the null and stop the analysis and
report the findings)
One-way ANOVA (Analysis of Variance)
comparing three or more independent groups