0% found this document useful (0 votes)
12 views112 pages

3.5 Descriptive Statistics

The document discusses descriptive statistics, focusing on measures of shape, central tendency, and variability. It explains concepts like skewness, kurtosis, and modality, as well as the importance of central tendency measures such as mean, median, and mode. Additionally, it covers variability measures including range, variance, and standard deviation, highlighting their significance in understanding data distributions.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
12 views112 pages

3.5 Descriptive Statistics

The document discusses descriptive statistics, focusing on measures of shape, central tendency, and variability. It explains concepts like skewness, kurtosis, and modality, as well as the importance of central tendency measures such as mean, median, and mode. Additionally, it covers variability measures including range, variance, and standard deviation, highlighting their significance in understanding data distributions.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 112

Ep 3, Part 2:

Descriptive Statistics

Bianca Patricia Reyes, RPm


Measures of Shape
Learning Outcomes
❑ To know what descriptive indices frequency distributions can
provide us
❑ To know what information such indices convey
The Shape of Frequency Graphs

▪ As the number of individuals measured and as the


accuracy of the measurement increases,
frequency histograms and polygons begin to
appear fairly smooth.
Measures of Shape

1. Skewness
2. Kurtosis
3. Modality
SKEWNESS
▪ The piling of scores on one end and tapers of
gradually at the other end
▪ The disproportionate concentration of lower or higher
scores is indicated by:
- Tendency of scores to pile on one end
- Tendency of the tail to point at the opposite end

▪ Proportion of higher vs. lower scores


SKEWNESS
Types of Skewness
1. Positive Skew

- There is a greater proportion of lower scores


- The scores pile up on the left and the tail points
towards the right
Types of Skewness
2. Negative Skew

- There is a greater proportion of higher scores


- The scores pile up on the right and the tail points
towards the left
How to Check Skewness?
You can:
▪ Draw a line down the middle.
▪ Compare the positions of the mean, median, and mode.
How to Check Skewness?
You can:
▪ Draw a line down the middle.
▪ Compare the positions of the mean, median, and mode.
TRUE OR FALSE

The distribution of the scores of BulSU psych students in


love shyness is positively skewed, therefore, this means
that they scored very low in love shyness.

Correct: … therefore, this means that there are more psych


students in BulSU with lower than higher levels of love shyness.
KURTOSIS
▪ It is the volume of scores around the tails relative to
the shoulder.

shoulder

tail
KURTOSIS
▪ It tells us about the abundance or scarcity of outliers
(extreme scores)
▪ It is indicated by how heavy or how light the tails are.
▪ Remember:

Peakedness or Variability < Weight of the Tails

▪ Distributions with different kurtosis can have different


variability, whereas distributions with identical
variability can differ in kurtosis (Weunch, 2014).
Types of Kurtosis
1. Leptokurtic

- Tails are heavy, shoulders are light


- Has more extreme values than a normal distribution
Types of Kurtosis
2. Mesokurtic
- Tails and shoulders are neither too thick nor too thin
- Same volume of extreme values as a normal
distribution.
Types of Kurtosis
3. Platykurtic
- Tails are light/thin, whereas the shoulders are
thick/heavy
- Has fewer outliers than a normal distribution
To Evaluate Kurtosis:
• Overlay a standard normal distribution onto your data
set. Upon doing so, if your data set has:

More extreme scores in the tails = leptokurtic


Similar amount of outliers in the tails = mesokurtic
Fewer extreme scores in the tails = platykurtic

• If it appears ambiguous, complement the visual


assessment with formal kurtosis calculations.
MODALITY
▪ It refers to the number of frequently occurring scores
in a distribution.
▪ It is indicated by the number of distinct peaks.
▪ It informs us about the presence or absence of distinct
subgroups.
Modality

1. Unimodal – one frequently occurring score; one peak


in a single curve
2. Bimodal – two frequently occurring score; two peaks
in a single curve
3. Multimodal – many frequently occurring scores; many
peaks in a single curve
Remember:

▪ Measures of shape do not tell us how high or low


members of a group are on a construct.
▪ Measures of shape ONLY describes the inclinations of
the group but do not provide any information on the
qualitative levels of the construct.
▪ Again, when we describe distributions based on shape,
we should use comparative language (e.g., higher or
lower).
Interpreting Measures of
Shapes
For Example:

Skewness = Positively Skewed

Interpretation: There are more students who with lower than higher
levels of love shyness.
For Example:

Skewness = Positively Skewed


Kurtosis = Mesokurtic

Interpretation: A moderate number of students have more extreme


levels of love shyness relative to the group’s average
For Example:

Skewness = Positively Skewed


Kurtosis = Mesokurtic
Modality = Unimodal

Interpretation: There is only one popular level of love shyness


within the group. There are no subgroups with distinctly different
levels of love shyness among the students.
Learning Outcomes
❑ To know what descriptive indices frequency distributions can
provide us
❑ To know what information such indices convey
Measures of Central
Tendency
Learning Outcomes
❑ To know what exactly central tendency is
❑ To know why it is necessary to quantify it
❑ To know what information each central tendency provide
❑ To know when to use each measure of central tendency
Why We Need to Measure CT
In order for us to describe a group of individuals or a group
of scores effectively and concisely, we need a single value
that:

• Represents the entire group


• Provides a simple and efficient description of the group
• Describes an average member of the group
Central Tendency
▪ Is a statistical measure used to determine a single
score that defines the center of the distribution
▪ Goal: find the single score that is most typical or most
representative of the entire group.
▪ It attempts to identify the “average” or “typical”
individual.
▪ Aside from describing, it is also useful for making
comparisons between individuals or sets of data.
Central Tendency
▪ However, there is no single, standard procedure for
determining central tendency.
▪ For there is no single measure that produces a central,
representative value in every situation.
Central Tendency
Measures of Central Tendency

1. Mean
2. Median
3. Mode
MEAN
▪ Also called as “arithmetic mean/average”
▪ Most commonly used average and is the most convenient
and versatile measure of central tendency.
▪ Computed by: adding all the scores in the distribution and
dividing by the number of scores.
▪ Symbols:
population – μ (read as ‘mew’)
sample – M or x̄ (read as x-bar)
MEAN
▪ Formula

Example: 2, 5, 3, 4, 9, 10, 7, 1

∑𝑋
μ= 𝑁

= 2 + 5 + 3 + 4 + 9 + 10 + 7 + 1
= 5.125
MEAN
▪ The mean can also be defined as:
- The amount each individual receives when the
total (∑𝑋) is divided equally among all the
individuals (N) in the distribution.
- The balance point for or the center of gravity
(fulcrum) of the distribution.
MEAN
▪ Consider this:
1, 2, 6, 6, 10
μ=5
MEAN
▪ It considers every score in the distribution. That
is, every scores adds to the total (∑X) and every
score contributes to the number of scores (N or
n).
▪ Changing the value of any score will change the
mean.
▪ Adding or removing a score will also change the
mean (note: unless the score you added has the
same value as the mean)
MEAN
▪ It is sensitive to outliers or extreme values.
▪ The presence of outliers pulls the mean towards
the tail or towards an outlying value.
▪ Keep in mind: only used when the data are
measured on an interval or ratio scale.
MEAN
▪ Weighted Mean
- When it is necessary to get the over all mean of
two or more group, we can compute for the
weighted mean.

Formula:
∑𝑋𝑁
μT =
𝑁𝑇
MEDIAN
▪ Is the value in the exact midpoint (middle) of a
distribution.
▪ The point that divides a distribution into two
equal halves: 50% of the scores above it and
50% below it.
▪ Cannot be found in nominal scales.
MEDIAN
▪ Appropriate to use when the distribution is
skewed and the scores then to concentrate on
one side of the distribution.
▪ It is unaffected by extreme scores and
skewness.
MEDIAN
▪ Obtaining the median when n is an odd number:
1. Array the scores from lowest to highest.
2. The median is the score in the middle of the
distribution.
Example: 26, 9, 3, 2, 13, 16, 5

= 2, 3, 5, 9, 13, 16, 26
=9
MEDIAN
▪ Obtaining the median when n is an even
number:
1. Array the scores from lowest to highest.
2. Add the two middle scores and divide the sum
by 2.
Example: 1, 4, 8, 7, 1, 5

=1, 1, 4, 5, 7, 8
= 4 + 5 / 2 -> 9/2
= 4.5
MODE
▪ It is the score or category that is most common or
typical, that is, it has the greatest frequency.
▪ It is the only central tendency that is applicable to all
four types of measurement scales.
▪ The only measure of CT that can be used for nominal
data
MODE
▪ The only descriptive statistic that can take up more
than one value.
▪ Remember: The mode does not represent the most
number of scores or observations.
▪ It is simply the most common score in the group,
which may or may not represent most number of
individuals.
Measures of CT: Proper
Utility of Each Measure
There are 4 things that we must consider:

1. Shape of the distribution


2. Presence or absence of open-ended scores
3. Type of measurement scale used
4. Amoun of data considered.
MEAN MEDIAN MODE
Advantages • Widely used • Not influenced to • Any type of
• Usually preferred outliers scale may be
as it considers • Can be used in used
every score skewed • Capable of
distributions identifying any
• Not influenced distinct
by open-ended subgroup(s)
scores
Disadvantages • Sensitive to outliers • Does not • Considers
• Cannot be used for consider all only the most
open-ended scores scores frequent
• Cannot be used in • Not informative scores and
skewed when there are neglect the
distributions many tied scores others
MEAN MEDIAN MODE
Scales • Interval • Ordinal • Nominal
• Ratio • Interval • Ordinal
• Sometimes: Ordinal • Ratio • Interval
• Ratio

Use when • Distribution is • Distribution is • When there


normal/symmetrical skewed are distinct
• Scores are open- subgroups
ended
Interpreting Measures of
Central Tendency
Remember. . .
▪ In interpreting measures of central tendency, we have to
make sure that the readers understand that we are
referring to the entire group as whole and not to the
individual members.
▪ So, we use terms such as:
- “In general…”
- “Overall...”
- “Generally speaking…”
Sample Interpretation:
1. In general, the BSP students have moderate levels of
extraversion (M = 14.5).
2. Overall, the students in BS Psych have moderate extraversion
(Md = 15). Half of the students are extraverted on a low to
moderate levels (Min = 5 to Md = 15), while the other half is
extraverted on a moderate to very high levels (Md = 15 to Max =
30).
3. Among the BSP students, the most prevalent level of
extraversion are moderate and high (Mo = 28, 36).
[You have defeated the
boss! Central Tendency
Retsam Skill: obtained!]
Learning Outcomes
❑ To know what exactly central tendency is
❑ To know why it is necessary to quantify it
❑ To know what information each central tendency provide
❑ To know when to use each measure of central tendency
Measures of Variablity
Learning Outcomes
❑ To be acquainted with what variability is and why it is important
to quantify it
❑ To know the different measures of variability and how they are
computed
❑ To know what information each variability provide
❑ To know when to use each measure of variability
What is “Variability”?

▪ Variability provides a quantitative measure of the


differences between scores in a distribution and
describes the degree to which the scores are spread out
or clustered together.
▪ It represents the amount of error to expect when we use
the mean of a sample to represent the entire population.
What is “Variability”?
▪ A good measure of variability serves two purpose:

1. It describes the population.


2. It tells us how well an individual score (or group of
scores) represents the entire distribution.

Note: If the measure of variability is large, it means that the


mean would not be fair representation of the group, it is
less accurate.
What is “Variability”?
Note:

If the individuals we are measuring do not differ greatly


from each other in terms of the construct we are
measuring, the smaller the resulting variability.

Conversely, if they differ vastly, then the variability will be


large.
Measures of Variablity
1. Range
2. Variance
3. Standard Deviation
4. Quartile Deviation
RANGE
RANGE

▪ Refers to the distance covered by the scores, from


the smallest score to the largest score.
▪ In other words, it gives us the total extent or the
width of the distribution.
▪ Importance indices related to the range: minimum
& maximum.
RANGE

1. Most commonly used

range = Xmax – Xmin

• Works well for variable with precisely defined


upper and lower boundaries.
RANGE

2. Scores are measurements of a continuous


variable.

range = URL Xmax – LRL Xmin


RANGE

3. When the scores are whole numbers

range = Xmax – Xmin + 1


RANGE

Problem w/ the range:

It is determined by the two extreme values only. It


ignores the other scores in the distribution.
MEAN DEVIATION
MEAN DEVIATION
▪ Uses the mean as a reference point and measure
the average (mean) distance of each score from
the mean.
▪ In short, we get the average (get the mean) of the
deviation scores.
MEAN DEVIATION
Average of Deviation Distance in Mean Deviation
Scores: km Score
𝑿𝒊 − 𝝁
5 -1
∑(𝑿𝒊 − 𝝁) 6 0
= 7 1
𝑵 4 6 -2
8 2
6 0
∑ = 36 ∑=0

HOWEVER!!! Averaging the deviation scores always results in 0!! Ottoke?!


MEAN DEVIATION
Distance in Mean Deviation Absolute Dev.
km Score Score
𝑿𝒊 − 𝝁 /𝑿𝒊 − 𝝁/
Mean Deviation:
5 -1 1
6 0 0
∑/(𝑿𝒊 − 𝝁)/ 7 1 1
= 4 6 -2 2
𝑵 8 2 2
6 0 0
∑ = 36 ∑=0 ∑=6
MEAN DEVIATION
Distance in Mean Deviation Absolute Dev. Mean Deviation:
km Score Score
𝑿𝒊 − 𝝁 /𝑿𝒊 − 𝝁/
∑/(𝑿𝒊 − 𝝁)/
5 -1 1 =
6 0 0 𝑵
7 1 1
4
8
6 -2
2
2
2 = 6/6
6
∑ = 36
0
∑=0
0
∑=6
=1
MEAN DEVIATION

▪ HOWEVER!!! Absolute value symbols are considerably


challenging to manipulate algebraically (Gorard, 2004).
Such complexity makes the development of advanced
analytical methods more difficult.
VARIANCE
(Mean Squared Deviation or Mean Square)
VARIANCE (Mean Square)

▪ It is the average squared distance from the mean.


▪ It is the mean of the sum of squares.

Variance:
Sum of Squares!

∑(𝑿𝒊−𝝁)2
σ2 =
𝑵
VARIANCE (Mean Square)

Distance in Mean Deviation Squared


km Score Deviation
𝑿𝒊 − 𝝁 (𝑿𝒊 − 𝝁)2
Variance:
5 -1 1
6 0 0 ∑(𝑿𝒊−𝝁)2
7 1 1 σ2 = 𝑵
4 6 -2 4
8 2 4
6 0 0
= 10/6
∑ = 36 ∑=0 ∑ = 10 = 1.67
VARIANCE (Mean Square)
▪ HOWEVER!!! The variance is difficult to explain, because:

1. It is possible to produce a value that is bigger than the scores


in the data set (although it did not happen to our example, but it is
possible!!!)
2. The variance is expressed as the square of the unit of the
measured construct.
Variance:

∑(𝑿𝒊−𝝁)2
σ2 =
𝑵

= 10/6
= 1.67 km2
PROBLEMA SA VARIABILITY

jusko poh, jusko poh. hindi pa


po ba tapos problema ng
variability?!
Break: Ano ang favorite doughnut ng statisticians?

Edi…
STANDARD
DEVIATION
STANDARD DEVIATION

▪ To address the problem of variance, we bring the variance back


to its original form by taking its square root

Standard Deviation:

∑(𝑿𝒊−𝝁)2
σ=
𝑵
STANDARD DEVIATION

Distance in Mean Deviation Squared


km Score Deviation
𝑿𝒊 − 𝝁 (𝑿𝒊 − 𝝁)2
Variance:
5 -1 1
6 0 0
∑(𝑿𝒊−𝝁)2
7 1 1 σ=
4 6 -2 4 𝑵
8 2 4
6 0 0
∑ = 36 ∑=0 ∑ = 10
= 10/6
= 1.67
= 1.29
STANDARD DEVIATION

▪ It refers to the measure of distance/dispersion around the


mean.
▪ It provides us a measure of the standard or the typical distance
from the mean.
▪ It is the most important measure of variability.

Why???
STANDARD DEVIATION

▪ Because…

1. It tells us how near or far individuals are from the mean.


2. It is easier to understand because it is expressed in the same
unit as raw scores (not squared).
3. It is used in partnership with the mean and with the normal
curve to determine how many individuals differ from the mean
in the population and by how much.
STANDARD DEVIATION
STANDARD DEVIATION
68% of dist. = 𝝁 ± 1σ
95% of dist. = 𝝁 ± 2σ
99% of dist. = 𝝁 ± 2.5σ
For example:
The middle 68% of the BSP class of 2028, with
M = 24.5; SD = 5.8 scores around 5.8 points away from the mean
(M = 24.5; SD = 5.8), have different levels of
conscientiousness, which ranges from
68% of dist. = 𝝁 ± 1σ
moderate (18.7) to very high (30.3) . This
68% of dist. = 24.5 ± 5.8 shows that the BSP Class 2028 differ from
68% of dist. = 18.7 to 30.3 each other when it comes to their
conscientiousness.
VARIANCE & STANDARD DEVIATION

Variance: Standard Deviation:

∑(𝑿𝒊−𝝁)2
σ2 = σ=
∑(𝑿𝒊−𝝁)2
𝑵
𝑵
VARIANCE & STANDARD DEVIATION for SAMPLES

Variance: Standard Deviation:

∑(𝑿𝒊−𝒙̄)2
s2 = s=
∑(𝑿𝒊−𝒙̄)2
𝒏−𝟏
𝒏−𝟏

“Ma’am, bakit po naging n-1 imbes na N lang? Bakit kailangan i-adjust?”


VARIANCE & STANDARD DEVIATION for SAMPLES

Variance: Standard Deviation:

∑(𝑿𝒊−𝝁)2
s2 = s=
∑(𝑿𝒊−𝝁)2
𝒏−𝟏
𝒏−𝟏

Answer:
➢ Such adjustment is necessary to correct for the bias in sample
variability.
➢ Dividing by a smaller number (n – 1) produces a larger results and
makes the sample variance closer to the population variance.
QUARTILE DEVIATION
QUARTILE DEVIATION
▪ Used when the distribution is skewed or have open-ended
scores
▪ Measures the dispersion only within the middle 50% of the
distribution and ignores the two ends
▪ Thus, the quartile deviation (Q) measures the distance from
the median to the boundaries that define the middle 50% of
the distribution.
𝐐𝟑 −𝐐𝟏
Q=
𝟐
QUARTILE DEVIATION

50%

Q1 Md Q3
Measures of Variability:
Proper Utility of Each
Measure
SD QD RANGE
Advantages • Provides a • Not influenced to • Easily
relatively accurate outliers understood
description of • Can be used in • Obtained
symmetrical skewed easily
distributions distributions
• Not influenced
by open-ended
scores
Disadvantages • Inaccurate for • Only considers • Easily
skewed the middle 50% influenced by
distributions and disregards extreme
the rest scores
SD QD RANGE
Scales • Interval • Ordinal • Nominal
• Ratio • Interval (special
• Ratio cases)
• Ordinal
• Interval
• Ratio

Use when • Distribution is • Distribution is • When a rough


normal/symmetrical skewed estimate of
• Scores are open- variability is
ended enough/accep
table
Going back: Pets Got Talent in Psych 203
Which animal will you train for your project?

DOGS
Mean SQ = 96 CATS
SQ Range = 85-100 Mean SQ = 110
SD = 1.2 SQ Range = 20-130
SD = 24.3
68% of dist. = 𝝁 ± 1σ
95% of dist. = 𝝁 ± 2σ
99% of dist. = 𝝁 ± 2.5σ

68% of dogs = 𝝁 ± 1σ 68% of cats = 𝝁 ± 1σ


= 96 ± 1.2 = 110 ± 24.3
= 94.8 to 97.2 = 85.7 to 134.3

Which pets will you choose?


Measures of Variability:
Jamovi
Writing the Results of
Descriptive Statistics
Commonly Reported Descriptive Statistics

▪ Mean
▪ Standard Deviation
▪ Verbal Interpretation
Sample 1

▪ A certain team of experimental Psychologist wanted to


identify which type of competition (group/individual)
catalyze better performance in accomplishing logic
quizzes.
▪ They decided that the population that will be asked to
participate are 3rd year philosophy majors studying on
a College located at Pangasinan. They requested the
administrator of the College to lend them 20 students
on each of the four sections of the said program.
Steps in Writing the Results

1. Identify the details of the results.

Type of N Mean Logic SD Verbal


Competition Score Interpretation
Group 40 23.45 3.2 Passed
Individual 40 37.02 2.9 High Passed
Steps in Writing the Results

1. Identify the details of the results.


40

MEAN LOGIC SCORES


35 37.02
30

25

20 23.45

15

10

0
Group Individual

TYPE OF COMPETITION
Steps in Writing the Results

2. Identify your initial sentence to start your written


presentation of results.

Sample introduction stem:


a. Based on the results of the descriptive statistics….
b. After subjecting the data to descriptive analysis, the
results show that…
c. Results of the descriptive analysis on the data gathered
shows that..
Steps in Writing the Results
3. Finalize the write-up by presenting the details in
connection with the initial sentence.

Sample Output
Based on the results of the descriptive statistics implemented,
the philosophy students that is subjected to individual
competition generally had higher scores in the logic quiz
administered. They have a mean score of 37.02 with a standard
deviation of 2.9. This average score can be verbally interpreted
as “High Passed”. On the other hand, those students subjected
to group competition only garnered an average score of 23.45
(SD = 3.2) which has a verbal interpretation of “Passed”
Another Sample

A personality psychologist is wondering if the number of


selfies posted on SNS per month (FB, Twitter, Instagram)
by a male young adult is related to his level of
heterosexuality. He surveyed various male University
students within the U-belt area to prove his hypothesis. By
the end of the data gathering a total of 356 respondents
were garnered.
Another Sample

Variables N Mean SD Verbal


Interpretation
Selfie Posting 356 2.41 2.67 Low
3.97 .36 High
Heterosexuality
Another Sample
Sample Output:
Based on the descriptive statistics implemented on the data
gathered among 356 male University students, the average
frequency of selfie posting on social networking sites within
a month is only 2.41 (SD = 2.67). This average is verbally
interpreted as “Low”. This indicates that selfie posting among
Male U-Belt students is not a common practice.

In terms of the heterosexuality, the data conveys that they have


a mean level of 3.97 (SD=.36). This degree of sexual inclination
is considerably “High”. This indicates that the respondents are
commonly males with sexual desires for the opposite gender.
Learning Outcomes
❑ To be acquainted with what variability is and why it is important
to quantify it
❑ To know the different measures of variability and how they are
computed
❑ To know what information each variability provide
❑ To know when to use each measure of variability
End of Ep 3 (Part II)

[Skill: Stat Savvy Lvl +1]


[ You have entered an S gate
dungeon! Check the Practice
Sets on your GClassroom.
Prepare yourself to slay the
final boss…es to get out! ]

You might also like