0% found this document useful (0 votes)
8 views41 pages

Descriptive Statistics W25

The document provides an overview of descriptive statistics, covering measures of central tendency, variability, and how to interpret data through various statistical methods. It explains concepts such as mean, median, mode, standard deviation, and the creation of box and whisker plots. Additionally, it discusses the impact of outliers on statistical measures and includes practice problems for better understanding.

Uploaded by

tx4n775hkx
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
8 views41 pages

Descriptive Statistics W25

The document provides an overview of descriptive statistics, covering measures of central tendency, variability, and how to interpret data through various statistical methods. It explains concepts such as mean, median, mode, standard deviation, and the creation of box and whisker plots. Additionally, it discusses the impact of outliers on statistical measures and includes practice problems for better understanding.

Uploaded by

tx4n775hkx
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 41

Data Analysis

Descriptive Statistics


Lesson Plan

 What are descriptive statistics and how are they used

 Understanding the different measures of central tendency

 Understanding variability and how to use them

 Discussing and using Coefficient of Variation

 Measures of Position

 Understanding how to read and create Box and Whisker Plot


Flowchart of Descriptive Statistics
Descriptive Statistics

Depending on formula we use will determine if it about a population of a sample

 Summarizes and provides features of a dataset of


either an entire population or of a sample:
▪ Characteristics of that dataset

Descriptive statistics

Measures of Central Variability


tendency How spred out is your data from the average

Typical value with in you data set


Measures of Central Tendency

Indication of a typical value in a dataset


Mode Median Mean Heavly effect by
extream values

 Most frequent  Middle value  Average of


number in the in dataset dataset
dataset
Measures of Central Tendency

Advantages
 Mode: - Not effect by extremes values
(outliers)
- Can be used on any level of

 Bimodal
measurment Disadvantages
Two different modes, the graph has tow equal bumps - It may not represent the centre of
the data set

 Multimodal Three or more modes

 No mode

What are the


advantages and
disadvantages?
Measures of Central Tendency

 Steps to finding the median Advantages


- Resitant to extream values
- Better measure of central
tendcy for skewed data
1. Sort data from lowest to Disadvantages
- Cannot be used with all

highest value level of meseaurments


(Catogorical)

2. Find the middle number (in an


odd dataset) that’s your
median!

3. If the number is even take the


average of the two middle
numbers that’s your
median!
Measures of Central Tendency

Advantages
 Steps in Calculating the Mean - It looks at sum of of all x values (more repersentive)
- Use for continous and discrete data
Disadvantages
- Effected my the extremes (outliers)

1. Compute ∑X

2. Divide ∑X by the total number of data values

Symbol for Symbol


population for
Symbol for Sample Mean Symbol
mean
Population Mean

 
for populati
sample mean on size
x sample x
x= size
=
n N
Measures of Central Tendency

 What is the mean of:

1 2 3 4 100
QUESTION: Does the mean represent the dataset?
22

What would 100 be in this dataset?


An outlier

What would be a better central tendency


Median, becausee they are not effect by outliers and their is no mode
The influence of outliers on the mean

 Outliers are usually the result of


data entry/experimental error,
sampling problems or natural
variation.
 Big impact on the calculation of
the mean.

 A statistic is resistant to extreme


values if it is not affected much
by extreme observations
(outliers). Mode & Meadian
Resistant Measures of Central
Tendency

 A measure NOT affected by extreme values in the data


set.

 Examples of resistant measures

Median

Trimmed Mean**
Resistant Measures: Trimmed
Mean

Truncated mean

 Trimmed Mean steps:

1. Put data in order from smallest to biggest

2. Remove k% off the top and bottom

3. Calculate the mean with the remaining dataset


Practice!

 Calculate a 5% trimmed mean for the following sample


14, 20, 20, 21, 22, 23, 23, 24, 25, , 27, 30, , 30, 31, 32, 33,, 35, 35, 40, 43, 44, 70
Add all the number to get = 558, then divide by
5% x n = 1.05 rounds to 1 take 1 value on each side the new n which 19 to get a trimmed mean of
29.37

14 24 20 20 25 23 30
43 40 70 30 22 35 23
21 32 35 33 31 27 44
Weighted Mean

 To assign more importance to certain numbers


w is the weight

 (w • x)
x =
w
Where x is a data value
W is the weight assigned to the value
Practice!

 Suppose you had a midterm score of 83% and your


final exam score is 95%. Your midterm was worth 40%
and your final exam is worth 60%. Calculate your final
grade.
(83%x40%) + (95%x60%)
----------------------------------- = 9020/100 =90.2%
60% + 40%
Central tendency and types of
measures

 Mode – can be used with all levels of data: nominal,


ordinal, interval and ratio levels.

 Median – may be used with ordinal, interval, or ratio


levels.

 Mean – may be used with interval or ratio levels.


Mean, median, mode and
skewness

Mode < then mean & meadian

Mode > then Mean & Meadian


Variance in Data


Practice!

 Set 1 Set 2

 -10, 0, 10, 20, 30


8, 9, 10, 11, 12
Mean of 10 Mean of 10
x bar = 10 x bar = 10

For each of these datasets calculate the mean.

Does the mean tell us everything we need to


know about our dataset?
Measures of Variation

Allows you to see how close your dataset is to the measures of


Central Tendency
Range Variance  Standard The one we
uses the most

• Measure of • Dispersion of Deviation


dispersion data around the
It allows us to varriation based on the
position of our data • Dispersion of
Effected by outliers mean
data from the
Range = • Standard
(maximum value) – mean
deviation2
(minimum value)
The square root of the Variance
Range: Example

Average weight of a carton of blueberries in ounces

Range of 10
Mode 22

Range of 10
Mode 27

Second one has more mariance


Standard Deviation and
Variance

 Sample Variance  Sample Standard Deviation

𝑛 2
2 𝑖=1(𝑥𝑖 −𝑥) 𝑆= 𝑆 2
𝑆 =
𝑛−1
n-1 is only for sample deviation
-Helpls accpunt for biasis
Standered deviation will
always be postive that is why
we sqaure
Standard Deviation and
Variance

Population Variance Population Standard Deviation


N

 (x − ) 2
= 
Population mean

2
i

 =
2 i =1
N Population size
Computation Formula (sample
statistics)

 Sample Variance  Sample Standard Deviation


Do NOT For get order of operation

BEDMAS
Calculating Variance and
Standard Deviation
1. Find the mean

2. Calculate the Deviation Scores

3. Square Deviation Scores

4. Add all Deviation Scores

5. Divide by n-1 (sample) or by N (population).  This is


the Variance!
6. Take the square root of the variance to find the
Standard Deviation.
Practice!

 Compute the S2 and the S

 2 3 3 8 10 10
Application of standard
deviation

• Standard deviation are positive numbers


Yes, because the means are effected by the outliers and because we include every singal value in you ste

• Units of standard deviation are the same as original


data value
Practice!

 Compute the standard deviation

 2 3 3 8 10 20
Comparing Standard Deviation

 Coefficient of variation (CV)

• Used to compare two datasets with different scales

For Samples For Population

s 
CV = 100 CV = 100
x 
Practice!

 Below are the mean height and weights for a sample


Marianopolis first year students registered in gym.
Compare the variation.
Heights Weights

Mean 68.34 in. 172.55 lb.


15.26%
4.42%

Standard deviation 3.02 in. 26.33 lb.


Measures of Position


Remember!

 Why is understanding variability in data


important?
It helps us see how close are data is. How dispuresed it is

 What is a shortcoming associated with mean and


standard deviation
We don’t know the shape of our stander deviation

Mean and standard deviation are


heavily affected by outliers!
Practice!
Range = 105-30= 75
Median: (42+55)/2= 48.5 30,35, 35,40,42,55,58,65,75,105
- This the midle point of your dat set
- Right side is more spread out Q1 is 35
Q3 s 65

 Below is a sample of 10 salaries (in thousands


of dollars) of recent graduates from
Concordia’s John Molson School of Business

42 30 35 105 65

40 35 58 55 75
Percentiles

Distribution of a value such that P% of that data falls at or below it


and (100-P)% of the data fall at or above it.

Quartiles:
 Types of percentile
 Divides data into fourths
Computing Quartiles

1. Oder the data from smallest to largest

2. Find the median (this is Q2)

3. Find the the first quartile Q1 by finding the median of the


data falling below the Q2 position (and not including Q2).

4. Find the third quartile Q3 by finding the median of the


data falling above the Q2 position (and not including Q2)
Interquartile Range

The Interquartile range (IQR) is obtained by


subtracting the first quartile from the third.
Five Number Summary

Five number summary includes:


 the minimum value
 the first quartile Q1
 the median (or second quartile Q2)
 the third quartile, Q3
 the maximum value
Box and Whisker (Boxplot)
Diagram
Steps
• Draw a scale to include lowest and highest value
A visual representation of 5 number summery
• Draw the box (Q1 and Q3)
• Draw a solid line for the median
• Draw the whiskers
How to find ouliers (fences)
What are fences Q3 + (1.5 x IQR)
Q1 + (1.5 x IQR)
Interpreting the Boxplot
Practice!

 Below is a sample of 10 salaries (in thousands


of dollars) of recent graduates from
Concordia’s John Molson School of Business

42 30 35 105 65

40 35 58 55 75

1. Determine the IQR


2. Calculate the five number summary & draw a Box &
Whisker diagram

You might also like