0% found this document useful (0 votes)
20 views68 pages

Unit 2 Notes

Here are the key points about describing spread in a distribution: - Range is the difference between the maximum and minimum values. It provides the span of the data but does not account for how the data is distributed within that span. - Standard deviation is a measure of how far the values are spread out from the mean. It provides information about the variability or dispersion of the data. A low standard deviation indicates data points are close to the mean, while a high standard deviation indicates values are spread out over a wider range. - Interquartile range (IQR) is the difference between the first (Q1) and third (Q3) quartiles. It describes the spread of the middle 50% of the data by
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
20 views68 pages

Unit 2 Notes

Here are the key points about describing spread in a distribution: - Range is the difference between the maximum and minimum values. It provides the span of the data but does not account for how the data is distributed within that span. - Standard deviation is a measure of how far the values are spread out from the mean. It provides information about the variability or dispersion of the data. A low standard deviation indicates data points are close to the mean, while a high standard deviation indicates values are spread out over a wider range. - Interquartile range (IQR) is the difference between the first (Q1) and third (Q3) quartiles. It describes the spread of the middle 50% of the data by
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 68

2.

1 Frequency Distributions Graphs


FrequencyDistribution A tablethatshowsclass intervals of data entries
w counts of each entry
Count Lowe Class limit least
growoften
that belongs to the class
4min for the
all the starting Isle
same upper class limit greatest
width same belongs to the class
amount of ymzx ending valve for the
s class

tow to create
Decide how many classes do you want
you pick 5 20
Find the class width
distance between lowelorupper limits
of Consecutive classes
find range largest min
Range of classes
round up to next Convienent whole

ex test grades I
s class tally fear Relative Edit
601373 11 2 45 2
60,70 75,8597 88 11
74 2 25
Éeau
range 97 60 37 89102 1 1 45
width 37 3 12.3 12 1,31 p
Find class limit
Find min value lower limit
add width to lower limit
Tally the data entry t count tallies Frequency
Other features
Midpoints middle of class
lower class limit upperclass limit 42
2
Relative Frequency percent proportion of data that
falls in the entry
classifiers
BAIKIE size
E
percentage
Cumulative to sum up sum of the frequency andeveryfrequency
before
Fequ Histogram To
THEYET
Is TOUCH unless theres a gap in data
3
xp

Quantitative
ONLY
8 starts stop class bounds
Relitie Frequ G 0.5 from lowerlimit
Histogram y 0.5 to upper limit

É
Quantitative
2.2 moregraphs t displays
PieChart circledividedintosectors that representcategories the area of
eachsector is proportional tothe frequency of each category
1001
25 a
50 proportional

25 y
ex
r f In
So 7
class width 51
Frequency Histogram: Ingles 35 4
Construct a frequency distribution and a frequency histogram for the data set using the
indicated number of classes

fY
7 41
2 46
47 51 I
52 56 0
tell
Displaying qualitative data:

p
tables

numerical if the
is a table
zip codes
student ID

Jersey
Pie Chart:
● A circle that is divided into sectors that represent categories

● The area of each sector is proportional to the frequency of each category

1004

50
He
Pie Chart: our example
The data represent the results of an online survey that asked adults how they will invest their
money.
Invest more in stocks 50 Sdi Total 100
Hold on to more cash 25 251 relativefreqEmpie
Invest more in bonds 15 ist
Invest the same as last year 10 10 I
1st
101
so't
251
Pareto Chart:
● Vertical bar graph

● Height of each bar represents frequency or relative frequency

● The bars are positioned in order of decreasing height, with the tallest bar
positioned at the left.

4
mad
Quantitative data:

numerical measurement counts

ex histogram
Stem and Leaf: 19,99 leaf
● Each number is separated into a stem and leaf stem
T
first digit lo's
place I 0 I
● As many leaves as entries in data set 2,2 2
0 3 5

● Leaves are single digits

Hey 1 2 12
Stem and Leaf: our example
Use a stem-and-leaf plot to display the data. The data represent the ages of the top
15 highest-paid CEOs
53 72 55 67 59 57 55 59 61 60 59 56 63 58 58

2
3
4
43
gs a a a aaa
6 0 1 3,7
7 2
Dot plot:
● Each data entry is plotted, using a point, above a horizontal axis

tf
Dot plot: our example
Use a dot plot to display the data. The data represent the life spans (in days) of 30
houseflies.
9 9 4 11 10 5 13 9 7 11 6 8 14 10 6
10 10 7 14 11 7 8 6 13 10 14 14 8 13 10

Iii
Scatter plot:
● Ordered pairs are graphed as points in a coordinate plane

● Used to show the relationship between two quantitative variables

gray Ei

Time hrs
Describing a distribution: center
Mean:
O
average

population mean Md Fe is pop size

man
sample
m
EI is sample size

Median:

middle value when data is in order

1 2,13 4
Describing a distribution:
Mode: mostoccurring

1,2 3,3 4
Saff

Outlier: data
entry far removed
1 2,3 4 4
Welcome!
Agenda: Due dates/ upcoming:
● 2.3: measures of Central ● You’ll need your calculators
Tendency starting next class
● How many pairs of shoes do you
own?
● Work time
Describing distributions - SCUFS
● Shape modesshew qual prechart
● Center meanmedian panetochant

quant dotplot
● Unusual features omens gaps clusters
● Spread range Standard deviation Scatterplot
far
Stem'sheat
context
histogram
Describing distributions - Shape
● Modes peaks mounds howmany 3 onmove multimodal

unimodal bimodal

TM
M
Describing distributions - Shape

● Skew direction of tail

Symetric
Describing distributions - Shape

● Skew

negative

I
Describing distributions - Shape

● Skew

positive

I
h
Shape

● Skew
Describing distributions - center
Mean average
popmean M
Samplemean I
EI

Median8Middlevalue whendata is in order


The mean is affected by outliers that do not influence the mean.
● Distribution of data is skewed to the left, the mean is often less than the
median
● Distribution is skewed to the right, the mean is often greater than the median

Skewed median mean


symetric
Unusual features

Gaps, Outliers, Clusters

grouped space
clyster
Mfmovea
far
Fig
Spread (variability)

Range:

Me
max min

going
p
Describing distributions - SCUFS
● Shape
Shew modes

● Center
meanymedian

● Unusual features
outliers gapcluster

● Spread I QR St dev
range
Describing distributions - spread
The deviation of an entry x in a population data set is the difference between the
entry and the mean of the data set

deviation X Mmean
a
value

Square deviations x my
Eft
Describing distributions - spread
Variance: distance between values's mean insquare units

Population variance:
O
EITI
Sample variance:
s
ELMI
Describing distributions - spread
Standard deviations: distance between values mean

Population st. dev: O EMIT


Sample st. dev:
SEEIN
Spread: Standard deviation
Measures how far each value is from the mean
Describing distributions - spread
Standard deviation by hand: calculate the standard deviation of the data set:

Step1 find meanE 6, 2, 3, 1


Step4 square root
6 21 3
FE
Step2 sumof squaredeviations Ex It 3
33712 332 13 332 1
6
9 I t O t 4 14
divide n
steps
II
Describing distributions - spread
Standard deviation by hand: calculate the standard deviation of the data set:

4, 3, 5, 2
4 2 3.5
345
4 3.55 133.53715 3.572 12 3.55
0.25 0.25 2.25 2.25 9

F 21.7
Describing distributions - spread
Standard deviation on a calculator:
Describing distributions - spread
In a study of high school football players that suffered concussions, researchers
placed the players in two groups. Players that recovered from their concussions
in 14 days or less were placed in Group 1. Those that took more than 14 days
were placed in Group 2. The recovery times (in days) for Group 1 are listed
below.

Find the sample variance and standard deviation of the recovery times.

4 7 6 7 9 5 8 10 9 8 7 10
Describing distributions - spread
Standard deviation in your name:
1. Write down the letters in your preferred first name and convert them to
numbers.
prehenell
51851212
1693 11
2. Using the values, calculate the st. dev.
of your name

3. Interpret the results

mean
I 10 I
Sampler
4.8
88 population
Describing distributions - spread
Outliers:

if somethingis 2 St dev above or below


the mean its an outlier
Describing distributions - spread
Why the standard deviation as a measure of variation is valuable:

30
I I l FI
M
I I
30
Within 1 standard deviation of the mean: about 68% of the data
Within 2 standard deviation of the mean: about 95% of the data
Within 3 standard deviation of the mean: about 99.7% of the data
Empirical rule
Describing distributions - spread
The monthly utility bills for eight more households are listed.Are any of the data
entries very unusual? Explain your reasoning.

$65, $52, $63, $83, $77, $98, $84, $70


Practice

Start working on:


● Section 2.4 page 93: #’s: 13, 15, 18, 21-24, 29, 30 and 33
at
mean
St dev
range

Homework: If you do not finish (due next class)


Summary statistics

The first quartile, Q1:


● The median of the half of the ordered data set from the minimum to the
position of the median
median

I
4, 7, 8, 8, 11, 13, 15, 19, 21

7.5
Q1
Summary statistics

The third quartile, Q3:


● The median of the half of the ordered data set from the position of the
median to the maximum

3
1

4, 7, 8, 8,
121
11, 13, 15, 19, 21
Summary statistics

Find Q1, median (Q2), and Q3 from the data set: (note it’s in order)

med
1, 2, 5, 6, 7, 9, 12, 15, 18, 19, 27
Q Q3
of middle half
Summary statistics
us spread
tells
I QR of data
Interquartile range:

3 g

example 18 5 13
Summary statistics

Outliers:
rule Q 1.511 QR

3 1.5 IOR

ex I 4 7 9 11 12,13 17 22,30
7 1.51107 8
IQR 17 7 10
17 15 32
No outliers
Summary statistics:

Five Number Summary:

Min
Qi
med

3
Max
Summary statistics display: stat too
Boxplot:

outer

mm Q1 Md Tif rather pick


next biggest
Summary statistics display:
tar
Comparing boxplots: SCUÉE
stfu median

a a

Tt skew aft
Iright
skew
Pre calc O 6
Stats O 5
used measure position
Fractiles

Quartile dude data late 4


equal parts

dived data into 10


Decile

Percentile divide data into 100


Percentiles of values less
percent
than X

Percentile of x t.FI aaftfIasksstuan

80th percentile 80 of ppl fell below you


I n I l s t
50thpercentile soy
Standardized scores I 100

Standardized z-score: measures how many standard deviations a data value is


from the mean
Standardized scores postneg
t below
able
z vakst.IT
pop z
XI outliers 2 St dev

sample z
XI
Using z-scores to compare data Z
XI
The scores for a pre-calc test are normally distributed with a mean of 81.5 and
standard deviation of 4.7. Stats tests are normally distributed with a mean of 79.9
and standard deviation of 9.3.scores

You score a 84.5 in per-calc and your friend in stats scores a 85. Who did better?

5f.IT o 6y
7,1
8 0.55
Normal distribution Asymmetric
area under the
curve
total area Foot

30 20 O M O 20 30
Label the following normal distribution given the mean and standard
deviation:

80 90 100 110 120 130 140


Within 1 standard deviation of the mean: about 68% of the data
Within 2 standard deviation of the mean: about 95% of the data
Within 3 standard deviation of the mean: about 99.7% of the data
Empirical rule Total area loot

s
Normal distribution - use

What percent of adults have a systolic blood pressure below 100 mmHg?

100 68
321

Ht 687

I f
What percent of adults have a systolic blood pressure above 120 mmHg?

841 161

I
What percent of adults have a systolic blood pressure between 90 and 120
mmHg?

81.51 100 6 2 5 81.5

2 St 161

You might also like