Statistics Notes
Statistics Notes
To explain this, please refer to the notes provided before and also self-study on the concepts
given below;
If anyone don’t understand the concepts after studying, please discuss with me!
Self-Study:
Symmetric Distribution
Skewed Distribution
Positively Skewed Data (Distribution)
Negatively Skewed Data (Distribution)
Now we have studied how to calculate the middle of data, or one value that can represent the
whole set (or distribution). These averages are called measure of location. Now we need to see
how variable the data is, how spread apart it is. To study that, we need to focus on measure of
spread (dispersion of data)
Measure of spread
Let’s take an example;
You enrolled in an off-road race, where there are multiple rivers which your jeep needs to wade
through. The average depth of water is quoted at 3 feet. Your jeep has a wading height quoted
at 3.5 feet. Is it safe to enter the water?
You probably will not cross water only with this information.
You will need to know the maximum and minimum depth of water to make sure that the
maximum does not go over the 3.5 feet mark which your jeep can wade through.
There can be two conditions:
1. The range of water is from 1 foot to 5 feet. If this is the case, the average depth will be 3
feet, but your car cannot wade through this depth and will probably result in your
engine being hydro-locked.
2. What if the range of water is from 2.5 feet to 3.5 feet? In this case, your jeep can easily
wade through the water and be a part of the competition?
This is the reason why there is a need of other information apart from the measure of location.
The dispersion, or the measure of spread, combined with the measure of location helps us
understand the data fully. In our course we will study the following measures of spread;
1. The range
2. The mean absolute deviation (MAD)
3. Variance
4. Standard Deviation
Some properties of dispersion:
Reasons to study measure of dispersion:
1. A small value of measure of dispersion indicated that the data are clustered closely,
around arithmetic mean. The mean is therefore considered representative of the data.
Conversely, a larger measure of dispersion indicated that the mean is not reliable.
2. The measure of spread helps us compare the data of two or more distributions. Suppose
that there are two plants of a LCD manufacturer. Both plants have similar mean values
of hourly outputs. However, this may not be correct, as one plant might have near
average hourly output rate, but in second plant, the hourly output of first shift is bad
while second shift is working way ahead of the mean. This will require us to know the
range to understand which hourly output mean is correct and which factory is working
better.
The Range:
Measure of spread that is most associated with the mode is range. Since both are statistically
relatively easy and quick to calculate. They’re well suited for initial exploration of the data.
The mean is the average being used, then one very good way of measuring the amount of
variability in the data is to calculate the extent to which the value differs from the mean.
MAD = ∑ | X − 𝑋| / n
Examples:
1. Measure the mean absolute deviation for the “shop A” mentioned in example above.
The arithmetic mean of sample is £1,050
2. The chart below shows the number of cappuccinos sold at Starbucks in the Orange
County airport and the Ontario, California, airport between 4 and 5 pm for a sample of
five days last month.
Determine the mean, median, range and MAD for each location.
California Airports
Orange County Ontario
20 20
40 49
50 50
60 51
80 80
2 2
∑(𝑋 − 𝑋 ) ∑ 𝑓 𝑥2 ∑ 𝑓𝑥
𝑆2 = = √ − ( )
𝑛 ∑𝑓 ∑𝑓
Steps to calculate Variance:
1. Calculate arithmetic mean
2. Calculate difference of every observation and mean
3. Square the differences
4. Sum the squares
5. Divide the sum of squares by the total number of observations.
Standard deviation
The square root of the variance.
Denoted by “S”
2
∑(𝑋 − 𝑋 )
𝑆= √𝑆 2 = √
𝑛
Steps to calculate Standard Deviation:
1. All Steps of Variance
2. Take Square root of variance.
Example:
The number of traffic citations issued last year by month in Beaufort County, South California, is
reported below;
Month Jan Feb Mar Apr May June July Aug Sep Oct Nov Dec
Citations 19 18 22 18 28 34 45 39 38 44 34 10
Determine the population variance.
ASSIGNMENT
1. Calculate the variance and standard deviation of Shop A and Shop B from previous
example.
2. An analyst is considering two categories of company, X and Y, for possible investment.
One of her assistants has compiled the following information on the price-earnings ratio
of the share of companies in the two categories over the past year.
Price-Earnings Ratio Number of category X Number of Category Y
companies companies
4.95 – 8.95 3 4
8.95 – 12.95 5 8
12.95 – 16.95 7 8
16.95 – 20.95 6 3
20.95 – 24.95 3 3
24.95 – 28.95 1 4
Compute the standard deviations of these two distributions and comment.
Mean of the two given distributions are 15.59 and 15.62 respectively.
3. Find the arithmetic mean for the following distribution, which shows the number of
employees absent per day
No. of employees absent No. of days (frequency)
2 2
3 4
4 3
5 4
6 3
7 3
8 3
4. Compute the mean of profit per vehicle from the following data of Applewood Auto
Group.
Profit Frequency
$200 upto $600 8
$600 up to $1,000 11
1,000 up to 1,400 23
1,400 – 1,800 38
1,800 – 2,200 45
2,200 – 2,600 32
2,600 – 3,000 19
3,000 – 3,400 4
6. The enrollment of the 13 public universities in the state of Ohio are listed below;
College Enrollment
University of Akron 25,942
Bowling Green State University 18,989
Central State University 1,820
University of Cincinatti 36,415
Cleveland State University 15,664
Kent State University 34,056
Miami University 17,161
Ohio State University 59,091
Shawnee State University 4,300
Univerity of Toledo 20,775
Wright State University 18,786
Youngstown State University 14,682