0% found this document useful (0 votes)
10 views20 pages

Lecture 2 PDF

The document covers key concepts in probability and statistics, focusing on measures of central tendency, including mean, median, and mode, as well as measures of dispersion like variance, standard deviation, and coefficient of variation. It provides formulas for calculating these measures for both ungrouped and grouped data, along with examples for better understanding. Additionally, it discusses the importance of these measures in analyzing data distributions.

Uploaded by

mehcav
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
10 views20 pages

Lecture 2 PDF

The document covers key concepts in probability and statistics, focusing on measures of central tendency, including mean, median, and mode, as well as measures of dispersion like variance, standard deviation, and coefficient of variation. It provides formulas for calculating these measures for both ungrouped and grouped data, along with examples for better understanding. Additionally, it discusses the importance of these measures in analyzing data distributions.

Uploaded by

mehcav
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 20

MAT3026 - Probability and Statistics

Today's topics are:


• Measures of Location

• Measures of Variability

Lecture 2 1
MEASURES OF CENTRAL TENDENCY (Measures of Location)

Central tendency refers to the location of a distribution. The most important


measures of central tendency are:

• the mean
• the median
• the mode.

We will be measuring these for samples drawn from populations, as well as for
grouped and ungrouped data.

Lecture 2 2
The arithmetic mean or average of a population is represented by μ (the Greek
letter mu); and for a sample, by (read “X bar”).

For ungrouped data, is calculated by the following formula:

And for grouped data, is calculated by

Lecture 2 3
The median (read “X tilda”) for ungrouped data is the value of the middle item
when all the items are arranged in either ascending or descending order in terms of
values:

where n refers to the number of items in the sample.

The median for grouped data is given by the formula;

where
= the lower class boundary of the class containing the median
= the total number of data
= the cumulative frequency of the classes before the median class
= the frequency of the median class
= class interval size

Lecture 2 4
The mode (read “X hat”) for ungrouped data is the value that occurs most
frequently in the data set.

The mode for grouped data calculated by the following formula:

where
= lower class boundary of the modal class
= frequency of the modal class minus the frequency of the previous class
= frequency of the modal class minus the frequency of the following class
= class interval size

Lecture 2 5
EXAMPLE : Find the mean, median and mode of the given set of numbers:

5, 4, 6, 8, 7, 2, 9, 4, 12

The mean is calculated by the following formula:

The median for ungrouped data is the value of the middle item when all the items
are arranged. Then we arrange the data in ascending order:

2, 4, 4, 5, 6, 7, 8, 9, 12

Therefore .

The mode for ungrouped data is the value that occurs most frequently in the data
set. Then .

Lecture 2 6
EXAMPLE : Find the mean, median and mode of the given set of numbers:

12, 5, 3, 6, 4, 9, 7, 6, 9, 11

The mean is calculated by the following formula:

The median for ungrouped data is the value of the middle item when all the items
are arranged. Then We arrange the data in ascending order:

3, 4, 5, 6, 6, 7, 9, 9, 11, 12

Therefore

The mode for ungrouped data is the value that occurs most frequently in the data
set. Then and

Lecture 2 7
EXAMPLE: The following table shows the distribution of the marks of a course.
A) Determine the mean, mode and median of the marks.
B) Determine the number of students whose marks are
I) less than 57.
II) between 58 and 72.
Marks No. of students Class Boundary Class Mark f.x
(f) (X)
30 – 39 2 29.5 – 39.5 34.5 69
40 – 49 10 39.5 – 49.5 44.5 445
50 – 59 9 49.5 – 59.5 54.5 490.5
Median 60 – 69 22 59.5 – 69.5 64.5 1419
class
70 – 79 14 69.5 – 79.5 74.5 1043
80 – 89 32 79.5 – 89.5 84.5 2704
Modal 90 - 99 3 89.5 – 99.5 94.5 283.5
class
92 6454

Lecture 2 8
I) less than 57. 57
2 + 10 + X

59.5

69.5

79.5

89.5

99.5
49.5
39.5
29.5
Class Boundary :

Frequency : 2 10 9 22 14 32 3
59.5 – 49.5 9 students
57 – 49.5 X
19 students have got less than 57.

II) between 58 and 72. 58 72


X+ 22 +Y

59.5

69.5

79.5

89.5

99.5
49.5
39.5
29.5

Class Boundary :

Frequency : 2 10 9 22 14 32 3
59.5 – 49.5 9 students
59.5 - 58 X

79.5 – 69.5 14 students


72 – 69.5 Y
27 students have got between 58 and 72.
Lecture 2 9
After arranging the data in ascending order;

Median divides the data into two equal parts,


Quartiles divide the data into four equal parts,

Deciles divide the data into ten equal parts,

Percentiles divide the data into hundred equal parts,

The 2nd quartile, the 5th decile and the 50th percentile correspond to the median.

The 25th and 75th percentiles correspond to the 1st and 3rd quartiles, respectively.

Lecture 2 10
EXAMPLE : A sample of 25 workers in a plant receive the hourly wages (in dollars) as:
3.65 3.78 3.85 3.95 4.00
4.10 4.25 3.50 3.85 3.96
3.60 3.90 4.28 3.75 3.95
4.05 4.08 4.15 3.80 4.05
3.88 3.95 4.06 4.18 4.05
a) Construct a frequency distribution table having $0.10 class interval size.
b) Find the 1st and 2nd quartiles and the 3rd deciles and 60th percentiles for both ungrouped
and grouped data.

Class Interval Frequency Class Boundary


3.50 – 3.59 1 3.495 – 3.595
Highest mark = 4.28
3.60 – 3.69 2 3.595 – 3.695
Lowest mark = 3.50
3.70 – 3.79 2 3.695 – 3.795
Range = 4.28 – 3.50 = 0.78 3.80 – 3.89 4 3.795 – 3.895
3.90 – 3.99 5 3.895 – 3.995
Class interval size (c) = 0.10
4.00 – 4.09 6 3.995 – 4.095
Number of class interval = 0.78 / 0.10 = 7.8 ≈ 8
4.10 – 4.19 3 4.095 – 4.195
4.20 – 4.29 2 4.195 – 4.295
n = 25
Lecture 2 11
For ungrouped data, we arrange the data in ascending order:
3.50 3.60 3.65 3.75 3.78 3.80 3.85 3.85 3.88 3.90
3.95 3.95 3.95 3.96 4.00 4.05 4.05 4.05 4.06 4.08
4.10 4.15 4.18 4.25 4.28

1/4 n are in the left side and 3/4 n are in the right side of 1st quartile.

2/4 n are in the left side and 2/4 n are in the right side of 2nd quartile.

3/10 n are in the left side and 7/10 n are in the right side of 3rd decile.

60/100 n are in the left side and 40/100 n are in the right side of 60th percentile.

Lecture 2 12
For grouped data:
Class Frequency
Boundary
3.495 – 3.595 1
3.595 – 3.695 2
3.695 – 3.795 2
3.795 – 3.895 4
3.895 – 3.995 5
3.995 – 4.095 6
4.095 – 4.195 3
4.195 – 4.295 2
25

Lecture 2 13
MEASURES OF DISPERSION (Measures of Variability)

Dispersion refers to the variability or spread in the data. The most important
measures of dispersion are
 the variance
 the standard deviation
 the coefficient of variation .
We will measure these for grouped and ungrouped data.

Lecture 2 14
Variance of a population is represented by (the Greek letter sigma squared );
and for a sample, by .

For ungrouped data, is calculated by the following formula:

And for grouped data, is calculated by

NOTE: The quantity n − 1 is often called the degrees of freedom associated with
the variance estimate.

Lecture 2 15
Standard deviation of a population is represented by ; and for a sample, by s.
They are the positive square roots of their respective variances.

For ungrouped data, s is calculated by the following formula:

And for grouped data, s is calculated by

Lecture 2 16
The coefficient of variation (CV) is defined as the ratio of the standard deviation
to the mean:

The standard deviations of two variables, while both measure dispersion in their
respective variables, cannot be compared to each other in a meaningful way to
determine which variable has greater dispersion because they may vary greatly in
their units and the means about which they occur. The standard deviation and
mean of a variable are expressed in the same units, so taking the ratio of these two
allows the units to cancel. This ratio can then be compared to other such ratios in a
meaningful way: between two variables, the variable with the smaller CV is less
dispersed than the variable with the larger CV.

Lecture 2 17
EXAMPLE: An engineer is interested in testing the “bias” in a pH meter. Data are
collected on the meter by measuring the pH of a neutral substance (pH = 7.0).
A sample of size 10 is taken, with results given by
7.07 7.00 7.10 6.97 7.00 7.03 7.01 7.01 6.98 7.08

Lecture 2 18
EXAMPLE: Find an estimate of the variance and standard deviation of the
following data for the marks obtained in a test (out of 50) by 88 students.

Marks No. of students


(f)
0 - 10 6 5 30 -23.5 552.25 3313.5
10 - 20 16 15 240 -13.5 182.25 2916
20 - 30 24 25 600 -3.5 12.25 294
30 - 40 25 35 875 6.5 42.25 1056.25
40 - 50 17 45 765 16.5 272.25 4628.25
88 2510 12208

Lecture 2 19
EXAMPLE: A company has two sections with 40 and 65 employees respectively.
Their average weekly wages are $550 and $350. The standard deviation are $10
and $9. Which section has larger variability in wages?

Coefficient of variation for Section A = 10/550 = 0.0182

Coefficient of variation for Section B = 9/350 = 0.0257

Section B has larger variability in the wages, since the CV of section B is greater than
the CV of section A.

Lecture 2 20

You might also like