Statistics and Probability
Statistics and Probability
PROBABILITY
STATISTICS
is an art and science that deals with the collection, organization, creative
presentation, analysis and interpretation quantitative of data.
FIELDS OF STATISTICS
DESCRIPTIVE STATISTICS
is concerned with the methods of collecting, organizing, and presenting
data appropriately and creatively to describe or assess group
characteristics.
INFERENTIAL STATISTICS
is concerned with inferring or drawing conclusions about the population
based from preselected elements of that population.
JOHN PAUL D. GUNNAWA
.
DATA/INFORMATION
TEXTUAL
TABULAR
MAP GRAPH/CARTOGRAPH
SCATTER POINT DIAGRAM
PIE/PICTURE GRAPH
JOHN PAUL D. GUNNAWA
FORMS OF PRESENTATION OF DATA
A. TEXTUAL
this form of presentation combines text and numerical facts in statistical report.
B. TABULAR
this form of presentation is better than textual form because it provides numerical
facts in a more concise and systematic manner.
Advantage of Tabular Presentation
1. It is brief, it reduces the matter to the minimum.
2. It provides the reader a good grasp of the meaning of the quantitative relationship
indicated in the report.
3. The column and rows make comparison easier.
C. GRAPHICAL PRESENTATION
this form is the most effective means of organizing and presenting statistical data
because the important relationships are brought out more clearly and creatively in
virtually solid and colorful figures. JOHN PAUL D. GUNNAWA
.
Consider the given data below which show the scores of 60 students in a statistics
test.
5 13 8 6 13 10 5 13 15 16
8 12 15 10 12 16 12 9 3 7
11 15 11 7 15 2 13 5 9 12
13 9 12 9 9 14 12 11 19 13
16 18 3 13 18 10 15 14 18 11
10 12 6 9 5 17 9 6 9 18
4. CLASS MARK
are the midpoint of the lower and upper class limits.
C.I CLASS MARK (X)
16 - 20 18
21 - 25 23
26 - 30 28
JOHN PAUL D. GUNNAWA
.
•The
construction of this distribution is a very simple activity that requires the
following steps.
1. Get the value of the range. The range denoted by R, refers to the difference
between the highest and the lowest value in the distribution.
R=H–L
2. The number of classes can be approximately by using the relationship
k = 1 + 3.3 log n
Where : k is the number of classes
n is the sample size
3. Determine the size of the class interval. The value of c can be obtained by
dividing the range by the desired number of classes.
c=
46 70 49 45 75 81 33 65 38 59
94 59 62 36 58 69 45 55 58 65
30 49 73 29 41 53 37 35 61 48
22 51 56 55 60 37 56 59 57 36
12 36 50 63 68 30 56 70 53 28
Step 5 and 6. Determine the classes and the frequency of each class.
Classes f x Class
boundaries
83 - 86 2 84.5 82.5 – 86.5
87 - 90 5 88.5 86.5 – 90.5
91 - 94 8 92.5 90.5 – 94.5
95 - 98 11 96.5 94.5 – 98.5
99 - 102 15 100.5 98.5 – 102.5
103 - 106 26 104.5 102.5 – 106.5
107 - 110 15 108.5 106.5 – 110.5
111 - 114 9 112.5 110.5 – 114.5
115 - 118 5 116.5 114.5 – 118.5
119 - 122 4 120.5 118.5 – 122.5
n = 100
JOHN PAUL D. GUNNAWA
Derived Frequency Distribution
we can construct other frequency distributions like the relative frequency
distribution and the cumulative frequency distribution.
Relative Frequency Distribution
it is given set of data shows the proportion in percent the frequency of each
class to the total frequency. Denoted by %f.
%f = f x 100
n
where %f – the relative frequency for each class interval
f – the frequency of each class
n – the sample size
The relative frequency of the first interval can be obtained as follow: 11 - 22
%f = 3 x 100 = 5%
60
Example: suppose we are interested in computing the weighted mean grade of the
student in our previous as shown below.
Student No. of units (w) Grade (x)
1 3 2.0
2 3 3.0
3 5 1.25
4 1 3.0
5 2 2.5
6 3 2.5
∑w = 17 ∑w x= 36.75
̅x = ∑wx
∑w
= 36.75
17
x̅ = 2.16 JOHN PAUL D. GUNNAWA
MEAN FOR GROUPED DATA
To compute the value of the mean of a data presented in a frequency
distribution, we shall consider two methods:
1. midpoint method
2. Unit Deviation method
The formula is:
̅x = ∑fx
n
Where: f – represent the frequency of each class
x – the midpoint of each class
n – the number of frequencies or sample size
Steps: 1. Get the midpoint of each class
2. Multiply each midpoint by its corresponding frequency
3. Get the sum of the products in step 2
4. Divide the sum obtained in step 3 by the total number of frequencies. The
result shall be rounded off to two decimal
JOHN PAUL D. places.
GUNNAWA
Example: Consider the frequency distribution of the examination scores of the sixty
students in a statistics class. (MIDPOINT METHOD)
Solution: To be able to compute the value of the mean.
Step 1. Get the midpoint of each class. The midpoint are shown in the third
column. Classes f x
11 - 22 3 16.5
23 - 34 5 28.5
35 - 46 11 40.5
47 - 58 19 52.5
59 - 70 14 64.5
71 - 82 6 76.5
83 - 94 2 88.5
Step 2: Multiply each midpoint by its corresponding frequency. The product are
shown in the 4th column.
Classes f x fx
11 - 22 3 16.5 49.5
23 - 34 5 28.5 142.5
35 - 46 11 40.5 445.5
47 - 58 19 52.5 997.5
59 - 70 14 64.5 903
71 - 82 6 76.5 459
83 - 94 2 88.5 177
11 - 22 3 16.5 49.5
23 - 34 5 28.5 142.5
35 - 46 11 40.5 445.5
47 - 58 19 52.5 997.5
59 - 70 14 64.5 903
71 - 82 6 76.5 459
83 - 94 2 88.5 177
n = 60 ∑fx = 3,174
JOHN PAUL D. GUNNAWA
.
n = 75 ∑fx = 3,496.5
̅x = ∑fx
n
= 3,496.5
75
̅ x = 46.62 JOHN PAUL D. GUNNAWA
UNIT DEVIATION METHOD
The formula is: ̅x = xₐ + (∑fd)c
n
Where: ̅x - represents the assumed mean
f – the frequency of each class
d – the unit deviation
c – the size of the class interval
n – the sample size
Follow the step:
1. Choose an assumed mean by getting the midpoint of any interval
2. Construct the unit deviation column
3. Multiply the frequencies by their corresponding unit deviation. Add the
products.
4. Divide the sum in step 3 by the sample size
5. Multiply the result in step 4 by the size of the class interval
6. Add the value obtained in step 5 to the assumed mean. The obtained result
which is the mean should be rounded off two decimal places.
JOHN PAUL D. GUNNAWA
Example 1. compute the value of the mean of the data. Using the unit deviation
method.
Solution:
Step 1. choose an assumed mean. Classes f
11 - 22 3
23 - 34 5
35 - 46 11
47 - 58 19
59 - 70 14
71 - 82 6
83 - 94 2
n = 60
JOHN PAUL D. GUNNAWA
Step 2. Construct the unit deviation column.
Classes f d
11 - 22 3 -3
23 - 34 5 -2
35 - 46 11 -1
47 - 58 19 0
59 - 70 14 1
71 - 82 6 2
83 - 94 2 3
JOHN PAUL D. GUNNAWA
Step 3. Multiply the frequencies by their corresponding unit deviation. Add the products.
Classes f d fd
11 - 22 3 -3 -9
23 - 34 5 -2 -10
35 - 46 11 -1 -11
47 - 58 19 0 0
59 - 70 14 1 14
71 - 82 6 2 12
83 - 94 2 3 6
∑fd = 2
JOHN PAUL D. GUNNAWA
Step 4, 5 and 6.
̅x = xₐ + (∑fd)c
n
= 52.5 + ( 2 )12
60
= 52.5 + ( 24 )
60
= 52.5 + 0.4
̅x = 52.9
JOHN PAUL D. GUNNAWA
MEDIAN
is a potential measure defined as the middlemost value in the
distribution.
MEDIAN FOR UNGROUP DATA
it is always a must that the values be arranged in terms of
magnitude either from lowest to highest or vice versa.
let ῀x be the median.
῀x = ᵡ(n + 1) if n is odd
2
= ᵡ(n) + ᵡ(n + 1)
2 2 If n is even
2 JOHN PAUL D. GUNNAWA
Example 1. find the median of the following values.
21, 10, 36, 42, 39, 52, 30, 25, 26
Solution: Before identifying the value of the median, it is
necessary that the values be arranged in terms of magnitude.
10, 21, 25, 26, 30, 36, 39, 42, 52
since n = 9 and is odd,
῀x = ᵡ(n + 1)
2
= ᵡ(9 + 1)
2
= x₅ (refers to the fifth value)
Where:
x₁ - refers to the lower boundary
cumfₐ - the cumulative frequency before the median class
f – the frequency of the median class
To be able to apply, we shall follow the steps below.
1. Get ½ of the total number of values.
2. Determine the value of cumf
3. Determine the median class.
4. Determine the lower boundary and the frequency of the median class and the size of the class
interval.
5. Substitute the values obtained in step 1-4 . Round off the final result to two decimal places.
n = 200
Determine the median of the monthly income of the 200 respondents.
JOHN PAUL D. GUNNAWA
Solution: By using same procedure.
Classes f <cumf
3,500 – 4,999 6 6
5,000 – 6,499 23 29
6,500 – 7,999 36 65
8,000 – 9,499 40 105
9,500 – 10,999 59 164 median class
11,000 – 12,499 20 184
12,500 – 13,999 8 192
14,000 – 15,499 6 198
15,500 – 16,999 2 200
Steps:
| <…………………..3/4……………….>|
|<……………1/2…………..>|
|<.…1/4…..>|
Example: 1. For purpose of illustration, let us again reproduce the less than
frequency distribution of the results of examination of 60 students, let us compute
the value of the first quartile and the third quartile.
Solution: The frequency distribution is reproduced below.
Classes F <cumf
11 - 22 3 3
23 - 34 5 8
35 - 46 11 19
47 - 58 19 f 38
59 - 70 14 52
71 - 82 6 58
83 - 94 2 60
To compute the value of Q₁, we shall follow the procedure used in computing the
value of the median.
Example: Determine the value of the 43rd percentile using the same frequency
distribution of the median, quartile or decile.
Solution: The frequency distribution is reproduced below.
Classes F <cumf
11 - 22 3 3
23 - 34 5 8
35 - 46 11 19
47 - 58 19 f 38
59 - 70 14 52
71 - 82 6 58
83 - 94 2 60
n = 60
AVERAGE DEVIATION
refers to the arithmetic mean of the absolute deviations of the values from the
mean of the distribution.
AVERAGE DEVIATION FOR UNGROUPED DATA
AD = ∑│x - ⁻x│
n
We shall follow the steps below:
1. Arrange the values in column according to magnitude.
2. Compute the value of the mean (⁻x).
3. Determine the deviation (x - ⁻x)
4. Convert the deviation in step 3 into positive deviation. Use the absolute value
sign │x - ⁻x│.
5. Get the sum of the absolute deviation in step 4.
6. Divide the sum in step 5 by n.
Notice that some of the deviations from the mean are negative. Hence, we make
an assumption that all deviation are positive deviations by introducing the absolute
value sign. Adding all these absolute deviation.
If we divide the sum of the absolute deviation by n, then we were able to compute
the value of the average deviation.
AD = ∑f│x - ⁻x│
n
= 751.4
60
AD = 12.52
s² = [∑fd² - (∑fd)²]c²
n n
s² = [114 - (2)²]12²
60 60