Handout Electro1
Handout Electro1
Data measured or collect by the investigator or the user directly from the source.
Secondary source: are individuals or agencies, which supply data originally collected for other
purposes by them or others. Usually they are published or unpublished materials, records,
reports, documents and etc.
Solution:
Since the data are categorical, discrete classes can be used. There are four types of marital status M,
S, D, and W. These types will be used as class for the distribution. We follow procedure to construct
the frequency distribution.
Step 1: Make a table as shown.
Class (1) Tally (2) Frequency (3) Percent (4)
M
S
D
W
Step 2: Tally the data and place the result in column (2).
Step 3: Count the tally and place the result in column (3).
f
Step 4: Find the percentages of values in each class by using; % * 100 Where f= frequency of
n
the class, n=total number of value.
Step 5: Find the total for column (3) and (4).
Combining the entire steps one can construct the following frequency distribution.
Class (1) Tally (2) Frequency (3) Percent (4)
M //// 5 20
S //// // 7 28
D //// // 7 28
W //// 6 24
CLASS
Boy s Men
Girls Women
30
25
Sales in $
20
15
10
5
0
A B C
product
Component Bar-graphs
-When there is a desire to show how a total (or aggregate) is divided in to its component parts, we use
component bar-graph.
-The bars represent total value of a variable with each total broken in to its component parts and different
colours or designs are used for identifications
Example: Draw a component bar graph to represent the sales by product from 1957 to 1959.
Solutions:
100
80
Sales in $
Product C
60
Product B
40
Product A
20
0
1957 1958 1959
Year of production
Multiple Bar-graphs
- It is used to display data on more than one variable for comparing different variables at the same time.
Example: Draw a multiple bar chart to represent the sales by product from 1957 to 1959.
Solutions:
60
50
Sales in $
40 Product A
30 Product B
20 Product C
10
0
1957 1958 1959
Year of production
1.2.6 Graphical presentation of data: Histogram, Frequency polygon, and Ogive curve
- The histogram, frequency polygon and cumulative frequency graph or ogive are most commonly
applied graphical representation for continuous data.
Procedures for constructing statistical graphs:
Draw and label the X and Y axes.
Choose a suitable scale for the frequencies or cumulative frequencies and label it on the Y axes.
Represent the class boundaries for the histogram or ogive or the mid points for the frequency
polygon on the X axes.
Plot the points.
Draw the bars or lines to connect the points.
Histogram
A graph which displays the data by using vertical bars of various height to represent frequencies. Class
boundaries are placed along the horizontal axes. Class marks and class limits are sometimes used as
quantity on the X axes.
Example: Construct a histogram to represent the previous data (example *).
Frequency Polygon: It is a line graph. The frequency of the data is placed along the vertical axis and
classes mid points are placed along the horizontal axis. It is customer to the next higher and lower class
interval with corresponding frequency of zero; this makes it a complete polygon.
Example: Draw a frequency polygon for the above data (example *).
Solutions:
4
Value Frequency
0
2. 5 8. 5 14.5 20.5 26.5 32.5 38.5 44.5
Objectives:
To comprehend (understand) the data easily.
To facilitate comparison.
To make further statistical analysis.
The Summation Notation:
Let X1, X2 ,X3 …XN be a number of measurements where N is the total number of
th
observation and Xi is i observation.
Very often in statistics an algebraic expression of the form X1+X2+X3+...+XN is used in a
formula to compute a statistic. It is tedious to write an expression like this very often, so
mathematicians have developed a shorthand notation to represent a sum of scores, called
the summation notation.
N
The symbol X
i 1
i is a mathematical shorthand for X1+X2+X3+...+XN
Properties of Summation
n
1. k nk
i 1
where k is any constant
n n
2. kX i k X i where k is any constant
i 1 i 1
n n
3. (a bX
i 1
i ) na b X i
i 1
where a and b are any constant
n n n
4. (X
i 1
i Yi ) X i Yi
i 1 i 1
fX k
f n
i i
Then the mean will be X i 1 , where k is the number of classes and i
k
f
i 1
i
i 1
f i Xi
36
X i 1
4
5.15
f
7
i
i 1
Arithmetic Mean for Grouped Data: If data are given in the shape of a continuous frequency
k
f i Xi th
distribution, then the mean is obtained as: X i 1
, Where Xi =the class mark of the i
k
f i 1
i
f X i i
1575
X i 1
6
15.75
f
100
i
i 1
Exercises: Life times (in months) of 75 bulbs are summarized in the following frequency
distribution:
Marks No. of students
40-44 7
45-49 10
50-54 22
55-59 f4
60-64 f5
65-69 6
70-74 3
If the 20% of the bulbs have life times between 55 and 59 months,
i. Find the missing frequencies f4 and f5.
ii. Find the mean.
Special properties of Arithmetic mean
1. The sum of the deviations of a set of items from their mean is always zero. i.e.
n
( X X ) 0.
i 1
i
2. The sum of the squared deviations of a set of items from their mean is the minimum. i.e.
n n
( Xi X ) ( X A) , A X
i 1
2
i 1
i
2
X n X 2 n 2 .... X k n k X n i i
by: Xc 1 1 i 1
n1 n 2 ...n k
k
n
i 1
i
X 1 n1 X 2 n 2 X n i i
Xc i 1
n1 n 2
2
n
i 1
i
mean
Weighted Mean
X W i i
Xw i 1
n
W
i 1
i
Example: A student obtained the following percentage in an examination: English 60, Biology
75, Mathematics 63, Physics 59, and chemistry 55.Find the students weighted arithmetic mean if
weights 1, 2, 1, 3, 3 respectively are allotted to the subjects.
Solutions:
5
X W i i
60 * 1 75 * 2 63 * 1 59 * 3 55 * 3 615
Xw i 1
61.5
1 2 1 3 3
5
10
W
i 1
i
If observations X1, X2, …Xn have weights W1, W2, …Wn respectively, then their harmonic
mean is given by
n
W i
H.M n
i 1
, This is called Weighted Harmonic Mean.
W
i 1
i Xi
Remark: The Harmonic Mean is useful and appropriate in finding average speeds and average
rates.
Example: A cyclist pedals from his house to his college at speed of 10 km/hr and back from the
college to his house at 15 km/hr. Find the average speed.
Solution: Here the distance is constant
The simple H.M is appropriate for this problem.
X1= 10km/hr X2=15km/hr
2
H.M 12km/hr
1 1
10 15
2.2.2 The mode
Mode is a value which occurs most frequently in a set of values
The mode may not exist and even if it does exist, it may not be unique.
In case of discrete distribution the value having the maximum frequency is the model value.
Examples:
1. Find the mode of 5, 3, 5, 8, 9. Mode =5
2. Find the mode of 8, 9, 9, 7, 8, 2, and 5. It is a bimodal Data: 8 and 9
3. Find the mode of 4, 12, 3, 6, and 7. No mode for this data.
- The mode of a set of numbers X1, X2, … Xn is usually denoted by X̂ .
1 Where:
X̂ L mo w
1 2 Xˆ the mod e of the distribution
w the size of the mod al class
1 f mo f 1
2 f mo f 2
f mo frequencyof the mod al class
f 1 frequencyof the class preceedingthe mod al class
f 2 frequencyof the class following the mod al class
ˆ 45 10 2
X
2 26
45.71
1
( X [3] X [ 4 ] )
2
1
( 5 6) 5.5
2
b) Order the data:1, 2, 3, 5, 8. Here n=5
~ X
X n 1
[ ]
2
X [3]
3
n
than or equal to .
2
Example: Find the median of the following distribution.
Class Frequency
40-44 7
45-49 10
50-54 22
55-59 15
60-64 12
65-69 6
70-74 3
Solutions:
First find the less than cumulative frequency.
Identify the median class.
Find median using formula.
Class Frequency Cumu.Freq(less
than type)
40-44 7 7
45-49 10 17
50-54 22 39
55-59 15 54
60-64 12 66
65-69 6 72
70-74 3 75
n 75
37.5
2 2
39 is the first cumulative frequencyto be greater thanor equalto 37.5
50 54 is the median class.
L 49.5, w 5
med
n 75, c 17, f 22
med
~
X L w ( n c)
med f 2
med
49.5 5 (37.5 17)
22
54.16
Remark: The decile class (class containing Di )is the class with the smallest cumulative frequency (less
w N
Q1 LQ1 ( c)
LQ 170 ,
1
w 10 fQ 4
1
N 493 , c 88 , f Q 72
1 170
10
(123.25 88)
72
174.90
ii. Q2
- determine the class containing the second quartile.
2* N
246.5
4
190 200 is the class containingthe sec ond quartile.
LQ 190 ,
2
w 10
N 493 , c 244 , f Q 107
2
w 2* N
Q2 LQ ( c)
2
fQ
2
4
10
170 ( 246.5 244)
72
190.23
iii. Q3
- determine the class containing the third quartile.
3* N
369.75
4
200 210 is the class containingthe third quartile.
LQ 200 ,
3
w 10
N 493 , c 351 , f Q 49
3
w 3* N
Q3 LQ 3 ( c)
fQ
3
4
10
200 (369.75 351)
49
203.83
b) D7
- determine the class containing the 7th decile.
7* N
345.1
10
190 200 is the class containingthe seventh decile.
LD 190 ,
7
w 10
N 493 , c 244 , f D 107
7
w 7* N
D7 LD ( c)
7
f D 10
7
10
190 (345.1 244)
107
199.45
c) P90
- determine the class containing the 90th percentile.
90 * N
443.7
100
220 230 is the class containingthe 90th percentile.
LP 220 ,
90
w 10
N 493 , c 434 , f P 3107
90
w 90 * N
P90 LP ( c)
90
f P 100
90
10
220 (443.7 434)
31
223.13
1
Population Varince 2 ( X i ) 2 , i 1,2,.....N
N
1
Population Varince 2 f i ( X i ) 2 , i 1,2,.....k
N
Sample Variance: One would expect the sample variance to simply be the population variance
with the population mean replaced by the sample mean. However, one of the major uses of
statistics is to estimate the corresponding parameter. This formula has the problem that the
estimated value isn't the same as the parameter. To counteract this, the sum of the squares of the
deviations is divided by one less than the sample size.
1
Sample Varince S 2 ( X i X ) 2 , i 1,2,....., n
n 1
1
Sample Varince S 2 f i ( X i X ) 2 , i 1,2,.....k
n 1
n
X i nX 2
2
S2 i 1
, for raw data.
n 1
k
f i X i nX 2
2
S2 i 1
, for frequency distribution.
n 1
Standard Deviation
There is a problem with variances. Recall that the deviations were squared. That means that the
units were also squared. To get the units back the same as the original data values, the square
root must be taken.
(Xi- X)2 36 1 1 36 74
n
( X i X )2 74
S 2 i 1 24.67.
n 1 3
S S 2 24.67 4.97.
2. X 55
Xi(C.M) 42 47 52 57 62 67 72 Total
1183 640 198 60 588 864 867 4400
fi(Xi- X)2
n
fi ( X i X )2 4400
S2 i 1
59.46.
n 1 74
S S2 59.46 7.71.
Special properties of Standard deviations
1. ( X i X )2 ( X i A) 2 ,A X
n 1 n 1
2. For normal (symmetric distribution the following holds.
Approximately 68.27% of the data values fall within one standard deviation of the mean. i.e.
with in ( X S , X S )
Approximately 95.45% of the data values fall within two standard deviations of the mean. i.e.
with in ( X 2 S , X 2 S )
Approximately 99.73% of the data values fall within three standard deviations of the mean.
i.e. with in ( X 3S , X 3S )
3. Chebyshev's Theorem
For any data set ,no matter what the pattern of variation, the proportion of the values that fall
Applying the above theorem at least (1 1 ) *100% 75% of the numbers lie between 38 and 62.
k2
b) Similarly done.
c) It is just the complement of a) i.e. at most 1 *100% 25% of the numbers lie less
2
k
than 32 or more than 62.
d) Similarly done.
Example 2: The average score of a special test of knowledge of wood refinishing has a mean of
53 and standard deviation of 6. Find the range of values in which at least 75% the scores will lie.
(Exercise)
4. If the standard deviation of X 1 , X 2 , .....X n is S , then the standard deviation of
SB 11
C.VB *100 *100 23.16%
XB 47.5
Since C.VA < C.VB, in firm B there is greater variability in individual wages.
2. A meteorologist interested in the consistency of temperatures in three cities during a given week
collected the following data. The temperatures for the five days of the week in the three cities were
City 1 25 24 23 26 17
City2 22 21 24 22 20
City3 32 27 35 24 28
Exercise: Which city have the most consistent temperature, based on these data?
2.4.2 Standard Scores (Z-scores)
S2 1.3
C.V2 *100 *100 10.92%
X2 11.9
Since C.V2 < C.V1, group 2 is more consistent.
b) Calculate the standard score of A and B
X A X 1 9.2 10.4
ZA 1
S1 1.2
XB X2 9.3 11.9
ZB 2
S2 1.3
Child B is faster because the time taken by child B is two standard deviation shorter than the
average time taken by group 2 while, the time taken by child A is only one standard deviation
shorter than the average time taken by group 1.