Chapter1-Introduction To Statistics
Chapter1-Introduction To Statistics
Introduction to Statistics
1
Chapter 1- Introduction to statistics
Introduction:
6
Chapter 1- Introduction to statistics
Limitation of statistics:
1. Statistics does not deal with individual items;
2. Statistics deals only with quantitatively
expressed items, it does not study qualitative
phenomena;
3. Statistical results are not universally true;
• Statistical laws are only approximations and not exact. Of
• in terms of probability and chance
• Eg. It has been found that 20 % of-a certain surgical operations by a
particular doctor are successful."
11
Chapter 1- Introduction to statistics
The following are some examples of descriptive
Statistics:
The daily average temperature range of AA was 25 0c
last week .
The maximum amount of coffee export of Eth. (as
observed from the last 20 years) was in the year
2004.
The average age of athletes participated in London
Marathon was 25 years.
75% of the instructors in AAU are male.
The scores of 50 students in a Mathematics exam are
found to range from 20 to 90.
12
Chapter 1- Introduction to statistics
2. Inferential statistics (Inductive Statistics):
1. Collection of Data:
Data collection is the process of gathering information
or data about the variable of interest. Data are inputs
for Statistical investigation. Data may be obtained
either from primary source or secondary source.
2. Organization of Data
Organization of data includes three major steps.
1. Editing: checking and omitting inconsistencies,
irrelevancies.
2. Classification : task of grouping the collected and
edited data .
3. Tabulation: put the classified data in the form of
table.
Chapter 1- Introduction to statistics 19
3. Presentation of Data
The purpose of presentation in the statistical analysis is to
display what is contained in the data in the form of Charts,
Pictures, Diagrams and Graphs for an easy and better
understanding of the data.
4. Analyzing of Data
In a statistical investigation, the process of analyzing
data includes finding the various statistical constants
from the collected mass of data such as measures of
central tendencies (averages) , measures of dispersions
and soon.
It merely involves mathematical operations: different
measures of central tendencies (averages), measures of
variations, regression analysis etc. In its extreme case,
analysis requires the knowledge of advanced
mathematics.
20
Chapter 1- Introduction to statistics 20
5. Interpretation of Data
involve interpreting the statistical constants
computed in analyzing data for the formation of valid
conclusions and inferences.
It is the most difficult and skill requiring stage.
It is at this stage that Statistics seems to be very
much viable to be misused.
Correct interpretation of results will lead to a valid
conclusion of the study and hence can aid in taking
correct decisions.
Improper (incorrect) interpretation may lead to
wrong conclusions and makes the whole objective of
the study useless.
21
Chapter 1- Introduction to statistics 21
THE ENGINEERING METHOD AND
STATISTICAL THINKING
29
Chapter 1- Introduction to statistics 29
Method of data presentation
1. Classification
2. Tabular method.
3. Graphical/Diagrammatic method
40
Chapter 1- Introduction to statistics 40
A. Ungrouped (Discrete) Frequency
Distribution
It is a tabular arrangement of numerical
data in order of magnitude showing the
distinct values with the corresponding
frequencies.
44
Chapter 1- Introduction to statistics 44
Components of grouped frequency distribution
1. Lower class limit:
is the smallest number that can actually belong to
the respective classes.
2. Upper class limit:
is the largest number that can actually belong to
the respective classes.
3. Class boundaries:
are numbers used to separate adjoining classes
which should not coincide with the actual
observations.
4. Class mark:
is the midpoint of the class.
6. Unit of measure
is the smallest possible positive difference
between any two measurements in the given data
set that shows the degree of precision.
Chapter 1- Introduction to statistics 46
Class boundaries:
can be obtained by taking the averages of the
upper class limit of one class and the lower class
limit of the next class.
Lower class boundaries:
can be obtained by subtracting half a unit of
measure from the lower class limits.
limits
Upper class boundaries:
can be obtained by adding half the unit of measure
to the upper class limits.
96 89 58 61 46 59 75 54
41 56 77 49 58 60 63 82
66 64 69 67 62 55 67 70
78 65 52 76 69 86 44 76
57 68 64 52 53 74 68 39
• No. of observations=40
• Range=96-39=57
• No. of classes=1+3.32*LOG(40)=6.3
• Class width=57/6=9
Classes Frequency
39.0 48.0 4
48.0 57.0 7
57.0 66.0 11
66.0 75.0 9
75.0 84.0 6
84.0 93.0 3
Exercise
• Range=4.2-2.0=2.2
• Class interval=2.2/(1+3.32log(30))=0.4
• No of classes=1+3.32(log(30))=5.9
Exercise
Types of Grouped Frequency Distribution
57
Chapter 1- Introduction to statistics
2. Cumulative Frequency Distribution (CFD):
58
Chapter 1- Introduction to statistics 58
59
Chapter 1- Introduction to statistics 59
3. Relative Cumulative Frequency Distribution (RCFD)
It is used to determine the ratio or the percentage of
observations that lie below or above a certain value/class
boundary, to the total frequency of all the classes. These
are of two types: The LRCFD and MRCFD.
Less than Relative Cumulative Frequency Distribution
(LRCFD): A table presenting the ratio of the cumulative
frequency less than upper class boundary of each class to
the total frequency of all the classes
More than Relative Cumulative Frequency Distribution
(MRCFD): A table presenting the ratio of the cumulative
frequency more than lower class boundary of each class to
the total frequency of all the classes.
Test score MC F MR C F MP C F
More than 37.5 40 40/40=1 100%
More than 47.5 36 36/40=0.9 90%
More than 57.5 28 28/40=0.7 70%
More than 67.5 15 15/40=0.375 37.5%
More than 77.5 5 5/40=0.125 12.5%
More than 87.5 2 2/40=0.05 5%
More than 97.5 0 0/40=0 0%
1. Histogram
2. Frequency Polygon (Line graph)
64
Chapter 1- Introduction to statistics 64
Line and Bar Graph
Suitable for Discrete variables
Bar Graph
Component Bar Diagram
Component Bar Diagram
Component Bar Diagram
• Shows breakup of each part
• Helpful for comparison of parts and
aggregates
The budgets of two famalies can be compared
by _____________.
a)All of these options
b)Bar Chart
c)Cluster Bar chart
d)Sub-divided rectangles
Histogram:
Freq.
20
15
10
3-D Column 1
5
0
20 - 3030 - 4040 - 5050 - 60 60 - 70 70 -80
a)1 only
b)2 only
c)Both 1 and 2
d)Neither 1 nor 2
MCQ
a) Symmetrical
b) Skewed left
c) Skewed right
d) Rotational
MCQ
In constructing a histogram, if the class interval size of one class is double
than others, then the width of that bar should be
a)Doubled
b)Half
c)One
d)Quarter
2. Frequency Polygon:
It is a line graph of grouped frequency
distribution in which the class frequency is
plotted against class mark that are
subsequently connected by a series of line
segments to form line graph including classes
with zero frequencies at both ends of the
distribution to form a polygon.
79
Chapter 1- Introduction to statistics
Frequency Polygon:
20
15
10 Line 1
0
20 - 30 30 - 40 40 - 50 50 - 60 60 - 70 70 -80 80
80
Chapter 1- Introduction to statistics
Frequency Polygon
Steps to draw Frequency polygon
83
Chapter 1- Introduction to statistics 83
Ogive: E.g.
45
40
35
30
25
Line 1
20
15
10
5
0
20 30 40 50 60 70 80
84
Chapter 1- Introduction to statistics 84
Steps to draw O-gives
i. Mark class boundaries on the x-axis and mark non overlapping
intervals of equal length on the y-axis to represent the
cumulative frequencies.
ii. For each class boundaries marked on the x-axis, plot a point with
height equal to the corresponding cumulative frequencies.
iii. Connect the marked points by a series of line segments where
the less than O-give is done by plotting the less than cumulative
frequency against the upper class boundaries
• Bar charts
• Pie chart
• Pictograph and
• Pareto diagram
mi x i
x i 1
µ i 1
fc N
Chapter 1- Introduction to statistics 95
Advantages
• It is the most commonly used measure of location or
central tendency for continuous variables.
• The arithmetic mean uses all observations in the data
set.
• All observations are given equal weight.
Disadvantage
• The mean is affected by extreme values that may not be
representative of the sample.
Harmonic mean =
s 2
2
fi * (m x ) i
2
n 1
Chapter 1- Introduction to statistics 117
Advantages
• The variance is an efficient estimator
• Variances can be added and averaged
Disadvantage
• The calculation of the variance can be tedious
without the aid of a calculator or computer
s s 2
CV ( s / x ) *100
Chapter 1- Introduction to statistics 121
Advantages
• The coefficient of variation can be used for comparing
the variation in different populations of data that are
measured in two different units. (because the CV is
unitless)
Disadvantages
• The coefficient of variation fails to be useful when x is
close to zero.
• The coefficient of variation is often misunderstood and
misused.
Chapter 1- Introduction to statistics 122
The Interquartile range
• It is the difference between the 25th and the
75th quartiles.
Interquartile range = Q3 – Q1
3