SQT I
SQT I
Introduction to Statistics
Dr. Vishal Thelkar
Learning Objectives
• Introduction to Statistics,
• Data Representations and Frequency Distribution;
• Graphs - Histogram,
• Polygon,
• Ogive,
• Bar Chart,
• Pie Chart,
• Pareto Diagram;
• Using Microsoft-Excel for the analysis of frequency distribution and
Graphs
Introduction
• In the modern world of computers and information technology, the importance of
statistics is very well recognized by all the disciplines.
• Statistics has originated as a science of statehood and found applications slowly and
steadily in
o
Agriculture,
o
Economics,
o
Commerce,
o
Biology,
o
Medicine,
o
Industry,
o
Planning, education and so on
• As on date there is no other human walk of life, where statistics cannot be applied.
Origin of Statistics
Statistics
• The word ‘ Statistics’ and ‘ Statistical’ are all derived from the Latin
word Status, means a political state.
•Class Limits:- The end values of a class are called as a Class limits
•Upper Class Limit :- The higher end value of a class is called as a
Upper Class limits (U)
•Lower Class Limit:- The Smaller value of a class is called as a Lower
Class limits (L)
Class Interval
• Class interval (C.I) is the difference between upper limit and
lower limit (which is constant through out the classes)
Methods of Frequency distribution
6.0 5.9 3.5 2.9 8.7 7.9 7.1 5.0 5.2 3.9
3.7 6.1 5.8 4.1 5.8 6.4 3.8 4.9 5.7 5.5
6.9 4.0 4.8 5.1 4.3 5.4 6.8 5.9 6.9 5.4
2.4 4.9 7.2 4.2 6.2 5.8 3.8 6.2 5.7 6.8
3.4 5.0 5.2 5.3 3.0 3.6 3.8 5.8 4.9 3.7
Arrange these data as a frequency distribution (forming about 7 classes).
1
So the classes are 2.0 – 3.0, 3.0 – 4.0, 4.0 – 5.0, 5.0 – 6.0 ,
6.0 – 7.0, 7.0 – 8.0 & 8.0 – 9.0
3.7 6.1 5.8 4.1 5.8 6.4 3.8 4.9 5.7
6.9 4.0 4.8 5.1 4.3 5.4 6.8 5.9 6.9
2.4 4.9 7.2 4.2 6.2 5.8 3.8 6.2 5.7
Exclusive Series 3.4 5.0 5.2 5.3 3.0 3.6 3.8 5.8 4.9
• Less than c.f. - frequency for any value of the variable is obtained
on adding successively the frequencies of all the previous value(or
class),including the frequency of variable against which the totals
are written, provided the values (class) are arranged in ascending
order of magnitude .
• More than c.f. - The more than frequency is obtained similarly
by finding the cumulative total of frequencies starting from the
highest value (class) of the variable to the lowest value .
• It is also called the de-cumulative frequency denoted by de-c.f.
• It is obtained by having the total frequency as de-c.f. for the first
class and then subtracting the successive class frequencies from
the de-c.f of the previous class to get the de-c.f of the present
class.
Less than c.f.
Marks No. of Students Less than c.f
5-10 5
10-15 10
15-20 15
20-25 32
25-30 21
Total = 83
More than c.f.
Marks No. of Students More than c.f
5-10 5
10-15 10
15-20 15
20-25 32
25-30 21
Total = 83
More than c.f. by de-c.f
Marks No. of Students More than c.f
5-10 5
10-15 10
15-20 15
20-25 32
25-30 21
Total = 83
Percentage frequencies:
𝐶𝑙𝑎𝑠𝑠 𝑓𝑟𝑒𝑞𝑢𝑒𝑛𝑐𝑦
Percentage frequency = ×100
𝑇𝑜𝑡𝑎𝑙 𝑓𝑟𝑒𝑞𝑢𝑒𝑛𝑐𝑦
𝑓
Percentage frequency = × 100
𝑁
Percentage frequency
Class Tally Freq. f Less than c.f % c.f =
50-200
200-350
350-500
500-650
650-800
800-950
Diagram – 1 dimensional
BAR DIAGRAMS - SUB-DIVIDED BAR Diagram
PERCENTAGE BAR DIAGRAM
MULTIPLE BAR DIAGRAM
2 dimensional – Pie chart
Graphs
• helps to study the mathematical relationship between two variables.
• Graphs are more obvious, precise and accurate and are helpful to
statisticians for the further study.
• Construction of graph is easier as compared to the construction of
diagrams.
The different types of graphs are
• Frequency polygon
• Frequency curve
• Histogram
• Ogive curves –
a) ‘Less than ‘ type
b) ‘more than’ type.
Frequency polygon
Frequency curve
Histogram
A histogram is a plot that lets you discover, and show, the underlying frequency distribution (shape) of a set of continuous data
Ogive Curves / Cumulative frequency curve –
Less than Ogive
Ogive Curves / Cumulative frequency curve –
More than Ogive
Pareto chart/diagram
• A Pareto chart is a type of chart that contains both bars and a line
graph, where individual values are represented in descending order
by bars, and the cumulative total is represented by the line
• The purpose of the Pareto chart is to highlight the most important
among a (typically large) set of factors. In quality control, it often
represents the most common sources of defects, the highest
occurring type of defect, or the most frequent reasons for customer
complaints, and so on.
• It is basically a bar chart showing how much each cause contributes to
an outcome or effect.
Example of Pareto Diagram
Interpretation
• In applying the 80/20 rule, draw a line starting at 80% on the
percentage scale, running parallel to the x-axis and stopping where it
contacts the cumulative percentage curve. The causes that fall to the
left of this point are the causes (the “vital few”) that contribute to
80% of the problems, while the causes to the right are less
important. This can help you focus improvement efforts on the
causes that can have the most impact on the problems.
Advantages
• To analyze the frequency of problems or defects in a process
• To analyze broad causes by examining their individual components
• To help focus efforts on the most significant problems or causes
when there are many