0% found this document useful (0 votes)
28 views121 pages

Chapter1-Introduction To Statistics

This document provides an introduction to statistics including definitions, importance, limitations, branches and application areas. It discusses descriptive and inferential statistics, and provides examples. It also outlines the steps in a statistical investigation including data collection, organization, presentation, analysis and interpretation.

Uploaded by

Tanya Hinduja
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPT, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
28 views121 pages

Chapter1-Introduction To Statistics

This document provides an introduction to statistics including definitions, importance, limitations, branches and application areas. It discusses descriptive and inferential statistics, and provides examples. It also outlines the steps in a statistical investigation including data collection, organization, presentation, analysis and interpretation.

Uploaded by

Tanya Hinduja
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPT, PDF, TXT or read online on Scribd
You are on page 1/ 121

Chapter One

Introduction to Statistics

1
Chapter 1- Introduction to statistics
Introduction:

• 'Statistics‘ derived from the Latin word'


status' or the Italian word' statista' or the
German word' statistik' each of which means a
'political state'.
• Kautilya's Arthshastra -registration of births
and deaths
• Aina,e-Akbari- good records of land and
agricultural statistics.
Introduction:
• The theoretical development of the so-called
modem statistics came during the mid
seventeenth century with the introduction of
'Theory of Probability' and 'Theory of Games and
Chance‘.
• Sir Ronald A Fisher, known as the 'Father of
Statistics' , placed Statistics on a very sound
footing by applying it to various diversified fields,
such as genetics, biometry, education;
agriculture, etc.
Definitions:
It is a science which helps us to collect, analyze and
present data systematically.
It is the process of collecting, processing, summarizing,
presenting, analysing and interpreting of data in order
to study and describe a given problem.
Statistics is the art of learning from data.

Statistics may be regarded as (i)the study of populations,


(ii) the study of variation, and (iii) the study of methods of
the reduction of data.
4
Chapter 1- Introduction to statistics
Definitions:

• The science of Statistics is the method of judging


collective, natural or social phenomenon from the
results obtained from the analysis or enumeration or
collection of estimates.
• Statistics is the science which deals with collection,
classification and tabulation of numerical facts as the
basis for explanation, description and comparison of
phenomenon.
Importance of Statistics:
 It simplifies mass of data (condensation);
 Helps to get concrete information about any problem;
 Helps for reliable and objective decision making;
 It presents facts in a precise & definite form;
 It facilitates comparison(Measures of central
tendency and measures of dispersion);
 It facilitates Predictions (Time series and regression
analysis are the most commonly used methods
towards prediction.);
 It helps in formulation of suitable policies;

6
Chapter 1- Introduction to statistics
Limitation of statistics:
1. Statistics does not deal with individual items;
2. Statistics deals only with quantitatively
expressed items, it does not study qualitative
phenomena;
3. Statistical results are not universally true;
• Statistical laws are only approximations and not exact. Of
• in terms of probability and chance
• Eg. It has been found that 20 % of-a certain surgical operations by a
particular doctor are successful."

4. Statistics is liable/responsible/ to be misused.


• can be moulded and manipulated in any manner to support one's way of
argument and reasoning.

Chapter 1- Introduction to statistics 7


Application areas of statistics
 Engineering:
Improving product design, testing product
performance, determining reliability and
maintainability, working out safer systems of
flight control for airports, etc.
 Business:
Estimating the volume of retail sales, designing
optimum inventory control system, producing
auditing and accounting procedures, improving
working conditions in industrial plants, assessing
the market for new products.

Chapter 1- Introduction to statistics 8


Quality Control:
Determining techniques for evaluation of quality
through adequate sampling, in process control,
consumer survey and experimental design in product
development etc.
Realizing its importance, large organizations are
maintaining their own Statistical Quality Control
Department.
Economics:
Measuring indicators such as volume of trade, size of
labor force, and standard of living, analyzing consumer
behavior, computation of national income accounts,
formulation of economic laws, etc.
Particularly, Regression analysis extensively used in
the field of Economics.
Chapter 1- Introduction to statistics 9
 Health and Medicine:
Developing and testing new drugs, delivering improved medical
care, preventing diagnosing, and treating disease, etc.
Specifically, inferential Statistics has a tremendous application
in the fields of health and medicine.
 Biology:
Exploring the interactions of species with their environment,
creating theoretical models of the nervous system, studying
genetically evolution, etc.
 Psychology:
Measuring learning ability, intelligence, and personality
characteristics, creating psychological scales and abnormal
behavior, etc.
 Sociology:
Testing theories about social systems, designing and conducting
sample surveys to study social attitudes, exploring cross-cultural
differences, studying the growth of human population, etc .
10
Chapter 1- Introduction to statistics
There are two main branches of statistics:
1. Descriptive statistics
2. Inferential statistics
1. Descriptive statistics:
 It is the first phase of Statistics;
 involve any kind of data processing designed to the
collection, organization, presentation, and analyzing
the important features of the data with out
attempting to infer/conclude any thing that goes
beyond the known data.
 In short, descriptive Statistics describes the nature
or characteristics of the observed data (usually a
sample) without making conclusion or
generalization.

11
Chapter 1- Introduction to statistics
The following are some examples of descriptive
Statistics:
 The daily average temperature range of AA was 25 0c
last week .
 The maximum amount of coffee export of Eth. (as
observed from the last 20 years) was in the year
2004.
 The average age of athletes participated in London
Marathon was 25 years.
 75% of the instructors in AAU are male.
 The scores of 50 students in a Mathematics exam are
found to range from 20 to 90.

12
Chapter 1- Introduction to statistics
2. Inferential statistics (Inductive Statistics):

 It is a second phase of Statistics which deals with


techniques of making a generalization that lie outside
the scope of Descriptive Statistics;
 It is concerned with the process of drawing
conclusions (inferences) about specific
characteristics of a population based on information
obtained from samples;
 It is a process of performing hypothesis testing,
determining relationships among variables, and making
predictions.
 The area of inferential statistics entirely needs
the whole aims to give reasonable estimates of
unknown population parameters.

Chapter 1- Introduction to statistics 13


The following are some examples of inferential
Statistics:
The result obtained from the analysis of the income of 1000
randomly selected citizens in Ethiopia suggests that the average
monthly income of a citizen is estimated to be 600 Birr.
Here in the above example we are trying to represent the
income of about entire population of Ethiopia by a sample of 1000
citizens, hence we are making inference or generalization.
Based on the trend analysis on the past observations/data, the
average exchange rate for a dollar is expected to be 18 birr in
the coming month.
The national statistical Bereau of Ethiopia declares the out
come of its survey as “The population of Eth. in the year 2020 will
likely to be 120,000,000.”
From the survey obtained on 15 randomly selected towns of Eth.
it is estimated that 0.1% of the whole urban dwellers are victims
of AIDS virus.

Chapter 1- Introduction to statistics 14


Exercise: Descriptive Statistics or Inferential Statistics
1. The manager of quality control declares the out come of its survey as “the
average life span of the imported light bulbs is 3000 hrs.
2.Of all the patients taking the drug at a local health center 80% of them suffer
from side effect developed.
3.The average score of all the students taking the exam is found to be 72.
4.The national statistical bureau of Eth. declares the out come of its survey for
the last 30 years as “the average annual growth of the people of Ethiopia is
2.8%.
5.The national statistical bureau of Eth. declares the out come of its survey for
the last 30 years as “the population of Ethiopia in the year 2015 will likely to
be 100,000,000.
6.Based on the survey made for the last 10 years 30,000 tourists are expected
to visit Ethiopia.
7.Based on the survey made for the last 20 years the maximum number of
tourists visited Eth. were in the year 1993 .
8.The Ethiopian tourism commission has announced that (as observed for the last
20 years) the average number of tourists arrived Ethiopia per year is 3000.
9.The maximum difference of the salaries of the workers of the company until
the end of last year was birr 5000.

Chapter 1- Introduction to statistics 15


STEPS/STAGES IN STATISTICAL INVESTIGATION

1. Collection of Data:
Data collection is the process of gathering information
or data about the variable of interest. Data are inputs
for Statistical investigation. Data may be obtained
either from primary source or secondary source.
2. Organization of Data
Organization of data includes three major steps.
1. Editing: checking and omitting inconsistencies,
irrelevancies.
2. Classification : task of grouping the collected and
edited data .
3. Tabulation: put the classified data in the form of
table.
Chapter 1- Introduction to statistics 19
3. Presentation of Data
The purpose of presentation in the statistical analysis is to
display what is contained in the data in the form of Charts,
Pictures, Diagrams and Graphs for an easy and better
understanding of the data.

4. Analyzing of Data
 In a statistical investigation, the process of analyzing
data includes finding the various statistical constants
from the collected mass of data such as measures of
central tendencies (averages) , measures of dispersions
and soon.
 It merely involves mathematical operations: different
measures of central tendencies (averages), measures of
variations, regression analysis etc. In its extreme case,
analysis requires the knowledge of advanced
mathematics.
20
Chapter 1- Introduction to statistics 20
5. Interpretation of Data
 involve interpreting the statistical constants
computed in analyzing data for the formation of valid
conclusions and inferences.
 It is the most difficult and skill requiring stage.
 It is at this stage that Statistics seems to be very
much viable to be misused.
 Correct interpretation of results will lead to a valid
conclusion of the study and hence can aid in taking
correct decisions.
 Improper (incorrect) interpretation may lead to
wrong conclusions and makes the whole objective of
the study useless.

21
Chapter 1- Introduction to statistics 21
THE ENGINEERING METHOD AND
STATISTICAL THINKING

 An engineer is someone who solves problems


of interest to society by the efficient
application of scientific principles.

 Engineers accomplish this by either refining


an existing product or process or by designing
a new product or process that meets
customers’ expectations and needs.

Chapter 1- Introduction to statistics 22


Cont’d
 The engineering method features a strong
interplay between the problem, the factors
that may influence its solution, a model of the
phenomenon, and experimentation to verify
the adequacy of the model and the proposed
solution to the problem.
 Specifically, statistical techniques can be a
powerful aid in designing new products and
systems, improving existing designs, and
designing, developing, and improving
production processes.
Chapter 1- Introduction to statistics 25
Cont’d
 Therefore, Engineers must know how to
efficiently plan experiments, collect
data, analyze and interpret the data,
and understand how the observed data
are related to the model that have been
proposed for the problem under study.

Chapter 1- Introduction to statistics 26


Method of data presentation
The purpose of organizing data is to see quickly
some of the characteristics of the data that
have been collected.
Raw data is collected numerical data which has
not been arranged in order of magnitude.
An array is an arranged numerical data in
order of magnitude.

29
Chapter 1- Introduction to statistics 29
Method of data presentation

 Mechanism for reducing and


summarizing data are:

1. Classification
2. Tabular method.
3. Graphical/Diagrammatic method

Chapter 1- Introduction to statistics 30


Classification
The placement of data in different homogeneous
groups formed on the basis of some characteristics or
criteria.
Eg.
•People may be divide according to age groups like 0-
10,10-20,20-30,30-40 etc
•Based on salary like <10,000, 10-20000, >20,000
•Classified data can further be presented in well
defined tables
Classification
Classification is the process of arranging things
in groups or classes according to their
resemblance and affinities, and gives expression
to the unit of attributes that may subsist among
the diversity of individuals
Classification Guidelines
1. Classes should be complete and non-overlapping.
2. Classes should clearly defined
3. Use standardized classes and units of classes should be
same
Types of Classification
(i) Geographical classification
(ii) Chronological classification
(iii) Qualitative classification
(iv) Quantitative classification
Geographical Classification
• Data classified on the basis of area or place
• Areal or spatial classification.
• The areas may be in terms of countries, states, districts, or
zones
• For the purpose of ready reference and ranking, the classes
should be arranged in order of their alphabets or size of the
frequencies respectively.
• However, this type of classification is suitable for those data
which are distributed geographically relating to a
phenomenon viz. population, mineral resources, production,
sales, students of universities etc.
Chronological Classification
• Data classified on the basis of time of their occurrence.
• Time series
• This type of classification is suitable for chose data
which take place in course of time viz. population,
production, sales, results etc.
• Classes are arranged in order of the time which may
begin either with the earliest, or the latest period.
• Weekly, monthly, annually
Qualitative Classification
• Data classified on the basis of certain descriptive
character or qualitative aspect of a phenomenon viz.
gender, beauty, literacy, honesty, intelligence, religion,
eye-sight etc.
• descriptive classification
• Dichotomous or multiclass
Quantitative Classification
• Data classified on the basis of certain variable
that can be measured viz. mark, income,
expenditure, profit, loss, height, weight, age,
price, production etc.
• Classification by variables
Tabular Representation
Arrangement of data in rows and columns.
Table has following parts:
•Title
•Body of the table
•Footnote
•Sourcenote
1. Tabular presentation of data:

The collected raw data should be put into an


ordered array in either ascending or descending
order so that it can be organized in to a
Frequency Distribution (FD)
Numerical data arranged in order of magnitude
along with the corresponding frequency is called
frequency distribution (FD).
FD is of two kinds namely ungrouped /and
grouped frequency distribution.

40
Chapter 1- Introduction to statistics 40
A. Ungrouped (Discrete) Frequency
Distribution
 It is a tabular arrangement of numerical
data in order of magnitude showing the
distinct values with the corresponding
frequencies.

Chapter 1- Introduction to statistics 41


Example:
Suppose the following are test score of 16 students in a
class, write un grouped frequency distribution.
“14, 17, 10, 19, 14, 10, 14, 8, 10, 17, 19, 8, 10, 14, 17, 14”
Sol: the ungrouped frequency distribution:
Array: 8,8,10,10,10,10,14,14,14,14,14, 17,17,17,19,19.
Then the ungrouped frequency distribution is then grouped:-
Test score 8 10 14 17 19
Frequency 2 4 5 3 2

 The difference between the highest and the lowest value


in a given set of observation is called the range (R)
R= L- S, R= 19-8 = 11

Chapter 1- Introduction to statistics 42


B. Grouped (continuous) Frequency
Distribution (GFD)

 It is a tabular arrangement of data in order of


magnitude by classes together with the corresponding
class frequencies.
 In order to estimate the number of classes, the ff
formula is used:
Number of classes=1+3.322(log N) where N is the
Number of observation.

The Class size = Range (round up)


(class width) 1+3.322(log N)

Chapter 1- Introduction to statistics 43


Example:
Grouped/Continuous frequency distribution
where several numbers are grouped into one
class.
e.g.
Student Frequency
age
18-25 5
26-32 15
33-39 10

44
Chapter 1- Introduction to statistics 44
Components of grouped frequency distribution
1. Lower class limit:
is the smallest number that can actually belong to
the respective classes.
2. Upper class limit:
is the largest number that can actually belong to
the respective classes.
3. Class boundaries:
are numbers used to separate adjoining classes
which should not coincide with the actual
observations.
4. Class mark:
is the midpoint of the class.

Chapter 1- Introduction to statistics 45


5. Class width/ Class intervals
is the difference between two consecutive lower
class limits or the two consecutive upper class
limits. (OR)
can be obtained by taking the difference of two
adjoining class marks or two adjoining lower class
boundaries.
Class width = Range/Number of class desired.
Where: Number of classes=1+3.322(log N) where N is
the Number of observation.

6. Unit of measure
is the smallest possible positive difference
between any two measurements in the given data
set that shows the degree of precision.
Chapter 1- Introduction to statistics 46
 Class boundaries:
can be obtained by taking the averages of the
upper class limit of one class and the lower class
limit of the next class.
 Lower class boundaries:
can be obtained by subtracting half a unit of
measure from the lower class limits.
limits
 Upper class boundaries:
can be obtained by adding half the unit of measure
to the upper class limits.

Chapter 1- Introduction to statistics 47


Example1 :

Suppose the table below is the frequency distribution


of test score of 50 students.
Then the frequency table has 6 classes (class intervals).
Test score Frequency
11-15 7
16-20 8
21-25 10
26-30 12
31-35 9
36-40 4
What is the Unit of Measure, LCLs, UCLs, LCBs, UCBs, CW, and CM

Chapter 1- Introduction to statistics 48


 The unit of measure is 1
 The lower class limits are:-11, 16, 21, 26, 31, 36
 The upper class limits are:- 15, 20, 25, 30, 35, 40
 The class marks are:- 13((11+15)/2), 18, 23, 28, 33, 38
 The lower class boundaries are:- 10.5(11-0.5), 15.5, ….,
35,5
 The upper class boundaries are:- 15.5(15+0.5), ….35.5,
40.5
 Class width (size) is 5.

Chapter 1- Introduction to statistics 49


Rules to construct Grouped Frequency
Distribution (GFD):

i. Find the unit of measure of the given data;


ii. Find the range;
range
iii.Determine the number of classes required;
iv. Find class width (size);
(size)
v. Determine a lowest class limit and then find the
successive lower and upper class limits forming non
over lapping intervals such that each observation falls
into exactly one of the class intervals;
vi. Find the number of observations falling into each
class intervals that is taken as the frequency of the
class (class interval) which is best done using a tally.

Chapter 1- Introduction to statistics 50


Exercise:

Construct a GFD of the following aptitude test scores


of 40 applicants for accountancy positions in a
company with
a. 6 classes b. 8 classes

96 89 58 61 46 59 75 54
41 56 77 49 58 60 63 82
66 64 69 67 62 55 67 70
78 65 52 76 69 86 44 76
57 68 64 52 53 74 68 39

Chapter 1- Introduction to statistics 51


Exercise:

• No. of observations=40
• Range=96-39=57
• No. of classes=1+3.32*LOG(40)=6.3
• Class width=57/6=9

Classes Frequency
39.0 48.0 4
48.0 57.0 7
57.0 66.0 11
66.0 75.0 9
75.0 84.0 6
84.0 93.0 3
Exercise

• Range=4.2-2.0=2.2
• Class interval=2.2/(1+3.32log(30))=0.4
• No of classes=1+3.32(log(30))=5.9
Exercise
Types of Grouped Frequency Distribution

1. Relative frequency distribution (RFD)

2. Cumulative Frequency Distribution (CFD)

3. Relative Cumulative Frequency Distribution


(RCFD)

Chapter 1- Introduction to statistics 55


Types of Grouped Frequency Distribution
1. Relative frequency distribution (RFD):
 A table presenting the ratio of the
frequency of each class to the total
frequency of all the classes.
classes
 Relative frequency generally expressed
as a percentage,
percentage used to show the
percent of the total number of
observation in each class.

Chapter 1- Introduction to statistics 56


For example

Test score F RFD PFD


37.5-47.5 4 4/40=0.1 10%
47.5-57.5 8 8/40=0.2 20%
57.5-67.5 13 13/40=0.325 32.5%
67.5-77.5 10 10/40=0.25 25%
77.5-87.5 3 3/40=0.075 7.5%
87.5-97.5 2 2/40=0.05 5%

57
Chapter 1- Introduction to statistics
2. Cumulative Frequency Distribution (CFD):

It is applicable when we want to know how many


observations lie below or above a certain value/class
boundary.

CFD is of two types, LCFD and MCFD:


 Less than Cumulative Frequency Distribution
(LCFD): shows the collection of cases lying below
the upper class boundaries of each class.
 More than Cumulative Frequency Distribution
(MCFD): shows the collection of cases lying above
the lower class boundaries of each class.

58
Chapter 1- Introduction to statistics 58
59
Chapter 1- Introduction to statistics 59
3. Relative Cumulative Frequency Distribution (RCFD)
It is used to determine the ratio or the percentage of
observations that lie below or above a certain value/class
boundary, to the total frequency of all the classes. These
are of two types: The LRCFD and MRCFD.
 Less than Relative Cumulative Frequency Distribution
(LRCFD): A table presenting the ratio of the cumulative
frequency less than upper class boundary of each class to
the total frequency of all the classes
 More than Relative Cumulative Frequency Distribution
(MRCFD): A table presenting the ratio of the cumulative
frequency more than lower class boundary of each class to
the total frequency of all the classes.

Chapter 1- Introduction to statistics 60


LRCFD
Test score LCF LRCF LPCF
Less than 37.5 0 0/40=0 0%
Less than 47.5 4 4/40=0.1 10%
Less than 57.5 12 12/40=0.3 30%
Less than 67.5 25 25/40=0.625 62.5%
Less than 77.5 35 35/40=0.875 87.5%
Less than 87.5 38 38/40=0.95 95%
Less than 97.5 40 40/40=1 100%

Chapter 1- Introduction to statistics 61


MRCFD

Test score MC F MR C F MP C F
More than 37.5 40 40/40=1 100%
More than 47.5 36 36/40=0.9 90%
More than 57.5 28 28/40=0.7 70%
More than 67.5 15 15/40=0.375 37.5%
More than 77.5 5 5/40=0.125 12.5%
More than 87.5 2 2/40=0.05 5%
More than 97.5 0 0/40=0 0%

Chapter 1- Introduction to statistics 62


• 6,7
• 10,12,1215,19
• 21,23,25
Graphic Methods of Data presentation

1. Histogram
2. Frequency Polygon (Line graph)

3. Cumulative frequency curve (o-give)

64
Chapter 1- Introduction to statistics 64
Line and Bar Graph
Suitable for Discrete variables
Bar Graph
Component Bar Diagram
Component Bar Diagram
Component Bar Diagram
• Shows breakup of each part
• Helpful for comparison of parts and
aggregates
The budgets of two famalies can be compared
by _____________.
a)All of these options
b)Bar Chart
c)Cluster Bar chart
d)Sub-divided rectangles
Histogram:

 A graphical presentation of grouped frequency


distribution consisting of a series of adjacent
rectangles whose bases are the class intervals
specified in terms of class boundaries (equal to the
class width of the corresponding classes) shown on
the x-axis and whose heights are proportional to the
corresponding class frequencies shown on the y-axis.
Suitable for continuous classes

Chapter 1- Introduction to statistics 71


Histogram: E.g.

Freq.
20

15

10
3-D Column 1
5

0
20 - 3030 - 4040 - 5050 - 60 60 - 70 70 -80

Chapter 1- Introduction to statistics 72


Steps to draw Histogram

i. Mark the class boundaries on the horizontal


axis (x- axis) and the class frequencies along
the vertical axis ( y- axis) according to a
suitable scale.
ii. With each interval as a base draw a
rectangle whose height equals the frequency
of the corresponding class interval. It
describes the shape of the data.

Chapter 1- Introduction to statistics 73


Histogram
MCQ
What does a histogram show?
a)A histogram is a graph in which values of observations are plotted on the
horizontal axis, and their density is plotted on the vertical axis.
b)A histogram is a graph in which levels of the independent variable are
plotted on the horizontal axis, and the mean of observations is plotted on the
vertical axis.
c)A histogram is a graph in which values of observations are plotted on the
horizontal axis, and the frequency with which each value occurs in the data
set is plotted on the vertical axis.
d)A histogram is a graph in which values of one variable are plotted against
values of a different variable.
MCQ
Consider the following statements in respect of histogram:
1. Histogram is an equivalent graphical representation of the frequency
distribution.
2. Histogram is suitable for continuous random variables, where the total
frequency of an interval is evenly distributed over the interval.
Which of the statements given above is/are correct?

a)1 only
b)2 only
c)Both 1 and 2
d)Neither 1 nor 2
MCQ

What is the shape of this histogram?

a) Symmetrical

b) Skewed left

c) Skewed right

d) Rotational
MCQ
In constructing a histogram, if the class interval size of one class is double
than others, then the width of that bar should be
a)Doubled
b)Half
c)One
d)Quarter
2. Frequency Polygon:
It is a line graph of grouped frequency
distribution in which the class frequency is
plotted against class mark that are
subsequently connected by a series of line
segments to form line graph including classes
with zero frequencies at both ends of the
distribution to form a polygon.

79
Chapter 1- Introduction to statistics
Frequency Polygon:

20

15

10 Line 1

0
20 - 30 30 - 40 40 - 50 50 - 60 60 - 70 70 -80 80

80
Chapter 1- Introduction to statistics
Frequency Polygon
Steps to draw Frequency polygon

i. Mark the class mid points on the x-axis and the


frequency on the y-axis.
ii. Mark dots which correspond to the frequency
of the marked class mid points.
iii. Join each successive dot by a series of line
segments to form line graph, including classes
with zero frequencies at both ends of the
distribution to form a polygon.

Chapter 1- Introduction to statistics 82


3. O-GIVE curve (Cumulative Frequency Curve /
percentage Cumulative Frequency Curve)
 It is a line graph presenting the cumulative frequency
distribution.
 O-gives are of two types: The Less than O-give and The
More than O-give.
 The Less than O-give shows the cumulative frequency
less than the upper class boundary of each class; and
 The More than O-give shows the cumulative frequency
more than the lower class boundary of each class.

83
Chapter 1- Introduction to statistics 83
Ogive: E.g.

45
40
35
30
25
Line 1
20
15
10
5
0
20 30 40 50 60 70 80

84
Chapter 1- Introduction to statistics 84
Steps to draw O-gives
i. Mark class boundaries on the x-axis and mark non overlapping
intervals of equal length on the y-axis to represent the
cumulative frequencies.
ii. For each class boundaries marked on the x-axis, plot a point with
height equal to the corresponding cumulative frequencies.
iii. Connect the marked points by a series of line segments where
the less than O-give is done by plotting the less than cumulative
frequency against the upper class boundaries

Chapter 1- Introduction to statistics 85


MCQ
Cumulative Frequency Curve is also called
a)Ogive
b)Frequency Curve
c)Histogram
d)Frequency Polygon
Pie Chart
• Circle divided into component sectors
according to the break-up of components given
in percentage.
• Different colors are given to different sectors.
Pie Chart
Pie Chart
Pie Chart
Diagrammatic Presentation Of Data

• Bar charts
• Pie chart
• Pictograph and
• Pareto diagram

Chapter 1- Introduction to statistics 91


Outline

1.3 Measures of Central Tendency


1.4 Measures of Dispersion
1.5 Measure of skewness and kurtosis

Chapter 1- Introduction to statistics 92


1.3 Measures of Central Tendency
 Central value refers to the location of the centre of the
distribution of data.
 Measure of central value:
The mean
The mode
The median
Percentile/Quantiles and
Midrange

Chapter 1- Introduction to statistics 93


The Mean:
 It is the most commonly used measures of central
value.
 Types of Mean:
1. Arithmetic Mean
2. Geometric Mean
3. Harmonic Mean
4. Quadratic Mean
5. Trimmed Mean
6. Weighted Mean
7. Combination mean
Chapter 1- Introduction to statistics 94
1. Arithmetic Mean (simply Mean)
Mean
 The mean is defined as the arithmetic average of
all the values.
 It is represented by x read as x-bar for a
sample and by µ for a population
fc N

 mi x i
x i 1
µ i 1
fc N
Chapter 1- Introduction to statistics 95
Advantages
• It is the most commonly used measure of location or
central tendency for continuous variables.
• The arithmetic mean uses all observations in the data
set.
• All observations are given equal weight.
Disadvantage
• The mean is affected by extreme values that may not be
representative of the sample.

Chapter 1- Introduction to statistics 96


2. Geometric mean
• It is the nth root of the product of the data
elements.
• It is used in business to find average rates of
growth.
Geometric mean = , for all n >=2.

Chapter 1- Introduction to statistics 97


• Example: suppose you have an IRA (Individual
Retirement Account) which earned annual interest
rates of 5%, 10%, and 25%.
Solution: The proper average would be the
geometric mean or the cube root of (1.05 • 1.10 •
1.25) or about 1.13 meaning 13%.
• Note that the data elements must be positive.
Negative growth is represented by positive values
less than 1. Thus, if one of the accounts lost 5%, the
proper multiplier would be 0.95.
Chapter 1- Introduction to statistics 98
3. Harmonic mean
 It is used to calculate average rates.
 It is found by dividing the number of data
elements by the sum of the reciprocals of each
data element.

Harmonic mean =

Chapter 1- Introduction to statistics 99


• Example: Suppose a boy rode a bicycle three miles. Due to
the topography, for the first mile he rode 2 mph; for the
second mile 3 mph; for the final mile the average speed was 4
mph. What was the average speed for the three miles?
Solution: The arithmetic mean of (2+3+4)/3 = 3.0 is incorrect.
This would imply it took 1 hour. Breaking it down into the
separate components, it takes 30 minutes (1st) + 20 minutes
(2nd) + 15 minutes (3rd) to walk (each mile) or 65 minutes total.
His actual speed was thus 3/1.083 or 2.77 mph.
• Another way to show our work would be:
3 miles = 3 . = 36 = 2.77mph
1/2+1/3+1/4 13/12 13

Chapter 1- Introduction to statistics 100


4. Quadratic mean
• It is another name for Root Mean Square or
RMS.
• RMS is typically used for data whose
arithmetic mean is zero.

Chapter 1- Introduction to statistics 101


Example
• Suppose measurements of 120, -150, and 75
volts were obtained.
Solution: The corresponding quadratic mean is
√((1202 + (-150)2 + 752)/3) or 119 volts RMS.
• The quadratic mean gives a physical measure of
the average distance from zero.

Chapter 1- Introduction to statistics 102


5. Trimmed mean
 It is usually refers to the arithmetic mean
without the top 10% and bottom 10% of the
ordered scores;
 Removes extreme scores on both the high and
low ends of the data.
• A truncated mean or trimmed mean

Chapter 1- Introduction to statistics 103


6. Weighted mean
 It is the average of differently weighted scores;
 It takes into account some measure of weight
attached to different scores.

Chapter 1- Introduction to statistics 104


The Mode
• The mode is the most frequent or most typical
value.
• The mode will not always be the central value;
in fact it may sometimes be an extreme value.
Also, a sample may have more than one mode.
Bimodal or Multimodal

Chapter 1- Introduction to statistics 105


Example
‘23, 22, 12, 14, 22, 18, 20, 22, 18, 18’

The mode is 18 and 22 - bimodal

Chapter 1- Introduction to statistics 106


Advantages
• Requires no calculations.
• Represents the value that occurs most often.
Disadvantage
• The mode for continuous measurements is
dependent on the grouping of the intervals.
• We may not have mode

Chapter 1- Introduction to statistics 107


The Median
 The median is the middle value of a group of an
odd number of observations when the data is
arranged in increasing or decreasing order
magnitude.
 If the number of values is even, the median is
the average of the two middle values.

Chapter 1- Introduction to statistics 108


Example
‘23, 22, 12, 14, 22, 18, 20, 22, 18, 18’
Array: 12, 14, 18, 18, 18, 20, 22, 22,22,23
• The median (18+20)/2 = 19

Chapter 1- Introduction to statistics 109


Advantages
• The median always exists and is unique.
• The median is not affected by extreme values.
Disadvantages
• The values must be sorted in order of magnitude.
• The median uses only one (or two) observations.

Chapter 1- Introduction to statistics 110


Percentiles / Quartiles
• Percentiles are values that divide a
distribution into two groups where the Pth
percentile is larger than P% of the values.
• Some specific percentiles have special names:
First Quartile : Q1 = the 25 percentile
Median : Q2 = the 50 percentile

Chapter 1- Introduction to statistics 111


Interpretation
• Q1 = a. This means that 25% of the data values are
smaller than a
• Q2 = b. This means that 50 % of the data has values
smaller than b.
• Q3 = c. This means that 75 % of the data has values
smaller than c.

Chapter 1- Introduction to statistics 112


Midrange
• The midrange is the average of largest and
smallest observation.
Midrange = (Largest +Smallest)/2
• The percentile estimate (P25 + P75)/2 is
sometimes used when there are a large
number of observations.

Chapter 1- Introduction to statistics 113


1.4. Measure of Dispersion (of Variability)

 A measure of dispersion indicates how the


observations are spread about the central
value.
 Measures of dispersion are:
The range
The variance
The standard deviation and
The coefficient of variation
Chapter 1- Introduction to statistics 114
The range
 The range is the difference between the
largest and the smallest value in the sample.
 The range is the easiest of all measures of
dispersion to calculate.
R = Maximum Value - Minimum Value

Chapter 1- Introduction to statistics 115


Advantage
 The range is easily understood and gives a quick
estimate of dispersion.
 The range is easy to calculate
Disadvantage
 The range is inefficient because it only uses the
extreme value and ignores all other available data.
The larger the sample size, the more inefficient the
range becomes.

Chapter 1- Introduction to statistics 116


The Variance
 The variance is the mean square deviation of
the observations from the mean. To calculate
the variance, the following equation is used:

s 2
2 
 fi * (m  x ) i
2

n 1
Chapter 1- Introduction to statistics 117
Advantages
• The variance is an efficient estimator
• Variances can be added and averaged
Disadvantage
• The calculation of the variance can be tedious
without the aid of a calculator or computer

Chapter 1- Introduction to statistics 118


The Standard Deviation
 The square root of the variance is known as
the standard deviation. The symbol for the
standard deviation is s.

s s 2

Chapter 1- Introduction to statistics 119


Advantages
• The standard deviation is in the same dimension as
the observed values.
• The standard deviation is an efficient estimator.
Disadvantages
• The calculations can be tedious without the aid of a
good calculator

Chapter 1- Introduction to statistics 120


The coefficient of variation
 The coefficient of variation is a measure of
relative dispersion, that is, a measure which
expresses the magnitude of the variation to
the size of the quantity that is being measured
and is expressed as a percent.

CV  ( s / x ) *100
Chapter 1- Introduction to statistics 121
Advantages
• The coefficient of variation can be used for comparing
the variation in different populations of data that are
measured in two different units. (because the CV is
unitless)
Disadvantages
• The coefficient of variation fails to be useful when x is
close to zero.
• The coefficient of variation is often misunderstood and
misused.
Chapter 1- Introduction to statistics 122
The Interquartile range
• It is the difference between the 25th and the
75th quartiles.

Interquartile range = Q3 – Q1

Chapter 1- Introduction to statistics 123


Measure of skewness and kurtosis
Skewness (mean deviation)
 Skewness is a measure of the tendency of the deviations to be
larger in one direction than in the other.
 Skewness is the degree of asymmetry or departure from symmetry
of a distribution.
 If the frequency curve of a distribution has a longer tail to the
right of the central value than to the left, the distribution is said to
be skewed to the right or to have positive skewness.
 If the reverse is true, it is said that the distribution is skewed to
the left or has negative skewness.

Chapter 1- Introduction to statistics 124


Chapter 1- Introduction to statistics 125
 Population skewness is defined as
E(x  ) 3

 3

where E stands for expected value


 A bell-shaped distribution which has no skewness, i.e., mean =
median = mode is called a normal distribution.
 f mean > Median > mode, the distribution is positively
skewed distribution or it is said to be skewed to the right.
 If mean < Median < mode, the distribution is negatively
skewed distribution or it is said to be skewed to the left.
Chapter 1- Introduction to statistics 126
Chapter 1- Introduction to statistics 127
Measure of kurtosis
 Kurtosis characterises the relative peakedness or
flatness of a distribution compared with the normal
distribution.
 Positive kurtosis indicates a relatively peaked
distribution.
 Negative kurtosis indicates a relatively flat
distribution.
 The population kurtosis is usually defined as:
E(x   ) 4
3
N 4
Chapter 1- Introduction to statistics 128

You might also like