0% found this document useful (0 votes)
93 views258 pages

Statistics

This document discusses key concepts in statistics including: - Defining statistics and classifying it into descriptive and inferential categories - Outlining the main stages of statistical investigation including data collection, organization, presentation, analysis, and interpretation - Describing common statistical terms like population, sample, variable, and parameter - Explaining applications of statistics in fields like engineering, economics, and research - Identifying uses of statistics such as summarizing data, comparing data sets, and influencing government policy - Noting limitations of statistics including not dealing with individual values or qualitative characteristics directly

Uploaded by

firanreg
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
93 views258 pages

Statistics

This document discusses key concepts in statistics including: - Defining statistics and classifying it into descriptive and inferential categories - Outlining the main stages of statistical investigation including data collection, organization, presentation, analysis, and interpretation - Describing common statistical terms like population, sample, variable, and parameter - Explaining applications of statistics in fields like engineering, economics, and research - Identifying uses of statistics such as summarizing data, comparing data sets, and influencing government policy - Noting limitations of statistics including not dealing with individual values or qualitative characteristics directly

Uploaded by

firanreg
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 258

Learning objectives:

At the end of this section students will be able to:

 Define statistics and understand classification of statistics

 List and describe stages in statistical investigation

 Familiar with some frequently used concepts/terms/


definitions in statistics
 Identify the different types of variables and scale of
measurement
 List the main application, uses and limitation of
statistical
 Present the collected data using table and graph or
diagrams.

11/13/2023 Introduction to Statistics 1


Introduction
 Statistics is used in almost all fields of human endeavor
 In the recent past statistics has become part of
• The natural science
• Social science
• Research
• Business
• Management
• Planning
• Economics
• Industry
• Behavioral sciences
• Agriculture and many other experimental sciences.
11/13/2023 Introduction to Statistics 2
Definition of statistics
Statistics: we can define it in two senses
 In the plural sense : statistics are the raw data
themselves , like statistics of births, statistics
of deaths, statistics of students, statistics of
imports and exports, etc.
 In the singular sense : statistics is the subject
that deals with the collection, organization,
presentation, analysis and interpretation of
numerical data

11/13/2023 Introduction to Statistics 3


Classification of statistics
 Depending on how data can be used statistics is
divided in to two main branches
• Descriptive
• Inferential
 Descriptive statistics : deals only with describing
some characteristics of the data without going
beyond the data.
E.g
 20% of the students in the class are married
 5% of software engineering DMU student have pc in
2011E.C.
11/13/2023 Introduction to Statistics 4
Classification of statistics……..
 Inferential statistics : is a method used to
generalize from a sample to a population
• This is concerned with drawing statistically
valid conclusions about the characteristics of
the population based on information from a
sample
• It is part of statistics which is concerned with
generalizing from sample to population using
probability
11/13/2023 Introduction to Statistics 5
Classification of statistics……..
• performing:
 hypothesis testing
 determining relationship between variables
 making prediction.
E.g
• The average age of University student is 19.1 years in
Ethiopia.
• There is a relationship between smoking tobacco and
an increased risk of developing cancer.
• There is the relationship between downloading video
and increased risk of pc infected by virus.
11/13/2023 Introduction to Statistics 6
Stages in statistical investigation

 The area of statistics incorporates the


following five stages.
• Proper data collection
• Data organization
• Data presentation
• Analysis and
• interpretation of the analysis result

11/13/2023 Introduction to Statistics 7


Stages in statistical investigation…….

 Proper data collection:- this is the 1st step in statistical


investigation
• It is necessary and important that such data be carefully
and accurately collected, accumulated and recorded.
• Faulty data or faulty collection of data techniques
would result in wrong conclusions.
 Data can be collected in a variety of ways;
most common methods are:
• Telephone survey
• Mailed questionnaire
• Personal interview
11/13/2023 Introduction to Statistics 8
Stages in statistical investigation…….

 Data organization : The collected data might


involve irrelevant figures, incorrect facts,
omissions and mistakes.
• Errors that may have been included during
data collection will have to be edited
• After editing, we may classify (arrange) data
according to their common characteristics,
which is called organizing.

11/13/2023 Introduction to Statistics 9


Stages in statistical investigation…….

 Data presentation: this stage is presenting the


organized data in the form of tables, diagrams and
graphs to be given a valid meaning and make the
presentation attractive.
 Data analysis: this is the stage where we critically
study to draw conclusion about them. Analysis
usually involves highly complex and sophisticated
mathematical techniques.
 Interpretation : This is the stage where we draw
valid conclusion from the results obtained through
data analysis.
11/13/2023 Introduction to Statistics 10
Definitions of some terms
• Population: A collection of items that have
something in common for which we wish to
draw conclusions at a particular time.
• Sample: - A subset of a population, about
which information is actually obtained. It is a
part of a population.

11/13/2023 Introduction to Statistics 11


Definitions of some terms……….
• Census: is the process of collecting data covering all
the units in the population.
• Parameters: are numerical characteristics of the
population defined for each variable of interest. A
descriptive measure computed from the data of a
population.
• Sample survey: A statistical study based on samples.
• Statistic: is a measure, which is obtained from the
sample data to make statements about an unknown
parameter, or is a measure obtained from a sample.

11/13/2023 Introduction to Statistics 12


Definitions of some terms……….

11/13/2023 Introduction to Statistics 13


Definitions of some terms……….
• Sampling: The process or method of sample
selection from the population.
• Sample size: The number of elements or
observation to be included in the sample.
• Frame: is a list of element covering the survey
population, serves as a base for sample selection.
• Variable: a characteristics or attribute associated
with each unit in the population that can assume
different values.

11/13/2023 Introduction to Statistics 14


Discussion
1. To assess the satisfaction of students at
Debremarkoss University about Cafeteria ,
the Amhara television reporter interviews 50
students.
2. To assess the academic performance of
female students in Debremarkoss university,
the university Female and AHIV Directorate
Director select 500 female students
proportionally in each college.

11/13/2023 Introduction to Statistics 15


Discussion…………
• What is the population?
• What is the sample here?
• How mach Sample size?
• What is the sample frame?

11/13/2023 Introduction to Statistics 16


Applications of Statistics

 Statistics can be applied in any field of study for


instance, engineering, economics, natural science, etc.
 Engineering: Statistics have wide application in
engineering.
• To compare the breaking strength of two types of
materials
• To determine the probability of reliability of a product.
• To control the quality of products in a given production
process.
• To compare the improvement of yield due to certain
additives (fertilizer, herbicides ), e t c
11/13/2023 Introduction to Statistics 17
Applications of Statistics………

 Economics: Statistics are widely used in economics


study and research.
• To measure and forecast Gross National Product
(GNP)
• Statistical analyses of population growth,
unemployment figures, rural or urban population
shifts and so on influence much of the economic
policy making
• Financial statistics are necessary in the fields of
money and banking.
11/13/2023 Introduction to Statistics 18
Applications of Statistics………

 Statistics and research: there is hardly any


advanced research going on without the use
of statistics in one form or another. Statistics
are used extensively in medical,
pharmaceutical and agricultural research.

11/13/2023 Introduction to Statistics 19


Uses of Statistics

 Today the field of statistics is recognized as a


highly useful tool to making decision process.
It has a lot of functions in everyday activities.
The following are some uses of statistics:
 Statistics is used summarizes complex data:
the original set of data (raw data) is normally
voluminous and disorganized unless it is
summarized and expressed in few numerical
values.

11/13/2023 Introduction to Statistics 20


Uses of Statistics

 Statistics is used to comparison of data: measures obtained


from different set of data can be compared to draw conclusion
about those sets.
 Statistics helps in predicting future trends: statistics is
extremely useful for analyzing the past and present data and
predicting some future trends.
 Statistics influences the policies of government: statistical
study results in the areas of taxation, on unemployment rate,
etc, may convince a government to review its policies and
plans with the view to meet national needs and aspirations.
 Statistical methods are very helpful in formulating and testing
hypothesis and to develop new theories.

11/13/2023 Introduction to Statistics 21


Limitation of statistics
 As a science statistics has its own limitations.
Some of these limitations are:
 It does not deal with individual values:
statistics deals with aggregate of values
 It does not deal with qualitative
characteristics directly: statistics is not
applicable to qualitative characteristics such as
beauty, honesty, poverty, standard of living
and so on since these cannot be expressed in
quantitative terms.
11/13/2023 Introduction to Statistics 22
Limitation of statistics…………
 Statistical conclusions are not universally true
• The statistical conclusions are true only under
certain assumptions.
• Deals extensively with the laws of probability
which at best are educated guesses.
• For example, if we toss a coin 10 times where the
chances of a head or a tail are 1:1, we cannot say
with certainty that there will be 5 heads and 5
tails. Thus the statistical laws are only
approximations.
11/13/2023 Introduction to Statistics 23
Types of Variables

 Quantitative variables: are variables that can


be quantified or can have numerical values.
Examples: height, area, income, temperature
Quantitative variables can be classified as
• Discrete and Continuous variables
 Discrete variables are variables whose values
are counts. Meaningless in ratio form
Eg. Number of students in the class

11/13/2023 Introduction to Statistics 24


Types of Variables
 Continuous variables are variables that can
have any value within an interval.
Eg. Height, weight, etc.
 Qualitative variables: are variables that
cannot be quantified directly.
Examples: color, sex, location, political
affiliation, and so on

11/13/2023 Introduction to Statistics 25


Scale of Measurement
The scale of measurement determines which
statistical calculations are meaningful.

The four scale of measurement are: nominal,


ordinal, interval, and ratio.
Nominal
Lowest to
Scale of Ordinal highest
Measurement Interval
Ratio

11/13/2023 Introduction to Statistics 26


Nominal Scale of Measurement
Data at the nominal scale of measurement are
qualitative only.
Nominal
Calculated using names, labels, or
Scale of qualities. No mathematical
Measurement computations can be made at this level.

Colors in the
Blood type Religion
Ethiopian flag

11/13/2023 Introduction to Statistics 27


Ordinal Scale of Measurement
Data at the ordinal scale of measurement are qualitative
or quantitative.

Scale of
Measurement Ordinal
Arranged in order, but differences
between data entries are not
meaningful.

Class standings: Severity of disease as: Top 10 songs played


freshman, sophomore, healthy, mild, moderate, by Ethiopian idol
junior, senior sever

11/13/2023 Introduction to Statistics 28


Interval Scale of Measurement
Data at the interval scale of measurement are quantitative. A zero
entry simply represents a position on a scale; the entry is not an
inherent zero.

Scale of Interval
Measurement
Arranged in order, the differences between data
entries can be calculated.

Temperatures Years on a timeline IQ level

11/13/2023 Introduction to Statistics 29


Ratio Scale of Measurement
Data at the ratio scale of measurement are similar to the interval
level, but a zero entry is meaningful.

A ratio of two data values can be formed so one


Scale of data value can be expressed as a ratio.
Measurement

Ratio

Grade point averages


Ages Weights

11/13/2023 Introduction to Statistics 30


Summary of Scale of Measurement

Arrange Determine if one data


Level of Put data in Subtract data
data in value is a multiple of
measurement categories values
order another

Nominal Yes No No No
Ordinal Yes Yes No No
Interval Yes Yes Yes No
Ratio Yes Yes Yes Yes
Discussion
 Give the correct variable type and scales of measurement for
each variable
1. Blood group ...............................
2. Status of pc infected by virus ................................
3. Job satisfaction index (1-5).........
4. Number of heart attacks .............
5. Calendar year ...........................
6. RAM of a computer..........
7. Number of traffic accidents in a 3 - day ....
8. Number of cases of each reportable disease
reported by a health worker.......
11. Ethnic group..........................
11/13/2023 Introduction to Statistics 32
Method of data collection and presentation
Objectives

 At the end of this session students will be able to:


 Identify appropriate data collection method for a
given problems

 organize data using frequency distribution

 present data using suitable graphs or diagrams.

11/13/2023 Method of data collection and presentation 33


Method of data collection and presentation
Types of data based on source

 Source of data: Data may be obtained from two sources,


primary and secondary.
 Primary sources: sources that can supply first hand
information for immediate use.
• Example: observe signs, measure characteristics, record
symptoms and interview respondent, etc.
 Secondary sources: the source in which data are obtained
from records of individual that have been collected by
persons other than the investigator for other purpose.
• Example: Hospital records, vital statistics and registers, etc.

11/13/2023 Method of data collection and presentation 34


Method of data collection………
There are three major methods of data
collection
 Observation or measurement

 Interviews and questionnaires

 The use of documentary sources

11/13/2023 Method of data collection and presentation 35


Method of data collection………
I. Observation or measurement: In this case
data can be obtained through direct
observation or measurement.
 Provides accurate information but it is
expensive and inconvenient.
• Example: physical examination, clinical
measurements, laboratory tests etc.

11/13/2023 Method of data collection and presentation 36


Method of data collection………
II. Interviews and questionnaires
 Direct personal Interview(in depth
interview)
 Face To Face Interviews ( with structure
questionnaire)
 Telephone Interviews
 Self administered questionnaire returned by
mail (mailed questionnaire)

11/13/2023 Method of data collection and presentation 37


Method of data collection………
III. The use of documentary sources
 Extracting information from existing sources
(e.g. Hospital records)
• is much less expensive than the other two
methods. It can be an important source of
data.
• Limitation: It is difficult to get information
needed, when records are compiled in un-
standardized manner.

11/13/2023 Method of data collection and presentation 38


Method of data collection………
Source of data

Primary Secondary

Method of data collection

Observation Interviews and The use of


or questionnaires documentary
measurement sources

11/13/2023 Method of data collection and presentation 39


Method of data presentation
The presentation of data is broadly classified in
to the following two categories:

 Tabular presentation
 Diagrammatic and Graphic presentation.

11/13/2023 Method of data collection and presentation 40


 Tabular presentation: Tabular presentation is a
systematic arrangement of statistical data in
column and rows
Definitions of some terms
• Raw data: recorded information in its original
collected form, is referred to as raw data.
• Frequency: is the number of values in a specific
class of the distribution.
• Frequency distribution: is the organization of raw
data in table form using classes and frequencies.
11/13/2023 Method of data collection and presentation 41
There are three basic types of frequency
distributions
• Categorical frequency distribution
• Ungrouped frequency distribution
• Grouped frequency distribution

11/13/2023 Method of data collection and presentation 42


Categorical frequency Distribution:
 Used for data that can be place in specific
categories such as nominal, or ordinal.
 Components of CFD:
• Class
• Tally
• Frequency
• percent

11/13/2023 Method of data collection and presentation 43


• Example: A social worker collected the
following data on marital status for 25
administrative workers from Debre markos
university .(M=married, S=single,
W=widowed, D=divorced)
M S D W D
S S M M M
W D S M M
W D D S S
S W W D D
11/13/2023 Method of data collection and presentation 44
Table 1 : marital status of administrative workers from
Debre markos university, 2009 EC.

Class Tally Frequency Percent


married //// / 6 24 %
Single //// // 7 28 %
Widowed //// 5 20 %
Divorced //// // 7 28 %

11/13/2023 Method of data collection and presentation 45


Ungrouped Frequency Distribution:
• Ungrouped frequency distribution is often
constructed for small set of data or a discrete
variable.
Constructing ungrouped frequency distribution:
• First find the smallest and largest raw score in
the collected data.
• Arrange the data in order of magnitude and
count the frequency.

11/13/2023 Method of data collection and presentation 46


• Example:
• The following data represent the mark of 20
students.
80 76 90 85 80
70 60 62 70 85
65 60 63 74 75
76 70 70 80 85

11/13/2023 Method of data collection and presentation 47


Mark Tally Frequency
60 // 2
62 / 1
63 / 1
65 / 1
70 //// 4
74 / 1
75 // 2
76 / 1
80 /// 3
85 /// 3
90 / 1
11/13/2023 Method of data collection and presentation 48
Grouped frequency Distribution:
 When the range of the data is large, the data must be grouped in
to classes that are more than one unit in width.
Definitions:
• Class limits: The limits could actually appear in the data and have
gaps between the upper limits of one class and lower limit of the
next.
• Units of measurement (U): the distance between two possible
consecutive measures. It is usually taken as 1, 0.1, 0.01, 0.001,
-----.
• Class boundaries: The boundaries have one more decimal places
than the row data and therefore do not appear in the data. The
lower class boundary is found by subtracting U/2 from the
corresponding lower class limit and the upper class boundary is
found by adding U/2 to the corresponding upper class limit.
11/13/2023 Method of data collection and presentation 49
• Class width: the difference between the upper and lower class
boundaries of any class. It is also the difference between the
lower limits of any two consecutive classes
• Class mark (Mid points): it is the average of the lower and upper
class limits or the average of upper and lower class boundary.
• Cumulative frequency: is the number of observations less
than/more than or equal to a specific value.
• Cumulative frequency above: it is the total frequency of all values
greater than or equal to the lower class boundary of a given class.
• Cumulative frequency blow: it is the total frequency of all values
less than or equal to the upper class boundary of a given class.
• Relative frequency (rf): it is the frequency divided by the total
frequency.
• Relative cumulative frequency (rcf): it is the cumulative frequency
divided by the total frequency.

11/13/2023 Method of data collection and presentation 50


Guidelines for classes

1. The classes must be mutually exclusive; This means


that no data value can fall into two different classes
2. The classes must be all inclusive ;This means that all
data values must be included.
3. The classes must be continuous; There are no gaps in
a frequency distribution.
4. The classes must be equal in width; The exception
here is the first or last class. It is possible to have an
"below ..." or "... and above" class. This is often used
with ages.

11/13/2023 Method of data collection and presentation 51


 Steps for constructing Grouped frequency
Distribution
1. Find the largest and smallest values
2. Compute the Range(R) = Maximum – Minimum
3. Select the number of classes desired, usually
between 5 and 20 or use Sturges rule K= 1+ 3.32
log(n) where k is number of classes desired and n is
total number of observation
4. Find the class width by dividing the range by the
number of classes and rounding up, not off.
w= R/K

11/13/2023 Method of data collection and presentation 52


5. Pick a minimum value as a starting point. Starting point
is called the lower limit of the first class. Continue to
add the class width to this lower limit to get the rest
of the lower limits.
6. To find the upper limit of the first class, subtract U from
the lower limit of the second class. Then continue to
add the class width to this upper limit to find the rest
of the upper limits.
7. The boundaries are also half-way between the upper
limit of one class and the lower limit of the next class.
Find the boundaries by subtracting U/2 units from the
lower limits and adding U/2 units from the upper limits.

11/13/2023 Method of data collection and presentation 53


8. Find the frequencies.
9.Find the cumulative frequencies. Depending on what
you're trying to accomplish, it may not be necessary
to find the cumulative frequencies.
10.If necessary, find the relative frequencies and/or
relative cumulative frequencies

11/13/2023 Method of data collection and presentation 54


• The number of hours 40 employees spends on
their job for the last 7 working days is given
below.
62 50 35 36 31 43 43 43
41 31 65 30 41 58 49 41
37 62 27 47 65 50 45 48
27 53 40 29 63 34 44 32
58 61 38 41 26 50 47 37
• Construct a suitable frequency distribution for
these data
11/13/2023 Method of data collection and presentation 55
Solution:
• Step 1: smallest value= 26, largest value=65
• Step 2: range = 65-26= 39
• Step 3: determine the number of class; class based on Sturges
rule, K= 1+3.32 log 40=6.32, K= 7 (round up)
• Step 4: class width; W= R/K = 39/7= 5.57 , width= 6(round up)
• Step 5: lower class limits: starting point is 26
lower class limits: 26, 32, 38, 44, 50, 56, 62
Step 6: upper class limits: upper class limit of the first class is
equal with subtracting unit of measurement from lower class
limit of the second class upper class limit for the first class =
32-1 = 31; U = 1; then add class width from the first upper
class limit upper class limits = 31, 37, 43, 49, 55, 61, 67

11/13/2023 Method of data collection and presentation 56


• Step 7: class boundary; lower class boundary= lower class-
unit of measurement/2 and upper class boundary= upper
class limit + unit of measurement/2, unit of measurement is 1
• lower class boundaries: 25.2, 31.5, 37.5, 43.5, 49.5, 55.5, 61.5
• upper class boundaries: 31.5, 37.5, 43.5, 49.5, 55.5, 61.5,
67.5
• Step 8: frequencies; count the number of observation found
in each class
• Step 9: cumulative frequencies; less than and greater than.
• Step 10: relative frequencies and relative cumulative
frequencies.

11/13/2023 Method of data collection and presentation 57


Class limits Class Frequency Less than More than
boundaries (relative Cumulative Cumulative
frequencies) frequency( LCR Frequency(MC
F) RF)

26 31 25.5 - 31.5 7 (17.5) 7(17.5) 40(100)

32 37 31.5 37.5 6(15) 13(32.5) 33(82.5)

38 43 37.5 43.5 9(22.5) 22(55) 27(67.5)

44 49 43.5 49.5 6(15) 28(70) 18(45)

50 55 49.5 55.5 4(10) 32(80) 12(30)

56 61 55.5 61.5 3(7.5) 35(87.5) 8(20)

62 67 61.5 67.5 5(12.5) 40(100) 5(12.5)

11/13/2023 Method of data collection and presentation 58


Class work
• Convert the following class limit into class
boundaries
a) 5-9 b) 44.5-49.4 c) 78.25-80.24
10-14 49.5-54.4 80.25-
82.24
15-19 54.5-59.4 82.25-
84.24

11/13/2023 Method of data collection and presentation 59


Diagrammatic and graphical presentation of
data :
These are techniques for presenting data using
pictures.
• Importance:
• They have greater attraction.
• They facilitate comparison.
• They are easily understandable.

11/13/2023 Method of data collection and presentation 60


 Diagrammatic presentation of data :
for discrete as well as qualitative data
• Bar chart
• Pie chart
 Bar chart: are commonly used to show the
number or proportion of nominal or ordinal
data which possess a particular attribute.
• Bar charts most often represent the number
of observations in a given category,

11/13/2023 Method of data collection and presentation 61


 There are different types of bar charts. The
most common being:
• Simple bar chart
• Component or sub divided bar chart.
• Multiple bar charts.
o Simple bar chart: Are used to display data on
one variable.
• They are thick lines (narrow rectangles) having
the same breadth. The magnitude of a
quantity is represented by the height /length
of the bar.
11/13/2023 Method of data collection and presentation 62
Example: Suppose that the following were the
gross revenues (in $100,000.00) for company
XYZ for the years 1989, 1990 and 1991. Draw
the bar graph for this data.
Year Revenue
1989 110
1990 95
1991 65

11/13/2023 Method of data collection and presentation 63


11/13/2023 Method of data collection and presentation 64
Component Bar chart:
o We use component bar chart when there is a
desire to show how a total (or aggregate) is
divided in to its component parts,
• The bars represent total value of a variable
with each total broken in to its component
• parts and different colors or designs are used
for identifications

11/13/2023 Method of data collection and presentation 65


Example: Construct component bar chart for the
number of children who were vaccinated with
DPT, POLIO and BCG antigens in Debre markos
Hospital 2008 E.C.
Antigen Male Female
DPT 250 300
Polio 300 320
BCG 200 210

11/13/2023 Method of data collection and presentation 66


11/13/2023 Method of data collection and presentation 67
Multiple bar charts:
These are used to display data on more than one
variable.
• They are used for comparing different variables at the
same time.
Example: Construct a multiple bar chart for the 3 types
of expenditures in dollars for a family of four years
Year Food Education Other Total
1 3000 2000 3000 8000
2 3500 3000 4000 10500
3 4000 3500 5000 12500
4 5000 5000 6000 16000
11/13/2023 Method of data collection and presentation 68
11/13/2023 Method of data collection and presentation 69
Pie chart:
A pie chart is a circle that is divided in to sections according
to the percentage of frequencies in each category of the
distribution.
Steps in drawing a pie chart
• Convert freq. distribution into percentage frequency
distribution.
• Draw a circle of any of radius and note that the circle is
represented by an angle of 3600.
• Convert percentage into degree measures. Since the
whole circle (3600) represents 100% of the observation,
3.60 will represent 1%.
Angle of sector= (Value of the part/the whole quantity) *360
11/13/2023 Method of data collection and presentation 70
• Example:

• The following data are the blood types of 50


volunteers at a blood plasma donation clinic:
O A O AB A A O O B A O A AB B O O O A B A A O A A
O B A O AB A O O A B A A A O B O O A O A B O AB A O B
• Present the data using both a pie and a bar chart.

• Find the percentage of donors for each blood type. In


order to find the angles of the sector for each blood
type, multiply the corresponding percentage by 360
degree and divide by 100.
11/13/2023 Method of data collection and presentation 71
Blood type Frequency Percent Angles in degree
A 19 38.0 136.80
B 8 16.0 57.60
O 19 38.0 136.80
AB 4 8.0 28.80
Total 50 100.0 360 0

11/13/2023 Method of data collection and presentation 72


Graphical presentation of data :
The histogram, frequency polygon and cumulative
frequency graph or ogives are most commonly applied
graphical representation for continuous data.
Procedures for constructing graphs:
• Draw and label the X and Y axes.
• Choose a suitable scale for the frequencies or
cumulative frequencies and label it on the Y axes.
• Represent the class boundaries for the histogram or
ogive or the mid points for the frequency polygon on
the X axes.
• Plot the points.
• Draw the bars or lines to connect the points.
11/13/2023 Method of data collection and presentation 73
 Histogram:
A histogram presents grouped frequency
distribution of a continuous type
Method of construction histogram
• Obtain a frequency distribution with class
boundaries and class midpoints.
• Construct bars on the horizontal axis with
center at the class midpoint and width equal
to the class width.
• The height of each bar should correspond to
the respective class frequency.
11/13/2023 Method of data collection and presentation 74
11/13/2023 Method of data collection and presentation 75
11/13/2023 Method of data collection and presentation 76
 Frequency polygon: Frequency polygon is a multi-
sided figure where the frequency is plotted
against the class midpoint.
Steps
• Construct a histogram
• Mark the midpoint on the top of each bar
• Join these marks with straight lines
• Extend these lines on both ends so that it reaches
the horizontal axis at the class mid points. This
allows the total area to be enclosed.
11/13/2023 Method of data collection and presentation 77
Eg.Freq uen cydistrib uti onsofage

11/13/2023 Method of data collection and presentation 78


11/13/2023 Method of data collection and presentation 79
 Cumulative Frequency Polygon (Ogive): It is a
graph obtained by plotting the cumulative
frequencies of a distribution against the
boundaries used to form the cumulative
frequencies.
Eg. Less than and more than cumulative
frequency distribution of the time in minutes
spent by automobile workers to travel from
home to work

11/13/2023 Method of data collection and presentation 80


11/13/2023 Method of data collection and presentation 81
11/13/2023 Method of data collection and presentation 82
Summary
Types of variable

Qualitative Quantitative

Tabular presentation
Categorical
Ungrouped Grouped
frequency frequency
frequency
distribution distribution
distribution

11/13/2023 Method of data collection and presentation 83


Summary
Types of variable

Qualitative Quantitative

Diagrammatic and graphical presentation

Pei chart, bar


Discrete Continuous
chart

Histogram,
Pei chart, bar
frequency
chart
polygon, ogive
11/13/2023 Method of data collection and presentation 84
Thank you !!!

11/13/2023 Introduction to Statistics 85


11/13/2023 Introduction to Statistics 86
Debre Markos University
College of natural and computational Sciences
Department of Statistics

2. SUMMARIZING OF DATA

11/13/2023 Measure of Central Tendency 87


Objectives

 At the end of this chapter students will be able to:


 Define measure of central tendency
 Explain properties of measure of central tendency
 Summarize an aggregate of statistical data by using single
measure
 Memorize the computational formula for different
measure of central tendency
 Know different positional measures such as quartiles,
deciles and percentiles with their interpretation.
 determine the scatter/dispersion of the data from MCT

11/13/2023 Measure of Central Tendency 88


2.1. Measure of Central Tendency
Introduction

 The most important objective of statistical


analysis is to determine a single value for the
entire mass of data
 Describes the overall level of the group of
observations and can be a representative of the
whole set of data
 There are several such measures, but here we
shall discuss the most commonly used measures
of central tendency. This includes: mean,
median and mode.
11/13/2023 Measure of Central Tendency 89
Introduction………
 Objectives of measures of central tendency
 To get a single value that represent(describe)
characteristics of the entire data
 To summarizing/reducing the volume of the
data
 To facilitating comparison within one group or
between groups of data
 To enable further statistical analysis

11/13/2023 Measure of Central Tendency 90


Introduction………
The Summation Notation (  )
Statistical Symbols: Let a data set consists of a
number of observations, represents by x1 ,
x2 ,..., xn where n (the last subscript) denotes
the number of observations in the data and xi
is the ith observation. Then the sum
n

(X1 + X2 + X3 + ...... + Xn) x i


x=  i =1
n n
11/13/2023 Measure of Central Tendency 91
Introduction………
• X - Whole set of numbers
• xi - Specific score in a set of numbers
• n - Number in subset (group)
 For instance a data set consisting of six
measurements 21, 13, 54, 46, 32 and 37 is
represented by x1 , x2, x3 , x4, x5 and x6
 where x1 =21, x2 =13,x3 =54 , x4 =46, x5 =32 and x6 =37
• Their sum becomes
6

i =1
xi

= 21+13+59+46+32+37=208.
11/13/2023 Measure of Central Tendency 92
Introduction………
Some Properties of the Summation Notation

11/13/2023 Measure of Central Tendency 93


Introduction………

11/13/2023 Measure of Central Tendency 94


Introduction………

11/13/2023 Measure of Central Tendency 95


Introduction………

11/13/2023 Measure of Central Tendency 96


Introduction………
Important Characteristics of a Good Average
 A typical average should posses the following:
• It should be rigidly defined.
• It should be based on all observation under
investigation.
• It should be as little as affected by extreme
observations.
• It should be capable of further algebraic treatment.
• It should be as little as affected by fluctuations of
sampling.
• It should be ease to calculate and simple to understand
11/13/2023 Measure of Central Tendency 97
Mean
 Among the types of means we discuss four of
them
 Which are suitable for a particular type of data
 These are
 Arithmetic mean
 weighted mean
 geometric mean and
 harmonic mean.

11/13/2023 Measure of Central Tendency 98


Arithmetic mean

The arithmetic mean, usually abbreviated to ‘mean’ is the sum of the observations divided by the number of observations.

If x1 , x 2 , ..., x n are n observed values of a sample , then


n

x i
x= i =1
.
n

11/13/2023 Measure of Central Tendency 99


A.M…….
• Suppose the data are given in the form of
ungrouped frequency distribution with
frequencies f1, f2, f3…fn associated with the
values of the variable x1, x2, x3…xn resp.
• The sum of all values equals f1x1+f2x2+f3x3+f4x4+
………….+fnxn and the total number of items is
obviously f1+f2+f3+….+fn
n

 fi xi
x = i =1
n

11/13/2023 Measure of Central Tendency 100


A.M
 Arithmetic mean of grouped data
 In the ungrouped case, the exact value of each
item is known.
 However, the data is grouped such that we are
given frequency distribution of finite sized class
intervals we do not know the values of every
item.
 Moreover, the observations in each class are
represented by the class mark of the class
interval.
11/13/2023 Measure of Central Tendency 101
A.M
• For a frequency distribution (grouped) with k classes
in which the jth class has the class mark of xj with
corresponding frequency fi, j=1, 2, 3…k.

• where,
k = the number of class intervals
xi = the mid-point of the ith class interval
fi = the frequency of the ith class interval

11/13/2023 Measure of Central Tendency 102


Example:
Calculate the mean for the following age
distribution.

11/13/2023 Measure of Central Tendency 103


Solution

11/13/2023 Measure of Central Tendency 104


A.M ……..
Combined mean:“If we have arithmetic means X̅ 1,
X̅ 2… X̅ n of n groups having the same unit of
measurement of a variable, based on n1, n2… nn
observations respectively,
 we can compute the combined mean of the variant
values of the groups taken together from the
individual means by the formula n

n1x1  n2x2  ...  nnxn n x i i


xco m =  i =1
n1  n2  ...nn k

n
i =1
i

11/13/2023 Measure of Central Tendency 105


Class exercise
1. The net weights of the contents of five pc screen selected at
random from the production line are (in grams);
85.4,85.3,84.9,85.4 and 85. What is the arithmetic mean
weight of the sample observation?
2. Calculate the mean of the marks of 57 students given below

11/13/2023 Measure of Central Tendency 106


Properties of A.M

 The mean can be used as a summary measure for both discrete


and continuous data, in general however, it is not appropriate
for either nominal or ordinal data.
 For a given set of data there is one and only one arithmetic
mean.
 Algebraic sum of the deviations of the given values from their
arithmetic mean is always zero.
 The arithmetic mean is greatly affected by the extreme values.
 In grouped data if any class interval is open, arithmetic mean
cannot be calculated.
 If we transform the original observation x1, x2, x3… xn to y1=ax1±b,
y2=ax2±b… yn=axn ± b, then the mean of the transformed values, is
given by ȳ=ax̄±b
 Easy to calculate and understand

11/13/2023 Measure of Central Tendency 107


Properties A.M….
• If a wrong figure has been used when calculating the
mean the correct mean can be obtained with out
repeating the whole process using:

• Example: An average weight of 10 students was


calculated to be 65.Latter it was discovered that one
weight was misread as 40 instead of 80 kg. Calculate
the correct average weight.
• Ans. 69kg
11/13/2023 Measure of Central Tendency 108
Classwork
1. The mean of n Tetracycline Capsules X1, X2, …, Xn
are known to be 12 gm. New set of capsules of
another drug are obtained by the linear
transformation Yi = 2Xi – 0.5 ( i = 1, 2, …, n ) then
what will be the mean of the new set of capsules?
2. In a class there are 30 females and 70 males. If
females averaged 60 in an examination and boys
averaged 72, find the mean for the entire class.

11/13/2023 Measure of Central Tendency 109


Weighted mean
 In the computation of arithmetic mean we had given equal
importance to each observation
 Sometimes the individual values in the data may not be
equally importance.
 When this is the case, we assigned to each weight which is
proportional to its relative importance and calculate the
weighted mean.
 The weighted mean of a set of values x1, x2, x3…xn with
corresponding weights w1, w2…wn denoted by x̄w and
computed by:
n

w x i i
xw = i =1
n

w
i =1
i

11/13/2023 Measure of Central Tendency 110


 The calculation of cumulative grade point average (CGPA) is a
good example of weighted mean.
E.g. If a student scores “A “in a 3 credit hours course ,”B” in a 4
credit hours course ,”C”In another 4 credit hours course and
“D” in a 2 credit hours course and the numerical values of the
letter grades are A=4,B=3 C=2,D=1,compute his /her GPA for
the semester.
Solution

11/13/2023 Measure of Central Tendency 111


Geometric mean(G.M)

 If the observed values are measured as ratios, proportions or


percentages, Geometric mean gives a better measure of
central tendency than other means.
 The Geometrical mean of n positive values is defined as the
nth root of their product .
 That is, if all the given observations x1, x2, x3…xn are positive,
then G.M=(x1.x2.x3………….xn) 1/n
 In case the observed values x1,x2,x3,……..,xn have the
corresponding frequencies f1, f2, f3…fn
Then G.M=( x1f1. x2f2. x3f3. x4f4.………….. xnfn )1/n

11/13/2023 Measure of Central Tendency 112


 In case of grouped data, class marks of the class interval are
considered as xi and can be used as such
G.M = (m1f1. m2f2..………….. mnfn ) 1/n , Where n is the sum of
frequency

Example1: The man gets three annual raises in his salary. At the
end of first year he gets an increase of 3%, at the end of the
second year he gets an increase of 6% and at the end of the
third year he gets an increase of 9% of his salary. What is the
average percentage increase in the three periods?
• Solution: G.M = (1.03X1.06X1.09.)1/3 = 1.0631=>1.0631-
1=0.0631
Therefore, the average percentage increase is 6.31%.

11/13/2023 Measure of Central Tendency 113


• Example2: Compute the Geometric mean of the following
values. 2, 8, 6, 4, 10, 6, 8, 4
Solution: Total number of observation is 8. We use geometric
mean for grouped data formula to calculate Geometrical
mean
G.M = (2f1.4f2.6f3.8f4.10f5)1/8 = (737280)1/8 = 5.41
 Geometric mean is computed through logarithm.
 We formulate the above formula interims of logarithm (with
base ten).
• From the above formula when reduced to its logarithmic
form, it will be Log (G.m) =log (x1.x2.x3…xn)1/n
=1/n log (x1. x2.x3…. xn) =1/n (logx1+logx2+…+logxn)
So G.M =antilog (1/n (logx1+logx2+…+logxn))

11/13/2023 Measure of Central Tendency 114


n

 log x i
GM = antilog( i =1
)
n

11/13/2023 Measure of Central Tendency 115


G.M………
• Geometric mean formula for frequency distribution
• Log (G.m) = log (x1f1. x2f2. x3f3. x4f4… xnfn) 1/n
=1/n log (x1f1. x2f2. x3f3. x4f4… xnfn)
=1/n (logx1f1+logx2f2+logx3f3+logx4f4…+logxnfn)
=1/n(f1logx1+f2logx2+f3logx3+…+fnlogxn)
So G.M =antilog (1/n(f1logx1+f2logx2+f3logx3+…+fnlogxn))
n

 f log x
i
i
GM = antilog( i =1
)
n

11/13/2023 Measure of Central Tendency 116


Harmonic mean(H.M)

Another important mean is the harmonic mean, which is suitable measure of central tendency when the data pertains to speed, rates and time.
Let x1,x2,x3,……..,xn be n variant values in a set of observation, then the harmonic mean is given by

11/13/2023 Measure of Central Tendency 117


H.M………
The following is a good example in which the application
of harmonic mean is appropriate.
• Example A motorist travels for three days at a rate
(speed) of 480km/day. On the first day he travels 10
hours at a rate of 48km/h, on the second day 12hours
at a rate of 40km/h, on the third day 15hours at a rate
of 32km/h. what is the average speed?
• The average speed can be calculated by calculating
the harmonic mean of the three rates.
H.M= 3/((1/48)+(1/40)+(1/32))= 38.92 so the
required average speed =38.92km/h
11/13/2023 Measure of Central Tendency 118
H.M………

If the data are arranged in the form of frequency distribution in which an observation x i has frequency fi (i=1, 2, 3…k), the harmonic mean is given by

11/13/2023 Measure of Central Tendency 119


Mode
 Any observation of a variable at which the distribution
reaches a peak is called a mode.
 Most distributions encountered in practice have one peak and
are described as uni-modal.
 E.g. Consider the example of ten numbers
19 21 20 20 34 22 24 27 27 27
In the above data set, the mode is 27

 The mode of grouped data, usually refers to the modal class,


(the class interval with the highest frequency)
 Having located the modal class of the data, the next problem
is to interpolate the value of the mode with in this “modal
class”.
Measure of Central
11/13/2023 120
Tendency
Mode………

11/13/2023 Measure of Central Tendency 121


Mode………

Measure of Central
11/13/2023 122
Tendency
Mode………
 Properties of mode
• The mode can be used as a summary measure for
nominal, ordinal, discrete and continuous data, in
general however, it is more appropriate for nominal
and ordinal data.

• It is not affected by extreme values

• It can be calculated for distributions with open end


classes

• Sometimes its value is not unique

• The main drawback of mode is that it may not exist


Measure of Central
11/13/2023 123
Tendency
Class exercise

1. The proportion of gender in the class is as


follows
Gender male female
Frequency 38 30
2. The student quiz score out of 5 is as follows
3 5 4 4 5 5 3 4 5 3
4 3 3
Find mode for Que.1&2
Measure of Central
11/13/2023 124
Tendency
Median
The median is the middle value in the sorted list

Median of ungrouped data : The median is found by arranging the data in order of magnitude. The median is then the value of the middle term
Let x1, x2, x3… xn be n ordered observations. Then median value is given by:

11/13/2023 Measure of Central Tendency 125


Median………
Example 1: Suppose the sales commission of 15
representatives were as follows:
23, 16, 31, 77, 21, 14, 32, 6, 155, 9, 36, 24, 5, 27, 19
Solution: Placing the data in order of magnitude, we have
5, 6, 9, 14, 16, 19, 21, 23, 24, 27, 31, 32, 36, 77, and 155
• The value of the middle term is the 8th value that is 23
Example 2: If there are six items with values.
25, 29, 30, 32, 35, 65 what is the median value ?
The median is obtained by determining the arithmetic
mean of the two central observations 30 & 32.
Median = (30+32)/2=31

11/13/2023 Measure of Central Tendency 126


Median………
 In some cases ungrouped data may be given in a frequency
distribution form.
• Now consider the case were the data are arranged in the
form of frequency distribution.
• Suppose the ordered values x1, x2, x3…xk have their
corresponding frequencies f1, f2…fk
• Construct the less than cumulative frequency
• less than cumulative distribution tells us the number of
values that below or above the specified value of the
observations.
• The variant value corresponding to less than cumulative
frequency is the median.
11/13/2023 Measure of Central Tendency 127
Median………

11/13/2023 Measure of Central Tendency 128


Median………
 Median of grouped data: in the grouped data,
the raw data have been organized in to a
frequency distribution of finite class interval
 Some of the information is not identifiable,
that is we do not know the value of every item
 But we can locate the middle observation by
dividing that the total number of observations
(n) by 2.
 The class corresponding to the smallest LCF
that is ≥ n/2 is called the median class.
11/13/2023 Measure of Central Tendency 129
Median………

11/13/2023 Measure of Central Tendency 130


Median………
 Steps of calculating median in the case of
grouping
• Construct the LCF table.
• To determine the median class divide the total
number of observations by2 and then search
for the smallest LCF which is ≥ n/2.
• To apply median for grouped data formula

11/13/2023 Measure of Central Tendency 131


Median………

11/13/2023 Measure of Central Tendency 132


Median………
• To obtain the median class we divide 50 by 2.
• Thus n/2=50/2=25. The smallest lcf which is ≥25 is 34. Thus,
the median class is 100-109 which is the class corresponding
to 34. So the 25th bulb is in the 100-109 class.
• Now to use the above formula we need to know Lcb, w, fm
and lcfp for the class immediately proceeding to the median
class, Lcb =99.5, w=10, fm=20 lcfp=14
• Median= lcb+((n/2)-cf))w/fm = 99.5+((50/2)-14))10/20 = 105
• Therefore the median daily production is 105.

11/13/2023 Measure of Central Tendency 133


Median………
Properties of median
• The median can be used as a summary measure for ordinal,
discrete and continuous data, in general however, it is not
appropriate for nominal data.

• There is only one median for a given set of data


• Median is a positional average and hence it is not affected by
extreme values (It is robust or resistant to extreme values)

• Median can be calculated even in the case of open end


intervals

• It is not a good representative of data if the number of items


is small

11/13/2023 Measure of Central Tendency 134


Class exercise compute median

• The following data is the number of PC infected by


virus with the category of their severity, patients
who treated at Debre Markos University.
Category Moderate Mild Severe
Frequency 20 10 12
• The student test score out of 10 is arranged in the
following ungrouped frequency distribution table
Mark 8 4 5 7 9 10
Frequency 6 7 5 11 11 6

11/13/2023 Measure of Central Tendency 135


Measures of Non-central Locations
 Which is divides a given set of data in to equal
subdivision
 They are averages of position (non-central
tendency)
 Some of these are quartiles, deciles and
percentiles.

11/13/2023 Measure of Central Tendency 136


Quartiles
 Quartiles: are values which divide the data set in to
four equal parts
 The first quartile is also called the lower quartile and
the third quartile is the upper quartile. The second
quartile is the median. Qi=(i(n+1)/4)th observation
 If (i(n+1)/4)th observation is in decimal Qi= (whole
number)th observation + decimal((whole number + 1)th
observation-(whole number) Q2
th
observation)
Q3
Q1

50% 75% below


25% below
below and and 25%
and 50% above
75% above above
11/13/2023 Measure of Central Tendency 137
Quartiles………

11/13/2023 Measure of Central Tendency 138


Quartiles………
 Calculation of quartiles from grouped data
 For the grouped data, the computations of the
three quartiles can be done as follows:
• Calculate (i x n)/4 and search for the minimum
lcf which is ≥ (i x n)/4 i=1,2,3
• The class corresponding to this lcf is called the
ith quartile class. This is the class where Qi lies.
• The unique value of the ith quartile (Qi) is then
calculated by the formula
11/13/2023 Measure of Central Tendency 139
Quartiles………

11/13/2023 Measure of Central Tendency 140


Deciles

Deciles : are the nine points, which divide the given ordered data into 10 equal parts.

11/13/2023 Measure of Central Tendency 141


Percentiles

Percentiles: are values which divide the data in to one hundred equal parts

11/13/2023 Measure of Central Tendency 142


Example
 Calculate Q1 , Q2 , Q3 , D5 , D7 , P50 , P80 for the following
tables.
Value (x) 0 1 2 3 4 5 6 7 8
Frequency (f) 1 9 26 59 72 52 29 6 1
 Solution
The given data is measured and it is arranged in an increasing
order. So we need to construct only the cumulative
frequency table before calculating the required values.
Value (x) 0 1 2 3 4 5 6 7 8
Frequency (f) 1 9 26 59 72 52 29 6 1
Cumulative Frequency (LCF) 1 10 36 95 167 219 248 254 255
• The total number of observations is 255. Clearly then the
median is 4 because
11/13/2023 Measure of Central Tendency 143
Example………

11/13/2023 Measure of Central Tendency 144


11/13/2023 Measure of Central Tendency 145
Class exercise

11/13/2023 Measure of Central Tendency 146


Measure of Variation

11/13/2023 measure of variation 147


Objectives

 After studying this lesson, you should be able to:


• Explain the meaning of measures of dispersion.
• State and explain the objective of measures of
dispersion.
• Explain the difference b/n measures of central
tendency and variation.
• Interpret the measures of dispersion
• Compare the variations between two distributions
using coefficient of variations.
• Apply the Z-score to find out the relative standing of
values.
11/13/2023 measure of variation 148
Introduction
 Measures of central tendency locate the
center of the distribution.

 But they do not tell how individual


observation are scattered on either side of the
center.

 The spread of the observations around the


center is known as dispersion or variability.

11/13/2023 measure of variation 149


Objective of measure of variation

 To measure reliability of the average being


used.

 To control variation in a product.

 To compare variability among two or more


groups.

11/13/2023 measure of variation 150


Absolute and Relative measures

 Measure of dispersion can be classified as


absolute and relative form.
 Absolute measures of dispersion- are expressed
in concrete units. That is units in terms of which
the data have been expressed.
 Relative measure of dispersion: It is a pure
number and is usually expressed in a percentage
form.
• Relative measures are used for making
comparisons b/n two or more distributions.
11/13/2023 measure of variation 151
Types of measure of variation
 Range: is the difference b/n the highest and smallest
observation.
 For ungrouped data
R= XL-Xs
where XL- is the largest observation
XS- is the smallest observation
 For grouped data
R= UCBL-LCBF
where UCBL- is the upper class boundary of the last class
LCBF- is the lower class boundary of the first class
11/13/2023 measure of variation 152
Range………
 It is the simplest to compute, it takes into
account only the two extreme values of the
data.

 The other values have no role to play. Because


of this range is considered as rough measure
of variation.

11/13/2023 measure of variation 153


Relative range

 Range is a measure of absolute dispersion and


cannot be used for comparing variability of
two distributions expressed in different units.
 Measurements made in Kg are not
comparable with dispersion measured in
centimeters.
 The solution is to use relative range or any
other relative measure of variation.
 Relative Range= Range/( XL+Xs)
11/13/2023 measure of variation 154
Relative range………
Example: consider weekly earning of workers in two laboratories of the same type.

Laboratory A (in birr) Laboratory B (in dollar)

11/13/2023 measure of variation 155


Relative range………
 We cannot use range for comparing the
variability because the units are different
 We have to use relative range.

• R.R for laboratory A= Range/( XL+Xs)= 9/( 30+21)=0.1764


• R.R for laboratory B= Range/( XL+Xs)= 15/( 32+17)=0.306

 Therefore, the variation is high in the case of


laboratory B.
11/13/2023 measure of variation 156
Range………
 Advantage of range
• It is easy to calculate
• It is easy to understand
 Limitation of range
• It can be affected by extreme values
• It cannot be computed when the distribution has
open-ended classes.
• It cannot be total into account the entire set if data.
• It does not tell anything about the distribution of
values in series relative to measure of central
tendency.
11/13/2023 measure of variation 157
Variance
 The variance is a measure of dispersion.

 The variance use the distance of our values


from their mean.
 It tells us something about the scatter of
scores around the mean.
 If the values are grouped near to the mean the
variance will be little.

11/13/2023 measure of variation 158


Variance………

11/13/2023 measure of variation 159


Variance………
 steps used to calculate the variance:
• Find the arithmetic mean.
• Find the difference between each observation
from the mean.
• Square these differences.
• Sum the squared differences.
• Divide the number (from step 4 above) by the
number of observations

11/13/2023 measure of variation 160


Standard Deviation
 There is a problem with variances. Recall that
the deviations were squared.

 That means that the units were also squared.

 To get the units back the same as the original


data values, the square root must be taken.

11/13/2023 measure of variation 161


Standard Deviation………

11/13/2023 measure of variation 162


Example

11/13/2023 measure of variation 163


Coefficient of variation

 Coefficient of variation is used in such problems


where we want to compare the variability of two
or more different series.

 Coefficient of variation is the ratio of the standard


deviation to the arithmetic mean, usually
expressed in percent.

 A distribution having less coefficient of variation is


said to be less variable or more consistent or more
uniform or more homogeneous.
11/13/2023 measure of variation 164
Coefficient of variation………

11/13/2023 measure of variation 165


Example

11/13/2023 measure of variation 166


Solution

11/13/2023 measure of variation 167


Standard Scores (Z-Scores)
 A standard score is a measure that describes
the relative position of a single score in the
entire distribution of scores in terms of the
mean and standard deviation.

 It also gives us the number of standard


deviations a particular observation lie above
or below the mean.

11/13/2023 measure of variation 168


Standard Scores (Z-Scores)………

11/13/2023 measure of variation 169


Example
• 1. Two sections were given an exam in a
course. The average score was 72 with
standard deviation of 6 for section 1 and 85
with standard deviation of 5 for section2.
Student A from section 1 scored 84 and
student B from section 2 scored 90. Who
performed better relative to his/her group?

11/13/2023 measure of variation 170


Example………

11/13/2023 measure of variation 171


Thank You for Your Attention!

11/13/2023 Measure of Central Tendency 172


Chapter-5

Elementary probability

173
Objectives

 After studying this chapter, you should be


able to:

 Define some basic terms of probability.


 Apply the principle of counting techniques to
solve real problem.
 Explain the basic concept of probability.

174
Introduction

 Probability (p):- is a numerical description of


chance occurrence of a given phenomena under
certain condition.
 It is used to measure the degree of certainty.
 Probability theory is a part of our everyday life.
– We may hear a doctor say that a patient has a 50%
chance of survival
– A meteorologist predict heavy rain with 80% chance.
 Probability theory is concerned with the study of
random (or chance) phenomena.

175
Definition of some probability terms

 Random experiment: - is a process that leads to


well defined results called outcomes.
– Example: tossing a coin two times and observing the
number of heads appearing on a top.
 An outcome: is the result of a single trial of a
random experiment.
– Example: when a coin is tossed, there are two
outcomes.ie H &T
 Sample space (s): -is a set of all possible outcomes
of a random experiment.
– Example: rolling a die s= (1, 2, 3….6) s= (no of
outcome) n
176
Definition of some probability terms...

 Events: - a subset of sample space and it consists of


one or more outcomes of a random experiment.
– Example: getting an odd numbers in rolling a die.
– Solution; Let A is an event of getting odd numbers. A=
(1, 3, 5)
 Complement of an event: - is a set of outcomes in the
sample space that are not included in the outcome of
an event.
• The complement of E is denoted by E’.
– Example: a) find the complement of an event of getting 4
in rolling a die.
b) If tossing two coins and getting all heads.
177
Definition of some probability terms...
 Mutually exclusive events: - If two events
cannot occur at the same time (i.e. they have
no outcome in common).
– Example: The event of getting a 4 and getting a 6
when a single card is drawn from a deck are
mutually exclusive events. Since a single card
cannot be both 4 and 6.
 Equally likely events: - events that have the
same probability of occurring.
– Example: when a single die is rolled, each
outcome has the same probability (p) of 1/6.
178
Definition of some probability terms...
 Independent events: - If two events A and B
are independent, then the occurrence of A
does not affect the occurrence of B.
– Example: Rolling a die and getting a 6, and then
rolling a second die and getting a 3.
 Dependent events: - when the occurrence of
the 1st event affects the occurrence of the
second event.
– Example: Drawing a card from a deck, not
replacing it, and then drawing a second card.

179
Counting rule
 In order to calculate probabilities, we have to
know
• The number of elements of an event
• The number of elements of the sample space.
 That is in order to judge what is probable, we
have to know what is possible.
 In order to determine the number of out
comes one can use several rules of counting:

180
Counting: Addition Principle
 Addition Principle: if a task can be
accomplished by k distinct procedures where
the ith procedures has ni alternatives, the total
number of ways of accomplishing the task
equals n1+n2+…………..+n k
• Example1: there are two transportation
means from city A to city B, either using bus
transportation or train transportation. There
are 3 buses and 2 trains .how many ways of
transportation is there from city A to city B?
181
Counting: Multiplication Principle
 Rule: if a sequence of n events in which the
first one has k1 possibilities, the second events
has k2, the third event has k3, and so on, the
total possibilities will be: k1.k2…….kn

182
Counting :Permutations

183
Solution
6!=6x5x4x3x2x1
=720

184
Counting :Permutations
Permutation rule-3

185
Counting : Combination
 Combination is a selection of distinct objects
without regard to order.
 Combination is used when the order of
arrangement is not important, as in the
selection process.
 The number of combinations of r objects
selected from n objects is denoted by

186
Counting : Combination

Example: given the letters A, B, C & D list the


permutation & combination for selecting two
letters.
• Permutation: AB, AC, AD, BA, BC, BD, CA, CB,
CD, DA, DB, DC, 4P2=12
• Combination: AB, AC, AD, BC, BD, CD, 4C2=6

187
Approaches of probability?
• Objective and subjective approaches of probability
• Subjective Probability:
– This probability measures based on feeling and may
not even be scientific.
For Example: the probability that a cure for cancer
will be discovered within the next 10 years.

188
Objective probability

• Classical,
• Relative and
• Axiomatic

189
Classical probability
 This is based on the assumption that the outcomes of an
experiment are equally likely and the total number of the
outcomes is definite. Uses sample space to determine the
numerical probability that an event will happen.
 Definition: if there are n equally likely outcomes of an
experiment, and out the n outcomes event A occur only k times
the probability of the event A is denoted by p(A) is defined as

For Example:
In the rolling of the die , each of the six sides is equally likely to
be observed.

So, the probability that a 4 will be observed is equal to 1/6.


190
Example
1. When an ordinary dice is thrown, each of the faces numbered 1,
2 . . . 6 has an equal chance of falling uppermost.

There is one chance in six of throwing a 3 so we write the


probability statement:
Pr(3) or simply P(3) = 1/6
i.e. Pr(not throwing a 3) = 5/6
i.e. Total probability = Pr(3) + Pr(no 3) = 1/6 + 5/6 = 1

Probability 191
2. When a fair coin is tossed, it may fall either as a head (H) or as a
tail (T)

i.e. We write P(H) = P(T) = 1/2

3. If we select a card at random from a pack.

Pr(Ace) = 4/52 = 1/13


Pr(No Ace) = 48/52 = 12/13
TPr = Pr(of event happ.) + Pr(event not happ.)
= 4/52 + 48/52 = 1
Probability 192
Classical Approach…

• Experiment: Rolling 2 die [dice] and summing 2


numbers on top.
• Sample Space: S = {2, 3, …, 12} What are the underlying, unstated
assumptions??
• Probability Examples:
• P(2) = 1/36 1 2 3 4 5 6
1 2 3 4 5 6 7
2 3 4 5 6 7 8
• P(7) = 6/36
3 4 5 6 7 8 9
• 4 5 6 7 8 9 10
P(10) = 3/36 5 6 7 8 9 10 11
• 6 7 8 9 10 11 12

193
frequency probability:
• This approach can be used when some random
phenomenon is observed repeatedly under identical
conditions.
• If an experiment possessing certain outcomes is
repeated a large number of times, N, and
N 
• if some resulting event E occurs A times, the relative
frequency of occurrence of E , N(A)/N will be
approximately equal to probability of E .
Pr(E) = A/N .

194
Frequency Probabilities

Coin flipping:

195
11 cards containing the letters of the word PROBABILITY is put in a
Ex 1 box. A card is taken out at random. Find the probability that the card
chosen is
(a) letter B (b) a vowel (c) a consonant

n( S )  11

(a) n( B )  2 (b) n(V )  4 (c) n(C )  7


2 4 7
P( B)  P (V )  P (C ) 
11 11 11

196
There are x red balls and 8 yellow balls in bag. A ball is taken at
Ex 2 random from the bag. The probability of getting a red ball is 3
7
.
(a) Find the value of x. (b) If y red balls are then added to the box,
the probability of getting a yellow ball
Total number of balls = x + 8 becomes ½. Find the value of y.

x 3 Total number of balls = y + 14



x8 7 8 1

y  14 2
7 x  3 x  24
y  14  16
4 x  24
y2
x6

197
Axiomatic approach:
Defines probability in terms of theorem:

1. 0 ≤ Pr (E) ≤ 1

Pr (being certain that event E will never occur) = 0

Pr (being certain that event E will occur) = 1

2. P(S)= 1, S is the sure event

3. If A and B are mutually exclusive events, the probability that one or the
other occur equals the sum of the two probabilities. i.e p(AUB)=P(A)+P(B)

4. P(A’)=1-P(A)

5. P(ϕ)=0, ϕ is impossible event

6. ∑Pr (S) = 1, i.e., P(E1 )+ P(E2) +……+P(En )=1

198
Conditional probability

199
Conditional probability

200
Probability of independent event

Two events are independent if the occurrence of one of the events does not affect the
probability of the other event.

That is, A and B are independent if :


P (B |A) = P (B) or if P (A |B) = P (A).

Example:
Let event A stands for “the sex of the first child from a mother is female”; and event B
stands for “the sex of the second child from the same mother is female”
Are A and B independent?

Solution
P(B/A) = P(B) = 0.5
The occurrence of A does not affect the probability of B,
so the events are independent.

201
Class exercise
 Fifteen Ethiopian athletes were entered to the
race. In how many different ways could prizes for
the first, the second and the third place be
awarded?
 In a club containing 7 members a committee of 3
people is to be formed. In how many ways can
the committee be formed?
 In the experiment of tossing a coin and a die
together, find the probability of an event E
consisting head and even numbers.
202
203
Debremarkos University
College of natural and computational
Sciences
Department of Statistics

Probability Distributions

11/13/2023 Probability Distributions 204


Objectives
 After studying this chapter, you should be able
to:
 Distinguish between Discrete random
variables from continuous random variables.
 Find the expected value and standard
deviation of discrete probability distribution.
 Calculate probability problems by applying
definition of probability.
 Apply expected value in decision making.

11/13/2023 Probability Distributions 205


Introduction to basic terms

 Variable: - Is any characteristic or attribute


that can assume different values.
 A random variable is a variable whose values
are determined by chance.
 It is a function which associates a number
(real number) to each possible outcome of an
experiment

11/13/2023 Probability Distributions 206


Introduction to basic terms………
E.g.1: suppose a coin is tossed three times. Let X be
the number of heads.
• Solution: If we toss a coin three times, then the
experiment has a total of eight possible outcomes,:
S= {HHH, HHT, HTH, THH, HTT, THT, TTH, TTT}.
• Since X is the characteristic, which denotes the
number of heads out of the three tosses
• X is a function defined on the elements of S and
the possible values of X are {0, 1, 2, and 3}.
• Specifically X(HHH)=3, X(HHT)=X(HTH)=X(THH)=2,
X(HTT)=X(THT)=X(TTH)=1, X(TTT)=0
11/13/2023 Probability Distributions 207
Introduction to basic terms………
 Discrete Random variable – let x be a r.v. If the
number of possible values of x is finite or
countable infinite, we call x a discrete r.v
• The possible values of x can be listed as x1, x2,
x3… xn
• Let x be discrete r.v with each possible
outcome x, we associate a number
– P (xi) = P (X=xi) called the probability of x.

11/13/2023 Probability Distributions 208


Introduction to basic terms………

11/13/2023 Probability Distributions 209


Introduction to basic terms………

11/13/2023 Probability Distributions 210


Expectation, Mean and Variance of a
random variable
 Expectation: The averaging process, when
applied to a random variable is called
expectation. It is denoted by E(X) and is read
as the expected value of X or the mean value
of X.

11/13/2023 Probability Distributions 211


Expectation

11/13/2023 Probability Distributions 212


Expectation………

11/13/2023 Probability Distributions 213


Expectation………
 Properties of Expectation
• If X and Y are random variables and a, b are
constants then:
1. E (a) = a
2. E (a X) = a E(X)
3. E (a X + b) = a E(X) + b
4. E (X + a) =E(X) + a
5. E(X + Y)= E(X) +E(Y)
6. E(XY) = E(X) E(Y), if X, Y are independent random
variables
11/13/2023 Probability Distributions 214
Expectation………

11/13/2023 Probability Distributions 215


Variance of a random variable

11/13/2023 Probability Distributions 216


Variance of a random variable………

11/13/2023 Probability Distributions 217


Common discrete distributions

11/13/2023 Probability Distributions 218


Probability distribution

 Every random variable has a corresponding probability


distribution.

 and probability distribution or just distribution refers to


the way data are distributed, in order to draw conclusions
about a set of data.
 It also refers to the underlying, usually unknown, distribution
of the population or random variable.

 The probability distribution of a categorical variable tells


us with what probability the variable will take on the different
possible values (outcomes).

11/13/2023 Probability Distributions 219


Binomial distribution
A binomial experiment is a probability experiment that satisfy
the following four requirements
1. There must be a fixed number of trials called n
2. Each trial can have only two outcomes or outcomes that can be
reduced to two outcomes. These outcomes can be considered
ether success or failure
3. The outcome of each trial must be independent of each other.
4. The probability of success must remain the same for each trial.
The outcome of a binomial experiment and the corresponding
probabilities of these outcomes are called a Binomial Distribution.
Pr (X=success) = Pr (X=1) = p
 Pr (X=failure) = Pr (X=0) = 1-p

11/13/2023 Probability Distributions 220


Binomial distribution, generally

If you have only two possible outcomes (call them 1/0 or


yes/no or success/failure) in n independent trials, then the
probability of exactly X “successes” is:

n = number of trials

n X n X
  p (1  p )
X 1-p = probability of
failure

X=# p = probability of
successes out success
of n trials

11/13/2023 Probability Distributions 221


Binomial distribution….

Example 1: Suppose that in a certain population, 52% of all recorded


births are males. If we select randomly 10 birth records, what is the
probability that exactly
5 will be males?

Given n=10, x=5 and p = 0.52

Pr (X= x) = n! p x (1- p) n- x
x ! (n -x )!
Therefore, Pr (X=5) = 10! X 0.52 5 x (1- 0.52)10-5 =0.24
5!(10-5)!
3 or more will be females?
Pr(X≥3) = 1- Pr (X<3) = 1-[Pr(X=0)+Pr(X=1)+Pr(X=2)]
=1-[0.001+0.013+0.055]= 1-0.069=0.931

11/13/2023 Probability Distributions 222


Binomial distribution….

 Example 2: The exam has five questions and each question has
four multiple choice in which one of the choice is the correct
answer. If a student answers all the question by guess.

1. What is the probability that he will answer 3 out of 5 questions


correctly?

2. What is the probability that he will answer more than 3


questions correctly?

11/13/2023 Probability Distributions 223


Poisson Distribution
 A kind of discrete probability distribution that applies
to occurrence of some event over a specified interval.
 Example
• Number of telephone calls made to a switch board in a
given minute.
• Number of bacteria per slide
• Number of road accidents on a particular motorway in
one day
• Number of natural hazards per year. etc. have a poison
distribution

11/13/2023 Probability Distributions 224


Poisson Distribution
• Suppose events happen randomly and
independently in time at a constant rate. If
events happen with rate λ events per unit
time, the probability of x events happening in
unit time is
P(x) = where e≈ 2.71828

11/13/2023 Probability Distributions 225


Example
• The daily number of new registrations of HIV is 2.2
on average

what is the probability of

a) Getting no new cases

b) Getting 1 case

c) Getting 2 cases

d) Getting 3 cases

e)11/13/2023
Getting 4 cases Probability Distributions 226
Solution
a) P(x= 0) = = 0.111
b) p(X=1) = 0.244
c) p(x=2) = 0.268
d) p(x=3) = 0.197
e) p(x=4) = 0.108

11/13/2023 Probability Distributions 227


Characteristics of poison distribution
The Poisson distribution is very asymmetric when its
mean is small
 With large means it becomes nearly symmetric
 It has no theoretical maximum value, but the
probabilities tail off towards zero very quickly
 λ is the parameter of the Poisson distribution
 The mean is λ and the variance is also λ

11/13/2023 Probability Distributions 228


A poison distribution differs from binomial
1. The binomial distribution is affected by the sample size n
and the probability p, where as the Poison distribution is
affected by the mean λ.
2. In a binomial distribution, the possible value of the
random variable x are 0, 1, ………. n, but a Poison
distribution has possible x values of 0,1,2,…. with no
upper limit.
 The poison distribution is sometimes used to approximate
the binomial distribution when n is large and p is small.
One rule of thump is to use such an approximation when
the following two condition are both satisfied:
1. n > 50, P < o.1
2. np < 5
11/13/2023 Probability Distributions 229
Continuous Probability Distributions

11/13/2023 Probability Distributions 230


Continuous probability Distributions

A continuous random variable has an infinite number of possible


values that can be represented by an interval on the number line.

Proportion of patients with positive HIV test


result per day

0% 12.5% 25% 37.5% 50% 62.5% 75% 87.5% 100%

The proportion of patients with


positive HIV test result can be any
number between 0% and 100%
inclusive.

The probability distribution of a continuous random variable is


called a continuous probability distribution.
11/13/2023 Probability Distributions 231
Continuous Probability Distributions

f (x) Normal
Uniform Skewed

x
 There are infinite number of continuous random variables

 We try to pick a model that

 Fits the data well

 Allows us to make the best possible inferences using


the data.

11/13/2023 Probability Distributions 232


Properties of Normal Distributions
The most important probability distribution in statistics is the normal
distribution.

Normal curve

A normal distribution is a continuous probability distribution for a random variable, x.

The graph of a normal distribution is called the normal curve.

11/13/2023 Probability Distributions 233


The Normal Distribution
 The formula that generates the normal probability distribution is:

1 x 2
1  ( )
f ( x)  e 2 
 2
This is a bell shaped curve with
Where, s = Population variance different centers and spreads
µ = population mean depending on  and 
e =2.718…, π= 3.14…

11/13/2023 Probability Distributions 234


Properties of Normal Distributions
Properties of a Normal Distribution
1. The mean, median, and mode are equal.
2. The normal curve is bell-shaped and symmetric about the
mean.
3. The total area under the curve is equal to one.
4. The normal curve approaches, but never touches the x-axis as it
extends farther and farther away from the mean.
5. Between μ  σ and μ + σ (in the center of the curve), the graph
curves downward. The graph curves upward to the left of μ  σ
and to the right of μ + σ. The points at which the curve changes
from curving upward to curving downward are called the
inflection points.

11/13/2023 Probability Distributions 235


Properties of Normal Distributions

Total area = 1
Inflection points

x
μ  3σ μ  2σ μσ μ μ+σ μ + 2σ μ + 3σ

11/13/2023 Probability Distributions 236


The Family of Normal Distribution

A normal distribution can have any mean and any


positive standard deviation.

The Line of symmetry for the curve indicates the mean of the
distribution, and the spread shows the magnitude of the standard
deviation Probability Distributions
The area under the curve
 The area under a curve can be obtained:
a. By taking the integral of an interval, (a, b)
b
Area under the curve b/n a & b =  f(x)dx
a

1
e ( x   ) / 2 2
2
Wheref ( x )  a b
2 
b. By preparing a tables containing areas for each curve

However, both of these are not good solutions because:


i. Either it requires us to have some knowledge of calculus or
ii. Preparing tables for the infinite family of normal curves is impossible

11/13/2023 Probability Distributions 238


The Standard Normal Distribution
Standardization solves the above two problems
Each data value of normally distributed random variable x can be transformed into a z-
score by using the formula:

Va lu e - Mea n x -μ
z = = .
St a n da r d devia t ion σ
z-score = no. of σ-units above (positive z) or below (negative z) a distribution mean μ
The resulting distribution will be the standard normal with a mean of 0 and a standard
deviation of 1.

The horizontal scale


corresponds to z-scores.
z
11/13/2023
3 2 1 0 1 2
Probability Distributions
3 239
The Standard Normal Distribution

The area that falls in the interval under the


unstandardized normal curve (the x-values)
is the same as the area under the standard
normal curve (within the corresponding z-
3 2 1 0 1 2 3
z boundaries)
X
-x3, -x2, -x1 μ x1 , x 2 , x 3 That means standardization preserves area.

After the formula is used to transform an x-value into a z-


score, a Standard Normal Table can be used to find the
area under the curve.
11/13/2023 Probability Distributions 240
The Standard Normal Table
Properties of the Standard Normal Distribution
1. The cumulative area is close to 0 for z-scores close to z = 3.49.
2. The cumulative area increases as the z-scores increase.
3. The cumulative area for z = 0 is 0.5000.
4. The cumulative area is close to 1 for z-scores close to z = 3.49

Cum. Area is close to 1.


Cum. Area is close to 0. z
3 2 1 0 1 2 3
z = 3.49 z = 3.49
z=0
Cum. Area is 0.5000.
11/13/2023 Probability Distributions 241
The Standard Normal Table
Example:
Find the cumulative area that corresponds to a z-score of 2.71.

Standard Normal Table


z .00 .01 .02 .03 .04 .05 .06 .07 .08 .09

0.0 .5000 .5040 .5080 .5120 .5160 .5199 .5239 .5279 .5319 .5359

0.1 .5398 .5438 .5478 .5517 .5557 .5596 .5636 .5675 .5714 .5753

0.2 .5793 .5832 .5871 .5910 .5948 .5987 .6026 .6064 .6103 .6141

2.6 .9953 .9955 .9956 .9957 .9959 .9960 .9961 .9962 .9963 .9964

2.7 .9965 .9966 .9967 .9968 .9969 .9970 .9971 .9972 .9973 .9974

2.8 .9974 .9975 .9976 .9977 .9977 .9978 .9979 .9979 .9980 .9981

Find the area by finding 2.7 in the left hand column, and then
moving across the row to the column under 0.01.
The area to the left of z = 2.71 is 0.9966.
11/13/2023 Probability Distributions 242
The Standard Normal Table
Example:
Find the cumulative area that corresponds to a z-score of 0.25.

Standard Normal Table


z .09 .08 .07 .06 .05 .04 .03 .02 .01 .00

3.4 .0002 .0003 .0003 .0003 .0003 .0003 .0003 .0003 .0003 .0003

3.3 .0003 .0004 .0004 .0004 .0004 .0004 .0004 .0005 .0005 .0005

0.3 .3483 .3520 .3557 .3594 .3632 .3669 .3707 .3745 .3783 .3821

0.2 .3859 .3897 .3936 .3974 .4013 .4052 .4090 .4129 .4168 .4207

0.1 .4247 .4286 .4325 .4364 .4404 .4443 .4483 .4522 .4562 .4602

0.0 .4641 .4681 .4724 .4761 .4801 .4840 .4880 .4920 .4960 .5000

Find the area by finding 0.2 in the left hand column, and then
moving across the row to the column under 0.05.
The area to the left of z = 0.25 is 0.4013
11/13/2023 Probability Distributions 243
Guidelines for Finding Areas
Finding Areas Under the Standard Normal Curve
1. Sketch the standard normal curve and shade the appropriate
area under the curve.
2. Find the area by following the directions for each case shown.
a. To find the area to the left of z, find the area that
corresponds to z in the Standard Normal Table.

2. The area to the left


of z = 1.23 is
0.8907.

1. Use the table to find 0 the 1.23


area for the z-score.

11/13/2023 Probability Distributions 244


Guidelines for Finding Areas
Finding Areas Under the Standard Normal Curve
b. To find the area to the right of z, use the Standard Normal
Table to find the area that corresponds to z. Then subtract
the area from 1.

2. The area to the 3. Subtract to find the area to the


left of z = 1.23 is right of z = 1.23: 1
0.8907. 0.8907 = 0.1093.

z
0 1.23
1. Use the table to find the
area for the z-score.

11/13/2023 Probability Distributions 245


Guidelines for Finding Areas
Finding Areas Under the Standard Normal Curve
c. To find the area between two z-scores, find the area
corresponding to each z-score in the Standard Normal
Table. Then subtract the smaller area from the larger area.

2. The area to the 4. Subtract to find the area of the


left of z = 1.23 is region between the two z-
0.8907. scores: 0.8907 
0.2266 = 0.6641.
3. The area to the left of
z = 0.75 is 0.2266.

z
0.75 0 1.23

1. Use the table to find the area for the z-


score.

11/13/2023 Probability Distributions 246


Normal Distributions and
Probabilities

11/13/2023 Probability Distributions 247


Probability and Normal Distributions
We know that the area under any normal curve is 1 unit

Therefore, we can link these areas with probability

i.e. if a random variable, x, is normally distributed, the probability that x will fall in a given
interval is the area under the normal curve for that interval.

Or P(a  x  b) = area under the curve


between a and b.

There is no probability attached to any single value of x. That is, P(x = a) = 0.

11/13/2023 Probability Distributions 248


Probability and Normal Distributions
Normal Distribution Standard Normal Distribution

μ = 10 μ=0
σ=5 σ=1

P(x < 15) P(z < 1)

x z
μ =10 15 μ =0 1

Same area

P(x < 15) = P(z < 1) = Shaded area under the curve
= 0.8413
11/13/2023 Probability Distributions 249
Probability and Normal Distributions
Example: The average weight of pregnant women attending a prenatal care in a clinic was 78kg
with a standard deviation of 8kg. If the weights are normally distributed:

a) Find the probability that a randomly selected pregnant woman weights less than 90kg.

x - μ 90 - 78
z  =
μ = 78 σ 8
σ=8 = 1.5

P(x < 90)


The probability that a
randomly selected pregnant
x
μ =78 90 woman weights less than
1.5 z 90kg. is 0.9332.
Z =0
?
P(x < 90) = P(z < 1.5) = 0.9332
11/13/2023 Probability Distributions 250
Probability and Normal Distributions
Example:
b) Based on the above example, find the probability that a pregnant woman weights
greater than 85kg.

x - μ 85 - 78
μ = 78 z = =
σ 8
σ=8
= 0.875  0.88
P(x > 85)
The probability that a
x randomly selected pregnant
μ =78 85 woman weights greater than
z
μ =0 0.88
?
85kg. is 0.1894.

P(x > 85) = P(z > 0.88) = 1  P(z < 0.88) = 1  0.8106 = 0.1894

11/13/2023 Probability Distributions 251


Probability and Normal Distributions
Example:
From the above example, find the probability that a randomly selected pregnant woman
weights between 60 and 80.

x - μ 60 - 78 = -2.25
z1 = =
σ 8
P(60 < x < 80) x - μ 80 - 78
z2  = = 0.25
σ 8
μ = 78
σ=8
The probability that a
x randomly selected pregnant
60 μ =78 80 women weights between 60
z
2.25
? Z =0 0.25
?
and 80 is 0.5865.

P(60 < x < 80) = P(2.25 < z < 0.25) = P(z < 0.25)  P(z < 2.25)
= 0.5987  0.0122 = 0.5865
11/13/2023 Probability Distributions 252
Finding z-Scores
Example:
In a certain population, the proportion of individuals with uric acid level less than a certain
limit is 36.7%.
• Find the z-score that corresponds to this cut of point

z .09 .08 .07 .06 .05 .04 .03 .02 .01 .00

0.3 .3483 .3520 .3557 .3594 .3632 .3669 .3707 .3745 .3783 .3821

0.2 .3859 .3897 .3936 .3974 .4013 .4052 .4090 .4129 .4168 .4207

0.1 .4247 .4286 .4325 .4364 .4404 .4443 .4483 .4522 .4562 .4602

0.0 .4641 .4681 .4724 .4761 .4801 .4840 .4880 .4920 .4960 .5000

Find the z-score by locating 0.367 in the body of the Standard Normal
Table. Use the value closest to 0.367.
The z-score is 0.34.
11/13/2023 Probability Distributions 253
Finding a z-Score Given a Percentile
Example:
Find the z-score that corresponds to P75

Area = 0.75

z
μ =0 ?
0.67

The z-score that corresponds to P 75 is the same as the z-score that


corresponds to an area of 0.75.
The z-score is 0.67.

11/13/2023 Probability Distributions 254


Class work
 For a standard normal variable Z find
a) P(-2.2 <Z<1.2)
b) P(0<Z<0.96)
c) P(Z>1.05)
d) p(-1.45 <Z<0)
 If X is normally distributed with μ = 12 and = 4.
• Find the probability of
i. X>20
ii. X<20
iii. 0 <X < 12
iv. Find a when (X > a) = 0.24
V. Find b and c when P(b < X < c) = 0.5 and P(X > c) = 0.25
11/13/2023 Probability Distributions 255
Transforming a z-Score to an x-Score
To transform a standard z-score to a data value, x, in a given population, use the
formula

x  μ + zσ.
Example:
The monthly expenses for cigarette by smokers in a city are normally
distributed with a mean of 120Birr and a standard deviation of 16 Birr.
Find the monthly expense corresponding to a z-score of 1.60.

x  μ + zσ
= 120 + 1.60(16)
= 145.6
We can conclude that an expense of 145.60 Birr for cigarette is 1.6
standard deviations above the mean.
11/13/2023 Probability Distributions 256
Exercise
A population of sandwich has a mean weight of 250 grams
with standard deviation of 20 grams. Based on this
information give a short answer to the following questions.

1. What proportion of sandwiches will weight above 289.2


grams?
2. What is the probability that a randomly selected sandwich
will weight between 250 and 289.2 grams?

11/13/2023 Probability Distributions 257


Table : Normal distribution
Area between 0 and z
0.00 0.01 0.02 0.03 0.04 0.05 0.06 0.07 0.08 0.09
0.0 0.0000 0.0040 0.0080 0.0120 0.0160 0.0199 0.0239 0.0279 0.0319 0.0359
0.1 0.0398 0.0438 0.0478 0.0517 0.0557 0.0596 0.0636 0.0675 0.0714 0.0753
0.2 0.0793 0.0832 0.0871 0.0910 0.0948 0.0987 0.1026 0.1064 0.1103 0.1141
0.3 0.1179 0.1217 0.1255 0.1293 0.1331 0.1368 0.1406 0.1443 0.1480 0.1517
0.4 0.1554 0.1591 0.1628 0.1664 0.1700 0.1736 0.1772 0.1808 0.1844 0.1879
0.5 0.1915 0.1950 0.1985 0.2019 0.2054 0.2088 0.2123 0.2157 0.2190 0.2224
0.6 0.2257 0.2291 0.2324 0.2357 0.2389 0.2422 0.2454 0.2486 0.2517 0.2549
0.7 0.2580 0.2611 0.2642 0.2673 0.2704 0.2734 0.2764 0.2794 0.2823 0.2852
0.8 0.2881 0.2910 0.2939 0.2967 0.2995 0.3023 0.3051 0.3078 0.3106 0.3133
0.9 0.3159 0.3186 0.3212 0.3238 0.3264 0.3289 0.3315 0.3340 0.3365 0.3389
1.0 0.3413 0.3438 0.3461 0.3485 0.3508 0.3531 0.3554 0.3577 0.3599 0.3621
1.1 0.3643 0.3665 0.3686 0.3708 0.3729 0.3749 0.3770 0.3790 0.3810 0.3830
1.2 0.3849 0.3869 0.3888 0.3907 0.3925 0.3944 0.3962 0.3980 0.3997 0.4015
1.3 0.4032 0.4049 0.4066 0.4082 0.4099 0.4115 0.4131 0.4147 0.4162 0.4177
1.4 0.4192 0.4207 0.4222 0.4236 0.4251 0.4265 0.4279 0.4292 0.4306 0.4319
1.5 0.4332 0.4345 0.4357 0.4370 0.4382 0.4394 0.4406 0.4418 0.4429 0.4441
1.6 0.4452 0.4463 0.4474 0.4484 0.4495 0.4505 0.4515 0.4525 0.4535 0.4545
1.7 0.4554 0.4564 0.4573 0.4582 0.4591 0.4599 0.4608 0.4616 0.4625 0.4633
1.8 0.4641 0.4649 0.4656 0.4664 0.4671 0.4678 0.4686 0.4693 0.4699 0.4706
1.9 0.4713 0.4719 0.4726 0.4732 0.4738 0.4744 0.4750 0.4756 0.4761 0.4767
2.0 0.4772 0.4778 0.4783 0.4788 0.4793 0.4798 0.4803 0.4808 0.4812 0.4817
2.1 0.4821 0.4826 0.4830 0.4834 0.4838 0.4842 0.4846 0.4850 0.4854 0.4857
2.2 0.4861 0.4864 0.4868 0.4871 0.4875 0.4878 0.4881 0.4884 0.4887 0.4890
2.3 0.4893 0.4896 0.4898 0.4901 0.4904 0.4906 0.4909 0.4911 0.4913 0.4916

11/13/2023 Probability Distributions 258

You might also like