0% found this document useful (0 votes)
73 views44 pages

Ch.1 All & Ch.2 Introduction

The document discusses statistics, defining it as the science of collecting, organizing, analyzing, and interpreting data to assist with decision making. It covers topics like descriptive versus inferential statistics, ethical reporting considerations, and different types of data classification including univariate, bivariate, multivariate, nominal, ordinal, interval, and ratio levels.

Uploaded by

Binyam Ayele
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
73 views44 pages

Ch.1 All & Ch.2 Introduction

The document discusses statistics, defining it as the science of collecting, organizing, analyzing, and interpreting data to assist with decision making. It covers topics like descriptive versus inferential statistics, ethical reporting considerations, and different types of data classification including univariate, bivariate, multivariate, nominal, ordinal, interval, and ratio levels.

Uploaded by

Binyam Ayele
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 44

Topic 1: Introduction: The Nature of Statistics

Topic Learning Objectives:

By the end of this session students are expected to:


 Define statistics and explain its application in business
 List and describe the stages of activities in statistics
 Discuss the classification of data and statistics
 Explain ethical and reporting considerations

Topic Outline

1. Definition of statistics
2. Application of statistics in business
3. Stages of activities in statistics
4. Classification of data
5. Types of statistics
6. Ethical and reporting considerations in statistics
7. Synopsis
8. Wrap up discussion questions
9. Next session’s assignment

Reading Assignment Discussion:

 What is statistics?
 Give examples of the use of statistics in different functions of business
 List and explain types of data.
 Define the two types of statistics.
 What are the ethical and reporting considerations that must be taken in
statistical investigation?
Reading Text:

Statistics is the science of data; it is the process or science of data collecting,


organizing, presenting, analyzing and interpreting to assist in making more effective
decisions.
Application of statistics in business: Statistics can be applied in the different
functional areas of business: accounting and finance, production, marketing, human
resource management, economic analysis etc. It is generally used in the process of
identifying business problems and finding solutions for improving business decisions
and hence practices.
Stages of activities in statistics
1. Planning (for data collection) and data Collection: Data collection is the
process of gathering information or data about the variable of interest. Data
may be obtained either from primary source or secondary source. Primary
data is collected through different methods including interview, questionnaire,
and direct or physical observation. Data collection must be preceded by
proper planning that addresses issues of scope, purpose, time, cost,
adequacy (volume), quality, reliability and relevancy of the data being
collected.
2. Organization of Data: It includes editing (checking for and correcting errors),
coding (assigning meaning for data items), classification (grouping the
collected and edited data into different similar categories based on some
criterion) and tabulation.
3. Presentation of Data: It involves displaying what is contained in data in the
form of tables, and pictures (diagrams and graphs). It facilitates
understanding and analysis.
4. Analysis of Data: The collected and organized data is manipulated so as to
generate or find different quantitative results. It involves conducting
mathematical operations, computations and measures (like average,
variation, etc.).
5. Interpretation: It associates meaning to the results obtained in the analysis
stage. It is by far the most difficult and skill requiring stage. It must be done
with great caution and integrity not to distort final results.
Classification of Data
Data collected during statistical study may be classified into different types based on
various criteria:
1. Univariate, Bivariate, and Multivariate based on variables measured for one
subject studied. E.g. Salary of Employees; Age of Employees; Total hours worked
etc. (data can be collected on one, two or more variables)
2. Time series vs. cross-sectional data based on time involved in the study. The
former is data collected at different point in time and the latter refers to data
collected at one point in time (at different setting/places). E.g. The price of teff
for consecutive months; and The price of teff in different markets of Addis Ababa
today.
3. Primary and Secondary data based on sources; if data is collected from original
source it is primary data but if data is obtained from intermediaries it is
secondary.
4. Data can be classified according to levels of measurement. The level of
measurement determines how data should be summarized and presented. It also
will indicate the type of statistical analysis that can be performed
 The nominal level data are sorted into categories (labels or names) with
no particular order to the categories. They can only be classified and
counted. E.g. Types of Cars in a small city, Gender
 The ordinal level data is such that one classification is ranked higher than
another and data can be counted. E.g. Performance rating result of
employee (Superior, Good, Average, Poor, Inferior)
 The interval level data has the ranking characteristic of the ordinal-level
data plus the characteristic that the distance between values is a constant
size; it has arbitrary 0 point, and basic arithmetic operations are possible.
E.g. Temperature on the Fahrenheit scale, O’clock, Clothing size
 The ratio level data has all the characteristics of the interval-level data,
plus there is a fixed 0 point and the ratio of two values is meaningful. E.g.
Salary, weight
5. Qualitative and Quantitative based on numeric nature of the data.
 Qualitative data is non-numeric in nature and is classified into nominal
(categorical) and ordinal data.
o Nominal data can only be put into groups; E.g. Gender (Male and
Female), number codes can be applied to each category/response
for labeling only.
o Ordinal data can be ranked; E.g. preferences (Strongly agree,
agree, neutral, disagree, strongly disagree), numbers can be
assigned to each response to indicate order/rank)
 Quantitative data is numeric in nature and is classified as discrete and
continuous data.
o Discrete data: is often whole number (it is the result of count),
discontinuous number, it assumes specific value and there is a gap
between the series of numbers. E.g Number of households; Score
of a football game, size of clothing etc.
o Continuous data: can be fraction and decimal, there is no gap
between the series, values can assume any number in an interval
infinitely. E.g. Time to run a marathon, distance, height etc.

Types of Statistics

Statistics is subdivided into descriptive and inferential. Descriptive statistics is concerned


with summarizing and describing a body of data. Inferential statistics is the process of
reaching generalizations about the whole (called the population) by examining a portion
(called the sample). In order for this to be valid, the sample must be representative of
the population and the probability of error also must be specified.
For instance, suppose that we have data on the income of 2000 families in Addis
Ababa. The body of data can be summarized by finding average family income and the
variation or spread of these family incomes above and below the average. The data can
also be described by constructing a table, chart, or graph of the number or proportion
of families in each income class. This is descriptive statistics. If these 2000 families are
representative sample of all families in Addis Ababa, we can estimate the average
family income of all (population of) families in Addis Ababa as a whole. Since the
conclusions are subject to error, we also would have to indicate the probability of error.
This is statistical inference.

Ethical and reporting considerations

Statistics must be practiced with integrity and honesty; an independent and principled
point-of-view should be held when analyzing and reporting findings and results. In
statistics, both good and bad results should be presented in a fair, objective and neutral
manner; inappropriate summary measures should not be used to distort facts. The real
contribution of statistics to society is a moral one. For instance, financial analysts need
to provide information that truly reflects a company’s performance so as not to mislead
individual investors; and information regarding product defects that may be harmful to
people must be analyzed and reported with truthfulness.

Synopsis

 Statistics is the science of collecting, organizing, presenting, analyzing and


interpreting data to assist in making more effective decisions.
 Statistics can be applied in many functional areas of business.
 Data collected during statistical study may be classified into different types
based on various criteria: Univariate, bivariate, multivariate; Primary vs.
Secondary; Time series vs. Cross-sectional; Nominal level data, Ordinal
level data, Interval level data, and Ration level data; Qualitative (Ordinal
vs. Nominal) vs. Quantitative (Discrete vs. Continuous).
 Statistics is classified into descriptive and inferential. Descriptive statistics
is concerned with summarizing and describing a body of data for easy
understanding. Inferential statistics is the process of reaching
generalizations about the population by examining a sample.
 Data analysis is objective, one should report the summary measures that
best describe and communicate the important aspects of the data set.
Data interpretation is subjective; it should be done in fair, neutral and
clear manner.

Wrap up Discussion Questions:

 Define statistics in your own terms


 Explain with examples how statistics is applied in many functional areas of
business.
 Explain the basic activitie s involved in any statistical investigation.
 Discuss the types of data used in statistical investigation with example
 Briefly describe the two types of statistics
 What are the ethical and reporting considerations that should be taken while
applying statistics?

Next Session’s Assignment:

 Read about similarity and difference of descriptive and inferential statistics


Topic 2: Introduction: The Types of Statistics

Topic Learning Objectives:

By the end of this session students are expected to:


 List and describe the features of each type of statistics
 Compare and contrast descriptive and inferential statistics

Topic outline
1. Characteristics of Descriptive Statistics
2. Characteristics of Inferential Statistics
3. Synopsis
4. Wrap up discussion questions
5. Next session’s assignment

Reading Assignment Discussion:

 What are the similarities and differences of the two types of statistics?

Reading Text:

Statistics started as a purely descriptive science, but it grew into a powerful tool of
decision making as its inferential branch was developed. Modern statistical analysis
refers primarily to inferential statistics. However, descriptive and inferential statistics are
complementary.

1. Descriptive Statistics: deals with (representing) summarizing data in terms of


tables, graphs and numbers for easy understanding.
 It involves organizing, presenting and analyzing data.
 It describes(only) without generalization facts about population(totality of study)
or sample(subset of population studied)
 It explains known event that already occurred (past) and is certain activity
 It uses tables, graphs and numbers as tools
2. Inferential (Inductive) Statistics: deals with finding out (determining)
something about a population based on sample.
 It is based on sampling (as census is very expensive and time consuming)
 It is therefore based on descriptive statistics of sample
 It infers population values (parameters) based on sample values or results
(statistics)
 It infers (makes generalizations, conclusions, predictions/estimations/ forecasts,
hypothesis testing) about the population based on sample
 It thus involves inductive reasoning (it ascribes properties to the whole starting
with the specific
 It may involve predicting the future based on descriptive statistics of the past
 It deals with the unknown, thus it involves uncertainty, there is a possibility of
error in inference; probability is an essential element in statistical inference
 It can be summarized as 2 activities:
o Statistical estimation refers to predicting (determining) population
parameters based on random sample result or statistic and
o Hypothesis testing refers to proving the validity of a claim about
population/ parameter (hypothesis) bases on random sample data/statistic

Synopsis

 The two types of statistics descriptive and inferential statistics have


similarities and differences; but together they make the theory and
practice of statistics complete and useful
 Descriptive statistics summarizes data in the form of tables, graphs and
numbers for easy understanding
 Inferential statistics technically speaking refers to estimation and
hypothesis testing.
 Probability theory is the essential element of statistical inference in that it
helps to measure uncertainty and hence possibility of errors.

Wrap up Discussion Questions:

 Compare and contrast descriptive and inferential statistics. Give examples.


 Why do we need statistical inference? Why don’t we always conduct census?
 Why is probability used in inferential statistics?
 Statistical inference basically involves estimation and hypothesis testing. Explain.

Next Session’s Assignment:

 Read about describing data in tables and graphs

Topic 3: Introduction: Describing Data in terms of Tables and Graphs

Topic Learning Objectives:

By the end of this session students are expected to:


 Revise how to represent data in tables and graphs

Topic outline
1. Tables
2. Graphs
3. Diagrammatic Representation
4. Exploratory Data Analysis (EDA)
5. Synopsis
6. Wrap up discussion questions
7. Next session’s assignment

Reading Assignment Discussion:


 How does descriptive statistics represent data in terms of tables and graphs?
Reading Text:

Descriptive Statistics: deals with (representing) summarizing data in terms of tables,


graphs and numbers for easy understanding.

Tables:

Frequency distribution is a tabular summary of a set of data in terms of several non-


overlapping classes or class intervals or groups and corresponding frequency (number).
Data organized in a frequency distribution are called grouped data. In contrast, for
ungrouped data every observed value of the random variable is listed

A cross-classification table (also called a cross-tabulation or Contingency table) is


used to describe the relationship between two nominal variables. Also, stem and leaf
display, etc. are used to organize and present data.

Graphs:

Graphical representation of frequency distribution:

 Histogram is a graph consisting of a series of adjacent rectangles whose


bases are equal to the class width of the corresponding classes and whose
heights are proportional to the corresponding class frequencies or relative
frequencies. Class intervals are plotted along x-axis and (relative) frequencies
along y axis. Histograms could be symmetric, left skewed, right skewed, or
uniform.
 Frequency polygon is a line graph of frequency distribution; discrete
variables or class marks are plotted against frequencies. It can be constructed
by joining the midpoints of Histogram with a straight line.
 Frequency curve is a smoothed frequency polygon. The curve is drawn
freehand through points of the polygon such that the total area under the
curve is equal to that of the polygon. Frequency curve may have different
shape in terms of the degree of asymmetry (skewness) and peakedness
(kurtosis).
 Ogive is the graph of the cumulative frequency distribution. Ogives are of
two kinds:
o ‘Less than’ ogive (< Ogive): upper class boundaries are plotted
against the ‘less than’ cumulative frequencies of the respective class &
they are joined by adjacent lines
o More than’ ogive (> Ogive): lower class boundaries are plotted
against the ‘more than’ cumulative frequencies of their respective class
and they are joined by adjacent lines. The intersection point of the two
graphs gives the median.
 Lorenz curve: it is a graphical method of studying dispersion. The
percentage cumulative frequencies (x-axis) are plotted against the
percentage cumulative values of variables (y-axis), and plotted points are
joined by smooth curve. Another straight line should also be drawn joining
the points (0,0) and (100, 100).

Diagrammatic Representation:

 A Pareto chart is similar to a histogram, except that it is a frequency bar chart


for a qualitative variable, rather than being used for quantitative data that have
been grouped into classes. The bars of the chart, which can represent either
frequencies or relative frequencies (percentages) are arranged in descending
order from left to right. Pareto charts are used in process control to tabulate the
causes associated with assignable-cause variations in the quality of process
output.
 Bar graph is a graphical portrayal consisting equally spaced bars where height
represents the frequency (at y-axis) & horizontal axis holds the labels of
qualitative data summarized into frequency distribution. It could be simple,
multiple, subdivided, percentage bar charts and etc.
 Pie Chart is a circular diagram divided into various sectors representing
percentage frequencies commonly.
 Pictogram uses symbols or pictures to represent data.
 Cartograms are statistical maps, used to give quantitative information on
geographical basis. The quantitative on the map can be shown in many way such
as shades, colors, dots, etc.
 A line graph portrays time-series amounts by a connected series of line
segments. Time series Plot is a line graph representing relationship between
time (on x-axis) and values of variables (on y-axis). It is used for forecasting.
We can compare 2 time series data. Also, vertical line graph is a graphical
representation of discrete data with respect to the frequencies. Vertical solid
lines are used to indicate the frequencies.
 A scatter plot: plot of points of co-ordinates representing predictor & the one
being predicted. It is used to study trends of relationship for bivariate
distribution.
 Radar chart is a two-dimensional chart of multivariate data, with three or more
quantitative variables represented on axis starting from the same point. It has
different types and diagram looks like a spider web.
 Dot plot (chart) is a graphical display for small datasets that used dots (data
points) plotted on a simple scale (graph). It is used to compare frequency counts
within groups or categories.

Exploratory Data Analysis (EDA): As the name implies, it is concerned with


techniques for preliminary analyses of data in order to gain insights about patterns and
relationships. Stem and leaf display and Box plot are used in EDA.
 Stem and leaf display (plot) is a diagram used to present quantitative data in
condensed form. In doing so each element of data set is split into 2 parts the
first being the stem and the trailing digits being the leaves. This pattern is
analogous to natural setting of stem and leaves coming out of it. It is almost
similar to a horizontally oriented histogram. Legends must be given for every
diagram.
 Box Plot is a graphical summary of data that is based on a five-number
summary (the minimum and maximum values within the limits of [Q1-1.5*IQR
and Q3+1.5*IQR (IQR=interquartile range], first quartile Q1, second quartile
(median), and the third quartile Q3). It is helpful to spot outliers (extreme values
which are outside the range of 1.5*IQR from the quartiles), they are shown with
an asterisk. A box is drawn with its ends at Q1 and Q3, a vertical line in the box
marks the median; dashed lines are drawn from the ends of the box to the
smallest and largest values inside the limits indicated above.

Example 3

1. Assume the following dataset represents the age of a sample of 30 persons


belonging to a club:

20, 18, 25, 68, 23, 25, 16, 22, 29, 37, 35, 49, 42, 65, 37, 42, 63, 65, 49, 42,
53, 48, 65, 72, 69, 57, 48, 39, 58, 67

Required

a. Develop the frequency distribution for the dataset, show the class
boundaries, class mark, relative frequencies, percentage relative frequencies,
less than and greater than cumulative frequencies and their percentages.
b. Draw the ogive
c. Show the stem and leaf display

Solution:

a. The ordered dataset:

16, 18, 20, 22, 23, 25, 25, 29, 35, 37, 37, 39, 42, 42, 42, 48,48, 49, 49, 53, 57,
58, 63, 65, 65, 65,67, 68, 69, 72
 Determine the number of classes (n). Ensure optimum size. You can use models
like n= 1+3.222 Log N ( N=total # of observation) or use 2 n≈N; thus
n=1+3222*log30=5.76≈6 classes
 Determine width of Class intervals. Class width (Cw)=(Range/n)=(72-
16)=56/6=9.33≈10

Table 3.1 Frequency distribution for Example 3.1

Class Class Class mark Relative LCF LCRF GCF GCRF


Frequency % RF
Interval Boundaries (Lcl+Ucl)/2 Frequency (RF) (<UCL) &% (>LCL) &%
5/30= 30/30=
15-24 14.5-24.5 5 19.5 5/30=0.167 16.7 5 30
16.7 100%
8/30= 25/30=
25-34 24.5-34.5 3 29.5 3/30=0.1 10 8 25
26.7 83.3
15/30= 22/30=
35-44 34.5-44.5 7 39.5 7/30=0.233 23.3 15 22
50 73.3
20/30= 15/30=
45-54 44.5-54.5 5 49.5 5/30=0.167 16.7 20 15
66.7 50
23/30= 10/30=
55-64 54.5-64.5 3 59.5 3/10=0.1 10 23 10
76.7 33.3
30/30= 7/30=
65-74 64.5-74.5 7 69.5 7/10=0.233 23.3 30 7
100 23.3

b.

Ogive
40
30
GCF & LCF

20 LCF

10 GCF

0
15-24 25-34 35-44 45-54 55-64 65-74

Figure 3.1 The Ogive for Example 3.1


c.
Stem Leaf

1 68
2 023559
3 5779
4 2228899
5 378
6 3555789
7 2
Leaf unit=1

Figure 3.2 The Stem and Leaf Display for Example 3.1

2. The following frequency distribution represent the weekly wages of 100 entry-level
workers:

Table 3.2 Weekly Wages of Entry-level Workers

Required: Draw the histogram, frequency polygon, and the frequency curve.
Solution:

Figure 3.3 The Histogram for Table 3.2

Figure 3.4 The Frequency Polygon for Table 3.2

Figure 3.5 The Frequency Curve for Table 3.2


Exercise 3

1. Assume the following data are taken from a sample salary of 40 employees in
thousands of Birr.

48 35 57 48 52 56 51 44 52 45

40 40 50 31 52 37 51 41 55 41

47 45 46 42 53 43 44 39 46 45

50 50 44 49 45 45 50 42 54 47

a. Develop a frequency distribution of an inclusive class.


b. Show the class boundaries, class mark, relative frequencies, percentage relative
frequencies, less than and greater than cumulative frequencies and their
percentages.
c. Prepare a histogram, frequency polygon, frequency curve, ogive, and Lorenz
curve of the dataset.
2. A sample of 25 high school student who planned to go to a college asked as to
which of the following majors he or she intends to choose Accounting, management,
marketing, economics, and law. The responses of these students are listed below.

Law Economics Management Accounting Accounting


Management Accounting Law Marketing Law
Marketing Law Economics Accounting Economics
Law Marketing Law Management Law
Accounting Management Management Law Economics

a. Prepare a frequency distribution table for the data.


b. Calculate the relative frequency and percentage of distribution for the data.
c. Construct a bar graph for the frequency distribution.
d. Draw a pie chart for the percentage distribution.
3. Develop stem and leaf display, box plot, and dot plot for the following datasets that
represent the amount spent in dollars in the grocery store by a sample of 12 people.

12 28 32 24 17 6 34 18 22 42 36 26

4. Construct a stem-and-leaf display and box plot for the following data which is
collected on the monthly rents paid (for public houses) by a sample of 20
households selected from the city of Addis Ababa.

429 540 650 585 578 1020 1070 780 989 930

870 1020 750 660 975 820 550 880 956 950
5. Give your own examples for Contingency table, Pareto-chart; scatter diagram, time
series plot, vertical line graph, pictogram, dot plot, and radar chart.

Synopsis
 Descriptive statistics represents data in terms of tables and graphs for easy
understanding using different methods
 Tables: Frequency distribution, and Contingency table, etc.
 Frequency distributions can be represented in the form of graphs like histogram,
frequency polygon, frequency curve, ogive, and Lorenz curve.
 Other graphs may include: bar graphs, pie chart, pareto chart, line graph (time
series plots, vertical line graph), scatter diagram, radar chart, pictogram,
cartogram, dot chart, etc.
 Stem and leaf display and box plots are graphic (diagrammatic) displays used for
(EDA) exploratory data analysis (preliminary analyses of data in order to gain
insights about patterns and relationships)

Wrap up Discussion Questions:

 List tabular methods of presenting data.


 Indicate the graphic displays that are used to represent frequency distribution
 Briefly list and describe (other graphic) diagrammatic methods used to describe
data
 Explain how the two dimensions of a shape of a distribution can be inferred from
histogram or frequency curve?
 What is EDA? What graphic methods are used for EDA?

Next Session’s Assignment:

 Attempt Exercise 3, #1-5


 Read about the different summary measures (numbers) computed during data
analysis in descriptive statistics

Topic 4: Introduction: Describing Data in terms of Numbers (Summary


Measures)

Topic Learning Objectives:

By the end of this session students are expected to:


 Revise how to represent data in numbers
 List and describe the classification, methods and purposes of the summary
measures employed in descriptive statistics

Topic outline

1. Measures of central tendency


2. Measures of dispersion
3. Measures of shapes of distribution
4. Measures of association
5. Synopsis
6. Wrap up discussion questions
7. Next session’s assignment

Reading Assignment Discussion:


 Indicate the summary measures employed in descriptive statistics to
represent data in terms of numbers.

Reading Text:

Descriptive Statistics: deals with (representing) summarizing data in terms of tables,


graphs and numbers for easy understanding. It summarizes data of population or
sample in the form of numbers, by computing different kinds of summary measures
during data analysis. If the number (summary measure) belongs to population it is
called parameter and if it belongs to sample it is called statistic.
1. Measures of central tendency (location): Helps to identify single center of
values summarizing the data set. It is the extent to which all the data values group
around a typical or central value. Common methods and formula of the measures
are summarized as follows:
The mean is a mathematical average; there are different types of mean: Arithmetic
mean, Harmonic mean, Geometric Mean, Weighted Mean, Combined Mean, Moving
Average, Trimmed mean etc.

The arithmetic mean is the most widely used and widely reported measure of
central tendency.
Population mean µ=∑X/N; Sample Mean =∑X/n;
for grouped data: =∑f*cmi/n; f=frequency, cmi=class mark (midpoint), n=total
observation
The Median is the positional measure of average; it is the value (the average of
the two values) at the middle of the dataset sorted in ascending or descending
c n 
order. For grouped data, Median x   l 
~   c. f  ;
f 2 
Where: l = LCB of the median class, c= class interval of the median class,
n
f=frequency of the median class, c.f=cumulative frequency just less than
2
The mode is the most frequently occurring value; for grouped data:
 f1  f 0 
Mode  xˆ   l     c

 1 f  f 0    f1  f 
2 

or
 1 
Mode xˆ  l  
     c
 1 2 

Where l – LCB of the modal class f2 – frequency succeeding f1


f1 – maximum frequency C – magnitude of the class
f0 – frequency preceding f1 ∆1 = f1 – f0
f2 – frequency succeeding f1 ∆2 = f1 – f2
Fractiles (Quartiles, Deciles, and Percentiles) divide the ranked dataset into
(4, 10, and 100) equal portions respectively, and indicate the relative standing of a
value or provide information about the position of particular values relative to the
entire data set.

The Pth percentile is the value for which P percent are less than that value and
(100–P)% are greater than that value.

For ungrouped data: Pth Percentile =(n+1)P/100th value; Dth Deciles


=(n+1)D/10th;Qth Quartile =(n+1)Q/4th

For grouped data:


c  iN 
Qi  l    c. f 
f  4 
c  iN 
Di  l    c. f 
f  10 
c  iN 
Pi  l    c. f 
f  100 
Where: l = LCB of the (P, D, or Q) class, c= class interval of the (P, D, or Q) class,
f= frequency of the (P, D, or Q) class, c.f=cumulative frequency jut less than

; Quartile Qi (i=1, 2, 3); Deciles Di (i=1, 2,…9); Percentiles Pi (1, 2, …99)


2. Measures of Dispersion: It measures variation or spread in the dataset; it
indicates deviation from the center and amongst data elements. Common methods
and their respective formula are enumerated below:
Range: Difference between maximum and minimum values
Interquartile range: Difference between third and first quartile (Q3 - Q1)
Variance: Mean* squared deviation from the mean;
for ungrouped data: Sample Variance (s2)=∑(X- )2/n-1 or [∑X2 -(∑X)2 /n]/n-1;
Population Variance σ2=∑(X- )2/N
For grouped data:

 f (m   ) 2
( mf ) 2
 2 (Population) =
N
or m f  N
2

N
Where; m = midpoint of a class; µ or = population or sample mean; F = Class
frequency; N (n) = total number of population (sample) size, N.B. for a sample
denominator is n-1
Standard deviation (σ or s) is the positive square root of the variance for
population or sample.
Coefficient of variation (V) measures relative dispersion: (population)V=σ/µ,
(sample) V=s/
3. Shapes of distribution refer to its symmetry or lack of it (skewness); and its
peakedness (kurtosis). It is the pattern of distribution of values from the lowest to
the highest value. Two or more distribution may have the same mean and equal
standard deviation but may differ in shape.
a. Skewness is the measure or degree of asymmetry; it influences the
relative position of the mean, median and mode; there are different
measures of skweness, one of which is Pearson’s coefficient of skewness
i.e. (Skp)=3(Mean-Median)/S.dev. Skp is 0 if distribution is symmetric. A
dataset can be:
o Symmetric (Mean=Median=Mode), left and right halves are equal;
o Left (negatively )skewed (Mean<Median< Mode); left tail is
elongated, few small extremes drag the mean to the left of the
median, E.g. consider dataset of exam completion time, normally,
few students finish their exams early, while the majority stays until
about the end of the exam
o Right (positively) skewed (Mean>Median>Mode); few large
extremes drag the mean to the right of the median, the right tail is
elongated. E.g. consider salary of employees in a company,
majority of workers salary is low, while the salary of only few
executives is high.

Figure 4.1 A. Symmetric B. Right Skewed C. Left Skewed Distribution

b. Kurtosis is the measure of peakedness or flatness. Tyeps of kurtosis are


namely: Platykurtic (relatively flat), Mesokurtic (normal); and Leptokurtic
(relatively peaked). It can be measured by dividing the fourth moment by
the standard deviation raised to the power of four. Karl Pearson’s
Coefficient of Kurtosis (population)= = ∑f(X-µ)4 /σ4 ;
for sample =
∑f(X- )4 /s4; the coefficient for normal (Mesokurtic) distribution is 3.

Figure 4.2 Types of Kurtosis


4. Measures of Association measure relationship in terms of strength, direction and
dependence. The covariance measures the strength and direction of the linear
relationship between two numerical variables (X & Y)
Correlation measures the relative strength and direction of the linear relationship
between two numerical variables
Regression: Statistical method that is used to predict unknown variable using
known variable given they are correlated. The details of the measures of relationship
will be discussed thoroughly, later in other sections.

Example 4

A large department store collects data on sales made by each of its salespeople. The
sample of number of sales made on a given day by each of 20 salespeople is shown
below.

9, 6, 12, 10, 13, 15, 16, 14, 14, 16, 17, 16, 24, 21, 22, 18, 19, 18, 20, 17

Required: Find the indicated summary measures and interpret results


a. Find the mean, median and mode number of sales for the department store on a
given day for each of its sales persons
b. Find the following fractiles: P80, P90; Q1, Q3, and D4,
c. Find the range, inter-quartile range, variance, standard deviation, and coefficient
of variation for the number of sales of each salespersons on a given day
d. Find the Pearson’s coefficient of skewness and coefficient of kurtosis for the
distribution and describe its shape.
Solution:
a. The dataset is sorted from the least to the highest as follows:
6, 9, 10, 12, 13, 14, 14, 15, 16, 16, 16, 17, 17, 18, 18, 19, 20, 21, 22, 24
The Mean is the mathematical average=∑Xi/n=317/20=15.85
The Median is the locational average; the value at the middle of the data sorted
in order of magnitude= (16+16)/2=16
6, 9, 10, 12, 13, 14, 14, 15, 16, 16, 16, 17, 17, 18, 18, 19, 20, 21, 22, 24
The Mode is the most frequent value=16
b. Pth Percentile =(n+1)P/100th value; P80=(20+1)*80/100=16.8th
The 16th observation is 19, and the 17th observation is also 20.The 80th
percentile is a point lying 0.8 of the way from 19 to 20 and is thus 19.8.
Interpretation: 80% of the number of sales made by the sales persons on a
given day is below 19.8
P90=(20+1)*90/100=18.9th
Interpretation: 90% of the number of sales made by the sales persons on a
given day is below 21.9
Q1=(20+1)*1/4=5.25th
Interpretation: 25% or quarter of the number of sales made by the sales
persons on a given day is below 13.25
Q3=(20+1)*3/4=15.75th
Interpretation: 75% or 3/4th of the number of sales made by the sales persons
on a given day is below 18.75
D4=(20+1)*4/10=8.4th
Interpretation: 40% of the number of sales made by the sales persons on a
given day is below 15.4
c. Range=Maximum-Minimum values=24-6=18
Interquartile range=Q3-Q1=18.75-13.25=5.5
Variance=∑(X- )2/n-1=378.55/20-1=19.9237 or [∑X2 -(∑X)2 /n]/n-1= (5403-
5024.45)/20-1=378.55/19=19.9237 ; N.B.: The denominator for a sample is n-1

Standard deviation= = =4.46


Table 4.1: Calculation for Variance, Standard Deviation and Kurtosis

X X- (X- )2 X2 (X- )4
6 -9.85 97.0225 36 9413.366

9 -6.85 46.9225 81 2201.721

10 -5.85 34.2225 100 1171.18

12 -3.85 14.8225 144 219.7065

13 -2.85 8.1225 169 65.97501

14 -1.85 3.4225 196 11.71351

14 -1.85 3.4225 196 11.71351

15 -0.85 0.7225 225 0.522006

16 0.15 0.0225 256 0.000506

16 0.15 0.0225 256 0.000506

16 0.15 0.0225 256 0.000506

17 1.15 1.3225 289 1.749006

17 1.15 1.3225 289 1.749006

18 2.15 4.6225 324 21.36751

18 2.15 4.6225 324 21.36751

19 3.15 9.9225 361 98.45601

20 4.15 17.2225 400 296.6145

21 5.15 26.5225 441 703.443

22 6.15 37.8225 484 1430.542

24 8.15 66.4225 576 4411.949

∑X =317 ∑(X- )2 =378.55 ∑X2 =5403 ∑(X- )4 =

=15.85 ∑(X- )2/n-1 =19.9237 =20083.13

(∑X)2= 100489
(∑X)2 /n=5024.45

Coefficient of Variation= (S.dev/Mean)*100=(4.46/15.85)*100=28%


d. Pearson’s Coefficient of Skewness (Skp)=3(Mean-Median)/Sd=3(15.85-16)/4.46
=-0.1; Interpretation: The distribution is very slightly left skewed
Karl Pearson’s Coefficient of Kurtosis= = ∑(X- )4/σ4 =20,083.13/4.464=50.76
Interpretation: As is above 3, the distribution is leptokurtic

Exercise 4

1. A sample of 15 college seniors showed the following credit hours taken during
the final term of the senior year.
15 21 18 16 18 21 19 15
14 18 17 20 18 15 16

Required: Find the indicated summary measures and interpret results

a. Find the sample mean, median and mode number of credit hours taken during
the final term
b. Find the following fractiles: P60, P90; Q1, Q3, and D3,
c. Find the range, inter-quartile range, variance, standard deviation, and coefficient
of variation for the number of credit hours taken during the final term
d. Find the Pearson’s coefficient of skewness and coefficient of kurtosis for the
distribution and describe its shape.
2. The time taken to serve each of a sample of 100 customers at a bank was
observed. The following table gives the frequency distribution of service times for
these 100 customers.
Service time (minutes) Number of customers
0-2 18
2-4 30
4-6 24
6-8 16
8-10 8
10-12 4
Required: Provide the following sample descriptive statistics and interpret results

a. Find the sample mean, median and mode service minutes


b. Find the following fractiles: P40,; Q1, Q3; and D2,
c. Find the range, inter-quartile range, variance, standard deviation, and coefficient
of variation for the service minutes
d. Find the Pearson’s coefficient of skewness and coefficient of kurtosis for the
distribution and describe its shape.
Synopsis
 Descriptive statistics represents data in terms of numbers using different
summary measures
 These numbers might belong to population and are called parameter; or
numbers belong to sample and are called statistic.
 Generally the basic measures conducted in descriptive statistics are summarized
below:

Table 4.2 Basic Summary Measures in Descriptive Statistics

Summary Measures Methods Purpose


Mean, Median, Mode
Measures average, indicates
Central tendency Quartile, Deciles,&
relative position or location
Percentile
Range, Interquartile range,
Variance, Standard Measures the variability or
Variation/Dispersion
deviation, Coefficient of spread in the distribution
Variation
Measures shape of
Skewness and Kurtosis
Shape distribution in terms of
(coefficients)
symmetry and peakedness
Covariance, Correlation, and Measures the strength and
Association/Relationship
Regression direction of relationship
Wrap up Discussion Questions:

 List the basic types of summary measures, their methods, and purpose
 How is the shape of a distribution measured?
 Explain the relative position of the mean, median, and mode in symmetric and
skewed distributions

Next Session’s Assignment:

 Attempt Exercise 4, #1 and 2


 Read about probability theory

Topic 5: Probability Distribution: Probability Theory

Topic Learning Objectives:

By the end of this session students are expected to:


 Explain probability and related concepts
 Discuss objective and subjective approaches to assign probability

Topic outline

1. Concepts of probability
2. Approaches to assign probabilities
3. Synopsis
4. Wrap up discussion questions
5. Next session’s assignment

Reading Assignment Discussion:


 What is probability?
 What are the different approaches that are used to assign probability to
events?
Reading Text:

Descriptive statistics focuses with summarizing something that has already happened. It
is inferential statistics that is involved in making predictions, generalizations &
conclusions based on the sample taken from the populations. Since the probability of
error exists in statistical inference, estimates or tests of a population characteristic are
given together with the chance or probability of being wrong. Probability theory
therefore, forms the basis for inferential statistics (statistical inference), and other fields
that require assessment of occurrences. Much of the decision making environment
involves uncertainties, for instance consider the following:
 What is the chance that the new product will be welcome in market?
 What is the chance that the new machinery will increase product?
 How likely will the project be completed on time?
Thus, knowledge of probability is required so that calculated risk would be taken.

Probability is a numerical measure of the relative likelihood that a particular event will
occur. It is the science of uncertainty. It is the mathematical means of studying
uncertainty & variability. Probability theory provides mechanism for measuring &
analyzing uncertainties associated with future events. Values of probability for an event
ranges between 0 and 1(100%) inclusive; the former represents (non-occurrence or
impossible event), and the latter indicates sure event.

A random experiment is activity that generates data; it is the process of observation


whose actual outcome is uncertain but the possible outcomes are defined. E.g. Tossing
3 coins; Rolling a die; and Count the number of TV sets owned by households in a sub-
city.

Event is the collection of one or more outcomes (result) of an experiment. E.g.


A=Observing at least 1 head; B=observing even numbers; C=and at most 2 TV sets.
Outcome is a particular result of an experiment. Each time random experiment is run
only one of the possible outcomes will occur. E.g. In rolling a die outcome could be any
of the following: 1, 2, 3, 4, 5, 6

Sample space is the set that consists of all the possible outcomes of a random
experiment. E.g. in tossing 3 coins the sample space: {T T T, HT T, THT, HHT, THH,
HTH, HHT, HHH}; this sample space has 8 (sample points) or elements.

Assigning Probabilities:

Probability for a single event can be assigned objectively (based on information) using
the classical, and empirical approach; or subjectively (based on a person’s belief or
estimate of an event’s likelihood).

Classical (a priori) method:

Probability of an event = Number of favorable outcomes


Total number of possible outcomes

It assumes each outcome of an experiment is equally likely; and it is called a priori


method because it allows determining the probability even before the experiment is
conducted. E.g. Probability of even numbers in rolling a die=3/6

Empirical probability (Relative frequency):

P(E)=Number of times the event occurs


Total number of observations (trials)

This is based on relative frequencies. E.g. 10 students scored A out of 100 registered
students for the course Managerial statistics last semester, based on this observation,
the probability that a student will score A =10/100=0.1

Exercise 5

1. A newly established company is planning to recruit trainees for four jobs in the
marketing department. The marketing manager contacted an employment
agency. The agency has selected four candidates and sends them to the
Company. The company will hire those who fulfill the requirement of the job.
Assume that a candidate’s chance to pass/fail the final evaluation is equally
likely. Use P=pass and Pc=fail
a. List all the sample space outcomes for the experiment
b. Identify the sample outcomes corresponding to the following events
i. None of them will qualify (pass)
ii. At least three of them will qualify (pass)

c. Identify the probability for the events in B (i and ii)

Synopsis

 Probability is a numerical measure of the chance or likelihood that a particular


event will occur. 0  P(Ei)  1; where P(Ei ) is the probability of an event.
 Probability for a single event can be assigned objectively (based on information)
using the classical, and empirical approach; or subjectively (based on a person’s
belief or estimate of an event’s likelihood).

Wrap up Discussion Questions:

 Define probability, random experiment, event, sample space, and sample point.
 Why should probability be studied in statistics?
 Explain the objective and subjective approaches of assigning probability.

Next Session’s Assignment:


 Attempt exercise 5, #1
 Read about probability rules
Topic 6: Probability Distribution: Probability Rules

Topic Learning Objectives:

By the end of this session students are expected to:


 Compute the probability of single and multiple events using probability rules
and approaches

Topic outline

1. Complementary events rule


2. Addition rule of probability
3. Multiplication rule (Joint Probability)
4. Synopsis
5. Wrap up discussion questions
6. Next session’s assignment

Reading Assignment Discussion:

 List and describe the basic probability rules

Reading Text:

Probability of multiple events (Probability rules):

Complementary Events Rule:

If A is any event, then the complement of A, denoted by A or Ac, is the event that A
does not occur. For instance, if A is an event that refers to Head, the complement of A
will be Tail in a single toss. The sum of all complementary events probabilities amount
to 1:
P(A) + P(Ac) = 1.
P(Ac) = 1 - P(A).
Thus, if we know the probability of an event, we can find the probability of its
complementary event by subtracting the given probability from 1.0.

Example 6.1: In a given football match the chance of winning (W) is 25% and the
chance of tying (T) is 40%, what is the chance of losing (L)?
P(L)=1-P(W)+P(T)=1-(0.25+0.4)=1-0.65=0.35

Rule of Addition

A. For mutually exclusive events (events that do not occur at the same time or have
common point):
Probability of at least one, either, P(A or B)=P(AuB)= P(A)+P(B)

Example 6.2: Ethiopian Airlines tests female candidates for hiring hostesses, if the
chance of being underweight (U), overweight (O) and satisfactory(S) is 0.025, 0.075
and 0.9 respectively; what is the probability that a randomly selected female candidate
can be satisfactory or overweight?
P(SuO)=0.075+0.9=0.975

B. For non-mutually exclusive events (events that can occur at the same time or
have common point
P(AuB)=P(A or B)=P(A)+P(B)- P(A B); Note P(A B)=P(A and B)

Example 6.3: A student is sitting for an entrance exam on English and Mathematics;
the probability that he passes in English is 0.65 and the chance that he passes in math
is 50%. Also the chance that he passes both is 0.25;

Given: P(E)=0.65, P(M)=0.5, P(E M)=0.25

a. What is the probability that he passes in at least one exam?

P(EUM)=0.65+0.5-0.25=0.9

b. What is the probability that he passes in neither of the exams?


P(EUM)’ =1-0.9=0.1
Multiplication rule (Joint Probability):

A. For dependent events (occurrence of one is connected with (affects) the


occurrence of the other)

Probability of A and B, both, P(A B)=P(A)xP(B/A) or P(B)xP(A/B);

note: P(A/B) is conditional probability of A given that B has already occurred.

Example 6.4: There are 16 eggs in a container 6 of which are rotten (E), 2 eggs are
successively selected randomly without replacement. Find the (joint) probability that
both eggs being spoiled.

P(E1 and E2)P(E1 E2)=P(E1)xP(E2/E1)=6/16 x 5/15=0.125

B. For independent events (the occurrence of A is not connected in any way to the
occurrence of B)

P(A B)=P(A and B)=P(A)xP(B)

Example 6.5: Two coins are tossed. What is the probability that both will land tail (T)
up? Prove by listing.

P(T1 T2)= P(T1)x P(T2)=1/2x1/2=1/4

Exercise 6

1. In a survey of benefits for 350 corporate managers and government officials


showed that 200 of them are provided with mobile telephone (M), 180 are
provided with a car (C), and 150 are provided with both associated with their
position. Find the following probabilities.

a. Find P (M), P (C), and P (M C).


b. Compute the probability that a manager has at least one of the two perks.
c. What is the probability that a manager does not have either of these
benefits?
2. Unity business students club has 1000 members. 60% of these members are
male. 45% of all members are accounting students while the remaining are
marketing management students. 175 of those who are accounting students are
female. If a member is randomly selected what is the probability that:
a. The member is female.
b. The member is a female and accounting student.
c. The member is a male or a marketing management student.
d. The member is accounting student if we know that the member is
female.
3. A problem in statistics is given to 4 students A, B, C, D, and E. Their chances of
solving it are 1/2; 1/3; 1/4; 1/5 (assume independence)
a. What is the probability that the problem will be solved?
b. What is the probability that the problem will not be solved?
c. What is the probability that the problem will be solved by all?
4. Suppose that two events, A and B, with P(A) = 0.5, P(B) = 0.60, and P(A  B)
= 0.40.
a. Find P(A\B)
b. Find P(B\A)
c. Are A and B independent? Why or why not?

Synopsis
 Probability rules:
o Complementary events rule
 P(A) + P(Ac) = 1.
 P(Ac) = 1 - P(A).
o Addition rule
 (Non-mutually exclusive events) :P(AuB)=P(A or B)=P(A)+P(B)-
P(A B)
 (Mutually exclusive events): P(AuB)=P(A or B)= P(A)+P(B)
o Multiplication rule (Joint probability)
 (Dependent events): P(A B)=P(AandB)=P(A)xP(B/A) or
P(B)xP(A/B);
 (Independent events): P(A B)=P(A and B)=P(A)xP(B)

Wrap up Discussion Questions:

 List and describe the basic probability rules


 Compare and contrast independent and dependent events
 Distinguish between mutually exclusive and non-mutually exclusive events
 How do the probability rules vary for each of the events indicated in the
preceding two listed statements?

Next Session’s Assignment:


 Attempt exercise 6 #1-4
 Read about random variables and probability distribution

Topic 7: Probability Distribution: Random Variable and Probability


Distribution

Topic Learning Objectives:

By the end of this session students are expected to:


 Explain the concepts of random variables and probability distributions
 Distinguish between discrete and continuous random variables
 Develop a discrete probability distribution and its graph

Topic outline
1. Random variable and probability distribution definitions
2. Synopsis
3. Wrap up discussion questions
4. Next session’s assignment
Reading Assignment Discussion:
 What is a random variable?
 Define probability distribution
 What are the requirements for valid probability distribution?
Reading Text:

Random Variable: A variable (subject) measured, studied or observed in a random


experiment. Its outcome is uncertain & is determined by chance (randomly). It
associates numerical values with each possible outcome.

Depending on the nature of the numeric value of the outcomes, a random variable
could be Discrete or Continuous. The former assumes only certain clearly separated
values resulting from count. The later assume values that fall in a specific interval
infinitely and mostly results from measurement.

E.g. Discrete random variable: number of customers entering a bank (0, 1, 2…); Score
of a football game; A discount offered by a retail store that is always either 5% or 10%;
Continuous random variable: Distance between A.A. & Adama (100km or 100.15 or
100.155…),the time of commercial flight between Nairobi and Addis Ababa; and today’s
outside temperature .

Probability Distribution: A summary or listing of all values of the outcomes of a


random variable and corresponding probabilities. It could be discrete or continuous.

Discrete Probability Distribution

The probability distribution of a discrete random variable possesses the following two
characteristics:
1. The probability assigned to each value of a random variable x lies in the range of
0-1 inclusive. That is, 0 ≤ P(x) ≤ 1 for each value of x.
N.B.: if random variable is X, its probability is denoted by P(x)
2. The sum of the probabilities assigned to all values of outcomes of x is equal to 1.
That is, ΣP(x) =1.0. Remember, if the probabilities are rounded, the sum may
not be exactly 1.0.

It is possible that all numerical values for a discrete random variable can be listed in a
table with accompanying probabilities. There are several standard probability
distributions that can serve as models for a wide variety of discrete random variables
involved in business applications. Some standard discrete probability distributions
known by name are: the binomial probability distributions, hypergeometric probability
distributions, Poisson probability distributions, multinomial probability distributions,
geometric probability distributions and etc. The first three enlisted types of discrete
probability distributions will be discussed in subsequent sections.

On the other hand, for a continuous random variable, all possible fractional values of
the variable cannot be listed, and therefore the probabilities that are determined by a
mathematical function are portrayed graphically by a probability density function, or
probability curve. Several standard probability distributions that can serve as models for
continuous random variables are described in latter sections.

Example 7

Develop a discrete probability distribution for number of heads in three tosses of a fair
coin. (Note: H stands for head, and T stands for tail)
Results from 3 coin tosses (sample space outcomes):
T T T, T TH,THT, HT T, THH, HTH, HHT, HHH
X=The random variable i.e. number of heads obtained, it is a discrete random variable

0 1 2 3
X (# of Heads)
F(x) or Frequency 1 3 3 1 ∑f(x)=8
P(X) (0.5 chance) 1/8=0.125 3/8=0.375 3/8=0.375 1/8=0.375
Note: Each outcome (head and tail) is equally likely; P(H)=0.5 or 1/2 and P(T)=0.5 or
1/2 . For instance, probability of two heads (HTH)=3/8 i.e. calculated as follows:
1/2*1/2*1/2=1/8, and as 2 Heads occur 3 times; 1/8+1/8+1/8=3/8 or 3*1/8=3/8.

Exercise 7
1. Classify the following random variables as discrete or continuous.
a. The number of students in a class
b. The time taken to run a marathon
c. The number of cattle owned by a farmer
d. Amount of today’s rainfall
e. The amount of fuel in the tank
f. The height of soldiers
g. The age of a house
h. The number of new accounts opened at a bank during a month
i. The number of pages in a book that contain at least one error
j. The time spent by a physician examining a patient
2. Develop a discrete probability distribution for number of girls in four births that
are expected.
3. A consumer agency surveyed 2500 families living in the small town to collect
data on the number of television sets owned by them. The following table lists
the frequency distribution of the data collected by this agency.
Number of TV owned: 0 1 2 3 4
Number of families: 850 1000 400 200 50
a. Construct a probability distribution table for the number of TV sets owned
by these families. Draw the (vertical line) graph of the probability
distribution.
b. Determine the probability of the number of TV sets owned by families
are:
 Exactly 1
 1 to 3
 more than 2
 less than or equal to 1
 at most 2
 at least 2

Synopsis

 Random Variable is a variable (subject) measured, studied or observed in a random


experiment. Its outcome is determined by chance or the random process.
o Based on the numeric values it assumes a random variable can be discrete or
continuous
 Discrete values are specific and separate and result from count
 Continuous values are infinite number of values within a specific range
and result from measurement
 A probability distribution is the listing of values (outcomes) of random variables and
corresponding probabilities for a random experiment
o It can be discrete or continuous probability distribution
 There are known types of discrete and continuous probability distributions
 For a valid discrete probability distribution:
o 0≤P(x)≤1 and ∑P(x)=1 (given all mutually exclusive and exhaustive
outcomes)

Wrap up Discussion Questions:

 Compare and contrast discrete and continuous random variables


 What is a probability distribution?
 What are the requirements for a valid discrete probability distribution
 Compare and contrast discrete and continuous probability distributions

Next Session’s Assignment:

 Attempt exercise 7 #1, 2, and 3


 Read about the computing mean, variance and standard deviation for a discrete
probability distribution.

Topic 8: Discrete Probability Distribution: Mean, Variance & Standard


Deviation

Topic Learning Objectives:

By the end of this session students are expected to:

 Compute the expected value (mean), variance, and standard deviation of a


discrete probability distribution

Topic outline

1. Computing expected mean for probability distribution


2. Computing variance and standard deviations for probability distribution
3. Synopsis
4. Wrap up discussion questions
5. Next session’s assignment

Reading Assignment Discussion:

 How do you calculate the expected mean, variance and standard deviation for
a probability distribution?

Reading Text:

As we can compute the mean, variance and standard deviation for a frequency
distribution; likewise, we can do so for probability distribution. The formula for finding
the mean (µ) or expected value E(x), variance (σ2) and standard deviation (σ) is given
below respectively:

; ;
Also variance can be calculated as:

Example 8

Compute the mean (expected value) for the probability distribution developed
previously in Example 7.

Table 8.1 Computing Mean, Variance, and Standard deviation for Example 8

Xi (# of Heads) 0 1 2 3 Total
Xi2 0 1 4 9
P(Xi) 1/8=0.125 3/8=0.375 3/8=0.375 1/8=0.125
∑Xi*P(Xi)
Xi*P(Xi) 0 3/8=0.375 6/8=0.75 3/8 =0.375
µ=1.5
Xi-µ (1.5) -1.5 -0.5 0.5 1.5
[Xi-µ]2 2.25 0.25 0.25 2.25 5
[Xi-µ]2.P(Xi) 0.28 0.094 0.094 0.28 ∑Xi-µ)2*P(Xi)
σ2=0.75
X2.P(X) 0 3/8=0.375 1.5 1.125 ∑Xi2*P(Xi)=3

From the table:


 the expected value (mean) of heads in three tosses is 1.5
 The variance (σ2)=0.75 heads; The standard deviation (σ)= =0.87 heads
 Another formula to calculate the variance is: σ2 =∑X2.P(Xi)-µ2 =3-2.25=0.75
Exercise 8
1. Mr. J has established the following probability distribution for the number of cars
he expects to sell on a particular Saturday.
No. of cars sold (x) 0 1 2 3 4 Total
Probability, P(x) 0.1 0.2 0.3 0.3 0.1 1.0
a. What type of distribution is this?
b. On a typical Saturday how many cars should J expects to sell
c. What is the variance and standard deviation of the distribution
2. The information below is the number of daily emergency service calls made by
the volunteer ambulance service of Red Cross, Hawassa, for the last 50 days. To
explain, there were 22 days on which there were two emergency calls, and 9
days on which there were three emergency calls.

Number of Calls 0 1 2 3 4
Frequency 8 10 22 9 1 =50

a. Convert this information on the number of calls to a probability distribution.


b. Is this an example of a discrete or continuous probability distribution?
c. What is the mean number of emergency calls per day?
d. What is the standard deviation of the number of calls made daily?
3. Compute the mean, variance and standard deviation for number of TV sets
owned by households (refer to Exercise 7 #3)

Synopsis

 For a discrete probability distribution:


o Mean or Expected value= ;

o Variance (x)= ;

o Standard deviation (x)=

Wrap up Discussion Questions:

 How is the expected value, variance and standard deviation calculated for a
discrete probability distribution?

Next Session’s Assignment:


 Attempt exercise 8, #1-3
 Read about the binomial probability distribution.

You might also like