Chapter One: 1. Basic Concepts, Methods of Data Collection and Presentation
Chapter One: 1. Basic Concepts, Methods of Data Collection and Presentation
Introduction:-
Statistics is a very broad subject, with applications in a vast number of different fields. In general
one can say that statistics is the methodology for collecting, analyzing, interpreting and drawing
conclusions from information. Putting it in other words, statistics is the methodology which
scientists and mathematicians have developed for interpreting and drawing conclusions from
collected data. Everything that deals even remotely with the collection, processing, interpretation
and presentation of data belongs to the domain of statistics, and so does the detailed planning of
that precedes all these activities.
Objectives of the chapter
After studying this chapter, you should be able to:-
1
Classifications:-
Depending on how data can be used, statistics can be classified in to two broad classes.
1. Descriptive Statistics:
- This part of statistics deals only with describing some characteristics of the data collected
without going beyond the data. In other words, it deals with only describing the sample
data without going any further: that is without attempting to infer (conclude) anything
about the population.
- Descriptive statistics deals with collection of data, its presentation in various forms, such
as tables, graphs and diagrams and finding averages and other measures which would
describe the data.
- Descriptive statistics refers only to the actual data. That is the data at hand.ind of
statistics which is used to describe the features of the data that gathered by the researcher.
Examples:
Classification of computers based on their generation.
Average score of students in a given semester
2. Inferential Statistics:
- This type of statistics is concerned with drawing statistically valid conclusions about the
characteristics of the population (large group) based on information obtained from a
sample (small group). That is, this part of statistics is concerned with the generalizing the
results of a sample or small groups using probabilities, performing hypothesis testing,
determining relationships between variables, and making predictions.
Examples:
Out of 50 computer science students10 students are randomly selected they had the last
name Abebe.
About 20% of all people living in Ethiopia have the last name Abebe.
1.2 Stages in Statistical Investigation
There are five stages or steps in any statistical investigation.
1. Collection of data: is the first stage of statistical investigation. The data should be collected
with a specific and well defined purpose so that the conclusions drawn are not to be misleading.
2
Two methods of data collection …primary and secondary….
Primary method of data collection refers to obtaining original and first hand data and secondary
method of data collection involves obtaining data from other sources.
2. Organization of data: this is a methodology for classification and describing the properties of
data in summary form. Editing, coding and classification are the three steps in organization of
data.
3. Presentation of data: The process of re-organization, classification, compilation, and
summarization of data to present it in a meaningful form.
4. Analysis of data: The process of extracting relevant information from the summarized data,
mainly through the use of elementary mathematical operation.
5. Interpretation of data: The interpretation and further observation of the various statistical
measures through the analysis of the data by implementing those methods by which conclusions
are formed and inferences made.
1.3 Definitions of some statistical terms
a. Statistical Population: It is the collection of all possible observations of a specified
characteristic of interest (possessing certain common property) and being under study.
An example is all of the students in Adigrat University.
b. Sample: It is a subset of the population, selected using some sampling technique in such
a way that they represent the population.
c. Sampling: The process or method of sample selection from the population.
d. Sample size: The number of elements or observation to be included in the sample.
e. Census: Complete enumeration or observation of the elements of the population. Or it is
the collection of data from every element in a population
f. Data:- Data as a collection of related facts and figures from which conclusions may be
drawn.
g. Parameter: Characteristic or measure obtained from a population.
h. Statistic: Characteristic or measure obtained from a sample.
i. Variable: It is an item of interest that can take on many different numerical values.
3
1.4 Applications, Uses and Limitations of statistics
Applications of statistics:
In almost all fields of human endeavor.
Almost all human beings in their daily life are subjected to obtaining numerical facts e.g.
abut price.
Applicable in some process e.g. invention of certain drugs, extent of environmental
pollution.
In industries especially in quality control area.
Uses of statistics:
The main function of statistics is to enlarge our knowledge of complex phenomena. The
following are some uses of statistics:
1. It presents facts in a definite and precise form.
2. Data reduction.
3. Measuring the magnitude of variations in data.
4. Furnishes a technique of comparison
5. Estimating unknown population characteristics.
6. Testing and formulating of hypothesis.
7. Studying the relationship between two or more variable.
8. Forecasting future events.
Limitations of statistics
- As a science statistics has its own limitations. The following are some of the limitations:
Deals with only quantitative information.
Deals with only aggregate of facts and not with individual data items.
Statistical data are only approximately and not mathematical correct.
Statistics can be easily misused and therefore should be used be experts.
1.5 Types of Variables
1. Qualitative Variables are nonnumeric variables and can't be measured. Examples include
gender, religious affiliation, and state of birth.
4
2. Quantitative Variables are numerical variables and can be measured. Examples include
balance in checking account, number of children in family. Note that quantitative variables are
either discrete (which can assume only certain values, and there are usually "gaps" between the
values, such as the number of bedrooms in your house) or continuous (which can assume any
value within a specific range, such as the air pressure in a tire.)
SCALE TYPES:-
Normally, when one hears the term measurement, they may think in terms of measuring the length
of something (i.e. the length of a piece of wood) or measuring a quantity of something (i.e. a cup
of flour). This represents a limited use of the term measurement. In statistics, the term
measurement is used more broadly and is more appropriately termed scales of measurement.
Scales of measurement refer to ways in which variables or numbers are defined and categorized.
Each scale of measurement has certain properties which in turn determine the appropriateness for
use of certain statistical analyses. The four scales of measurement are nominal, ordinal, interval,
and ratio.
The various measurement scales results from the facts that measurement may be carried out under
different sets of rule.
Nominal Scale:-Consists of ‘naming’ observations or classifying them into various mutually
exclusive categories. Sometimes the variable under study is classified by some quality it possesses
rather than by an amount or quantity. In such cases, the variable is called attribute.
Example
o Religion: Christianity, Islam, Hinduism, etc.
o Sex: Male, Female
o Eye color: brown, black, etc.
o Blood type: A, B, AB and O.
Ordinal Scale: - Whenever observations are not only different from category to category, but can
be ranked according to some criterion. The variables deal with their relative difference rather than
with quantitative differences.
Ordinal data are data which can have meaningful inequalities. The inequality signs < or > may
assume any meaning like ‘stronger, softer, weaker, better than’, etc.
Example
5
Patients may be characterized as unimproved, improved & much improved.
letter grading system, authority, career, etc
Individuals may be classified according to socio-economic as low, medium & high.
Interval Scale: With this scale it is not only possible to order measurements, but also the
distance between any two measurements is known but not meaningful quotients. There is no true
zero point but arbitrary zero point. Interval data are the types of information in which an increase
from one level to the next always reflects the same increase. Possible to add or subtract interval
data but they may not be multiplied or divided.
Example: Temperature of zero degrees does not indicate lack of heat. The two common
temperature scales; Celsius (C) and Fahrenheit (F). We can see that the same difference exists
between 10oC (50oF) and 20oC (68OF) as between 25oc (77oF) and 35oc (95oF) i.e. the
measurement scale is composed of equal-sized interval. But we cannot say that a temperature of
20oc is twice as hot as a temperature of 10oc. because the zero point is arbitrary.
Ratio Scale: - Characterized by the fact that equality of ratios as well as equality of intervals may
be determined. Fundamental to ratio scales is a true zero point. Typical examples of ratio scales
are measures of time or space. For example, as the Kelvin temperature scale is a ratio scale, not
only can we say that a temperature of 200 degrees is higher than one of 100 degrees; we can
correctly state that it is twice as high. Interval scales do not have the ratio property. Most
statistical data analysis procedures do not distinguish between the interval and ratio properties of
the measurement scales.
Example: Variables such as age, height, length, volume, rate, time, amount of rainfall, etc. are
require ratio scale.
6
available.
Method of primary data collection
In primary data collection, you collect the data yourself using methods such as interviews,
observations, laboratory experiments and questionnaires. The key point here is that the data you
collect is unique to you and your research and, until you publish, no one else has access to it.
There are many methods of collecting primary data and the main methods include:
Questionnaire: It is a popular means of collecting data, but is difficult to design and often require
many rewrites before an acceptable questionnaire is produced.
Advantages:
Can be used as a method in its own right or as a basis for interviewing or a telephone
survey.
Can be posted, e-mailed or faxed.
Can cover a large number of people or organizations.
Wide geographic coverage.
Relatively cheap.
No prior arrangements are needed.
Avoids embarrassment on the part of the respondent.
Respondent can consider responses.
Possible anonymity of respondent.
No interviewer bias.
Disadvantages:
Historically low response rate (although inducements may help).
Time delay whilst waiting for responses to be returned
Require a return deadline.
Several reminders may be required.
Assumes no literacy problems.
No control over who completes it.
Not possible to give assistance if required.
7
Replies not spontaneous and independent of each other.
Respondent can read all questions beforehand and then decide whether to complete or not.
For example, perhaps because it is too long, too complex, uninteresting, or too personal.
8
Observation: It involves recording the behavioral patterns of people, objects and events in a
systematic manner.
Diaries: A diary is a way of gathering information about the way individuals spend their time on
professional activities. They are not about records of engagements or personal journals of
thought! Diaries can record either quantitative or qualitative data, and in management research
can provide information about work patterns and activities.
Laboratory experiment: Conducting laboratory experiments on fields of chemical, biological
sciences and so on.
9
B) METHODS OF DATA PRESNTATION
Having collected and edited the data, the next important step is to organize it. That is to present it
in a readily comprehensible condensed form that aids in order to draw inferences from it. It is
also necessary that the like be separated from the unlike ones.
The presentation of data is broadly classified in to the following three categories:
Tabular presentation
Diagrammatic and
Graphic presentation.
The process of arranging data in to classes or categories according to similarities technically is
called classification.
Classification is a preliminary and it prepares the ground for proper presentation of data.
Definitions:
Raw data: recorded information in its original collected form, whether it may be counts
or measurements, is referred to as raw data.
Frequency: is the number of values in a specific class of the distribution.
Frequency distribution: is the organization of raw data in table form using classes and
frequencies.
- There are three basic types of frequency distributions
Categorical frequency distribution
Ungrouped frequency distribution
Grouped frequency distribution
There are specific procedures for constructing each type.
1) Categorical frequency Distribution:
Used for data that can be place in specific categories such as nominal, or ordinal. E.g. marital
status.
Example: a social worker collected the following data on marital status for 25
persons.(M=married, S=single, W=widowed, D=divorced)
10
M S D W D
S S M M M
W D S M M
W D D S S
S W W D D
Solution:
Since the data are categorical, discrete classes can be used. There are four types of marital status
M, S, D, and W. These types will be used as class for the distribution. We follow procedure to
construct the frequency distribution.
Step 1: Make a table as shown.
Class Tally Frequency Percent
Step 2: Tally the data and place the result in column (2).
Step 3: Count the tally and place the result in column (3).
Step 4: Find the percentages of values in each class by using;
%=f/n *100 Where f= frequency of the class, n=total number of value.
Percentages are not normally a part of frequency distribution but they can be added since
they are used in certain types diagrammatic such as pie charts.
Step 5: Find the total for column (3) and (4).
Combing the entire steps one can construct the following frequency distribution.
11
Class Tally Frequency Percent
(1) (2) (3) (4)
M //// / 6 24
S //// // 7 28
D //// // 7 28
W //// 5 20
Solution:
Step 1: Find the range, Range=Max-Min=90-60=30.
Step 2: Make a table as shown
Step 3: Tally the data.
Step 4: Compute the frequency.
12
Mark Tally Frequency
60 // 2
62 / 1
63 / 1
65 / 1
70 //// 4
74 / 1
75 // 2
76 / 1
80 /// 3
85 /// 3
90 / 1
Each individual value is presented separately, that is why it is named ungrouped frequency
distribution.
3) Grouped frequency Distribution:
- When the range of the data is large, the data must be grouped in to classes that are more
than one unit in width.
Definitions:
- Grouped Frequency Distribution: a frequency distribution when several numbers are
grouped in one class.
- Class limits: Separates one class in a grouped frequency distribution from another. The
limits could actually appear in the data and have gaps between the upper limits of one
class and lower limit of the next.
- Units of measurement (U): the distance between two possible consecutive measures. It
is usually taken as 1, 0.1, 0.01, 0.001, -----.
- Class boundaries: Separates one class in a grouped frequency distribution from another.
The boundaries have one more decimal places than the row data and therefore do not
appear in the data. There is no gap between the upper boundary of one class and lower
boundary of the next class. The lower class boundary is found by subtracting U/2 from
the corresponding lower class limit and the upper class boundary is found by adding U/2
13
to the corresponding upper class limit.
- Class width: the difference between the upper and lower class boundaries of any class. It
is also the difference between the lower limits of any two consecutive classes or the
difference between any two consecutive class marks
- Class mark (Mid points): it is the average of the lower and upper class limits or the
average of upper and lower class boundary.
- Cumulative frequency: is the number of observations less than/more than or equal to a
specific value.
- Cumulative frequency above: it is the total frequency of all values greater than or equal
to the lower class boundary of a given class.
- Cumulative frequency blow: it is the total frequency of all values less than or equal to
the upper class boundary of a given class.
- Cumulative Frequency Distribution (CFD): it is the tabular arrangement of class
interval together with their corresponding cumulative frequencies. It can be more than or
less than type, depending on the type of cumulative frequency used.
- Relative frequency (rf): it is the frequency divided by the total frequency.
- Relative cumulative frequency (rcf): it is the cumulative frequency divided by the total
frequency.
-
Guidelines for classes:-
1. There should be between 5 and 20 classes.
2. The classes must be mutually exclusive. This means that no data value can fall into two
different classes
3. The classes must be all inclusive or exhaustive. This means that all data values must be
included.
4. The classes must be continuous. There are no gaps in a frequency distribution.
5. The classes must be equal in width. The exception here is the first or last class. It is
possible to have a "below ..." or "... and above" class. This is often used with ages.
14
1. Find the largest and smallest values
2. Compute the Range(R) = Maximum – Minimum
3. Select the number of classes desired, usually between 5 and 20 or use Sturge’s rule
log (nwhere k is number of classes desired and n is total number of
observation.
4. Find the class width by dividing the range by the number of classes and rounding up, not
off. wR/K
5. Pick a suitable starting point less than or equal to the minimum value. The starting point
is called the lower limit of the first class. Continue to add the class width to this lower
limit to get the rest of the lower limits.
6. To find the upper limit of the first class, subtract U from the lower limit of the second
class. Then continue to add the class width to this upper limit to find the rest of the upper
limits.
7. Find the boundaries by subtracting U/2 units from the lower limits and adding U/2 units
from the upper limits. The boundaries are also half- way between the upper limit of one
class and the lower limit of the next class. !may not be necessary to find the boundaries.
8. Tally the data.
9. Find the frequencies.
10. Find the cumulative frequencies. Depending on what you're trying to accomplish, it may
not be necessary to find the cumulative frequencies.
11. If necessary, find the relative frequencies and/or relative cumulative frequencies
Example:(***)
Construct a frequency distribution for the following data.
11 29 6 33 14 31 22 27 19 20
18 17 22 38 23 21 26 34 39 27
Solutions:
Step 1: Find the highest and the lowest value H=39, L=6
Step 2: Find the range; R=H-L=39-6=33
Step 3: Select the number of classes’ desired using Sturges formula;
15
Step 4: Find the class width; w=R/k=33/6=5.5=6 (rounding up)
Step 5: Select the starting point, let it be the minimum observation.
6, 12, 18, 24, 30, 36 are the lower class limits.
Step 6: Find the upper class limit; e.g. the first upper class=12-U=12-1=11
11, 17, 23, 29, 35, 41 are the upper class limits.
So combining step 5 and step 6, one can construct the following classes.
Class limits
6 – 11
12 – 17
18 – 23
24 – 29
30 – 35
36 – 41
Step 7: Find the class boundaries;
E.g. for class 1 Lower class boundary=6-U/2=5.5
Upper class boundary =11+U/2=11.5
Then continue adding w on both boundaries to obtain the rest boundaries. By doing so
one can obtain the following classes.
Class boundary
5.5 – 11.5
11.5 – 17.5
17.5 – 23.5
23.5 – 29.5
29.5 – 35.5
35.5 – 41.5
Step 8: tally the data.
Step 9: Write the numeric values for the tallies in the frequency column.
Step 10: Find cumulative frequency.
Step 11: Find relative frequency or/and relative cumulative frequency.
16
Class Class Class Tally Freq. Cf (less Cf (more rf. rcf (less
limit boundary Mark than than than type
type) type)
6 – 11 5.5 – 11.5 8.5 // 2 2 20 0.10 0.10
12 – 17 11.5 – 17.5 14.5 // 2 4 18 0.10 0.20
18 – 23 17.5 – 23.5 20.5 ////// 7 11 16 0.35 0.55
24 – 29 23.5 – 29.5 26.5 //// 4 15 9 0.20 0.75
30 – 35 29.5 – 35.5 32.5 /// 3 18 5 0.15 0.90
36 – 41 35.5 – 41.5 38.5 // 2 20 2 0.10 1.00
Diagrammatic and graphic presentation of data.
These are techniques for presenting data in visual displays using geometric and pictures.
Importance:
They have greater attraction.
They facilitate comparison.
They are easily understandable.
-Diagrams are appropriate for presenting discrete data.
-The three most commonly used diagrammatic presentation for discrete as well as qualitative
data are:
Pie charts
Pictogram
Bar charts
Pie chart
- A pie chart is a circle that is divided in two sections or wedges according to the percentage of
frequencies in each category of the distribution. The angle of the sector is obtained using:
= *360
Example: The following table gives the details of monthly budget of a family. Represent these
figures by a suitable diagram.
17
Items of Family
expenditure budget
Food $600
Clothing $100
House rent $400
Fuel and lighting $100
Miscellaneous $300
Total $1500
Solutions:
Step 1: Find the percentage.
Step 2: Find the number of degrees for each class.
Step 3: Using a protractor and compass, graph each section and write its name corresponding
percentage.
Items of Family Angle of percentage
expenditure budget sector
Food $600 144o 40%
Clothing $100 24o 6.67%
House rent $400 96o 26.67%
Fuel and lighting $100 24o 6.67%
miscellaneous $300 72o 20%
Total $1500 360o 100%
18
Pictogram
In this diagram, we represent data by means of some picture symbols. We decide about a suitable
picture to represent a definite number of units in which the variable is measured.
Example: draw a pictogram to represent the following population of a town.
Bar Charts:
-A set of bars (thick lines or narrow rectangles) representing some magnitude over time space.
-They are useful for comparing aggregate over time space.
-Bars can be drawn either vertically or horizontally.
-There are different types of bar charts. The most common being :
Simple bar chart
Deviation or two way bar chart
Broken bar chart
Component or sub divided bar chart.
Multiple bar charts.
Simple Bar Chart
- Are used to display data on one variable.
- They are thick lines (narrow rectangles) having the same breadth. The magnitude of a
19
quantity is represented by the height /length of the bar.
- Example:- Draw simple bar diagram to represent the profits of a bank for 5 years.
years 1989 1990 1991 1992 1993
Profit 10 12 18 25 42
(million $)
-When there is a desire to show how a total (or aggregate) is divided in to its component parts,
we use component bar chart.
-The bars represent total value of a variable with each total broken in to its component parts and
different colures or designs are used for identifications.
Example: The table below shows the quantity in hundred kgs of Wheat, Barley and Oats produced
on a certain form during the years 1991 to 1994. Draw stratified bar chart.
20
Solution: To make the component bar chart, first of all we have to take year wise
total production.
The required diagram is given below:
Multiple Bars
When two or more interrelated series of data are depicted by a bar diagram, then
such a diagram is known as a multiple-bar diagram. Suppose we have export and
import figures for a few years.
We can display by two bars close to each other, one representing exports while the
other representing imports figure shows such a diagram based on hypothetical
data.
Multiple Bars
It should be noted that multiple bar diagrams are particularly suitable where some comparison is
involved.
21
Graphical Presentation of data
- The histogram, frequency polygon and cumulative frequency graph or ogives are most
commonly applied graphical representation for continuous data.
Procedures for constructing statistical graphs:
Draw and label the X and Y axes.
Choose a suitable scale for the frequencies or cumulative frequencies and label it on the
Y axes.
Represent the class boundaries for the histogram or ogive or the mid points for the
frequency polygon on the X axes.
Plot the points.
Draw the bars or lines to connect the points
Histogram
The graph which displays the data by using vertical bars of height to represent frequencies.
Class boundaries are placed along the horizontal axes. Class marks and class limits are
sometimes used as quantity on the X axes.
Example: Construct a histogram to represent the previous data which is stated in (example ***).
Frequency Polygon:
- A line graph. The frequency is placed along the vertical axis and classes mid points are placed
along the horizontal axis. It is customer to the next higher and lower class interval with
corresponding frequency of zero, this is to make it a complete polygon.
Example: Draw a frequency polygon for the above data (example* * *).
Ogive (cumulative frequency polygon)
- A graph showing the cumulative frequency (less than or more than type) plotted against upper
or lower class boundaries respectively. That is class boundaries are plotted along the horizontal
axis and the corresponding cumulative frequencies are plotted along the vertical axis. The points
are joined by a free hand curve.
Example: Draw an ogive curve(less than type) for the above data in example ***.
22
1.7. Review Exercises
2. Classify the following statements as Descriptive and Inferential Statistics
a. The average age of the students in this class is 21 years.
b. At least 5% of the killings reported last year in city X were due to tourists.
c. Of the students enrolled in Adigrat University in this year 74% are male and 26% are
female.
d. The chance of winning the Ethiopian National Lottery in any day is 1 out of 167000.
3. Classify each of the following as Qualitative and Quantitative and if it is quantitative
classify as Discrete and Continuous.
a. Color of automobiles in a dealer’s show room.
b. Number of seats in a movie theater.
c. Classification of patients based on nursing care needed (complete, partial or seafarer)
d. Number of tomatoes on each plant on a field.
e. Weight of newly born babies.
4. Mark of 50 students out of 40
16 21 26 24 11 17 25 26 13 27 24 26 3 27 23 24 15 22 22 12 22 29 18 22
28 25 7 17 22 28 19 23 23 22 3 19 13 31 23 28 24 9 20 33 30 23 20 8 21
24
a) Construct grouped frequency distribution.
b) Construct histogram for the above data.
c) Construct frequency polygon and ogives.
23
CHAPTER 2
MEASURES OF CENTERAL TENDENCY
Introduction
- When we want to make comparison between groups of numbers it is good to have a
single value that is considered to be a good representative of each group. This single
value is called the average of the group. Averages are also called measures of central
tendency.
- An average which is representative is called typical average and an average which is not
representative and has only a theoretical value is called a descriptive average.
A typical average should posses the following:
It should be rigidly defined.
It should be based on all observation under investigation.
It should be as little as affected by extreme observations.
It should be capable of further algebraic treatment.
It should be as little as affected by fluctuations of sampling.
It should be ease to calculate and simple to understand.
Objectives of measures of central tendency:
To comprehend the data easily.
To facilitate comparison.
To make further statistical analysis.
24
The Summation Notation:
- Let X1, X2 ,X3 …XN be a number of measurements where N is the total number of
observation and Xi is ith observation.
- Very often in statistics an algebraic expression of the form X1+X2+X3+...+XN is used in
a formula to compute a statistic. It is tedious to write an expression like this very often,
so mathematicians have developed a shorthand notation to represent a sum of scores,
called the summation notation.
- The symbol ∑ xi is a mathematical shorthand for X1+X2+X3+...+XN
The expression is read, "the sum of X sub i from i equals 1 to N." It means "add up all the
numbers."
Example: Suppose the following were scores made on the first homework assignment for five
students in the class: 5, 7, 7, 6, and 8. In this example set of five numbers, where N=5, the
summation could be written:
The "i=1" in the bottom of the summation notation tells where to begin the sequence of
summation. If the expression were written with "i=3", the summation would start with the third
number in the set. For example:
In the example set of numbers, this would give the following result:
The "N" in the upper part of the summation notation tells where to end the sequence of
summation. If there were only three scores then the summation and example would be:
25
Sometimes if the summation notation is used in an expression and the expression must be written
a number of times, as in a proof, then a shorthand notation for the shorthand notation is
employed. When the summation sign "" is used without additional notation, then "i=1" and "N"
are assumed.
For example:
PROPERTIES OF SUMMATION
X Y
5 6
7 7
7 8
6 7
8 8
a) ∑ x
b) ∑ y
c) ∑ 10
d) ∑ (xi + yi)
e) ∑ (xi − yi)
26
f) ∑ xiyi
g) ∑ xi2
h) (∑ xi)(∑ yi)
Solutions:
a) ∑ x=
b) ∑ y=
c) ∑ 10
e) ∑ (xi − yi)
f) ∑ xiyi
g) ∑ xi2 =
27
A. The Arithmetic Mean
Is defined as the sum of the magnitude of the items divided by the number of items.
The mean of X1, X2 ,X3 …Xn is denoted by A.M ,m or X and is given by:
⋯
x=
∑
x=
If x1 occurs f1 times
If x2 occurs f2 times
.
.
if xn occurs fn times
∑
Then the mean will be x ∑ where k is number of classes and ∑ i=n
Solution:
Xi fi Xifi
2 2 4
3 1 3
7 3 21
8 1 8
Total 7 36
∑
x ∑
28
Example: calculate the mean for the following age distribution.
Class fi Xi Xifi
6- 10 35 8 280
11- 15 23 13 299
16- 20 15 18 270
21- 25 12 23 276
26- 30 9 28 252
31- 35 6 33 198
Total 100 1575
Solutions:
1. First find the class marks
2. Find the product of frequency and class marks
3. Find mean using the formula.
Class Frequency
6- 10 35
11- 15 23
16- 20 15
21- 25 12
26- 30 9
31- 35 6
∑6 1575
x ∑i 6 1 xi i 100
i 1 i
- If the values in a series or mid values of a class are large enough, coding of values is a
good device to simplify the calculations.
- For raw data suppose we have used the following coding system.
d i X i A
X i d i A
29
∑ ∑ ( )
x= =
∑
x=A+
x =A+ ̅
Where A is an assumed mean and d is the mean of the coded data.
- If the data are expressed in terms of ungrouped frequency distribution
d i X i A
X i d i A
∑ ( )
x
∑
x
xd
- In both cases the true mean is the assumed mean plus the average of the deviations from
the assumed mean.
- Suppose the data is given in the shape of continuous frequency distribution with a
constant class size of w then the following coding is appropriate.
di =
xi = wdi + A
∑ ∑ ( )
x= =
∑
x=A+
x = A + wd
30
Example:
1. Suppose the deviations of the observations from an assumed mean of 7 are:
1, -1, -2, -2, 0, -3, -2, 2, 0, -3.
a) Find the true mean
b) Find the original observation.
Solutions:
A) 7 , ∑ di = -10
a) d = -10/10 = -1
x = A + d = 7-1 = 6
The true mean is 6.
b) Using Xi=A+di we obtain the following original observations:
8, 6, 5, 5, 7, 4, 5, 9, 7, 4.
Solutions:
31
Females Males
x1 =60 X 2 =72
n1=30 n2 =70
∗ ∑ ∗
X c ∑
( ∗ ) ( ∗ )
X c = = = 68.40
4. If a wrong figure has been used when calculating the mean the correct mean can be obtained
without repeating the whole process using:
( )
Correct Mean Wrong Mean
32
b) If each of the numbers in the set are multiplied by -5, then what will be the mean of the
new set?
Solutions:
a).New Mean Old Mean 10 500 10 510
b).New Mean5 * Old Mean5 * 5002500
Weighted mean
- When a proper importance is desired to be given to different data a weighted mean is
appropriate.
- Weights are assigned to each item in proportion to its relative importance.
- Let X1, X2, …Xn be the value of items of a series and W1, W2, …Wn ; their
corresponding weights , then the weighted mean denoted X w is defined as:
∑ ∗
x̅ = ∑
Example:
A student obtained the following percentage in an examination: English 60, Biology 75,
Mathematics 63, Physics 59, and chemistry 55.Find the students weighted arithmetic mean if
weights 1, 2, 1, 3, 3 respectively are allotted to the subjects.
Solutions:
∑ ∗ ( ∗ ) ( ∗ ) ( ∗ ) ( ∗ ) ( ∗ )
x̅ = ∑
= = = 61.5
33
It cannot be used when dealing with qualitative characteristics, such as intelligence,
honesty, beauty.
It can be a number which does not exist in a serious.
Sometimes it leads to wrong conclusion if the details of the data from which it is obtained
are not available.
It gives high weight to high extreme values and less weight to low extreme values.
The Geometric Mean
- The geometric mean of a set of n observation is the nth root of their product.
- The geometric mean of X1, X2 ,X3 …Xn is denoted by G.M and given by:
G.M= √x1 ∗ x2 ∗ … ∗ xn
- Taking the logarithms of both sides
= ∗∑ log xi
- The logarithm of the G.M of a set of observation is the arithmetic mean of their
logarithm
G.M = anti log ( ∗ ∑ log xi)
If observations X1, X2, …, Xn have weights W1, W2, …Wn respectively, then their harmonic
34
mean is given by
∑
H.M = , This is called Weighted Harmonic Mean.
∑
Remark: The Harmonic Mean is useful and appropriate in finding average speeds and average
rates.
Example: A cyclist pedals from his house to his college at speed of 10 km/hr and back from the
college to his house at 15 km/hr. Find the average speed.
Solution: Here the distance is constant
The simple H.M is appropriate for this problem.
X1= 10 km/hr X2= 15 km/hr
H.M = = = 12 km/hr
∑
B. Mode
- Mode is a value which occurs most frequently in a set of values
- The mode may not exist and even if it does exist, it may not be unique.
- In case of discrete distribution the value having the maximum frequency is the model value.
Examples:
1. Find the mode of 5, 3, 5, 8, 9
Mode =5
2. Find the mode of 8, 9, 9, 7, 8, 2, and 5.
It is a bimodal Data: 8 and 9
3. Find the mode of 4, 12, 3, 6, and 7.
No mode for this data.
- The mode of a set of numbers X1, X2, …Xn is usually denoted by X.
Mode for Grouped data
If data are given in the shape of continuous frequency distribution, the mode is defined as:
∆
X = L0 + w(∆ )
∆
35
Where:
Xthe mod e of the distribution
w the size of the mod al class
1 f mo f1
2 f mo f 2
f mo frequency of the mod al class
f1 frequency of the class preceding the mod al class
f 2 frequency of the class following the mod al class
Note: The modal class is a class with the highest frequency.
Example: Following is the distribution of the size of certain farms selected at random from a
district. Calculate the mode of the distribution.
Solutions:
45 55 is the mod al class, since it is a class with the highest frequency.
L mo 45
w 10
1 f mo f1 2
2 f mo f 2 26
f mo 31
f1 29
f 2 5
∆
X = L0 + w(∆ ) = 45 + 10( ) = 45.71
∆
36
Merits and Demerits of Mode
Merits:
It is not affected by extreme observations.
Easy to calculate and simple to understand.
It can be calculated for distribution with open end class
Demerits:
It is not rigidly defined.
It is not based on all observations
It is not suitable for further mathematical treatment.
It is not stable average, i.e. it is affected by fluctuations of sampling to some extent.
Often its value is not unique.
Note: being the point of maximum density, mode is especially useful in finding the most popular
size in studies relating to marketing, trade, business, and industry. It is the appropriate average to
be used to find the ideal size.
C. The Median
- In a distribution, median is the value of the variable which divides it in to two equal
halves.
- In an ordered series of data median is an observation lying exactly in the middle of the
series. It is the middle most value in the sense that the number of values less than the
median is equal to the number of values greater than it.
- If X1, X2, …Xn be the observations, then the numbers arranged in ascending order will
be X[1], X[2], …X[n], where X[i] is ith smallest value.
X[1]< X[2]< …<X[n]
- Median is denoted by x and read as x tile.
Median for ungrouped data
x= , if n is odd
{ + } , if n is even.
37
Example: Find the median of the following numbers.
a. 6, 5, 2, 8, 9, 4.
b. 2, 1, 8, 3, 5.
Solutions:
a) First order the data: 2, 4, 5, 6, 8, 9
Here n=6
x = { + }
Where:
Lmed lower class boundary of the median class.
w the size of the median class
n total number of observations.
c the cumulative frequency (less than type) preceding the median class.
f med the frequency of the median class.
Remark:
The median class is the class with the smallest cumulative frequency (less than type) greater than
or equal to .
38
Example: Find the median of the following distribution.
class frequency
40-44 7
45-49 10
50-54 22
55-59 15
60-64 12
65-69 6
70-74 3
Solutions:
1. First find the less than cumulative frequency.
2. Identify the median class.
3. Find median using formula.
= = 37.5
39
x = Lmed + { − c}
= 49.5 + { − 17}
= 54.16
Merits and Demerits of Median
Merits:
Median is a positional average and hence not influenced by extreme observations.
Can be calculated in the case of open end intervals.
Median can be located even if the data are incomplete.
Demerits:
It is not a good representative of data if the number of items is small.
It is not amenable to further algebraic treatment.
It is susceptible to sampling fluctuations.
D. Quantiles
When a distribution is arranged in order of magnitude of items, the median is the value of the
middle term. Their measures that depend up on their positions in distribution quartiles, deciles,
and percentiles are collectively called quantiles.
j. Quartiles:
- Quartiles are measures that divide the frequency distribution in to four equal parts.
- The value of the variables corresponding to these divisions are denoted Q1, Q2, and Q3
often called the first, the second and the third quartile respectively.
- Q1 is a value which has 25% items which are less than or equal to it.
- Similarly Q2 has 50%items with value less than or equal to it and Q3 has 75% items
whose values are less than or equal to it.
- To find Qi (i=1, 2, 3) we count of the classes beginning from the lowest class.
40
Where,
LQi = lower class boundary for quartile class.
wthe size of the quartile class
N total number of observations.
c the cumulative frequency(less than type) preceding the quartile class.
fQi = frequency of the quartile class
Remark:
The quartile class (class containing Qi ) is the class with the smallest cumulative frequency (less
ii. Deciles:
- Deciles are measures that divide the frequency distribution in to ten equal parts.
- The values of the variables corresponding to these divisions are denoted D1, D2,.. D9
often called the first, the second,…, the ninth decile respectively.
- To find Di (i=1, 2,..9) we count of the classes beginning from the lowest class.
Where :
L Di lower class boundary of the decile class.
w the size of the decile class
N total number of observations.
c the cumulative frequency (less than type) preceding the decile class.
f D i the frequency of the decile class.
Remark:
- The decile class (class containing Di )is the class with the smallest cumulative frequency
iii. Percentiles:
- Percentiles are measures that divide the frequency distribution in to hundred equal parts.
- The values of the variables corresponding to these divisions are denoted P1, P2,.. P99 often
called the first, the second,…, the ninety-ninth percentile respectively.
41
- To find Pi (i=1, 2,..99) we count of the classes beginning from the lowest class.
Where:
L Pi lower class boundary of the percentile class.
w the size of the percentile class
N total number of observations.
c the cumulative frequency (less than type) preceding the percentile class.
f Pi the frequency of the percentile class.
Remark:
The percentile class (class containing Pi )is the class with the smallest cumulative frequency (less
values Frequency
140-150 17
150-160 29
160-170 42
170-180 72
180-190 84
190-200 107
200-210 49
210-220 34
220-230 31
230-240 16
240-250 12
42
Solutions:
First find the less than cumulative frequency.
Use the formula to calculate the required quantile.
Values frequency Cum.Freq(less
than type)
140-150 17 17
150-160 29 46
160-170 42 88
170-180 72 160
180-190 84 244
190-200 107 351
200-210 49 400
210-220 34 434
220-230 31 465
230-240 16 481
240-250 12 493
a) Quartiles:
i. determine the class containing the first quartile.
= 123.25
Q1=170 + { − 88}
174.90
ii. Q2
- determine the class containing the second quartile.
43
190 200 is the class containing the second quartile.
LQ2 190 , w10
N 493 , c 244 , f Q2107
∗
Q2=LQ2 + { − cQi}
∗
Q2= 190 + { − 244}
190.23
iii. Q3
- determine the class containing the third quartile.
= 369.75
w 3∗n
Q3=LQ3 +fQ3 { − cQ3}
4
10 2∗493
Q3= 200 +49 { − 351}
4
203 . 83
b) D7
- determine the class containing the 7th decile.
= 345.1
199 . 45
c) P90
- determine the class containing the 90th percentile.
= 443 .7
44
220 230 is the class containing the 90 th percentile .
Lp90 220, w10
N 493 , c 434 , f p90 3107
w 90∗n
P90 LP90 fp90 { 10
− cp90}
10 90∗493
P903107 { − 434}
10
223 . 13
45
2.2.1. Types of Measures of Dispersion
Various measures of dispersions are in use. The most commonly used measures of dispersions
are:
1. Range and relative range
2. variance
3. Standard deviation
4. Coefficient of variation and standard score.
The Range (R)
The range is the largest score minus the smallest score. It is a quick and dirty measure of
variability, although when a test is given back to students they very often wish to know the range
of scores. Because the range is greatly affected by extreme scores, it may give a distorted picture
of the scores. The following two distributions have the same range, 13, yet appear to differ
greatly in the amount of variability.
Distribution 1: 32 35 36 36 37 38 40 42 42 43 43 45
Distribution 2: 32 32 33 33 33 34 34 34 34 34 35 45
For this reason, among others, the range is not the most important measure of variability.
R L S , L l arg est observation
S smallest observation
Range for grouped data:
If data are given in the shape of continuous frequency distribution, the range is computed as:
R UCBk LCB1 , UCBk is upper class boundaries of the last class.
LCB1 is lower class boundaries of the first class.
This is sometimes expressed as:
46
Demerits:
It is not based on all observation.
It is highly affected by extreme observations.
It is affected by fluctuation in sampling.
It is not liable to further algebraic treatment.
It cannot be computed in the case of open end distribution.
It is very sensitive to the size of the sample.
Relative Range (RR)
-it is also sometimes called coefficient of range and given by:
RR=
= R/(H+L)
Sample Variance
One would expect the sample variance to simply be the population variance with the population
mean replaced by the sample mean. However, one of the major uses of statistics is to estimate
the corresponding parameter. This formula has the problem that the estimated value isn't the
47
same as the parameter. To counteract this, the sum of the squares of the deviations is divided by
one less than the sample size.
1
Sample Variance S2n ∗ ∑ni 1(xi − x)
1
Standard Deviation
There is a problem with variances. Recall that the deviations were squared.
That means that the units were also squared. To get the units back the same as the original data
values, the square root must be taken.
The following steps are used to calculate the sample variance:
a. Find the arithmetic mean.
b. Find the difference between each observation and the mean.
c. Square these differences.
d. Sum the squared differences.
e. Since the data is a sample, divide the number (from step 4 above) by the
number of observations minus one, i.e., n-1 (where n is equal to the number of
observations in the data set).
Population standard deviation√σ
Sample standard deviation√s
Examples: Find the variance and standard deviation of the following sample data
1. 5, 17, 12, 10.
2. The data is given in the form of frequency distribution.
48
Solutions:
1. x = 11
Class frequency
40-44 7
45-49 10
50-54 22
55-59 15
60-64 12
65-69 6
70-74 3
Xi 5 10 12 17 Total
2 36 1 1 36 74
(Xi- X )
49
Approximately 99.73% of the data values fall within three standard
deviations of the mean. i.e. within (x 3S , x 3S )
3, Chebyshev's Theorem
For any data set ,no matter what the pattern of variation, the proportion of the values that
1
fall within k standard deviations of the mean or ( x kS , x kS ) will be at least 1 - k2 where
k is an number greater than 1 i.e. the proportion of items falling beyond k standard
1
deviations of the mean is at most 2.
k
Example: Suppose a distribution has mean 50 and standard deviation 6.What percent of the
numbers are:
a) Between 38 and 62
b) Between 32 and 68
c) Less than 38 or more than 62.
d) Less than 32 or more than 68.
Solutions:
a) 38 and 62 are at equal distance from the mean,50 and this distance is 12
ks 12
K = 12/s =12/2 = 2
1
Applying the above theorem at least (1-k2) *100% 75% of the numbers lie between 38 and 62.
50
another drug are obtained by the linear transformation Yi =
2Xi – 0.5 ( i = 1, 2, …, n ) then what will be the standard deviation of the new set of
capsules
2. The mean and the standard deviation of a set of numbers are respectively 500 and 10.
a. If 10 is added to each of the numbers in the set, then what will be the variance and
standard deviation of the new set?
b. If each of the numbers in the set are multiplied by -5, then what will be the variance
and standard deviation of the new set?
Solutions:
1. Using c) above the new standard deviation = k S 2 * 3 6
2. a. They will remain the same.
b. New standard deviation k S 5 *10 50
Coefficient of Variation (C.V)
Is defined as the ratio of standard deviation to the mean usually expressed as percents.
C.V = * 100%
The distribution having less C.V is said to be less variable or more consistent.
Examples:
1. An analysis of the monthly wages paid (in Birr) to workers in two firms A and B belonging to
the same industry gives the following results
Values Firm A Firm B
Mean wage 52.5 47.5
Variance 100 121
In which firm A or B is there greater variability in individual wages?
Solutions:
Calculate coefficient of variation for both firms.
sA
C.VA = xA* 100% = 10/52.5 = 19.05%
sB
C.VB = *100% =11/47.5 = 23.16%
xB
Since C.VA < C.VB, in firm B there is greater variability in individual wages.
51
Standard Scores (Z-scores)
If X is a measurement from a distribution with mean X and standard deviation S, then its
value in standard units is
x µ
Z= , for population.
σ
x x
Z= , for sample.
s
Student A from section 1 scored 90 and student B from section 2 scored 95.Relatively speaking
who performed better?
Solutions:
Calculate the standard score of both students.
xA x1 90 78
ZA = = =2
s1 6
xB x2 95 90
ZB = = =1
s2 5
- Student A performed better relative to his section because the score of student A is two
standard deviation above the mean score of his section; while, the score of student B is
only one standard deviation above the mean score of his section.
2. Two groups of people were trained to perform a certain task and tested to find out which
group is faster to learn the task. For the two groups the following information was given:
52
Value group 1 group 2
Mean 10.4 min 11.9 min
St.dev 1.2 min 1.3 min
Relatively speaking:
a) Which group is more consistent in its performance?
b) Suppose a person A from group one take 9.2 minutes while person B from Group two take 9.3
minutes, who was faster in performing the task? Why?
Solutions:
a) Use coefficient of variation.
sA 1.2
C.V1 x1 10.4
sA 1.3
C.V2=x2 * 100% 11.9
Child B is faster; because the time taken by child B is two standard deviation shorter than
the average time taken by group 2 while, the time taken by child A is only one standard
deviation shorter than the average time taken by group 1.
53
2.3. Review Exercises
1. By considering the given raw data
12, 13, 17, 12, 13, 14, 16, 12, 15 and 15.
Find: i) mean
ii) mode
iii) median
iv) variance and standard deviation
v) coefficient of variation.
2. The Addis Ababa city municipality police traffic control department has observed
the number of car accidents (per month) to be categorized as shown in the table
below.
No. of care accidents frequency
10-14 5
15-19 6
20-24 3
25-29 4
30-34 2
Then calculate:
a) the average number of accidents
b) The median and mode
c) The variance, standard deviation and coefficient of variation
3. Marks of 75 students are summarized in the following frequency distribution:
54
If 20% of the students have marks between 55 and 59
i) Find the missing frequencies f4 and f5.
ii) Find the mean.
4. A teacher attaches 2 to Quiz, 3 to Mid-term and 5 for Final exam. If a student gets
90, 50 and 60 for Quiz, Mid-term and Final-exam respectively, what is his/her
average academic performance?
5. The mean weight of 50 women workers in a factory is 48 kg. The mean weight of
75 men working in the same factory is 58 kg. Find the mean weight of all workers
in the factory.
6. The mean of 200 items was found to be 40. Later on it was discovered that two
items were wrongly read as 92 and 8 instead of 192 and 88 respectively. Find the
correct mean.
7. The mean salary of 100 laborers working in a factory , running in two shifts of 40
and 60 workers respectively is birr 380. The mean salary of the 40 laborers
working in the morning shift is birr 350. Find the mean salary of the 60 laborers
working in the evening shift.
8. Find the geometric mean of A) 1, 2, 3, 4, and 5. B) 1, 2, 3, 4, 100. Is there a great
difference between the GM of A and that of B?
9. The price of a commodity increased by 5% from 1989 to 1990, 8% from 1990 to
1991 and by 77% from 1991 to 1992. Find the average price increase.
10. Find the harmonic mean of A) 1, 2, 3, 4, 5. B) 1, 2, 3, 4, 100. Is there a great
difference between the HM of A and that of B?
11. A driver traveled 400 km per day for three days at a speed of 60, 50 and 40
kilometers per hour. Find the average speed of the driver.
12. A student reads the first 100 pages of a book at a rate of 5 pages per hour, the next
100 pages at a rate of 8 pages per hour. What is the student’s average reading
speed?
13. In a certain investigation, 460 persons were involved in the study, and based on an
enquiry on their age, it was known that 75% of them were 22or more. The
following frequency distribution shows the age composition of the persons under
study.
55
Mid-age in years 13 18 23 28 33 38 43 48
a. Find the median and modal life of condensers and interpret them.
b. Find the values of all quartiles.
c. Compute the 5th deciles, 25th percentile, 50th percentile and the 75th
percentile and interpret the results.
14. The mean annual salary of all employees in a company is 2500. The mean salary
of male and female is 2700 and 1700 respectively. Find the percentage of males
and females employed in the company.
15. Given the following FD.
Mid-price of a commodity 15 25 35 45 55
a. If 75% of the items were sold in birr 45 or less and most items were sold in
birr 34, find the missing frequencies.
b. If 25% of the items were sold in less than or equal to birr 45 and most items
were sold in birr 34, find the missing frequencies.
56
CHAPTER 3
ELEMENTARY PROBABILITY
Introduction
In chapter one’ we discussed the difference b/n descriptive and inferential statistics. Much
statistical analysis are inferential, and probability is the corner stone of inferential statistics.
Recall that inferential statistics involves taking a sample from the population, a sample value (a
statistic) on the sample, and inferring from the statistic the value of the corresponding population
value (a parameter). The reason for doing so is that it is difficult, and sometimes impossible, to
get the population parameter directly.
Objectives of the chapter
By the end of this chapter students must be able to:
- Understand the concept behind probabilistic models, random experiment, sample space,
events and fundamental axioms of probability.
- Justify the different types of outcomes, sample spaces, events and probability approaches.
- Solve various probability problems.
- Calculate permutation and combination; determine whether you should use combination
or permutation to calculate the number of outcomes.
Definition
Probability theory is the foundation upon which the logic of inference is built.
It helps us to cope up with uncertainty.
In general, probability is the chance of an outcome of an experiment. It is the measure of
how likely an outcome is to occur.
3.1 Definitions of some probability terms
1. Experiment: Any process of observation or measurement or any process which generates well
defined outcome.
2. Probability Experiment: It is an experiment that can be repeated any number of times under
similar conditions and it is possible to enumerate the total number of outcomes without
predicting an individual out come. It is also called random experiment.
Example: If a fair die is rolled once it is possible to list all the possible outcomes i.e.1, 2, 3, 4, 5,
6 but it is not possible to predict which outcome will occur.
3. Outcome: The result of a single trial of a random experiment
57
4. Sample Space: Set of all possible outcomes of a probability experiment
5. Event: It is a subset of sample space. It is a statement about one or more outcomes of a
random experiment .They are denoted by capital letters.
Example: Considering the above experiment let A be the event of odd numbers, B be the event of
even numbers, and C be the event of number 8.
A
B
Cor empty space or impossible event
Remark:
If S (sample space) has n members then there are exactly 2n subsets or events.
6. Equally Likely Events: Events which have the same chance of occurring.
7. Complement of an Event: the complement of an event A means non- occurrence of A and
is denoted by A , or A , or A contains those points of the sample space which don’t
belong to A.
8. Elementary Event: an event having only a single element or sample point.
9. Mutually Exclusive Events: Two events which cannot happen at the same time.
10. Independent Events: Two events are independent if the occurrence of one does not affect
the probability of the other occurring.
11. Dependent Events: Two events are dependent if the first event affects the outcome or
occurrence of the second event in a way the probability is changed.
Example: .What is the sample space for the following experiment
a) Toss a die one time.
b) Toss a coin two times.
c) A light bulb is manufactured. It is tested for its life length by time.
Solution
a) S={1,2,3,4,5,6}
b) S={(HH),(HT),(TH),(TT)}
c) S={t /t≥0}
Sample space can be
Countable ( finite or infinite)
Uncountable.
58
3.2 Counting Rules
In order to calculate probabilities, we have to know
The number of elements of an event
The number of elements of the sample space.
That is in order to judge what is probable, we have to know what is possible .
In order to determine the number of outcomes, one can use several rules of counting.
- The addition rule
- The multiplication rule
- Permutation rule
- Combination rule
To list the outcomes of the sequence of events, a useful device called tree diagram is
used.
Example: A student goes to the nearest snack to have a breakfast. He can take tea, coffee, or milk
with bread, cake and sandwich. How many possibilities does he have?
Solutions:
Tea
Bread
Cake
Sandwich
Coeffee
Bread
Cake
Sandwitch
Milk
Bread
Cake
Sandwitch
There are nine possibilities.
The Multiplication Rule:
If a choice consists of k steps of which the first can be made in n1 ways, the second can be made
in n2 ways…, the kth can be made in nk ways, then the whole choice can be made in (n1 * n2 *
59
........ * nk ) ways.
Example: The digits 0, 1, 2, 3, and 4 are to be used in 4 digit identification card.
How many different cards are possible if
a) Repetitions are permitted.
b) Repetitions are not permitted.
Solutions
a)
60
Where n! n * (n1) * (n2) *.....* 3 * 2 *1
2. The arrangement of n objects in a specified order using r objects at a time is called the
permutation of n objects taken r objects at a time. It is written as n Pr and the formula is
!
n Pr = (
∗ )!
3. The number of permutations of n objects in which k1 are alike k2 are alike ---- kn are alike is
!
n Pr =
!∗ !∗…∗ !
Example:
1. Suppose we have a letters A, B, C, D
a) How many permutations are there taking all the four?
b) How many permutations are there two letters at a time?
2. How many different permutations can be made from the letters in the word “CORRECTION”?
Solutions:
1,
A. Here n 4, there are four distinct object
There are 4! 24 permutations.
B, Here n 4, r 2
4!
There are 4 P2(4 permutation.
2)!
2,
Here n 10
Of which 2 are C, 2 are O, 2 are R ,1E,1T ,1I ,1N
K1 2, k2 2, k3 2, k4 k5 k6k7 1
U sin g the 3rd rule of permutation , there are
10!
453600 permutations.
2!∗2!∗2!∗1!∗1!∗1!∗1!
Combination
- A selection of objects without regard to order is called combination.
Example: Given the letters A, B, C, and D list the permutation and combination for selecting two
61
letters.
Solutions:
Permutation Combination
AB BA CA DA AB BC
AC BC CB DB AC BD
AD BD CD DC AD DC
Note that in permutation AB is different from BA. But in combination AB is the same as BA.
Combination Rule
- The number of combinations of r objects selected from n objects is denoted by
nCr or and given by:
n!
= (n r)!∗r!
Examples:
1. In how many ways a committee of 5 people be chosen out of 9 people?
Solutions:
n 9 , r 5
n n! 9!
( ) = (n = (9 = 126 way.
r r)!∗r! 5)!∗5!
2. Among 15 clocks there are two defectives .In how many ways can an inspector chose
three of the clocks for inspection so that:
A. There is no restriction.
B. None of the defective clock is included.
C. Only one of the defective clocks is included.
D. Two of the defective clock is included.
Solutions:
n 15 of which 2 are defective and 13 are non defective.
r3
a) If there is no restriction select three clocks from 15 clocks and this can be done in:
n 15 , r 3
n! 15!
) = (n = (15 = 455 way.
r)!∗r! 5)!∗5!
62
This is equivalent to zero defective and three non defective, which can be done in:
* = 286 ways.
c) Only one of the defective clocks is included.
This is equivalent to one defective and two non defective, which can be done in:
* = 156 ways.
d) Two of the defective clock is included.
This is equivalent to two defective and one non defective, which can be done in:
* = 13 ways.
3.3 Approaches to measuring Probability
There are four different conceptual approaches to the study of probability theory. These are:
The classical approach.
The frequentist approach.
The axiomatic approach.
The subjective approach.
The classical approach
This approach is used when:
- All outcomes are equally likely.
- Total number of outcome is finite, say N.
Definition: If a random experiment with N equally likely outcomes is conducted and out of these
NA outcomes are favorable to the event A, then the probability that event A occur denoted P( A)
is defined as:
N No.of outcomes favourable to A n(A)
P( A) = n(S)
N total number of outcomes
Examples:
1. A fair die is tossed once. What is the probability of getting?
a) Number 4?
b) An odd number?
c) An even number?
d) Number 8?
Solutions:
First identify the sample space, say S
63
S
N n(S
a) Let A be the event of number 4
A
N A n( A) 1
n(A)
P( A n(S)
64
( )
P( A 0.00001825
( )
( )
P( A
( )
( )
P( A
( )
Example: If records show that 60 out of 100,000 bulbs produced are defective.
What is the probability of a newly produced bulb to be defective?
Solution:
Let A be the event that the newly produced bulb is defective.
NA 60
P( A)limN→∞ 100,000
N
Axiomatic Approach:
Let E be a random experiment and S be a sample space associated with E. With each event A a
65
real number called the probability of A satisfies the following properties called axioms of
probability or postulates of probability.
1. P( A) 0
2. P(S ) 1, S is the sure event.
3. If A and B are mutually exclusive events, the probability that one or the other occur equals the
sum of the two probabilities. i. e.
P( A B) P( A) P(B)
4. P(A) = 1-P(A)
5. 0 P( A 1
6. P(ø) =0, ø is the impossible event.
Remark: Venn-diagrams can be used to solve probability problems.
AUB AnB A
66
iii. Has a consonant and vowels alternating?
3. Out of 5 Mathematician and 7 Statistician a committee consisting of 2 Mathematician and 3
Statistician is to be formed. In how many ways this can be done if
a) There is no restriction
b) One particular Statistician should be included
c) Two particular Mathematicians cannot be included on the committee.
4 . If 3 books are picked at random from a shelf containing 5 novels, 3 books of poems, and a
dictionary, in how many ways this can be done if
a) There is no restriction.
b) The dictionary is selected?
c) 2 novels and 1 book of poems are selected?
5. What is the probability that a waitress will refuse to serve alcoholic beverages to only three
minors if she randomly checks the I.D’s of five students from among ten students of which four
are not of legal age?
6. If 3 books are picked at random from a shelf containing 5 novels, 3 books of poems, and a
dictionary, what is the probability that
a) The dictionary is selected?
b) 2 novels and 1 book of poems are selected?
7. There are 5 cooperative, 3 management, and 2 accounting students. In how many ways can
these students be arranged if:
a. students from the same department must sit together?
b. no restriction?
67
CHAPTER FOUR
Conditional probability and Independency
Objectives of the chapter
By the end of the chapter; you should be able to:-
- Know basic concept and important facts about conditional probability and independency.
- Determine whether two events are dependent or independent and whether one event is
conditional on another or not.
- Solve problems related to conditional probability and independence.
Conditional Events: If the occurrence of one event has an effect on the next occurrence of the
other event then the two events are conditional or dependant events.
The conditional probability of an event B in relation to an event A is the probability that event B
occurs given that the event A has already occurred. The notation for this conditional probability
is p (B/A), where the vertical bar (/) is read as “given” and the whole symbol is referred to as
“the probability of B given A” or “the conditional probability of B given A’.
Example: Suppose we have two red and three white balls in a bag
1. Draw a ball with replacement
This is conditional.
Let B= the event that the second draw is red given that the first draw is
red p(B) 1/ 4
4.1 Conditional probability of an event
The conditional probability of an event A given that B has already occurred, denoted p( A B) is
68
p( A /B) = p( A B) , p(B) 0
p(B)
Examples
1. For a student enrolling at freshman at certain university the probability is 0.25 that he/she will
get scholarship and 0.75 that he/she will graduate. If the probability is 0.2 that he/she will get
scholarship and will also graduate. What is the probability that a student who get a scholarship
graduate?
Solution:
Let A= the event that a student will get a scholarship
B= the event that a student will graduate
given p( A) 0.25, p(B) 0.75, p A B 0.20
Re quired pB /A
(∩ ) .
pB/ A 0.80
( ) .
2. If the probability that a research project will be well planned is 0.60 and the probability that it
will be well planned and well executed is 0.54, what is the probability that it will be well
executed given that it is well planned?
Solution;
Let A= the event that a research project will be well Planned
B= the event that a research project will be well Executed
Given: p( A) 0.60, p A B 0.54
Re quired pB/ A
(∩ ) .
pB/ A 0.9
() .
69
4.2. Probability of Independent Events
We say two events A and B are said to be independent if the occurrence of event A in a
probability experiment does not affect the probability of event B. that is , if we know that A has
occurred, then B still occurs with its usual probability. Similarly, if B has occurred, then A
occurs with its usual probability. In other words, events A and B are considered as independent if
the conditional probability A given B is the same as the unconditional probability of A, i.e,
P(A/B) = P(A) (B does not affect event A) and on the same way if the conditional probability B
given A is the same as the unconditional probability of B, i.e, P(B/A) = P(B) (A does not affect
event B) other ways, the events are dependent. This leads to a useful formula which is also our
definition of independency.
P(AnB) = P(A)*P(B) if and only if A and B are independent events.
Here p A/ B p A, PB/ A pB
Example; A box contains four black and six white balls. What is the probability of getting two
black balls in drawing one after the other under the following conditions?
a. The first ball drawn is not replaced
b. The first ball drawn is replaced
Solution; Let A= first drawn ball is black
B= second drawn is black
Required p A B
a. p A B pB/ A p A4 /103/ 9 2 /15
b. p A B p A. pB4 /104 /10 4/ 25
70
4.3. Review exercises
1. A coin is filliped 3 times. Each of the 8 outcomes is equally likely.
Let A; head occur on each of the first 2 flips
B; tail occurs on the third flip, and
C; exactly 2 tails occur in the 3 flips.
Then, show that A and B are independent, but B and C are dependent?
2. Let A and B be independent events with P(A) = 0.3 and P(B) = 0.4. Then what P (A or
B)?
3. Suppose that a fair die is tossed 2 times. Let;
A: be the event that the first die shows an even number and
B: be the event that the second die shows a 5 or 6.
Are events A and B independent?
4. Suppose that A and B are independent events associated with an experiment. If the
probability that A or B occurs equal to 0, while the probability that A occurs equals 0.4,
determine the probability that B occurs?
71
Chapter five
5. ONE- DIMENSIONAL RANDOM VARIABLE
Learning Objectives:
The type of a given random variable determined by the output of a given experiment. If the
output of given experiment is countable finite or accountably infinite then the type of random
variable defined over this experiment is known to be discrete random variable where as if the
output of given experiment cannot be determined by counting but assumes any real values
between n two defined points then the type of random variable define over this experiment is
known to be continuous random.
Depending on the number of variable that determine out put a given experiment, we can
categorize the the random variable as one dimensional random variable if the output of that
experiment is determined by one variable, two dimensional random variable if the output of that
experiment is determined by two variables and k dimensional random variable if the output of
that experiment is determined by two variables. In general the type of a given experiment
determines the type of random variable because of the random variable is defined over a given
experiment.
72
5.1.1. Discrete random variables
The discrete random variable arises in situations when the possible outcomes are countable
finite or countably infinite. Or it is a random variable that assumes only certain clearly separated
values or whole numbers.
No of heads (x) 0 1 2
73
Probability of heads 1/4 1/2 1/4
p(x)
Exercise:
1. Construct a probability distribution for the number of heads observed in tossing a coin three times.
2. Construct a probability distribution for the number of girls if a family plans to have three children
3. f(X)= 2/9x where x=1,2,4,5
74
/ /
Solution: p(0<x<1/2)=∫ = | =0.04
Exercise 2: The length X (in inches) of a steel rod is assumed to be a random variable with
probability density function given by
3 − ,0 < <2
fX(x) =
0, otherwise
Rod is considered defective if its length is less than 1 inches. Then find the probability of the
steel road become defective.
5.2. Cumulative distribution function
Definition let x be a random variable continuous or discrete the cumulative distribution of x is
defined by F(x) =P(X≤ ) here the way we find CDF for the two types random variable are
different.
If X is a discrete random variable F(x) is given by P(X≤ )=∑ ( )
If X is a continuous random variable its F(x) is given by P(X≤ )=∫ ( ) were f(t)
here is the function of probability density function of a given random variable.
Example 1: a student is given three true false questions but the student do not know the exact
answer of the questions let X represents the number of true answers the student answered
randomly construct a cumulative distribution of X.
Solution:
Here x takes the values of 0,1,2,3 now our aim is to find the CDF of each value but here we
should have to compute its probability mass function.
Xi
0 1 2 3
P(xi) 1/8 3/8 3/8 1/8
75
1/8, 0 ≤ <1
F(x)= 4/8, 1 ≤ <2
7/8, 2 ≤ <3
1, x≥ 3
Example: Assume that the resistance of certain cable is assumed to be a continuous with the
3x , 0 < < 1
following pdf f(x) =
0, other wise
Then compute the CDF for the resistance of a cable
Solution:
F(x) =P(X≤ )= ∫ ∞
( ) but here f(t) is the function of f(x)
= ∫0 3 2dt
=x3
Then the general form of CDF is given by
0, x < 0
F(x)= x , 0 < < 1
1, x > 1
76
Let x represent the coordinate of the chosen point, we have that the pdf of x is given by f(x) = ½,
3/2 1
0≤x≤2, 0, otherwise. And hence p (1≤x≤3/2) =∫1 2 = 1/4.
5.4.Review Exercises
1. Let X be a continuous with pdf
, 0≤ ≤1
( )= ( )=
0, ℎ
Where c is a constant
a. Determine the value of constant c
b. Find the cumulative distribution function of the random variable X that is,
( ).
2. The length of time (in minute) y that a certain women speaks on mobile telephone is
random variable with probability density function (pdf)
f(y) = , ≥0 then Determine
0, ℎ
a. The value of constant c.
b. The cumulative distribution function (CDF)
c. What is the probability that the duration of her conversation will be between 10 and 15
minutes?
3. For the random variable X with probability mass function P(x)= aX,
X=1,2,3,4 and Y=2X+1. Find
a) The value of a.
b) CDF of Y.
4. From the following pmf
PX;Y(x,y) Y
0 1 2
a) Calculate PX(x)
b) Find and PY(y).
77
CHAPTER SIX
Example: Consider an experiment where a die is rolled twice. Let X1 denote the number of the
first roll, and X2 the number of the second roll. Then (X1, X2) is a two dimensional random
vector.
6.2. Joint Distributions
Now that we know the definition of a random vector, we can begin to use it to assign
probabilities to events. For any random vector, we can define a joint cumulative distribution
function for all of the components as follows:
Definition 6.2. Let (X1,X2) be a random vector. The joint cumulative distribution function for
this random vector is given by , FX1,X2,...;Xn(x1, x2, . . . , xn)= P(X1 ≤ x1,X2 ≤ x2, . . . ,Xn ≤ x)
In the two dimensional case, the joint cumulative distribution function for the random vector
(X ,X ) evaluated at the point (x ,x ), namely FX , X (x ,x ),is the probability that the
experiment results in a two dimensional value within the specified region.
Every joint cumulative distribution function must possess the following properties:
1. lim → ∞ 1, 2, . . . ; ( 1, 2, . . . , )=0
2. lim → ∞ 1, 2, . . . , ( 1, 2, . . . , ) =1
3. As xi varies, with all other xj’s (j ≠ i) fixed, FX1,X2,...;Xn(x1, x2, . . . , xn) is a non
decreasing function of xi.
As in the case of one dimensional random variables, we shall identify two major classifications
of vector valued random variables: discrete and continuous. Although there are many common
properties between these two types, we shall discuss each separately.
6.2.1. Discrete Distributions
A random vector that can only assume at most a countable collection of discrete values is said to
be discrete. As an example, consider once where a die is rolled twice. The possible values for
either X1 or X2 are in the set {1; 2; 3; 4; 5; 6}. Hence, the random vector (X1;X2) can only take
on one of the 36 values. If the die is fair, then each of the points can be considered to have a
79
probability mass of 1/36. This prompts us to define a joint probability mass function for this type
of random vector, as follows:
Definition 6.3. Let (X1,X2, . . .Xn) be a discrete random vector. Then pX1, X2,...,Xn(x1, x2, . . .
,xn) = P(X1 = x1,X2 = x2, . . . ,Xn = xn): is the joint probability mass function for the random
vector (X1,X2, . . . ,Xn). Referring again to the example, we find that the joint probability mass
function for (X1,X2) is given by pX1,X2(x1, x2) = 1/ 36 for x1 = 1,2, . . . , 6 and x2 = 1, 2, . . . , 6
Note that for any probability mass function,
Therefore, if we wished to evaluate FX1,X2(3,4.5) we would sum all of the probability mass in
the specified region , and obtain FX1,X2(3,4.5) = 12* 1/36 = 1/3.
This is the probability that the first roll is less than or equal to 3 and the second roll is less than
or equal to 4.5.
Every joint probability mass function must have the following properties:
1. pX1,X2,...,Xn(x1, x2, . . . xn) ≥ 0 for any (x1,x2,. . . ,xn)
2. .. PX 1, X 2...Xn( x1, x2,...xn) =1
allx1 allx 2 allxn
You should compare these properties with those of probability mass functions for single valued
discrete random variables.
6.2.2. Continuous Distributions
Definition 6.4. Let (X1, X2, . . . , Xn) be a continuous random vector with joint cumulative
distribution function FX1,...,Xn(x1, . . . , xn): The function fX1,...,Xn(x1, . . . , xn) that satisfies the
equation
80
3. P (E) = ∫ … ∫ 1, 2, . . . , ( 1, 2, . . . , ) 1… for any event E.
You should compare these properties with those of probability density functions for single
valued continuous random variables.
In the one dimensional case, we had the handy formula P (a<X<b)=FX(b)-FX(a). This
worked for any type of probability distribution.
Example: Let (X,Y ) be a two dimensional random variable with the following joint probability
density function.
fX,Y (x, y) = 2-y, if 0≤x≤2,0≤y≤2
0, elsewhere
Note that ∫ ∫ (2 − ) =1
Suppose we would like to compute P(X ≤1,Y ≤1.5). To do this, we calculate the volume under
the surface
fX,Y (x, y) over the region{(x,y) : x ≤ 1, y ≤ 1.5}.
Performing the integration, we get,
.
P(X ≤ 1.0,Y ≤ 1.5) = ∫ ∞
∫ ∞(2 − ) =3/8.
Exercise: what is p(x<0.5, y<2)?
81
Definition 6.5: Let (X1; . . . ;Xn) be a random vector with joint cumulative distribution function
FX1,...,Xn(x1; . . . , xn). The marginal cumulative distribution function for X1 is given by
FX1(x1) = lim →∞ lim →∞ … lim →∞ 1, . . . , ( 1, . . . , )( 1, 2, . . . , )
Notice that we can renumber the components of the random vector and call any one of them X1.
So we can use the above definition to find the marginal cumulative distribution function for any
of the Xi’s.
Although Definition 5.5 is a nice definition, it is more useful to examine marginal probability
mass functions and marginal probability density functions. For example, suppose we have a
discrete random vector (X; Y ) with joint probability mass function pX,Y (x, y). To find pX(x), we
ask “What is the probability that X = x regardless of the value that Y takes on? This can be
written as PX(x) = P(X = x) = P(X = x, Y = any value) = PX , Y ( x, y) .
ally
Example: In the die example, pX1; X2(x1; x2) = 1/36, for x1 = 1,2, . . . ,6 and x2 = 1,2, . . . ,6
To find PX1 (2), for example, we compute
P (2) = P( = 2) = PX , Y (2, k ) =1/6.
allk
Y
PX,Y (x,y) 1 2 3 4 5
x
1 .15 0 0 0 0
2 .15 .1 0 0 0
Let X be the total number of items produced in a day’s work at a factory, and let Y be the
number of defective items produced. Suppose that the probability mass function for (X,Y ) is
given by Table 5.1. Using this joint distribution, we can see that the probability of producing 2
items with exactly 1 of those items being defective is
PX,Y (2, 1) = 0.15
82
To find the marginal probability mass function for the total daily production, X, we sum the
probabilities over all possible values of Y for each fixed x:
PX(1) = PX,Y (1,1) = 0.05
PX(2) = PX,Y (2,1) + PX,Y (2, 2) = 0:15 + 0:10 = 0:25
PX(3) = pX;Y (3; 1) + pX;Y (3; 2) + pX;Y (3; 3) = 0:05 + 0:05 + 0:10 = 0:20
etc.
The procedure is similar for obtaining marginal probability density functions. Re call that a
density, fX(x), itself is not a probability measure, but fX(x)dx, is. So with a little loose speaking
integration notation we should be able to compute fX(x) dx = P(x ≤ X < x + dx)
= P(x ≤ X < x + dx; Y = any value)
=∫ , ( , ) , where y is the variable of integration in the above
integral. Looking at this relationship as
∞
fX(x) dx= ∫ ∞
, ( , )
=∫ ( + )
83
=(xy+ ) 0,1
=x+
Let X denote the number of grade A compressors produced on a given day. Let Y denotes the
number of grade B compressors produced on the same day. Suppose that the joint probability
mass function
PX,Y (x, y) = P(X = x, Y = y) is given by the following table:
pX;Y(x;y) Y pX(x)
0 1
0 0.1 0.3 0.4
X 1 0.2 0.1 0.3
2 0.2 0.1 0.3
84
Given that no grade B compressors were produced on a given day, what is the probability that 2
grade A compressors were produced?
p( x 2, y 0) 0.2 2
p ( x 2 / y 0)
p ( y 0) 0.5 5
Example2: Suppose an electronic circuit contains two transistors. Let X be the time to failure of
transistor 1 and let Y be the time to failure of transistor 2.
( )
4 x ≥ 0, ≥ 0
fX, Y (x, y)=
0, otherwise.
Given that the total life time for the two transistors is less than two hours, what is
the probability that the first transistor lasted more than one hour?
P( X 1, X Y 2)
Solution: p ( x 1 / x y 2)
P ( X Y 2)
2 2 x 2 ( x y )
p ( x 1, x y 2) 4e dydx
1 0
e 2 3e 4
2 2 x 2 ( x y )
p ( x y 2) 4e dxdy
0 0
1 5e 4
e 2 3e 4
p ( x 1 / x y 2) 0.0885
1 5e 4
85
Testing for independence:
Case I: Discrete
A discrete random variable X is independent of a discrete random variable Y if and
only if
pX,Y (x, y) = [pX(x)][pY (y)] for all possible values of x and y
Example: Suppose an electronic circuit contains two transistors. Let X be the time to failure of
transistor 1 and let Y be the time to failure of transistor 2.
( )
4 x ≥ 0, ≥ 0
fX;Y (x; y)=
0, otherwise.
2 x ≥ 0
fX(x)=
0, Otherwise
2 y ≥ 0
fY (y)=
0, otherwise
We must check the probability density functions for (X, Y), X and Y for all values
of (x, y).
For x ≥ 0 and y ≥ 0:
86
( )
fX,Y (x, y) = 4 = fX(x) fY (y) =2 2 For x ≥ 0 and y < 0:
fX;Y (x; y) = 0 = fX(x)fY (y) = 2 (0) For x < 0 and y ≥ 0:
fX;Y (x; y) = 0 = fX(x)fY (y) = (0) 2 For x < 0 and y < 0:
fX;Y (x; y) = 0 = fX(x)fY (y) = (0)(0)
So the random variables X and Y are independent.
1. A candy company distributes boxes of chocolates with a mixture of creams, coffees, and nuts
coated in both light and dark chocolate. For a randomly selected box, let X and Y, respectively, be
the proportions of the light and dark chocolates that are creams and suppose that the joint density
function is
(2 + 3 ), 0 < < 1, 0 < y < 1
f (x,y ) =
0, ℎ
a. Verify f (x,y ) whether or not is joint probability function of x and y.
b. Find P(X<1/2, ¼<y<1/2)
c. Find the marginal probability density of x and y
2. Two electronic components of a missile system work in harmony for success of the total system.
Let X and Y denote the life in hours of the two components. The joint density of X and Y is
( )
, ≥ 0
( , ) =
0, ℎ .
a. Find the marginal density functions for both random variables.
b. What is the probability that both components will exceed 2 hours?
87
Chapter Seven
7. Expectation
Learning outcomes
If X is a discrete random variable which can take the values x1, x2,…,xn with probability
E ( X ) x i f ( xi ) x1 f ( x1 ) x 2 f ( x 2 ) x n f ( x n )
i
Example: Consider one roll of a die. Let X is the number that turns up. To find E(X), we
must get the probability distribution of X.
Solution:
X f(x)
1 1/6
2 1/6
3 1/6
4 1/6
5 1/6
6 1/6
1 1 1 1 1 1 7
μ = E(X) = 1 2 3 4 5 6
6 6 6 6 6 6 2
88
Expectations of Continuous Random Variables
Let the continuous random variable X taking values in [a,b] and f ( x ) is the probability
density function. Then, the expected value of the continuous random variable X is
b
E(X ) xf ( x ) dx .
a
x 2
18 , 2 x 4
f (x )
0 , otherwise
Find E[X] .
Solution:
E X
4
x
x 2
4
x2 x
dx dx
x3 x2
18 2
2
18 2
9 54 18 2
Exercise
E [ g ( X )]
x
g ( x) f ( x)
b
E [ g ( X )] g ( X ) f ( x ) dx .
a
89
Let X and Y be random variables with joint probability distribution f ( X , Y ) . The mean or
expected value of the random variable g ( X , Y ) is
+∞ +∞
E[ g ( X , Y )] = ( , ) ( , ) , if X and Y are
−∞ −∞
Example: let X and Y be the random variables with joint probability distribution indicated in the
table. Find the expected value of g ( X , Y ) XY
f ( X ,Y ) X
0 1 2 Row totals
0 3/28 9/28 3/28 15/28
Y 1 3/14 3/14 0 3/7
2 1/28 0 0 1/28
Column totals 5/14 15/28 3/28
Solution:
E[( X , Y ] = , ( , )
=0.0 (0,0)+ 0.1 (0,1)+ 0.02 (0,2)+ 1.0 (1,0)+ 1.1 (1,1)+ 1.2 (1,2)+ 2.0 (2,0)+
2.1 (2,1)+ 2.2 (2,2)
= (1,1)
=3/14
Example: The joint pdf of two random variables X and Y is given by
1
f X ,Y ( x, y ) xy 0 x 2, 0 y 2
4
0 otherwise
2
Find the joint expectation of g ( X , Y ) X Y
90
Eg ( X , Y ) EX 2Y
g ( x, y ) f X ,Y ( x, y )dxdy
22 1
x 2 y xydxdy
00 4
12 2
x 3 dx y 2 dy
40 0
1 2 23 4
4 4 3
8
3
Note:
If ( , ) = and ( , ) =
⎧ ( , ) = ℎ( ) ,
⎪
⎪
( )=
⎨ ∞
⎪
⎪ ( , ) = ℎ( ) ,
⎩ −∞
⎧ ( , ) = ℎ( ) ,
⎪
⎪
( )=
⎨ ∞
⎪
⎪ ( , ) = ℎ( ) ,
⎩ −∞
Properties of expectation
1. If c is a constant,
91
E [c ] c
2. If c is a constant and X is a random variable, then
E [cX ] cE [ X ]
3. If a and c are constants then
E [a cX ] a cE [ X ]
7.2. Variance of a random variable
Let x1 , x 2 ,, x n , be all the possible values of the discrete random variable X and f ( x ) is
the probability distribution. Let E(X ) be the expected value of X. Then, the variance of
the discrete random variable X is
Var ( X ) 2 E X E ( X )
2
(x i ) 2 f ( xi )
i
( x1 ) 2 f ( x1 ) ( x 2 ) 2 f ( x 2 ) ( x n ) 2 f ( x n )
Example: Consider one roll of a die. Let X is the number that turns up. To find V(X), we must get
the expected value of X. This is
To find the variance of X, we form the new random variable (X − µ) 2 and compute its
expectation. We can easily do this using the following table.
X f(x) (x − 7/2)2
1 1/6 25/4
2 1/6 9/4
3 1/6 ¼
4 1/6 ¼
92
5 1/6 9/4
6 1/6 25/4
Var(X) =1/6{25/4+9/4+1/4+1/4+9/4+25/4}=35/12
Let the continuous random variable X taking values in [a,b] and f (x) is the probability
distribution. Let E(X ) be the expected value of X. Then, the variance of the continuous
random variable X is
b
E X E ( X ) ( x ) 2 f ( x ) dx
2 2
Var ( X )
a
Standard Deviation
The standard deviation of X, denote d by SD(X), is SD(X) = Var (X ) . We often write σ for SD(X)
Example 5.1.2 : Find variance of for the given probability density function for a continuous
random variable X
x 2
18 , 2 x 4
f (x)
0 , otherwise
Solution:
93
4 4 4
E X x
x 2 x2 x
dx dx
x3 x2
18 2
2
18 2
9 54 18 2
Since
x
2
4
2
x 2
4
x3 x 2
dx dx
x4 x3
EX
18 18
9 72 27 6 ,
2 2 2
2
Var X E X E X 2 2 6 22 2
Proof. We have
= E(X 2 − 2µX + µ2 )
= E(X2) − 2µE(X) + µ2
= E(X2) − µ2.
= E(X2) – [E(X)]2
Using Theorem, we can compute the variance of the outcome of a roll of a die by first
computing
94
91 7 35
Var (X) = E (X2) − µ2 , in agreement with the value obtained directly from
6 2 12
the definition of Var(X).
Properties of Variance
The variance has properties very deferent from those of the expectation. If c is any
constant, E (cX) = cE(X) and E(X + c) = E (X) + c. These two statements imply that the
expectation is a linear function. However, the variance is not linear, as seen in the next
theorem. If X is any random variable and c is any constant, then
= ( − ) − = ( )− = ( )− ( ) ( )
( )
= =
( ) ( )
Because > 0 and >0, if the covariance between X and Y is positive, negative, or
zero, the correlation between X and Y is positive, negative, or zero, respectively. The
following result can be shown.
2
( , ) = 5 (2 + 3 ), 0≤ ≤ 1, 0≤ ≤1
0 , ℎ
Find
a. ( ) and ( )
b. ( ) and ( )
c. Covariance of and ( )
d. The Correlation coefficient between and ( )
Solution:
a. ( )=∫ ( )
ℎ ( )
1
2 3y2
( )= (2 + 3 ) = 2xy
5 2 0
2 3
= 2 + − (0 + 0)
5 2
2 3
( )= 2 +
5 2
( )= ( )
1
2 3 2 2 x 3 3x 2
= 2 + = 3 4
5 2 5 0
2 2 3 8+9
= 3 4 = 30
5
17
( )=
30
( )= ( )
96
1
2 x 4 3x 3
2 3 2
= 2 + = 4 6 0
5 2 5
2 2 3
=
5 4 6
2
( )=
5
( ) = ( ) − [ ( )]
2 17
= −
5 30
360 − 289
=
900
71
( )=
900
b. ( )=∫ ( )
ℎ ( )
1
2 2x 2
( )= (2 + 3 ) = 3xy
5 2 0
2
= [(1 + 3 ) − (0 + 0)]
5
2
( ) = (1 + 3 )
5
( )= ( )
2 2
= (1 + 3 ) = ( +3 )
5 5
1
2 y 2 3y3
= 2 3
5 0
2 1 2 3
= 2 1 = 5 2
5
3
( )=
5
( )= ( )
2 2
= (1 + 3 ) = ( +3 )
5 5
97
1
2 y3 3y 4
=
5 3 4 0
2 1 3 2 4+9
= =
5 3 4 5 12
13
( )=
30
( )= ( ) − [ ( )]
13 3
= −
30 5
65 − 54
=
150
11
( )=
150
c. = ( − ) − = ( )− = ( )− ( ) ( )
2
( )= ( , )
5
1 1
2
= (2 + 3 )
5
0 0
1 1
2
= (2 +3 )
5
0 0
1
2 2 x 3 y 3x 2 y 2
= 3 2 dy
5 0
2 2y 3y 2
= 3 2 dy
5
98
1
2 2 y2 3y3
=
5 6 6 0
2 2 3 2
= =
5 6 6
1
( )=
3
Since we have
1 3 17
( ) = , ( ) = ( ) =
3 5 30
Therefore,
= ( )− ( ) ( )
1 3 17
= −
3 5 30
50 − 51
=
150
−1
( )= =
150
( )
d. = =
( ) ( )
( )= = ( )
71
=
900
( )= = ( )
11
=
150
−1
( )
Therefore, = = 150
( ) ( ) 71 11
900 150
99
7.5. Review Exercises
1. Find the mean, variance and standard deviation of the following probability
distributions.
1,0 x 1
a. f ( x)
0, otherwise
3x 2 ,0 x 1
b. f ( x)
0, otherwise
2. It is given that E(x) = 3, V(x) = 16, E(y) = 4, V(y) = 9 and that x and y are independent, find
100
Chapter Eight
101
The mean of a binomial distribution is E(X) =np and variance is V(X) =npq (S.d=sqrt (V(X))).
Example
A coin is tossed five times. (This is the same as a sample size of five). What is the
probability of obtaining exactly two heads in the five tosses?
Solution:
To arrive at the answer to the question the values are entered in the binomial formula.
(2 )= = 10 × × = 0.3125 31.25
Example
Solution:
4
(1 ℎ ) = (1) = (0.02)1 (0.98) = 4(0.02)(0.9412) = 0.0753
1
Solution:
102
More than one defective chip in a sample of four means two, three or four
defective chips. The probability of each may be calculated using the binomial
formula.
P(more than 1 defective chip) = P(2) or P(3) or P(4) = P(2) + P(3) + P(4)
In any trial or sample, the sum of the probabilities of the individual events always
equal one. In this problem: P(0) + P(1) + P(2) + P(3) + P(4) = 1
Exercise:
1. Suppose a coin is tossed 10 times. What is the probability of getting
a. Exactly 3 heads
b. At most 3 heads
c. At least 3 heads
d. More than 3 heads
e. No head
Find the average and variance of the number of heads.
2. The probability of a man kicking into the goal is 2/3. If a person kicks 5 times, what is
the probability of scoring
a. At least one goal.
b. At most 3 goals.
Find the average, variance and standard deviation of the number of goals.
Properties
1. The probability of success, p, is very small.
2. The experiment is performed indefinitely (n is very large).
3. The average number of events per unit of time ( ) is known.
103
Thus, the random variable X (number of successes) has a Poisson distribution with parameter ,
X~Poisson ( ) and the probability of getting x successes is given by
e x
P ( X x) , x 0,1,2,.... .
x!
If X is a Poisson random variable then E(X) = and V(X) =
Example:
In making switches, it has been determined by empirical studies that there is, on average,
one defect per switch. What is the probability of selecting a sample of five switches that
contains zero defects?
Solution:
There are two methods to solve this problem. The first method is to use the above
formula where x = 0, n = 5, and p = 1, therefore
np = 5 x 1 = 5.
(5) 0.00674
(0) = = = 0.00664 0 0.674 %
0! 1
The second and most widely used method is to use the Poisson tables that are published
in most statistics books. To use the tables, find the value of x in the leftmost column, then
find the value of np on the top row and read P(x) at the intersection of the two values.
Exercise:
1. On average a typist commits 3 errors per page. Find the probability that she will make
a. No mistake.
b. More than one mistake.
2. Customer arrive at a photocopying machine at an average rate of two every 10 minutes.
What is the probability that there will be
104
a. No arrivals during any period of ten minutes.
b. Exactly one arrival during these time period.
c. More than two arrivals during this time period.
1. The normal curve is symmetrical about the mean. This means that the number of
units in the data below the mean is the same as the number of units above the
mean. This means the mean and median have the same value.
2. The height of the normal curve is maximum at the mean value. Thus, the mean and
mode coincide. This means that the normal distribution has the same value of
mean, median and mode.
3. The curve declines as we go in either direction from the mean, but never touches
the base (X-axis) so that the tails of the curve on both sides extend indefinitely.
4. The corresponding deciles, quartiles and percentiles are equi-distant from the
mean.
The height of the normal curve Y at any value of the random variable X is given by
1 x 2
1 ( )
Y f ( x) e2
, x and write as X N ( , 2 )
2
Where µ is the mean of the distribution
105
is the standard deviation of the distribution
X
If X N ( , 2 ) , then Z is called the standard normal curve variate with mean
1
1 z2
zero and variance one and written as Z N (0,1) and f ( z ) e 2 , z
2
Therefore,
b b 1 X 2
1 (
) a b
P ( a X b) f ( x ) dx e2 dx P ( Z )
a a 2
b
f ( z)dz
a
The total area under the standard normal curve is one. Then the area to right and left
from the central point (µ=0) is 0.5 each.
Example: Find the area under the standard normal distribution which lies
a) (0 ≤ ≤ 1.96)
b) ( ≥ 1.96)
Solution:
A random variable X has a normal distribution with mean 80 and standard deviation 4.8.
What is the probability that it will take a value
106
X is normal with mean, =80, standard deviation, = 4.8
.
a) ( < 87.2) = <
87.2 − 80
= <
4.8
= ( < 1.5)
= 0.5 + 0.4332
= 0.9332
.
b) ( > 76.4) = >
76.4 − 80
= >
= ( > −0.75)
= 0.5 + 0.2734
= 0.7734
.
c) (82.2 < < 86) = < <
81.2 − 80 86 − 80
= < <
4.8 4.8
= (0.25 < < 1.25)
= (0 < < 1.25) − (0 < < 0.25)
= 0.3934 − 0.0987
= 0.2957
107
Exercise:
The IQ score of students is normally distributed with a mean of 120 and variance
400. What is the probability that a student will have an IQ?
a) Between 100 and 130.
b) Below 150.
c) Above 140.
108
6. A student is given 4 true or false questions. The student does not know the answer to any of the
questions. He tosses a coin. Each time he gets a head, he selects true. What is the probability
that he will get;
a. Only one correct answer.
b. At most 2 correct answers.
c. At least 3 correct answers.
d. All correct answers.
7. The number of cars pulling into a petrol pump is 3 cars in every 10 minutes. What is the
probability that exactly 2 cars will arrive in the next 10 minutes?
8. The price of one gallon bottle of milk is normally distributed with an average price of $2 and
standard deviation of 20 cents. A family stops at a booth to buy a gallon of milk. What is the
probability that they will pay;
a. More than $2.1
b. Less than $1.75
c. Between $1.85 and $ 2.15
d. Between $1.85 and $1.95
1. Construct a frequency distribution table for the following data by including class limit,
class boundaries, class mark, frequency, LCF and MCF. Then calculate the range and
stander deviation for constructed FD.
42 62 46 54 41 37 54 44 32 45
47 50 58 49 51 42 46 37 42 39
56 38 45 52 46 54 39 51 58 47
64 43 48 49 48 49 61 41 40 58
49 59 57 57 34 40 63 41 51 41
2. Suppose a train moves 100 km with a speed of 40 km per hour, then 150 km with a speed
of 50 km per hour and the next 135 km with a speed of 45 km per hour. Calculate the
average speed of the train.
3. In Adigrat University, 120 students were involved in the study, and based on an enquiry
on their age, it was known that 35% of them were 20 or less. The following frequency
distribution shows the age composition of the students under study.
Mid-age in years Number of persons
16 15
19 f1
22 f2
25 23
28 15
31 9
Find the mean, median and mode?
109
X 0 1 2 3 4 5 6 7
2 2
P(X=x) 0 k 2k 2k 3k K 2k 2k2+k
i. Find K.
ii. If p(x≤k)> ½; find the minimum value of k.
iii. Determine the distribution function of X.
iv. Find mean and variance.
5. The IQ score of students is normally distributed with a mean of 120 and variance 400.
What is the probability that a student will have an IQ?
a) Between 100 and 130.
b) Below 150.
c) Above 140.
d) Between 140 and 150.
6. On average a typist commits 3 errors per page. Find the probability that she will make
a. No mistake.
b. More than one mistake.
7. Customer arrive at a photocopying machine at an average rate of two every 10 minutes.
What is the probability that there will be
a. No arrivals during any period of ten minutes.
b. Exactly one arrival during these time period.
c) Calculate PX(x)
d) PY(y).
e) Find PX=2/Y=1.
9. From the following joint probability density function (4pt)
, 0<x<2; 0<y<1
110
a) Determine the value of A.
b) Find the E (X), E (Y).
10. Let (X; Y) be a two dimensional continuous random variable with joint probability
density function given by
2 + , 0 < < 1 0 < < 1
; ( , )=
0 , ℎ
Find the marginal probability density function for X and y
11. If [ ( )] = { [ ( )| ]} then,
a) E(Y)=E{E[Y|X]}
b) Var(Y)=E[Var(Y|X)]+Var(E[Y|X])
12. Graduated Statistics Aptitude Test (GSAT) scores are widely used by graduate school of
Engineering and Technology as an entrance requirement. Suppose that in one particular year, the
mean score for GSAT was 476 with s.d 107. Assuming that the FSAT scores are normally
distributed. Answer the following question.
i. What is the probability that randomly selected score falls between 476 and 650?
ii. What is the probability of receiving score greater than 750?
iii. What is the probability of receiving score 540 or less?
iv. What is the probability of receiving score between 440 and 330?
111