0% found this document useful (0 votes)
16 views88 pages

Math 231

Uploaded by

alia mohammed
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
16 views88 pages

Math 231

Uploaded by

alia mohammed
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 88

Introduction to

Probability and
Statistics
Lectures Notes

Dr. Rabab Sabry Dr. Dina Ahmed


Mathematics Department
Dr. Rabab Sabry Intro. Prob and Stat Dr. Dina Ahmed

Chapter 1: Nature of Probability and Statistics


Introduction
Statistics is used in almost all fields of human endeavor. In sports, for
example, a statistician may keep records of the number of yards a running back
gains during a foot- ball game, or the number of hits a baseball player gets in a
season. In other areas, such as public health, an administrator might be concerned
with the number of residents who contract a new strain of flu virus during a
certain year. In education, a researcher might want to know if new methods of
teaching are better than old ones. These are only a few examples of how
statistics can be used in various occupations.
Furthermore, statistics is used to analyze the results of surveys and as a tool in
scientific research to make decisions based on controlled experiments. Other uses
of statistics include operations research, quality control, estimation, and
prediction.

and draw conclusions from data.

There are several reasons why you should study statistics.

1. Like professional people, you must be able to read and understand the
various statistical studies performed in your fields. To have this
understanding, you must be knowledgeable about the vocabulary, symbols,
concepts, and statistical procedures used in these studies.
2. You may be called on to conduct research in your field, since statistical
procedures are basic to research. To accomplish this, you must be able to
design experiments; collect, organize, analyze, and summarize data; and
possibly make reliable predictions or forecasts for future use. You must also
be able to communicate the results of the study in your own words.

1
Dr. Rabab Sabry Intro. Prob and Stat Dr. Dina Ahmed

3. You can also use the knowledge gained from studying statistics to become
better consumers and citizens. For example, you can make intelligent
decisions about what products to purchase based on consumer studies, about
government spending based on utilization studies, and so on.
1.1 Descriptive and Inferential Statistics
To gain knowledge about seemingly haphazard situations, statisticians collect
information for variables, which describe the situation.

A variable is a characteristic or attribute that can assume different values.

Data are the values (measurements or observations) that the variables can assume.
Variables whose values are determined by chance are called random variables.
Example 1.1: Suppose that an insurance company studies its records over the past
several years and determines that, on average, 3 out of every 100 automobiles the
company insured were involved in accidents during a 1-year period. Although there is
no way to predict the specific automobiles that will be involved in an accident (random
occurrence), the company can adjust its rates accordingly, since the company knows the
general pattern over the long run. (That is, on average, 3% of the insured automobiles will
be involved in an accident each year).
A collection of data values forms a data set. Each value in the data set is called
a data value.
In statistics it is important to distinguish between a sample and a population.

A population consists of all subjects (human or otherwise) that are being


studied.
When data are collected from every subject in the population, it is called a census.
Example 1.2: Every 10 years the United States conducts a census. The primary purpose
of this census is to determine the apportionment of the seats in the House of
Representatives. The first census was conducted in 1790. As the United States grew, the
scope of the census also grew. Today the Census limits questions to populations, housing,
2
Dr. Rabab Sabry Intro. Prob and Stat Dr. Dina Ahmed

manufacturing, agriculture, and mortality. The Census is conducted by the Bureau of the
Census, which is part of the Department of commerce.
Most of the time, due to the expense, time, size of population, medical concerns, etc.,
it is not possible to use the entire population for a statistical study; therefore, researchers
u s e samples.
A sample is a group of subjects selected from a population.

If the subjects of a sample are properly selected, most of the time they should possess the
same or similar characteristics as the subjects in the population.
However, the information obtained from a statistical sample is said to be biased if the
results from the sample of a population are radically different from the results of a
census of the population. Also, a sample is said to be biased if it does not represent
the population from which it has been selected.
The body of knowledge called statistics is sometimes divided into two main areas,
depending on how data are used. The two areas are

1. Descriptive statistics
2. Inferential statistics

presentation of data.

In descriptive statistics the statistician tries to describe a situation. Consider the


national census conducted by the U.S. government every 10 years. Results of this
census give you the average age, income, and other characteristics of the U.S.
population. To obtain this information, the Census Bureau must have some means to
collect relevant data. Once data are collected, the bureau must organize and summarize
them. Finally, the bureau needs a means of presenting the data in some meaningful
form, such as charts, graphs, or tables.

3
Dr. Rabab Sabry Intro. Prob and Stat Dr. Dina Ahmed

estimations and hypothesis tests, determining relationships among


variables, and making predictions.

• Here, the statistician tries to make inferences from samples to populations.


Inferential statistics uses probability, i.e., the chance an event occurring.
You may be familiar with the concepts of probability through various forms
of gambling. If you play cards, dice, bingo, or lotteries, you win or lose
according to the laws of probability. Probability theory is also used in the
insurance industry and other areas.
• The area of inferential statistics called hypothesis testing is a decision making
process for evaluating claims about a population, based on information
obtained from samples.

Example1.3: A researcher may wish to know if a new drug will reduce the
number of heart attacks in men over age 70 years of age. For this study,
two groups of men over age 70 would be selected. One group would be
given the drug, and the other would be given a placebo (a substance with
no medical benefits or harm). Later, the number of heart at- tacks
occurring in each group of men would be counted, a statistical test would be run,
and a decision would be made about the effectiveness of the drug.
Example 1.4: A study conducted at Manatee Community College revealed that
students who attended class 95 to 100% of the time usually received an A in the
class. Students who attended class 80 to 90% of the time usually received a B or C
in the class. Students who attended class less than 80% of the time usually
received a D or an F or eventually withdrew from the class.
Based on this information, attendance and grades are related. The more you attend
class, the more

4
Dr. Rabab Sabry Intro. Prob and Stat Dr. Dina Ahmed

likely it is you will receive a higher grade. If you improve your attendance, your
grades will probably improve. Many factors affect your grade in a course. One factor
that you have considerable control over is attendance. You can increase your
opportunities for learning by attending class more often.

1. What are the variables under study?


2. What are the data in the study?
3. Are descriptive, inferential, or both types of statistics used?
4. What is the population under study?
5. Was a sample collected? If so, from where?
From the information given, comment on the relationship between the variables.
1.2 Variables and Types of Data
Variables can be classified as qualitative or quantitative.

1.2.1 Qualitative variables are variables that have distinct categories according
to some characteristic or attribute.

For example, if subjects are classified according to gender (male or female), then
the variable gender is qualitative. Other examples of qualitative variables are
religious preference and geographic locations.

1.2.2 Quantitative variables are variables that can be counted or measured.

For example, the variable age is numerical, and people can be ranked in order
according to the value of their ages. Other examples of quantitative variables are
heights, weights, and body temperatures.
Quantitative variables can be further classified into two groups: discrete and
continuous.
(i) Discrete variables can be assigned values such as 0, 1, 2, 3 and are said
to be countable. Examples of discrete variables are the number of
children in a family, the number of students in a classroom, and the
number of calls received by a call center each day for a month.
5
Dr. Rabab Sabry Intro. Prob and Stat Dr. Dina Ahmed

(ii) Continuous variables, by comparison, can assume an infinite number


of values in an interval between any two specific values. Temperature,
for example, is a continuous variable, since the variable can assume an
infinite number of values between any two given temperatures.

specific values. They often include fractions and decimals.

Example 1.4: Classify each variable as a discrete or continuous variable.

a. The number of hours during a week that children ages 12 to 15 reported that
they watched television.
b. The number of touchdowns a quarterback scored each year in his college
football career.
c. The amount of money a person earns per week working at a fast-food restaurant.
d. The weights of the football players on the teams that play in the NFL this year.
Solution:
a. Continuous, since the variable time is measured.
b. Discrete, since the number of touchdowns is counted.
c. Discrete, since the smallest value that money can assume is in cents.
d. Continuous, since the variable weight is measured.
1.3 Types of Data
In statistical analysis, there are different kinds of data, whose values are closely
related to the nature of the variables. There are two main types of data that are
mostly observed in practical applications which further are of different types:

• Categorical Data: It is also described as qualitative data. This data arise


when the observations fall into separate distinct categories. Such data are
inherently discrete, i.e., there are finite number of possible categories into
6
Dr. Rabab Sabry Intro. Prob and Stat Dr. Dina Ahmed

which each observation may fall. Categorical data can further be classified
as
1- Nominal Data: It is a variable whose measurement indicates a category or
characteristic, more than an exact mathematical measure. In nominal
variables, there is not a clear order among categories, and so a nominal
variable is just a label of a characteristic of the observational unit without
a rating scale (order). For example gender, eye colour, religion, brand.
2- Ordinal Data: It is a variable whose measurement indicates a clear
ordered category or characteristic. In ordinal variables, there is a clear
order among categories. So an ordinal variable points out a characteristic
of an observational unit that can be ranked regarding a rating scale. For
example, a student’s grades such as (A, B, C), clothing size (small,
medium, large).
• Numerical Data: This kind of data, also known as quantitative data, arise
when the observations are counts or measurements. For example, the
quantities such as number of students in the class, weight of an individual,
temperature at a particular place, etc. The numerical data can further be of
two types.
1- Discrete Data: The domain of discrete data is integers. For example,
number of houses in a society, number of chapters in a book, etc.
2- Continuous Data: The domain of a continuous variable or some interval
on real line. Continuous domains are lattices and are clearly well ordered.
measurement scales
In addition to being classified as qualitative or quantitative,
variables can be clas sified by how they are categorized, counted,
or measured.
For example, can the data be organized into specific categories,
such as area of residence (rural, suburban, or urban)? Can the data
values be ranked, such as first place, second place, etc.? Or are the
7
Dr. Rabab Sabry Intro. Prob and Stat Dr. Dina Ahmed

values obtained from measurement, such as heights, IQs, or


temperature? This type of classifi- cation—i.e., how variables are
categorized, counted, or measured—uses measurement scales,
and four common types of scales are used: nominal, ordinal,
interval, and ratio.
The first level of measurement is called the nominal level of measurement. A

lapping) categories in which no order or ranking can be imposed


on the data.

sample of college instructors classified according to subject taught (e.g., English,


history, psychology, or mathematics) is an example of nominal-level
measurement. Classifying survey subjects as male or female is another example
of nominal-level measurement. No ranking or order can be placed on the data.
Classifying residents according to zip codes is also an example of the nominal
level of measurement. Even though numbers are assigned as zip codes, there is
no meaningful order or ranking. Other examples of nominal-level data are
political party (Democratic, Republican, independent, etc.), religion
(Christianity, Judaism, Islam, etc.), and marital status (single, married, divorced,
widowed, separated).

2- The ordinal level of measurement classifies data into categories


that can be ranked; however, precise differences between the ranks
do not exist.

The next level of measurement is called the ordinal level. Data measured at
this level can be placed into categories, and these categories can be ordered,
or ranked. For example, from student evaluations, guest speakers might be
ranked as superior, average, or poor. Floats in a homecoming parade might
be ranked as first place, second place, etc. Note that precise measurement of
differences in the ordinal level of measurement does not exist. For instance,
when people are classified according to their build (small, medium, or large),
a large variation exists among the individuals in each class.
8
Dr. Rabab Sabry Intro. Prob and Stat Dr. Dina Ahmed

Other examples of ordinal data are letter grades (A, B, C, D, F).

3. The interval level of measurement ranks data, and precise


differences between units of measure do exist; however, there is
no meaningful zero.

The third level of measurement is called the interval level. This level differs
from the ordinal level in that precise differences do exist between units. For
example, many standardized psychological tests yield values measured on
an interval scale. IQ is an ex- ample of such a variable. There is a meaningful
difference of 1 point between an IQ of 109 and an IQ of 110. Temperature is
another example of interval measurement, since there is a meaningful
difference of 1°F between each unit, such as 72 and 73°F. One property is
lacking in the interval scale: There is no true zero. For example, IQ tests do
not measure people who have no intelligence. For temperature, 0°F does not
mean no heat at all.
4. The ratio level of measurement possesses all the characteristics of
interval measurement, and there exists a true zero. In addition, true
ratios exist when the same variable is measured on two different
members of the population.

The final level of measurement is called the ratio level. Examples of ratio
scales are those used to measure height, weight, area, and number of phone
calls received. Ratio scales have differences between units (1 inch, 1 pound,
etc.) and a true zero. In addition, the ratio scale contains a true ratio between
values. For example, if one person can lift 200 pounds and another can lift
100 pounds, then the ratio between them is 2 to 1. Put another way, the first
person can lift twice as much as the second person.

ABLE 1–2 Examples of Measurement Scales


9
Dr. Rabab Sabry Intro. Prob and Stat Dr. Dina Ahmed

Nominal-level Ordinal-level Interval-level Ratio-level


data data data data
Zip code Grade (A, SAT Hei
Gender (male, B, C, D, score ght
female) Eye F) IQ Wei
color (blue, Judging (first Temperature ght
brown, place, second Tim
green, hazel) place, etc.) e
Political Rating scale Sala
affiliation (poor, ry
Religious good, Age
affiliation excellent)
Major field Ranking of
(mathematics, tennis
computers, players
etc.)
Nationality

EXAMPLE Measurement Levels


What level of measurement would be used to measure each variable?
a. The ages of authors who wrote the hardback versions of
the top 25 fiction books sold during a specific week
b. The colors of baseball hats sold in a store for a specific year
c. The highest temperature for each day of a specific month
d. The ratings of bands that played in the homecoming parade at a
college
Solution

a. Ratio
b. Nominal
c. Interval
d. Ordinal

10
Dr. Rabab Sabry Intro. Prob and Stat Dr. Dina Ahmed

Exercises

(1) Determine whether each statement is true or false. If the statement is


false, explain why.

1. Probability is used as a basis for inferential statistics.

2. When the sample does not represent the population, it is called a biased sample.

3. The difference between a sampling measure and a population measure is called


a non-sampling error.

4. When the population of college professors is divided into groups according to


their rank (instructor, assistant professor, etc.) and then several are selected from
each group to make up a sample, the sample is called a cluster sample.

5. The variable temperature is an example of a quantitative variable.

6. The height of basketball players is considered a continuous variable.

7. The boundary of a value such as 6 inches would be 5.9–6.1 inches.

(2) Select the best answer.

8. The number of ads on a one-hour television show is what type of data?


a. Nominal b. Qualitative
c. Discrete d. Continuous

9. What are the boundaries of 25.6 ounces?


a. 25–26 ounces b. 25.55–25.65 ounces
c. 25.5–25.7 ounces d. 20–39 ounces
10. A study that involves no researcher intervention is called
a. An experimental study. b. A noninvolvement study.
c. An observational study. d. A quasi-experimental study.
10. A variable that interferes with other variables in the study is

11
Dr. Rabab Sabry Intro. Prob and Stat Dr. Dina Ahmed

called
a. A confounding variable. b. An explanatory variable.
c. An outcome variable. d. An interfering variable.

12
Dr. Rabab Sabry Intro. Prob and Stat Dr. Dina Ahmed

Chapter 2 Frequency Distribution and Graphs


Introduction
When conducting a statistical study, the researcher must gather data for the
particular variable under study. To describe situations, draw conclusions, or make
inferences about events, the re- searcher must organize the data in some meaningful
way. The most convenient method of organizing data is to construct a frequency
distribution.
After organizing the data, the researcher must present them so they can be understood
by those who will benefit from reading the study. The most useful method of
presenting the data is by constructing statistical charts and graphs. There are many
different types of charts and graphs, and each one has a specific purpose.

2.1 Organizing Data


Example 2.1: Suppose a researcher wished to do a study on the ages of the 50 wealthiest people
in the world. The researcher first would have to get the data on the ages of the people. In this case,
these ages are listed in Forbes

45 46 64 57 85
92 51 71 54 48
Class limits Tally Frequency
27 66 76 55 69
f
54 44 54 75 46
27–35 / 1
61 68 78 61 83
36–44 /// 3
88 45 89 67 56
45–53 //// //// 9
81 58 55 62 38
54–62 //// //// //// 15
55 56 64 81 38
63–71 //// //// 10
49 68 91 56 68
72–80 /// 3
46 47 83 71 62
81–89 //// // 7
90–98 // 2
50

and
frequencies.

13
Dr. Rabab Sabry Intro. Prob and Stat Dr. Dina Ahmed

Note: Where f = frequency of the class and n = total number of values.

∑𝑓 = 𝑛

Two types of frequency distributions that are most often used are the categorical
frequency distribution and the grouped frequency distribution. The procedures
for constructing these distributions are shown now.

2.1.1 Categorical Frequency Distributions


𝑓
(i) Relative Frequency ( 𝑛 )

• The relative frequency of an event is defined as the number of times that the
event occurs during experimental trials, divided by the total number of trials
conducted.

• The relative frequency is not a theoretical quantity, but an experimental one.


We have to repeat an experiment a number of times and count how many times
the outcome of the trial is in the event set. Because it is experimental, it is
possible to get a different relative frequency every time that we repeat an
experiment.

𝑓
• ∑ =1
𝑛
𝑓
(ii) Percentage Frequency ( 𝑛 × 100% )

• Percentage frequency is a frequency distribution in which the individual class


frequencies are expressed as a percentage of the total frequency equated to 100.

• A percentage frequency distribution is a display of data that specifies the percentage of


observations that exist for each data point or grouping of data points. It is a particularly
useful method of expressing the relative frequency of survey responses and other data.

• Many times, percentage frequency distributions are displayed as tables or as bar graphs
or pie charts.

Example 2.2: Twenty-five army inductees were given a blood test to determine their blood type.

14
Dr. Rabab Sabry Intro. Prob and Stat Dr. Dina Ahmed

The data set is


A B B AB O
O O B AB B
B B O A O
A O O O AB
AB A O B A
Construct a frequency distribution for the data.
Solution:
Since the data are categorical, discrete classes can be used. There are four blood types: A, B, O,
and AB. These types will be used as the classes for the distribution.
The procedure for constructing a frequency distribution for categorical data is given next.

Step 1: Make a table as shown.

Class Tally Frequency Relative Percentage


Frequency
A
B
O
A
B
Step 2: Tally the data.

Step 3: Count the tallies.


𝑓
Step 4: Find the relative frequency by using formula 𝑛

Step 5: Find the percentage of values in each class by using the formula

𝑓
× 100%
𝑛

15
Dr. Rabab Sabry Intro. Prob and Stat Dr. Dina Ahmed

Relative
Class Tally Frequency Frequency Percentage
A //// 5 0.2 20
B //// // 7 0.28 28
O //// //// 9 0.36 36
AB //// 4 0.16 16
Total 25 1 100%

2.1.2 Grouped Frequency Distributions


When the range of the data is large, the data must be grouped into classes that are more than one
unit in width, in what is called a grouped frequency distribution.

Constructing a Grouped Frequency Distribution

Step 1: Determine the classes.


➢ Find the highest and lowest values.
➢ Find the range.
𝑅 = 𝑙𝑎𝑟𝑔𝑒𝑠𝑡 𝑣𝑎𝑙𝑢𝑒 − 𝑠𝑚𝑎𝑙𝑙𝑒𝑠𝑡 𝑣𝑎𝑙𝑢𝑒
➢ Select the number of classes (usually between 5 and 20).
𝑘 = 1 + 3.322 log10 𝑛
➢ Find the width by dividing the range by the number of classes and rounding up.
𝑹
𝒘=
𝒌
➢ Select a starting point (usually the lowest value or any convenient number less than the
lowest value); add the width to get the lower limits.
➢ Find the upper class limits.
➢ Find the boundaries.

Step 2: Tally the data.

Step 3: Find the numerical frequencies from the tallies, and find the frequencies.

Example 2.3: These data represent the record high temperatures in degrees Fahrenheit (°F) for each
of the 50 states. Construct a grouped frequency distribution for the data, using 7 classes.
112 100 12 12 13 11 10 11 10 112

16
Dr. Rabab Sabry Intro. Prob and Stat Dr. Dina Ahmed

7 0 4 8 5 0 9
110 118 11 11 11 12 11 11 10 109
7 6 8 2 4 4 5
107 112 11 11 11 11 11 12 10 110
4 5 8 7 8 2 6
116 108 11 12 11 12 11 11 10 111
0 1 3 0 9 1 4
120 113 12 11 10 11 11 11 11 114
0 7 5 0 8 2 4
Solution:
The procedure for constructing a grouped frequency distribution for numerical data follows.

Step 1 Determine the classes.

➢ Find the highest value and lowest value:

H = 134 and L = 100.

➢ Find the range:

R = highest value − lowest value = H − L, so


R = 134 − 100 = 34
➢ Find the number of classes:
𝒌 = 𝟏 + 𝟑. 𝟑𝟐𝟐 𝐥𝐨𝐠 𝟏𝟎 𝟓𝟎 = 𝟔. 𝟔𝟒 ≅ 𝟕
➢ Find the width classes
𝟑𝟒
𝒘= = 𝟒. 𝟗 ≅ 𝟓
𝟕

(Round the answer up to the nearest whole number if there is a remainder)

➢ Select a starting point for the lowest class limit. This can be the smallest data
value or any convenient number less than the smallest data value. In this case,
100 is used. Add the width to the lowest score taken as the starting point to get
the lower limit of the next class.

100, 105, 110, etc.

Subtract one unit from the lower limit of the second class to get the upper
limit of the first class. Then add the width to each upper limit to get all

17
Dr. Rabab Sabry Intro. Prob and Stat Dr. Dina Ahmed

the up- per limits.

105 − 1 = 104

➢ Find the class boundaries by subtracting 0.5 from each lower class limit and
adding 0.5 to each upper class limit:

99.5–104.5, 104.5–109.5, etc.

Step 2: Tally the data.

Step 3: Find the numerical frequencies from the tallies. The completed frequency distribution is
Class Class
limits boundaries Tally Frequency
100–104 99.5–104.5 // 2
105–109 104.5–109.5 //// /// 8
110–114 109.5–114.5 //// //// //// /// 18
115–119 114.5–119.5 //// //// /// 13
120–124 119.5–124.5 //// // 7
125–129 124.5–129.5 / 1
130–134 129.5–134.5 / 1
Total 50

To construct a frequency distribution, follow these rules:

1. There should be between 5 and 20 classes. Although there is no hard-and-


fast rule for the number of classes contained in a frequency distribution, it
is importance to have enough classes to present a clear description of the
collected data.

2. It is preferable but not absolutely necessary that the class width be an odd
number. This ensures that the midpoint of each class has the same place value
as the data. The class midpoint Xm is obtained by adding the lower and upper
boundaries and dividing by 2, or adding the lower and upper limits and
dividing by 2:
lower boundary + upper boundary
Xm =
2
3. The classes must be mutually exclusive.

18
Dr. Rabab Sabry Intro. Prob and Stat Dr. Dina Ahmed

4. The classes must be continuous.

5. The classes must be exhaustive. There should be enough classes to


accommodate all the data.
6. The classes must be equal in width.

cumulative

Example 2.4: The cumulative frequency distribution for the data in Example 2.3 is
as follows:
Cumulative frequency
Less than 99.5 0
Less than 104.5 2
Less than 109.5 10
Less than 114.5 28
Less than 119.5 41
Less than 124.5 48
Less than 129.5 49
Less than 134.5 50

Cumulative frequencies are used to show how many data values are accumulated up to
and including a specific class. In Example 2.3, of the total record high temperatures 28
are less than or equal to 114°F. Forty-eight of the total record high temperatures are less
than or equal to 124°F.
➢ There are two types of cumulative frequency.
(1) Less than type
(2) Greater than type

2.2 Graphical Representation Data


Graphical Representation is a way of analyzing numerical data. It exhibits the relation
between data, ideas, information and concepts in a diagram. It is easy to understand and it is
one of the most important learning strategies. It always depends on the type of information in
a particular domain. There are different types of graphical representation.
Generally, the frequency distribution is represented in four methods, namely

• Histogram

19
Dr. Rabab Sabry Intro. Prob and Stat Dr. Dina Ahmed

• Bar graph

• Pie diagram

• Frequency Polygon

• Cumulative or ogive frequency graph

2.2.1 Histogram

The histogram is a graph that displays the data by using contiguous vertical
bars (unless the frequency of a class is 0) of various heights to represent the
frequencies of the classes.
➢ Frequencies are along the vertical axis and the classes are along the horizontal axis.

➢ The frequencies within each interval of a histogram are represented by a rectangle, the
size of the interval being the base and the frequency of that interval the height.

➢ The area of each rectangle in a histogram corresponds to the frequency within a given
interval, while the total area of a histogram corresponds to the total frequency (n) of
the distribution.

➢ From Example 2.3:

2.2.2 Bar Graphs


Bar Graphs, similar to histograms, are often useful in conveying information about categorical
data where the horizontal scale represents some non-numerical attribute. In a bar graph, the
bars are no overlapping rectangles of equal width and they are equally spaced. The bars can
be vertical or horizontal. The length of a bar represents the quantity we wish to compare.
➢ From Example 2.2:
20
Dr. Rabab Sabry Intro. Prob and Stat Dr. Dina Ahmed

Blood Types

AB

0 1 2 3 4 5 6 7 8

2.2.3 Circle Graph


Another type of graph used to represent data is the circle graph. A circle graph or pie chart,
consists of a circular region partitioned into disjoint sections, with each section representing a
part or percentage of a whole. To construct a pie chart we first convert the frequency into a
percentage frequency. Then, since a complete circle corresponds to 360 degrees, we obtain the
central angles of the various sectors by multiplying the percentages by 3.6. We illustrate this
method in the next example.
➢ From Example 2.2:

BLOOD TYPES
A B O AB

11%

13%
46%

30%

2.2.4 Frequency Polygon


The frequency polygon is a graph that displays the data by using lines that connect points
plotted for the frequencies at the midpoints of the classes. The frequencies are represented
by the heights of the points.

➢ Using the midpoints for the x values and the frequencies as the y values, plot the points.
21
Dr. Rabab Sabry Intro. Prob and Stat Dr. Dina Ahmed

➢ Connect adjacent points with line segments. Draw a line back to the x axis at the
beginning and end of the graph, at the same distance that the previous and next
midpoints would be located.

➢ The frequency polygon and the histogram are two different ways to represent the same
data set.
➢ From Example 2.3:
Find the midpoints of each class. Recall that midpoints are found by adding the upper and
lower boundaries and dividing by 2:
99.5 + 104.5
= 102
2
Class Midpoints Frequency
boundaries
99.5–104.5 102 2
104.5–109.5 107 8
109.5–114.5 112 18
114.5–119.5 117 13
119.5–124.5 122 7
124.5–129.5 127 1
129.5–134.5 132 1

2.2.5 Cumulative or Ogive frequency graph

The Ogive is defined as the frequency distribution graph of a series. The Ogive is a graph of a
cumulative distribution, which explains data values on the horizontal plane axis and either the
22
Dr. Rabab Sabry Intro. Prob and Stat Dr. Dina Ahmed

cumulative relative frequencies, the cumulative frequencies or cumulative percent frequencies


on the vertical axis.

➢ Cumulative frequency graphs are used to visually represent how many values are below
a certain upper class boundary.

➢ From Example 2.4:


Cumulative
frequency
Less than 99.5 0
Less than 104.5 2
Less than 109.5 10
Less than 114.5 28
Less than 119.5 41
Less than 124.5 48
Less than 129.5 49
Less than 134.5 50

To find out how many record high temperatures are less than 14.5°F?

Locate 114.5°F on the x axis, draw a vertical line up until it intersects the graph,
and then draw a horizontal line at that point to the y axis. The y axis value is 28.

Exercises

(1) Determine whether each statement is true or false. If the statement is


false, explain why.
23
Dr. Rabab Sabry Intro. Prob and Stat Dr. Dina Ahmed

1. In the construction of a frequency distribution, it is a good idea to have overlapping class


limits, such as 10–20, 20–30, 30–40.

2. Bar graphs can be drawn by using vertical or horizontal bars.

3. It is not important to keep the width of each class the same in a frequency distribution.

4. Frequency distributions can aid the researcher in drawing charts and graphs.

5. The type of graph used to represent data is determined by the type of data collected and by
the researcher’s purpose.

6. In construction of a frequency polygon, the class limits are used for the x axis.

7. Data collected over a period of time can be graphed by using a pie graph.

(2) Select the best answer.

8. What is another name for the ogive?


a. Histogram
b. Frequency polygon
c. Cumulative frequency graph
d. Pareto chart

9. What are the boundaries for 8.6–8.8?


a. 8–9
b. 8.5–8.9
c. 8.55–8.85
d. 8.65–8.75

10. What graph should be used to show the relationship between the parts and the whole?
a. Histogram
b. Pie graph
c. Pareto chart
d. Ogive

11. Except for rounding errors, relative frequencies should add up to what sum?
a. 0
b. 1

24
Dr. Rabab Sabry Intro. Prob and Stat Dr. Dina Ahmed

c. 50

12. Housing Arrangements A questionnaire on housing arrangements showed this


information obtained from 25 respondents. Construct a frequency distribution for
the data (H = house, A = apartment, M = mobile home, C = condominium).

H C H M H A C A M
C M C A M A C C M
C C H A H H M

Construct a pie graph for the data.

13. Items Purchased at a Convenience Store When 30 randomly selected


customers left a convenience store, each was asked the number of items he
or she purchased. Construct an ungrouped frequency distribution for the
data.
2 9 4 3 6
6 2 8 6 5
7 5 3 8 6
6 2 3 2 4
6 9 9 8 9
4 2 1 7 4
Construct a histogram, a frequency polygon, and an ogive for the data.

25
Dr. Rabab Sabry Intro. Prob and Stat Dr. Dina Ahmed

Chapter 3: Data Description


Introduction
Statisticians use samples taken from populations; however, when populations are small,
it is not necessary to use samples since the entire population can be used to gain
information. For example, suppose an insurance manager wanted to know the average
weekly sales of all the company’s representatives. If the company employed a large
number of sales people, say, nationwide, he would have to use a sample and make an
inference to the entire sales force. But if the company had only a few sales people, say,
only 87 agents, he would be able to use all representatives’ sales for a randomly chosen
week and thus use the entire population.
Measures found by using all the data values in the population are called parameters.
Measures obtained by using the data values from samples are called statistics; hence, the
average of the sales from a sample of representatives is a statistic, and the average of sales
obtained from the entire population is a parameter.

Since the parameters are unknown, statistics (known in general) are used to approximate
(estimate) parameters.
We are now able to define a number of important summarized measures, starting with the
arithmetic average or mean.
Understanding descriptive statistics, their measures of center and their variability, helps form
the foundation of statistical analysis. Descriptive statistics tell us how frequently an
observation occurs, what is considered average, and how far data in our sample deviate from
being average. With these statistics, we are able to provide a summary of characteristics from
both large and small datasets. Measures of central tendency and variability provide valuable
information on their own, and form the cornerstone of the quantitative structures.

3.1 Measures of Central Tendency


Central tendency is a concept closely related to what is expected in a sample, its most frequent
value, or their average behaviour. A central tendency measure give us an idea of how a data

26
Dr. Rabab Sabry Intro. Prob and Stat Dr. Dina Ahmed

sample has been grouped around a value. Some important central tendency measures are the
Mean, Median and Mode.

3.1.1 Mean

The mean is the arithmetic average of all of the data points. It is also the most common measure
of central tendency and is the most widely understood. In fact, when most people think of
average, they are imagining the mean.

(1) Raw Data

➢ The population mean, denoted by μ, is calculated by using all the values in the
population. The population mean is a parameter.
∑𝑁
𝑖=1 𝑋𝑖
𝜇=
𝑁

where N represents the total number of values in the population and 𝑋1 , 𝑋2 , … … . , 𝑋𝑁 are the
population values.

➢ The sample mean, denoted by 𝑋̅, is calculated by using sample data. The sample mean
is a statistic.

∑𝑛𝑖=1 𝑥𝑖
𝑥̅ =
𝑛
where n represents the total number of values in the sample and 𝑥1 , 𝑥2 , … … . , 𝑥𝑛 are the
sample values.

Example 3.1: The number of confirmed flu cases for a 9-year period is shown.
4 46 98 115 88 44 73 48 62

(i) Find the mean for this population?


4 + 46 + 98 + 115 + 88 + 44 + 73 + 48 + 62
𝜇= = 64.2.
9

(ii) Find the mean for this sample, 98, 115, 88 and 44?

98 + 115 + 88 + 44
𝑥̅ = = 86.25.
4

27
Dr. Rabab Sabry Intro. Prob and Stat Dr. Dina Ahmed

(2) Ungrouped data


∑𝑛𝑖=1 𝑓 𝑥𝑖
𝑥̅ =
𝑛
Where 𝑓 denotes the frequency.
(3) Grouped data
∑𝑛𝑖=1 𝑓 𝑚𝑖
𝑥̅ =
𝑛
Where 𝑓 denotes the frequency and 𝑚𝑖 is the interval midpoint.
Notes:
1- The mean is found by using all the values of the data.
2- The mean for the data set is unique and not necessarily one of the data values.
3- The mean cannot be computed for the data in a frequency distribution that has an open-ended
class.
4- The mean is affected by extremely high or low values, called outliers, and may not be the
appropriate average to use in these situations.
5- The mean can only be found for quantitative variables.

Example 3.2: The frequency distribution shows the salaries (in millions) for a specific year
of the top 25 CEOs in the United States. Find the mean.
Class Frequency
boundaries

15.5–20.5 13
20.5–25.5 6
25.5–30.5 4
30.5–35.5 1
35.5–40.5 1
Total 25

Solution:
Construct this table

28
Dr. Rabab Sabry Intro. Prob and Stat Dr. Dina Ahmed

Class Frequency Midpoint f · 𝑚𝑖


𝑚𝑖

15.5–20.5 13 18 234
20.5–25.5 6 23 138
25.5–30.5 4 28 112
30.5–35.5 1 33 33
35.5–40.5 1 38 38
n = 25 ∑ 𝑓. 𝑚𝑖

= 555

∑𝑛𝑖=1 𝑓 𝑚𝑖 555
𝑥̅ = = = 22.2.
𝑛 25
3.1.2 Median
The median is the halfway point in a data set. Before you can find this point, the data must be
arranged in ascending or increasing order. When the data set is ordered, it is called a data array. The
median either will be a specific value in the data set or will fall between two values.
➢ To find the median is

1- Arrange the data values in ascending order.

2- Determine the number of values in the data set.


𝑛+1
3- If n is odd, there will be a unique median, the( )th number from either end in the
2

ordered sequence.
4- If n is even there is strictly no middle observation, but the median is defined by
𝑛 𝑛
convention as the average of the two middle observations, ( 2 ), ( 2 + 1) from either

end.

Example 3.3: The number of tornadoes that have occurred in the United States
over an 8-year period is as follows. Find the median.

684,764, 656, 702, 856, 1133, 1132, 1303

29
Dr. Rabab Sabry Intro. Prob and Stat Dr. Dina Ahmed

Solution:
Arrange the data values in ascending order.

656, 684, 702, 764, 856, 1132, 1133, 1303


The rank of the middle value (median) is
𝑛 8 𝑛 8
= 2 = 4, 2 + 1 = 2 + 1 = 5
2

Then, the median is


𝑛 𝑛
𝑥 (2) + 𝑥 (2 + 1) 764 + 856
𝑀𝑒𝑑𝑖𝑎𝑛 = = = 810.
2 2

The median of the grouped data can be calculated using the following formula;
𝑛⁄ − (∑ 𝑓)
𝑀𝑒𝑑𝑖𝑎𝑛 = 𝐿 + ( 2
𝑚−1
) ℎ,
𝑓𝑚
where
L: Lower class boundary of the median class.
n: Sum of all frequencies.
𝑓𝑚 : The frequency of the median class.
(∑ 𝑓)𝑚−1 : Sum of frequency before the median class.
h: the size of the class interval.
Example 3.3: A mobile phone company examines the ages of 150 customers to start special
plans for them. Consider frequency table
Age(years) 10-19 20-29 30-39 40-49 50-59 60-69
Frequency 14 40 28 27 24 17
Find the Median.
Solution:
𝑛
Since n=150, 2 = 75
Age True class Frequency Cumulative
Frequency
10-19 9.5-19.5 14 14

20-29 19.5-29.5 40 54

30-39 29.5-39.5 28 82

40-49 39.5-49.5 27 109

30
Dr. Rabab Sabry Intro. Prob and Stat Dr. Dina Ahmed

50-59 49.5-59.5 24 133

60-69 59.5-69.5 17 150

Where, 𝐿 = 29.5, (∑ 𝑓)𝑚−1 = 54, 𝑓𝑚 = 28 𝑎𝑛𝑑 ℎ = 10,


75 − 54
𝑀𝑒𝑑𝑖𝑎𝑛 = 29.5 + ( ) × 10 = 37.
28
Notes:
1. The median is used to find the center or middle value of a data set.
2. The median is used when it is necessary to find out whether the data values fall into the
upper half or lower half of the distribution.
3. The median is used for an open-ended distribution.
4. The median is affected less than the mean by extremely high or extremely low values.
5. In large data sets, the median requires more work to calculate than the mean
and is not much use in the elaborate statistical techniques (it is still useful as a descriptive
measure for skewed distributions).

3.1.3 Mode
The mode is defined as the observation in the sample which occurs most frequently if
there is such an observation.

A data set that has only one value that occurs with the greatest frequency is said to be
unimodal. If a data set has two values that occur with the same greatest frequency, both values
are considered to be the mode and the data set is said to be bimodal. If a data set has more than
two values that occur with the same greatest frequency, each value is used as the mode, and
the data set is said to be multimodal. When no data value occurs more than once, the data set
is said to have no mode. Note: Do not say that the mode is zero. That would be incorrect,
because in some data, such as temperature, zero can be an actual value. A data set can have
more than one mode or no mode at all.

Example 3.4: The data show the number of public libraries in a sample of eight states. Find the
mode.

21, 77, 77, 101, 114, 159, 311, 382

Solution:
31
Dr. Rabab Sabry Intro. Prob and Stat Dr. Dina Ahmed

The mode is 77.


Example 3.5: The data show the number of licensed nuclear reactors in the United States
for a recent 15-year period. Find the mode.
104 10 104 10 104
107 10 109 10 110
109 11 112 11 109

Solution:
Since the values 104 and 109 both occur 5 times, the modes are 104 and 109. The data
set is said to be bimodal.
The mode of the grouped data can be calculated using the following formula;
∆1
𝑚𝑜𝑑𝑒 = 𝐿 + ( )ℎ
∆1 + ∆2
Where,
L: lower class boundary of model class.
∆𝟏 : excess of modal frequency over frequency of next lower class.
∆𝟐 : excess of model frequency over frequency of next higher class.
h: size of modal class interval.
Example 3.6: Calculate the mode for the data in Example 3.3.
Solution:
The modal class is the second class 19.5-29.5, from which we obtain
𝐿 = 19.5, 𝑓 = 40, 𝑓1 = 14, 𝑓2 = 28 𝑎𝑛𝑑 ℎ = 10
∆𝟏 = 𝑓 − 𝑓1 = 40 − 14 = 26
∆𝟐 = 𝑓 − 𝑓2 = 40 − 28 = 12
Then,
26
𝑚𝑜𝑑𝑒 = 19.5 + ( ) × 10 = 26.34.
26 + 12
The mode is not used widely in analytical statistics, other than as a descriptive measure, mainly
because of the ambiguity in its definition, as the fluctuations of small frequencies are apt to
produce spurious modes.
Notes:
1. The mode is used when the most typical case is desired.
2. The mode is the easiest average to compute.
32
Dr. Rabab Sabry Intro. Prob and Stat Dr. Dina Ahmed

3. The mode can be used when the data are nominal or categorical, such as religious preference,
gender, or political affiliation.
4. The mode is not always unique. A data set can have more than one mode, or the mode may
not exist for a data set.

3.2 Distribution Shapes


Frequency distributions can assume many shapes. The three most important shapes are
positively skewed, symmetric, and negatively skewed. Figure 3–1 shows histograms of each.
In a positively skewed or right-skewed distribution, the majority of the data values fall to the
left of the mean and cluster at the lower end of the distribution; the “tail” is to the right. Also, the
mean is to the right of the median, and the mode is to the left of the median.
In a symmetric distribution, the data values are evenly distributed on both sides of the mean. In
addition, when the distribution is unimodal, the mean, median, and mode are the same and are at the
center of the distribution. Examples of symmetric distributions are IQ scores and heights of adult
males.

When the majority of the data values fall to the right of the mean and cluster at the upper
end of the distribution, with the tail to the left, the distribution is said to be negatively skewed
or left-skewed. Also, the mean is to the left of the median, and the mode is to the right of the
median. As an example, a negatively skewed distribution results if the majority of students score
very high on an instructor’s examination. These scores will tend to cluster to the right of the
distribution.
When a distribution is extremely skewed, the value of the mean will be pulled toward the tail, but
the majority of the data values will be greater than the mean or less than the mean (depending on
which way the data are skewed); hence, the median rather than the mean is a more appropriate
measure of central tendency. An extremely skewed distribution can also affect other statistics.

33
Dr. Rabab Sabry Intro. Prob and Stat Dr. Dina Ahmed

3.3 Measures of Variation


Measures of central tendency, as introduced earlier, give us an idea about the location where
most of the data is concentrated. However, two different data sets may have the same value
for the measure of central tendency, say the same arithmetic means, but they may have
different concentrations around the mean. In this case, the location measures may not be
adequate enough to describe the distribution of the data. The concentration or dispersion of
observations around any particular value is another property which characterizes the data and
its distribution. We now introduce statistical methods which describe the variability or
dispersion of data.
3.3.1 Range
The simplest measure of variability in a sample is the range, which is the difference between
the largest and smallest sample values.

𝑅 = ℎ𝑖𝑔ℎ𝑒𝑠𝑡 𝑣𝑎𝑙𝑢𝑒 − 𝑙𝑜𝑤𝑒𝑠𝑡 𝑣𝑎𝑙𝑢𝑒.

Example 3.7: Calculate the Range for the data in Example 3.4.
Solution:
𝑅 = 382 − 21 = 361.
The range for these data is quite large since it depends on the highest data value and the lowest data
value. To have a more meaningful statistic to measure the variability, statisticians use measures called
the variance and standard deviation.
3.3.2 Variance
Variability refers to the spread of a sample, its range, or distribution. A variability measure
gives us an idea of how a data sample is spread around a value. Two important variability

34
Dr. Rabab Sabry Intro. Prob and Stat Dr. Dina Ahmed

measures are the variance and the standard deviation.


Our primary measures of variability involve the deviations from the mean, 𝑥1 − 𝑥̅ , 𝑥2 −
𝑥̅ , … … . , 𝑥𝑛 − 𝑥̅ . That is, the deviations from the mean are obtained by subtracting 𝑥̅ from each
of the n sample observations. A deviation will be positive if the observation is larger than the
mean (to the right of the mean on the measurement axis) and negative if the observation is
smaller than the mean. If all the deviations are small in magnitude, then all 𝑥𝑖 ’s are close to
the mean and there is little variability. On the other hand, if some of the deviations are large in
magnitude, then some 𝑥𝑖 ’s lie far from 𝑥̅ , suggesting a greater amount of variability. A simple
way to combine the deviations into a single quantity is to average them (sum them and divide
by n).
𝑛

∑(𝑥𝑖 − 𝑥̅ ) = 0
𝑖=1

(1) The population variance is the average of the squares of the distance each value is from
the mean. The symbol for the population variance is 𝜎 2 ,
𝑁
1
𝜎 = ∑(𝑋𝑖 − 𝜇)2
2
𝑁
𝑖=1

where X = individual value, 𝜇 = population mean and N = population size


The population standard deviation is the square root of the variance,
𝜎 = √𝜎 2 .

(2) The sample variance 𝑆 2 of a sample composed of the observations 𝑥1 , 𝑥2 , … … . , 𝑥𝑛 is


the arithmetic mean of the squared distances between each observation and its sample
mean,
𝑛
1
𝑆2 = ∑(𝑥𝑖 − 𝑥̅ )2 .
𝑛−1
𝑖=1

We can use the following shortcut formula, for the variance

2
∑𝑛𝑖=1 𝑥𝑖 2 − (∑𝑛𝑖=1 𝑥𝑖 )2⁄𝑛
𝑆 = .
𝑛−1
The sample standard deviation is the square root of the variance,
𝑆 = √𝑆 2 .
where n − 1 is the degrees of freedom of the sample variance, one degree of freedom is lost
since ∑𝑛𝑖=1 𝑥𝑖 is known.

35
Dr. Rabab Sabry Intro. Prob and Stat Dr. Dina Ahmed

The variance is always nonnegative, and it has the squared units of the sample values. If the
data observations are more spread, the variance is higher and vice versa.
Example 3.8: Find the variance and standard deviation for the data set in an example 3.1,
98, 115, 88 and 44.
Solution:

The mean for the data, 𝑥̅ = 86.25.


Then, variance for the data
1
𝑆2 = [(98 − 86.25)2 + (115 − 86.25)2 + (88 − 86.25)2 + (44 − 86.25)2 ]
4−1
= 917.59
Hence, the standard deviation is 𝑠 = 30.29.

Variance and Standard Deviation for ungrouped and Grouped Data


(1) Ungrouped data

2
∑𝑛𝑖=1 𝑓 𝑥𝑖 2 − (∑𝑛𝑖=1 𝑓 𝑥𝑖 )2⁄𝑛
𝑆 = ,
𝑛−1
Where 𝑓 denotes the frequency.
(2) Grouped data
∑𝑛𝑖=1 𝑓 𝑚𝑖 2 − (∑𝑛𝑖=1 𝑓 𝑚𝑖 )2 ⁄𝑛
𝑆2 = ,
𝑛−1
Where 𝑓 denotes the frequency and 𝑚𝑖 is the interval midpoint.

Example 3.9: Find the sample variance and the sample standard deviation for the frequency
distribution of the data shown. The data represent the number of miles that 20 runners ran during
one week.
True Class Midpoint Frequency

5.5–10.5 8 1
10.5–15.5 13 2
15.5–20.5 18 3
20.5–25.5 23 5
25.5–30.5 28 4
30.5–35.5 33 3
35.5–40.5 38 2

36
Dr. Rabab Sabry Intro. Prob and Stat Dr. Dina Ahmed

Solution:
Construct this table

True Class Midpoint Frequency 𝑓 𝑚𝑖 2


𝑓 𝑚𝑖

5.5–10.5 8 1 8 64
10.5–15.5 13 2 26 338
15.5–20.5 18 3 54 972
20.5–25.5 23 5 115 2,645
25.5–30.5 28 4 112 3,136
30.5–35.5 33 3 99 3,267
35.5–40.5 38 2 76 2,888

∑ 𝑓 𝑚𝑖 𝑛
Total n=20
𝑖=1
∑ 𝑓 𝑚𝑖 2
= 490 𝑖=1

= 13,310

Then, the variance is


13,10 − (490)2 ⁄20
𝑆2 = = 68.7.
20 − 1
Hence, the standard distribution is
𝑠 = √68.7 = 8.3.

3.3.3 Coefficient of Variation


Whenever two samples have the same units of measure, the variance and standard deviation for
each can be compared directly. A statistic that allows you to compare standard deviations when
the units are different units, is called the coefficient of variation, which is defined by:
𝑆
𝐶. 𝑉 = × 100%.
𝑥̅

37
Dr. Rabab Sabry Intro. Prob and Stat Dr. Dina Ahmed

Example 3.10: The mean of the number of sales of cars over a 3-month period is 87, and the
standard deviation is 5. The mean of the commissions is $5225, and the standard deviation is $773.
Compare the variations of the two.
Solution:
The coefficients of variation are
5
𝐶. 𝑉1 = × 100% = 5.7%,
87
773
𝐶. 𝑉2 = × 100% = 14.8%,
5225

Since the coefficient of variation is larger for commissions, the commissions are more variable
than the sales.
Example 3.11: Find the coefficient of variation for the data set in an example 3.9.
Solution:
Coefficient of variation is
8.3
𝐶. 𝑉 = × 100% = 0.122%.
24.5

3.4 Measures of Position


In addition to measures of central tendency and measures of variation, there are measures of position
or location. These measures include percentiles, deciles, and quartiles. They are used to locate the
relative position of a data value in the data set.

To be able to evaluate our relative position when interested in comparing performance and
knowing a ranking.

Used to describe the position of a data value in relation to the rest of the data

38
Dr. Rabab Sabry Intro. Prob and Stat Dr. Dina Ahmed

cores

s ore re resents the n er of standard de ia ons a data


al e falls a o e or elo the ean

t is used as a wa to measure rela ve posi on .

core ormula

Can a z-score be negative?

yes
positive score means that a score is a o e the mean.
negative score means that a score is elo the mean.
score of means that a score is the exa t sa e as the mean.

Example

student scored a on a math test that had a mean of


and a standard devia on of . he scored on a
histor test with a mean of and a standard devia on
of . ompare her rela e osi on on the two tests.

ath

istor

The student did e er in ath because the


score was higher.

39
Dr. Rabab Sabry Intro. Prob and Stat Dr. Dina Ahmed

Example

ind the score for each test and state which


test is be er.

Test
Test

Test .
Test .
hich is higher
est is hi her therefore it is e er
t has a hi her rela e osi on

3.4.1 percentiles
In statistics, percentiles are used to understand and interpret data. The nth percentile of a set
of data is the value at which n percent of the data is below it. In everyday life, percentiles are
used to understand values such as test scores, health indicators, and other measurements. For
example, an 18-year-old male who is six and a half feet tall is in the 99th percentile for his
height. This means that of all the 18-year-old males, 99 percent have a height that is equal to
or less than six and a half feet. An 18-year-old male who is only five and a half feet tall, on the
other hand, is in the 16th percentile for his height, meaning only 16 percent of males his age
are the same height or shorter.

The percentiles divide the data set into 100 equal groups. Percentiles are symbolized by
𝑃1 , 𝑃2 , 𝑃3 , … . . , 𝑃99 and divide the distribution into 100 groups.

40
Dr. Rabab Sabry Intro. Prob and Stat Dr. Dina Ahmed

The percentile corresponding to a given value X is computed by using the following formula:

(𝑛𝑢𝑚𝑏𝑒𝑟 𝑜𝑓 𝑣𝑎𝑙𝑢𝑒𝑠 𝑏𝑒𝑙𝑜𝑤 𝑥 ) + 0.5


𝑃𝑒𝑟𝑐𝑒𝑛𝑡𝑖𝑙𝑒 = × 100.
𝑇𝑜𝑡𝑎𝑙 𝑛𝑢𝑚𝑏𝑒𝑟 𝑜𝑓 𝑣𝑎𝑙𝑢𝑒𝑠

Example 3.12: The number of traffic violations recorded by a police department for a 10-day
period is shown. Find the percentile rank of 16.

22 19 25 24 18 15 9 12 16 20

Solution:
Arrange the data in order from lowest to highest.

9 12 15 16 18 19 20 22 24 25
Then substitute into the formula.

3 + 0.5
𝑃𝑒𝑟𝑐𝑒𝑛𝑡𝑖𝑙𝑒 = × 100 = 35𝑡ℎ
10

Hence, the value of 16 is higher than 35% of the data values.

Finding a Data Value Corresponding to a Given Percentile

1- Arrange the data in order from lowest to highest.

2- Substitute into the formula

𝑛. 𝑃
𝑐=
100
where n = total number of values and p = percentile

3- If c is not a whole number, round up to the next whole number. Starting at the lowest
value, count over to the number that corresponds to the rounded- up value.

4- If c is a whole number, use the value halfway between the cth and (c + 1)st values
when counting up from the lowest value.

Example 3.13: Using the data in Example 3.12, find the value corresponding to the 65th
percentile and 30th percentile.

Solution:
Arrange the data in order from lowest to highest.

41
Dr. Rabab Sabry Intro. Prob and Stat Dr. Dina Ahmed

9 12 15 16 18 19 20 22 24 25
To compute 65th percentile
10 × 65
𝑐= = 6.5
100
Since c is not a whole number, round it up to the next whole number; in this case, it is c = 7.
Start at the lowest value and count over to the 7th value, which is 20. Hence, the value of 20
corresponds to the 65th percentile.
To compute 30th percentile
10 × 30
𝑐= = 3,
100
Since c is a whole number, use the value halfway between the c and c + 1 values when counting up from
the lowest. In this case, it is the third and fourth values.
9 12 15 16 18 19 20 22 24 25

3th 4th
The halfway value is between 15 and 16. It is 15.5. Hence, 15.5 corresponds to the 30th percentile.

er en les
alues of the variable that divide a ranked set into .
or e ample would be at .

( )
= + h i edian
r. abab abr

42
Dr. Rabab Sabry Intro. Prob and Stat Dr. Dina Ahmed

Example

27
= 6.75
4
54
= 13.5
4
81
= 20.25
4

6.75 3
1 = 10 20.25 18
3 = 129. 9
13.5 10
2 =8
4

3.4.2 Quartiles
Quartiles divide the distribution into four equal groups, denoted by Q1, Q2, Q3.
Note that Q1 is the same as the 25th percentile; Q2 is the same as the 50th percentile, or the
median; Q3 corresponds to the 75th percentile, as shown:

To finding Data Values Corresponding to 𝑸𝟏 , 𝑸𝟐 , and 𝑸𝟑 :

1- Arrange the data in order from lowest to highest.

2- Find the median of the data values. This is the value for 𝑄2 .

3- Find the median of the data values that fall below 𝑄2 , This is the value for 𝑄1.

4- Find the median of the data values that fall above 𝑄2 . This is the value for 𝑄3 .

43
Dr. Rabab Sabry Intro. Prob and Stat Dr. Dina Ahmed

Example 3.14: Using the data in Example 3.12, find 𝑄1, 𝑄2 , and 𝑄3 .
Solution:
Arrange the data in order from lowest to highest.

9 12 15 16 18 19 20 22 24 25

Find the median (𝑄2 ):

9 12 15 16 18 19 20 22 24 25
18 + 19
𝑀𝑒𝑑𝑖𝑎𝑛 = = 18.5 = 𝑄2
2

Find the median of the data values below 18.5.

9 12 15 16 18

𝑄1 = 15.

Find the median of the data values greater than 18.5.

19 20 22 24 25

𝑄3 = 22.

44
Dr. Rabab Sabry Intro. Prob and Stat Dr. Dina Ahmed

For grouped data


( )
= +

. ( )
= . +

3.4.3 Deciles
Deciles divide the distribution into 10 groups, as shown. They are denoted by D1, D2, etc.

Note that D1 corresponds to P10; D2 corresponds to P20; etc. Deciles can be found by using the
formulas given for percentiles. Taken altogether then, these are the relationships among percentiles,
deciles, and quartiles.

Deciles are denoted by D1, D2, D3, . . . , D9, and they correspond to
P10, P20, P30, . . . , P90.
Quartiles are denoted by Q1, Q2, Q3 and they correspond to P25, P50, P75. The median
is the same as P50 or Q2 or D5.

45
Dr. Rabab Sabry Intro. Prob and Stat Dr. Dina Ahmed

, ..,

( )
= +

Example Find D2

46
Dr. Rabab Sabry Intro. Prob and Stat Dr. Dina Ahmed

= . +(

The inter ar le ran e is de ned as the di eren e between

n o tlier is an extre el hi h or an extre el lo data


al e hen o ared ith the rest of the data values.

ro ed re for den f in tliers


te rrange the data in order and nd
te ind the inter uar le range
te ul pl the b .
te ubtract the value obtained in step from

te heck the data set for an data value that is


smaller than or larger than
r. abab abr

47
Dr. Rabab Sabry Intro. Prob and Stat Dr. Dina Ahmed

xa le heck the following data set for outliers.

ol on
The data value is e tremel suspect. These are the
steps in checking for an outlier.
te ind

te ind the inter ar le ran e hi h is

Exploratory Data Analysis


In traditional statistics, data are organized by using a frequency distribution.
From this distribution various graphs such as the histogram, frequency
polygon, and ogive can be constructed to determine the shape or nature of the
distribution. In addition, various statistics such as the mean and standard
deviation can be computed to summarize the data.
The purpose of traditional analysis is to confirm various conjectures about
the nature of the data. For example, from a carefully designed study, a
researcher might want to know if the proportion of Americans who are
exercising today has increased from 10 years ago. This study would contain
various assumptions about the population, various definitions such as the
definition of exercise, and so on.
In exploratory data analysis (EDA), data can be organized using a stem
and leaf plot. ) The measure of central tendency used in EDA is the median.
The measure of variation used in EDA is the interquartile range Q3–Q1. In EDA
the data are represented graphically using a boxplot (sometimes called a box
and whisker plot). The purpose of explor atory data analysis is to examine data
to find out what information can be discovered about the data, such as the
center and the spread. Exploratory data analysis was developed by John Tukey
and presented in his book Exploratory Data Analysis (Addison-Wesley, 1977).

The Five-Number Summary and Boxplots


A boxplot can be used to graphically represent the data set. These plots
involve five specific values:
48
Dr. Rabab Sabry Intro. Prob and Stat Dr. Dina Ahmed

1. The lowest value of the data set (i.e., minimum)


2. Q1
3. The median
4. Q3
5.The highest value of the data set (i.e., maximum)
These values are called a five-number summary of the data set.
A boxplot is a graph of a data set obtained by drawing a horizontal line from the
minimum data value to Q1, drawing a horizontal line from Q3 to the maximum data
value, and drawing a box whose vertical sides pass through Q1 and Q3 with a
vertical line inside the box passing through the median or Q2.

PrConstruction a Boxplot
Step 1 Find the five-number summary for the data.
Step 2 Draw a horizontal axis and place the scale on the axis. The scale should start on or
below the minimum data value and end on or above the maximum data value.
Step 3 Locate the lowest data value, Q1, the median, Q3, and the highest data value; then draw
a box whose vertical sides go through Q1 and Q3. Draw a vertical line through the
median. Finally, draw a line from the minimum data value to the left side of the box,
and draw a line from the maximum data value to the right side of the box.

Example

49
Dr. Rabab Sabry Intro. Prob and Stat Dr. Dina Ahmed

symmetric.
b. skewed.
c.

b.
c.

The boxplot indicates that the distribution is slightly positively skewed.

50
Dr. Rabab Sabry Intro. Prob and Stat Dr. Dina Ahmed

Example

51
Dr. Rabab Sabry Intro. Prob and Stat Dr. Dina Ahmed

52
Dr. Rabab Sabry Intro. Prob and Stat Dr. Dina Ahmed

te and eaf lots


The stem and leaf plot is a method of organi ing data and is a
combina on of sor ng and graphing. t has the advantage over a
grouped fre uenc distribu on ofretainin the a t al data while
showing them in graphical form.

ste and leaf lot is a data lot that ses art of the data al e
as the ste and art of the data value as the leaf to form groups or
classes

xa le t an outpa ent tes ng centerthe number of


cardiograms performed each da for da s is shown. onstr t
a ste and leaf lot for the data.

ol on
te rran e the data in order

when the data set is large however it is helpful in construc ng a stem


and leaf plot.The leaves in the nal stem and leaf plot should be
arranged in order.
te eparate the data according to the rst digit as

53
Dr. Rabab Sabry Intro. Prob and Stat Dr. Dina Ahmed

From the previous Figure


mode=32 median =2

xa le The number of stories in two selected samples of tall buildings in tlanta


and hiladelphia is shown. onstruct a back to back stem and leaf plot and
compare the distributions.

54
Dr. Rabab Sabry Intro. Prob and Stat Dr. Dina Ahmed

ol on
te rran e the data for oth data sets in order
te onstr t a ste and leaf lot sin the sa e di its as
ste s la e the di its
for the leaves for tlanta on the le side of the stem and the digits
for the leaves for hiladelphia on the right sideas shown.

te o are the distri ons he ildin s in tlanta ha e a


lar e aria on in the number of stories per building. lthough both
distribu ons are peaked in the to stor class hiladelphia has
more buildings in this class. tlanta has more buildings that have or
more stories than hiladelphia does.

55
Dr. Rabab Sabry Intro. Prob and Stat Dr. Dina Ahmed

Exercises
(1) Determine whether each statement is true or false. If the statement is false, explain
why.

1. When the mean is computed for individual data, all values in the data set are used.

2. The mean cannot be found for grouped data when there is an open class.

3. A single, extremely large value can affect the median more than the mean.

4. One-half of all the data values will fall above the mode, and one-half will fall below the
mode.
5. In a data set, the mode will always be unique.

6. The range and midrange are both measures of variation.

7.

8. One disadvantage of the median is that it is not unique.

9. The mode and midrange are both measures of variation.

10. If a person’s score on an exam corresponds to the 75th percentile, then that person
obtained 75 correct answers out of 100 questions.
(2) Select the best answer.
11. What is the value of the mode when all values in the data set are different?

a. 0 c. There is no mode.
b. 1 d. It cannot be determined unless the data values are given.

12. When data are categorized as, for example, places of residence (rural, suburban, urban), the
most appropriate measure of central tendency is the

a. Mean c. Mode
b. Median d. Midrange

13. P50 corresponds to


a. Q2 c. IQR
b. D5 d. Midrange

14. Which is not part of the five-number summary?


a. Q1 and Q3 c. The median
b. The mean d. The smallest and the largest data values
56
Dr. Rabab Sabry Intro. Prob and Stat Dr. Dina Ahmed

15. A statistic that tells the number of standard deviations a data value is above or below the
mean is called

a. A quartile c. A coefficient of variation


b. A percentile d. A z score

16. When a distribution is bell-shaped, approximately what percentage of data values will fall
within 1 standard deviation of the mean?

a.50% c. 95%
b.68% d. 99.7%

17. The number of highway miles per gallon of the 10 worst vehicles is shown.

12 15 13 14 15 16 17 16 17 18

Find each of these.


a. Mean e. Range
b. Median f. Variance
c. Mode
d. . Standard deviation
18. The distribution of the number of errors that 10 students made on a typing test is shown.

Errors Frequency

0–2 1
3–5 3
6–8 4
9–11 1
12–14 1

Find each of these.


a. Mean c. Variance
b. Modal class d. Standard deviation

19. The average number of newspapers for sale in an airport newsstand is 56 with a standard
deviation of 6. The average number of newspapers for sale in a convenience store is 44
with a standard deviation of 5. Which data set is more variable?
57
Dr. Rabab Sabry Intro. Prob and Stat Dr. Dina Ahmed

20. , use each boxplot to identify the maximum value, minimum value, median, first quartile, third quartile, and
interquartile range

1.

200 225 250 275 300 325

2.

50 55 60 65 70 75 80 85 90 95 100

58
Dr. Rabab Sabry Intro. Prob and Stat Dr. Dina Ahmed

Chapter 4: Probability

What is Probability?
The word probability means chance or possibility of an outcome. It explains
the possibility of a particular event to occur. We often use sentences like -
‘It will probably rain today’, ‘he will probably pass the test’, ‘there is very less
probability of getting storm tonight’, ‘most probably the price of onion will
go high again’. In all these sentences we replace words like chance, doubt,
maybe, likely, etc with the word probability. Probability is basically the
prediction of an event which is either based on the study of previous
records or the number and type of possible outcomes.

The Story Behind The Discovery Of Probability


In the 16th century, a gambler named Chevalier de Mere wanted to find out
about the chances of a number appearing on the roll of dice so he decided
to approach a French Philosopher and Mathematician Blaise Pascal to solve
the dice problem. Blaise Pascal got interested in the concept of possibility
and so he discussed it with another French Mathematician, Pierre de
Fermat. Both the Mathematicians started working on the concept of
probability separately.

Later, J. Cardan, an Italian Mathematician wrote the first book named 'Book
on Games of Chance' in 1663 that deals with the inception of probability.
This caught the attention of some of the great Mathematicians J. Bernoulli,
P. Laplace, A.A Markov and A.N.Kolmogorov.

Out of all the Mathematicians, A.N.Kolmogorov, a Russian mathematician,


treated probability as a function of outcomes of the experiment. With the
help of this concept, we can find the probability of events allied with
discrete sample spaces. This also establishes the concept of conditional
probability which is important for the perception of Bayes' Theorem,
multiplication rule and independence of events. In 1812, Laplace also came
up with ‘Theory Analytique des Probabilities’ which is considered as the
greatest contribution by an individual to the theory of probability. The
deductions and reasoning introduced by these mathematicians related to

59
Dr. Rabab Sabry Intro. Prob and Stat Dr. Dina Ahmed

probability are now being used in Biology, economics, genetics, physics,


sociology, etc.

Uses of Probability
Probability is important to figure out if a particular thing is going to occur
in an event or not. It also helps us to predict future events and take action
accordingly. Below are the uses of probability in our day to day life.

1. Weather forecasting - We often check weather forecasting before


planning for an outing.Weather forecast tells us if the day will be
cloudy, sunny, stormy or rainy. On the basis of the prediction made
we plan our day. Suppose the weather forecast says there is a 75%
chance of rain. Now, the question arises how is the calculation of
probability or precise prediction done. The access to the historical
database and the use of certain tools and techniques helps in
calculating the probability. For example, according to the database if
out of 100 days, 60 days were cloudy then we can say that there is
60% chance that the day will be cloudy depending on other
parameters like temperature, humidity, pressure, etc.

2. Agriculture - Temperature, season and weather plays an important


role in agriculture and farming. Earlier we did not have a better
understanding of weather forecasting but now various technologies are
developed for weather forecasting which helps the farmers to do their job
well on the basis of predictions. Undoubtedly, the occurrence of erratic
weather is beyond human control but it is possible to prepare for the
adverse weather if it is forecasted beforehand. The process of sowing is
usually done in clear weather. Thus, the accurate prediction of weather
enables the farmer to take major steps in order to prevent big loss by
saving their crops. The planning of other suitable farming operations like
irrigations, application of fertilizers and pesticides, etc depends on
weather, thus a proper weather forecast is needed.

3. Politics - Many politicians want to predict the outcome of an election


even before the polling is done. Sometimes they predict which political
party will rise to power by closely studying the results of exit polls. There
are some politicians who spend a lot only to predict the results, so that they
can save themselves from being dethroned. There are other good uses of
probability like predicting the number of students who would be needing
60
Dr. Rabab Sabry Intro. Prob and Stat Dr. Dina Ahmed

jobs in the upcoming year so that the vacancy can be created accordingly.
Politicians can also analyze the rate of car and bike accidents increased in
past years so that they can take measures and reduce road accidents.

4. Insurance - Insurance companies use probability to find out the


chances of a person’s death by studying the database of the person’s family
history, and personal habits like drinking and smoking. Probability also
helps to examine and evaluate the best insurance plan for the benefit of a
person and his family. Suppose a person who is an active smoker has more
chances of getting lung cancer as compared to the people who don’t. Thus,
it is beneficial for a smoker to go for health insurance rather than vehicle
or house insurance for the betterment of his family.

4.1 SAMPLE SPACE AND EVENTS.

In the study of statistics, we consider experiments for which the outcome cannot
be predicted with certainty. Such experiments are called random experiments.

Although the specific outcome of a random experiment cannot be predicted with


certainty before the experiment is performed.

A sample space is the set of all possible outcomes of a probability experiment.is


denoted by 𝑺.

Some sample spaces for various probability experiments are shown here.

Experiment Sample space

Toss one coin Head, tail


Roll a die 1, 2, 3, 4, 5, 6
Answer a true/false question True, false
Toss two coins Head-head, tail-tail, head-tail, tail-head

EXAMPLE Find the sample space for rolling two dice.


SOL UTIO N Since each die can land in six different ways, and two dice
are rolled, the sample space can be presented by a rectangular array, as shown
in Figure 4–1. The sample space is the list of pairs of numbers in the chart.

61
Dr. Rabab Sabry Intro. Prob and Stat Dr. Dina Ahmed
Die 2
Die 1 1 2 3 4 5 6
1 (1, 1) (1, 2) (1, 3) (1, 4) (1, 5) (1, 6)
2 (2, 1) (2, 2) (2, 3) (2, 4) (2, 5) (2, 6)
3 (3, 1) (3, 2) (3, 3) (3, 4) (3, 5) (3, 6)
4 (4, 1) (4, 2) (4, 3) (4, 4) (4, 5) (4, 6)
5 (5, 1) (5, 2) (5, 3) (5, 4) (5, 5) (5, 6)
6 (6, 1) (6, 2) (6, 3) (6, 4) (6, 5) (6, 6)

4.2 An Event:

Any subset of the sample space S is called an event.

•  S is an event ( I m p o s s i b l e e v e n t ).
• S is an event (Sure event).

Example:

Experiment: Selecting a ball from a numbered box containing 6 balls 1,2,3,4,5

• This experiment has 6 possible outcomes


S = 1, 2, 3, 4, 5, 6.
• Consider the following events:
E1 = getting an event number =  2, 4, 6  S .
E2 = getting a number less than 4 = 1, 2, 3   S .
E3 = getting 1 or 3 = 1, 3   S.

E4 = getting an odd number= 1, 3, 5  S.

E5 = getting a negative number=   =   S.


E6 = getting a number less than 10 = 1, 2, 3, 4, 5, 6  = S s

62
Dr. Rabab Sabry Intro. Prob and Stat Dr. Dina Ahmed

4.3 Some Operations on Events:

Let A and B be two events defined on the sample space .

4.3.1 Union of Two events: A

A B Consists of all outcomes in A or in B or in both A and B.


Occurs if A occurs, or B occurs, or both A and B occur.

4.3.2 Intersection of Two Events: AB

A  B Consists of all outcomes in both A and B.


A  B Occurs if both A and B occur.

4.3.3 Complement of an Event: Ac

Ac is the complement of A.
Ac consists of all outcomes of  but are not in A.
Ac occurs if A does not.

63
Dr. Rabab Sabry Intro. Prob and Stat Dr. Dina Ahmed

Special terminology associated with events that is often used by


statisticians includes the following

1. The events A1 , A2 ,…, An are exhaustive events if A1  A2 …  An = S.


2. The events A and B are disjoint (or mutually exclusive) if A  B =  .

It is impossible that both events occur together (in the same time).

AB   AB = 
A and B are not A and B are mutually
mutually exclusive exclusive (disjoint)
(It is possible that both (It is impossible that both
events occur in the same events occur in the same
time) time)

A  Ac = S , A and Ac are exhaustive events.are


A  Ac =  , A and Ac disjoint event.

There are three major types of probability:


1. Classical probability (theoretical probability)

2. Empirical or relative frequency probability (Experimental


probability)
3. Axioms probability
4.4 Classical Probability

.Classical probability uses sample spaces to determine the numerical


probability that an event will happen. You do not actually have to perform
the experiment to determine that probability. Classical probability is so
named because it was the first type of probability studied formally by
mathematicians in the 17th and 18th centuries.

64
Dr. Rabab Sabry Intro. Prob and Stat Dr. Dina Ahmed
Classical probability assumes that all outcomes in the sample space are equally
likely to occur.
For example, when a single die is rolled, each outcome has the same probability
of occurring. Since there are six outcomes, each outcome has a probability of 1⁄6
.

Equally likely events are events that have the same probability of occurring.

The probability of any event E is


Number of outcomes in E
Total number of outcomes in the sample space
This probability is denoted by

n(S)
where n(E) is the number of outcomes in E and n(S) is the number of outcomes in the samplespace S.

EXAMPLE: A coin is tossed twice. What is the probability that at least one head
occurs?
SOLUTION: The sample space for this experiment is
S = {HH, HT, TH, TT}
If the coin is fair, each of the outcomes would be equally likely to
occur. If A represents the event of occurring at least1 head then, A
= {HH, HT, TH} hence, P(A) = 3/ 4
EXAMPLE : Throw an unbiased coin three times and observe the sequence of heads
and tails. Here the sample space is the collection of all possible sequences,
S = {HHH, HHT, HTH, THH, HTT, THT, TTH, TTT}

Since the outcomes are equally likely and mutually exclusive then the probability
of each outcome is 1/8. Let A be the event that two or more heads appear
consecutively, andB that all the tosses are the same. Then A = {HHH, HHT, THH}
and B = {HHH, TTT} Therefore
Dr. Rabab Sabry Intro. Prob and Stat Dr. Dina Ahmed

𝟑 𝟐
Solution 𝑷(𝑨) = , 𝑷(𝑩) =
𝟖 𝟖
𝟏 𝟒
𝑷(𝑨 ∩ 𝑩) = , 𝑷(𝑨 ∪ 𝑩) = ,
𝟖 𝟖

EXAMPLE: A fair die is tossed twice. What is the probability that sum of
upturned faces is 9?
SOLUTION: The sample space for this experiment is
S = {(i, j): i = 1,2,...,6; j = 1,2,...,6}

Since the die is fair (unbiased), each of the 36 possible outcomes would be equally
to occur. If A represents the event that the sum of the upturned faces is 9 then,
A = {(3,6) , (4,5), (5,4), (6,3)} Hence, P(A) = 4/36 = 1/9.

PROBABILITY OF AN EVENT.

The statistician basically concerned with drawing conclusions or inferences from


experiments involving uncertainties. For these conclusions and inferences to be
reasonably accurate, an understanding of basic probability concepts is essential.

Therefore, the probability of A, denoted by P(A), for anyevent A, must satisfy


the following fundamental properties.

1. 𝑃(𝐴) ≥ 0
2. 𝑃(𝑆) = 1 𝑎𝑛𝑑 𝑃(∅) = 0
3. 𝑃(𝐴 ∪ 𝐵) = 𝑃(𝐴) + 𝑃(𝐵) − 𝑃(𝐴 ∩ 𝐵)

4. If A and B are mutually exclusive, the 𝑃(𝐴 ∪ 𝐵) = 𝑃(𝐴) + 𝑃(𝐵)

𝑃(𝐴 ∪ 𝐴𝐶 ) = 1 → 𝑃(𝐴) + 𝑃(𝐴𝐶 ) = 1

𝑃(𝐴𝐶 ) = 1 − 𝑃(𝐴)

5. 𝐼𝑓 𝐴1 , 𝐴2 , … . , 𝐴𝑘 𝑎𝑟𝑒 𝑚𝑢𝑡𝑢𝑎𝑙𝑙𝑦 𝑒𝑥𝑐𝑙𝑢𝑠𝑖𝑣𝑒 𝑎𝑛𝑒 𝑒𝑥ℎ𝑎𝑢𝑠𝑡𝑖𝑣𝑒 𝑒𝑣𝑒𝑛𝑡𝑠, 𝑡ℎ𝑒𝑛


Dr. Rabab Sabry Intro. Prob and Stat Dr. Dina Ahmed
𝑃(𝐴1 ∪ 𝐴2 … .∪ 𝐴𝑘 ) = 𝑃(𝐴1 ) + 𝑃(𝐴2 ) + ⋯ + 𝑃(𝐴𝑘 ) = 1

∴ ∑𝑘𝑖=1 𝑃(𝐴1 ) = 1

6. If events A and B are such that A ⊂ B, then P(A) ≤ P(B).

7. For each event A, P(A) ≤ 1.

8. P(ABC)= P(A) − P(AB)


P(ACB)= P(B) − P(AB) P(ACBC)= 1 − P(AB)

4.5 Axiomatic Probability - Axiomatic Probability is a theory of unifying


probability where there is an application of a set of rules made by
Kolmogorov.

The three axioms are:


1. P(A) ≥ 0;
2. P(S) = 1;
3 . 𝐼𝑓 𝐴1 , 𝐴2 , … . , 𝐴𝑘 𝑎𝑟𝑒 𝑚𝑢𝑡𝑢𝑎𝑙𝑙𝑦 𝑒𝑥𝑐𝑙𝑢𝑠𝑖𝑣𝑒 𝑎𝑛𝑒 𝑒𝑥ℎ𝑎𝑢𝑠𝑡𝑖𝑣𝑒 𝑒𝑣𝑒𝑛𝑡𝑠, 𝑡ℎ𝑒𝑛
∑𝑘𝑖=1 𝑃(𝐴1 ) = 1

EXAMPLE: The probability that a student passes mathematics is 0.85, and the
probability that he passes English is 0.8. If the probability of passing at least one
courseis 0.9, what is the probability that he will pass both courses?
Dr. Rabab Sabry Intro. Prob and Stat Dr. Dina Ahmed
SOLUTION: If M is the event "passing mathematic" and E the event "passing
English", then by transposing the terms in the Additive Rule, we have
P(M  E) = P(M) + P(E) – P(M  E)0.9 = 0.85 + 0.80 – P(M  E) P(M  E) =0.75

EXAMPLE: Three horses A, B and C are in a race. A is twice as likely to win as B


and B is twice as likely to win as
C. what is their respective probabilities of winning, i.e., P(A),P(B) and P(C)?
SOLUTION: Let P(C) = p, since B is twice as likely to win as C, then P(B) = 2p, and
since A is twice as likely to win as B, then P(A) = 4P. Now the sum of the
probabilities must be one,hence
P(A) + P(B) + P(C) = 1, i.e.
4p + 2p + p = 1
Therefore, p = 1/7, and
P(A) = 4p = 4/7, P(B) = 2p = 2/7 and P(C) = p = 1/7.

4.6 Empirical Probability.

The difference between classical and empirical probability is that classical


probability assumes that certain outcomes are equally likely (such as the outcomes
when a die is rolled), while empirical probability relies on actual experience to
determine the likelihood of outcomes. In empirical probability, one might actually
roll a given die 6000 times, observe the various frequencies, and use these
frequencies to determine the probability of an out.

Given a frequency distribution, the probability of an event being in a


given class is 284
P(E)= frequency here.class 𝑓/𝑛
for the
Type equation
630
E total frequencies in the distribution
This probability is called empirical probability and is based on
observation.
Dr. Rabab Sabry Intro. Prob and Stat Dr. Dina Ahmed

Find the probability that


1- The blood type of the selected patient is “O”
2- The blood type of the selected patient is “A”
3-The blood type of the selected patient is “B”
4- The blood type of the selected patient is “AB”
5. The blood type of the selected patient is “AB” or A
6-The blood type of the selected patient not “O”

E The blood t pe of the selected patient is “O”


E The blood t pe of the selected patient is “ ”
E The blood t pe of the selected patient is “ ”
E The blood t pe of the selected patient is “ ”

284
1. 𝑃(𝐸1 ) =
630
258
2. 𝑃(𝐸2 ) =
630
63
3. 𝑃(𝐸3 ) =
630
25
4. 𝑃(𝐸4 ) =
630
5. 𝑃(𝐸4 ∪ 𝐸2 ) = 𝑃(𝐸4 ) + 𝑃(𝐸2 )
284
6. 𝑃(𝐸1𝐶 ) = 1 −
630
Dr. Rabab Sabry Intro. Prob and Stat Dr. Dina Ahmed

4.7 Combinatorial Analysis

nda ental rin i les of o ntin


4.7.1 Multiplication Principle
The number of wa s a se uence of k events counting rule can occur if
the first event can occur in 𝒏𝟏 a s
The second event can occur in 𝒏𝟐 a s
The third event can occur in 𝒏𝟑 𝒘𝒂𝒚𝒔
.............
The k th event can occur in 𝒏𝒌 a s
o total n er of a s 𝒏𝟏 . 𝒏𝟐 … … … 𝒏𝒌 of the composite
experiment .

xa le o ittee of e ers is to e for ed onsistin of one


re resentati e ea h fro la or ana e ent and the li f there are
ossi le re resentati es fro la or fro ana e ent and fro the li
deter ine an different o ittees an e for ed
ol tion he n er of different o ittees that an e
for ed is

er tations
he n er of er tations of o je ts takin r o je ts at a ti e
order is i ortant
xa les
1. permutations of two digits from {0, 1}: 01, 10
2. permutations of three letters from { x, y, z }:
xyz, xzy, yxz, yzx, zxy, zyx
Dr. Rabab Sabry Intro. Prob and Stat Dr. Dina Ahmed

Ex mpl : n how man wa s can people be seated on a bench if onl


seats are available

ol tion The first seat can be filled in an one of a s and when this has
been done there are 9 a s of filling second seat 8 a s of filling the third
seat and wa s of filling the fourth seat. Therefore Number of
arrangements of people taken at a time . . .

xa le n how man wa s can differentl colored marbles be


arranged in a row
ol tion Number of arrangements of marbles in a row . . . . l
!

n general er of arran e ents of different o je ts in a ro


-l … !
This is also called the p j k
and is denoted b P .

n general
Number of arrangements of different objects taken at a
𝒏!
time - …… -r+ 𝑃𝑟 𝑛⬚ = (𝒏−𝒓)!
This is also called the of m r p rm o
j k and is denoted b 𝑃 𝑛 .
Dr. Rabab Sabry Intro. Prob and Stat Dr. Dina Ahmed

Example: How many three letter strings can be formed from the
letters in “compiler” if no letters can be repeated?

8!
Solution 𝑃(8,3) = = 336
5!

Example :

A library has 4 books on operating systems, 7 on


programming, and 3 on data structures.
How many ways can these books be arranged,
given that all books on a subject must be together?
Solution

𝑃(3,3) = 3! ways to arrange 3 subjects

𝑃(4,4) = 4! Ways to arrange 4 books operating system

𝑃(7,7) = 7! Ways to arrange 7 books on programming


𝑃(3,3) = 3! Ways to arrange 3 books on data structure

3! · 4! · 7! · 3! = 4,354,56 total arrangements


Dr. Rabab Sabry Intro. Prob and Stat Dr. Dina Ahmed

Permutations of non-distinct objects


If the permutations are drawn from a collection of objects
that includes non-distinct objects, the formula is different.
How many permutations can be made from the letters in
a. FLORIDA b. MISSISSIPPI
P(7, 7) = 7!= 5,040 11!
= 34,650
4!4!2!1!
all distinct number of Ms not all distinct
number of Ps
number of Is
number of Ss

In general, if there are n = n1 + n2 + … + nk objects,


where each collection of ni objects are indistinguishable,

o ination
The number of combinations of objects taken from objects order is not
i ortant

• e.g., combinations of two digits from {0, 1}: 01


• e.g., combinations of two letters from {x, y, z}:
xy, xz, yz

𝒏 𝒏
( ) = 𝑪(𝒏, 𝒓) = 𝑪𝒓
𝒓
𝑷(𝒏, 𝒓)
=
𝒓!
𝒏!
=
𝒓! (𝒏 − 𝒓)!
Dr. Rabab Sabry Intro. Prob and Stat Dr. Dina Ahmed

Example
#

A committee of 8 students is to be selected from a class


consisting of 19 freshman and 34 sophomores?
a. In how many ways can 3 freshman and 5 sophomores
be selected?
19! 34!
C(19, 3)  C(34, 5) =  = 969  278,256 = 269,630,064
3!16! 5!29!
choose 5 sophomores
choose 3 freshmen

b. In how many ways can a committee with exactly 1


freshman be selected?

19! 34! = 19  5,379,616 = 102,212,704


C(19, 1)  C(34, 7) = 1!18!
7!27!

Example
a. In how many ways can a committee with at least 1
freshman be selected?
Method 1 1 C(19, 1)  C(34, 7) +

2 C(19, 2)  C(34, 6) +
3 C(19, 3)  C(34, 5) +
number of freshmen 4 C(19, 4)  C(34, 4) +

on committee
5 C(19, 5)  C(34, 3) +
Dr. Rabab Sabry Intro. Prob and Stat Dr. Dina Ahmed

xa le Out of mathematicians and ph sicists a committee


consisting of mathematicians and ph sicists is to be formed. n
how man wa s can this be done if
a an mathematician and an ph sicist can be included
b one particular ph sicist must be on the committee
c two particular mathematicians cannot be on the committee

ol tion
a. Total number of possible selections (5) (7)
2 3
5 6
b. Total number of possible selections ( ) ( )
2 2
c. Total number of possible selections (3) (7)
2 3

Sampling without replacement


occurs when an object is not replaced after it has been selected

Sampling with replacement


occurs when an object is selected and then replaced before the next
object has been selected

We've learned a number of different counting techniques in this lesson.


Now, we'll get some practiceusing the various techniques. As we do so, you
might want to keep these helpful (summary) hints in mind:
Dr. Rabab Sabry Intro. Prob and Stat Dr. Dina Ahmed

When you begin to solve a counting problem, it's always, always, always a good idea to create
a specific example (or two or three...) of the things you are trying to count.

If there are ways of doing one thing, and ways of doing another thing, then the
Multiplication Principle tells us there are ways of doing both things.

If you want to place things in positions, and you care about order, you can do it in ways.
(Doing so is called permuting items at a time.)

If you want to place things in positions, and you care about order, you can do it in
ways. (Doing so is called permuting items at a time.)

If you have items of one kind, items of another kind, and items of a third kind, then
there are ways of arranging the items. (Doing so is called counting the number
of distinguishable permutations.)

If you have items of one kind and items of another kind, then there are
ways of choosing the items without replacement and without regard to order.

(Doing so is called counting the number of combinations, and we say " choose ".)

X student has a choice of selecting three elective courses for the


ne t semester. e can choose from si humanities or four ps cholog
courses. ind the ro a ilit that
a. all three o rses sele ted ill e h anities o rses ass in
he sele ts the at rando
b. one h anit and s holo
Solution
10
a The total number of wa s of selecting courses from courses is ( )
3
ince there are si humanities courses and the student needs to select three
Dr. Rabab Sabry Intro. Prob and Stat Dr. Dina Ahmed

6
of them there are ( )
3
(6)
3
ence the probabilit of selecting all humanities courses is
(10
3)
. Number of wa s to select one humanit and ps cholog of them there
6 4
are ( ). ( )
1 2

(𝟔) (𝟒)
𝟏 𝟐
the ro a ilit of sele tin one h anit and s holo is
(𝟏𝟎)
𝟑

xersise marble is drawn at random from a bo containing


red hite l e and oran e marbles.
ind the probabilit that it is
(a)orange or red
b not red or blue
c not blue
d red or white or blue.
Dr. Rabab Sabry Intro. Prob and Stat Dr. Dina Ahmed

Chapter 5: Conditional Probability and Independence

5.1 INTRODUCTION

In this chapter we introduce one of the most important concepts in probability


theory, that of conditional probability. The importance of this concept is
twofold. In the first place, we are often interested in calculating probabilities
when some partial information concerning the result of the experiment is
available; in such a situation the desired probabilities are conditional. Second,
even when no partial information is available, conditional probabilities can
often be used to compute the desired probabilities more easily.

5.2 CONDITIONAL PROBABILITIES


Dr. Rabab Sabry Intro. Prob and Stat Dr. Dina Ahmed

Example A coin is flipped twice. If we assume that all four points in


the sample space S { (H, H), (H, T), (T, H), (T, T)} are equally likely,
what is the conditional probability that both flips result in heads,
given that the first flip does?

Solution If E= { (H, H)} denotes the event that both flips land heads, and
F ={ (H, H), (H, T)} the event that the first flip lands heads, then the desired
probability is given by
𝑷(𝑬 ∩ 𝑭) 𝑷({ (H, H)} ) 𝟏⁄ 𝟏
𝑷(𝑬|𝑭) = = = 𝟒=
𝑷(𝑭) 𝑷({ (H, H), (H, T)}) 𝟐⁄ 𝟐
𝟒

Example An urn contains 10 white, 5 yellow, and 10 black m~bles. A marble


is chosen at random from the urn, and it is noted that it is \not one of the black
marbles. What is the probability that it is yellow?

Solution Let Y denote the event that the marble selected is yellow, and
𝐶
let 𝐵 denote the event that it is not black.
𝑷(𝒀 ∩ 𝐵𝐶 ) 𝟓⁄
𝑷(𝒀|𝐵 = 𝐶)
= 𝟐𝟓
𝐶
𝑷(𝐵 ) 𝟏𝟓⁄
𝟐𝟓

5.3 Multiplication Rule of Probability

𝑃(𝐴 ∩ 𝐵) = 𝑃(𝐴). 𝑃(𝐵|𝐴)

𝑃(𝐴 ∩ 𝐵) = 𝑃(𝐵). 𝑃(𝐴|𝐵)


Dr. Rabab Sabry Intro. Prob and Stat Dr. Dina Ahmed
Dr. Rabab Sabry Intro. Prob and Stat Dr. Dina Ahmed

xa le f we randoml pick two television tubes in succession from


a shipment of television tubes of which are defective hat is the
ro a ilit that the ill oth defe ti e?

ol tion
irst t e is defe ti e B e ond t e is defe ti e ∩ 𝑩 : 𝐵𝑜𝑡ℎ 𝑎𝑟𝑒 𝑑𝑒𝑓𝑒𝑐𝑡𝑖𝑣𝑒
15
𝑃(𝐴) = 𝑃(𝐵|𝐴) 4/
240
15 14
𝑃(𝐴 ∩ 𝐵) = 𝑃(𝐴). 𝑃(𝐵|𝐴) = .
240 239

nde endent ents


Dr. Rabab Sabry Intro. Prob and Stat Dr. Dina Ahmed

f and B are inde endent then


1
= = .
1 = .
and B are inde endent

2
= = .
1 = .
and are inde endent

3
=1 =1 ( + )
= + .
= 1 = .
1 = . ( )
and are inde endent
Example
If A and B are independent, P ( A | B )  .2, P (B )  .3.
Find
1 - P (A  B )
2 − P (A  B )

and are independent


𝐴̅ 𝑎𝑛𝑑 𝐵 𝑎𝑟𝑒 𝑖𝑛𝑑𝑒𝑝𝑒𝑛𝑑𝑒𝑛𝑡
𝑃(𝐴̅ ∩ 𝐵) = 𝑃(𝐴̅). 𝑃(𝐵)

and are independent 𝑃(𝐴|𝐵) = 𝑃(𝐴) = 0.2


𝑃(𝐴̅) = 1 − 0.2 = 0.8
𝑃(𝐴̅ ∩ 𝐵) = 𝑃(𝐴̅). 𝑃(𝐵) = 0.8 × 0.3

and are independent


𝐴̅ 𝑎𝑛𝑑 𝐵̅ 𝑎𝑟𝑒 𝑖𝑛𝑑𝑒𝑝𝑒𝑛𝑑𝑒𝑛𝑡
Dr. Rabab Sabry Intro. Prob and Stat Dr. Dina Ahmed

𝑃(𝐴̅ ∩ 𝐵̅) = 𝑃(𝐴̅). 𝑃(𝐵̅) = 0.8 × (1 − 0.3) = 0.8 × 0.7

xa le bo contains red and blue marbles. ind the probabilit that if


two marbles are drawn at random ith re la e ent
(a) both are blue
(b) both are red
(c) one is red and one is blue.

ol tion
red marble lue marble

3 3
a. {both are blue} ∩𝐵 . | .
5 5
2 2
b. {both are red} ∩𝑅 . | .
5 5

c. {one red and one blue} {red and blue or blue and red}
∩𝐵 ∩𝑅
. | . |
2 3 3 2
. .
5 5 5 5

xa le Two marbles are drawn in succession from the bo containing


red white blue and orange marbles re la e ent ein ade
after ea h dra in . ind the probabilit that
30
75
20 15
75
O
75

(a) oth are hite


10 10
∩𝑅 . .
75 75
Dr. Rabab Sabry Intro. Prob and Stat Dr. Dina Ahmed

the first is red and the se ond is hite


𝟏𝟎 𝟑𝟎
𝑷(𝑹 ∩ 𝑾) = 𝑷(𝑹). 𝑷(𝑾) = .
𝟕𝟓 𝟕𝟓
Example A system composed of n separate components is said to be a parallel
system if it functions when at least one of the components functions . For such
a system, if component i, independent of other components, functions with
probability 𝑝𝑖 , i = 1, ... , n, what is the probability that the system functions?

Solution Let 𝐴𝑖 denote the event that component i functions. Then


P {system functions}= 1 - P {system does not function}
=1 - P{all components do not function}
= 1 − 𝑃(∩ 𝐴𝐶𝑖 ), by independence
= 1 − ∏𝑛𝑖=1 1 − 𝑝𝑖

o of total ro a ilit

f 𝐵1 , 𝐵, … . . 𝐵𝑛 𝑎𝑟𝑒 𝑚𝑢𝑡𝑢𝑎𝑙𝑙𝑦 𝑒𝑥𝑐𝑙𝑢𝑠𝑖𝑣𝑒, 𝑆 = 𝐵1 ∪ 𝐵2 ∪


𝐵3 … . . 𝐵𝑛 , 𝑎𝑛𝑑 𝐴 𝑖𝑠 𝑑𝑒𝑓𝑖𝑛𝑑 𝑜𝑛 𝑆
𝑛

𝑡ℎ𝑒𝑛 𝑃 (𝐴) = ∑ 𝑃(𝐴|𝐵𝑖 )𝑃(𝐵𝑖 )


𝑖=1
Dr. Rabab Sabry Intro. Prob and Stat Dr. Dina Ahmed

roof
𝐴 = (𝐴 ∩ 𝐵1) ∪ (𝐴 ∩ 𝐵2) … . . (𝐴 ∩ 𝐵𝑛)
f (𝐴 ∩ 𝐵1), (𝐴 ∩ 𝐵2), … . . , (𝐴 ∩ 𝐵𝑛)
are t all ex l si e
P(𝐴 ∩ 𝐵1) + 𝑃(𝐴 ∩ 𝐵2) … . . +𝑃(𝐴 ∩ 𝐵𝑛)
. | . | ….. n. | n
hen
𝑃(𝐴) = ∑𝑛𝑖=1 𝑃(𝐴|𝐵𝑖 )𝑃(𝐵𝑖 )

Bayes' Formula
𝑃(𝐵𝑖 ∩𝐴) 𝑃(𝐵𝑖 ).𝑃(𝐴|𝐵𝑖 )
𝑃(𝐵𝑖 |𝐴) =
𝑃(𝐴) 𝑛
𝐴𝐵
∑𝑖=1 𝑃( | 𝑖 )𝑃(𝐵𝑖 )

If we think of the events 𝐵𝑖 ℎas being possible "hypotheses" about some


subject matter (Prior probability), then Bayes' formula may be interpreted as
showing us how opinions _ about these hypotheses held before the experiment
[that is, the P(Bi] should be modified by the evidence of the experiment,
𝑃(𝐵𝑖 |𝐴) (Posterior probability),

Example: In community 𝑃(𝑑𝑖𝑠𝑒𝑎𝑠𝑒) = 𝑃(𝐷) = .005.


A test is developed:𝑃(+|𝐷) = .99
What is the probability 𝑷(𝑫| +)?
Solution
𝑃(+|𝐷) = .99 → 𝑃(+|𝐷𝐶 ) = .01
𝑃(+) = 𝑃(𝐷). 𝑃(+|𝐷) + 𝑃(𝐷𝐶 )𝑃(+|𝐷𝐶 )
= .005 × .99 + .995 × .01 = .0149
𝑃(𝐷). 𝑃(+|𝐷)
𝑃(𝐷| +) = = .332
𝑃(𝐷). 𝑃(+|𝐷) + 𝑃(𝐷𝐶 )𝑃(+|𝐷𝐶 )
Dr. Rabab Sabry Intro. Prob and Stat Dr. Dina Ahmed

xa le bo contains blue and red marbles while another bo


contains blue and red marbles. marble drawn at random from one of
the bo es turns out to be blue. hat is the probabilit that it came from
the rst bo

rst bo second bo

1 1
2
, ( 2)
2
3 2
|
5
, |
7

( ) . ( | ) 21
|
( )
= =
. + . ( | ) 31
Dr. Rabab Sabry Intro. Prob and Stat Dr. Dina Ahmed

Exercises

1.From a box containing 5 black balls and 3 green balls, 2 balls are
drawn in succession, the first ball being replaced in the box before the
second draw is made.
(a) What is the probability that both balls are the same color?
(b) What is the probability that color is represented?
2- In a certain population of women 4% have had breast cancer, 25% are
smokers and 2.5% are smokers and have had breast cancer. A women
is selected at random from thepopulation:
(a) If she is a smoker, what is the probability that she alsohas breast
cancer?
(b) If she hasn't breast cancer, what is the probability thatshe is a
smoker?
(c) What is the probability that she is neither a smoker norhas breast
cancer?
3- Out of 5 mathematicians and 7 statisticians, a committee consisting of
2 maths and 3 statists. Is to be formed. In howmany ways this be done
if
(a) Any mathematician and any statistician can beincluded.
(b) One particular statistician must be on the committee.
(c) Two particular mathematicians cannot be on the committee.
5 Urn has white and black balls Urn white and black and Urn white and
black. n urn is selected at random and a ball drawn at random is found to be white.
ind the probabilit that Urn was selected.
6 A box contains 8 red, 3 white, and 9 blue balls. If 3 balls are drawn at random without
replacement, determine the probability that (a)all 3 are red, (b) all 3 are white, (c) 2 are
red and 1 is white, (d) at least 1 is white, (e) 1 of each color is drawn, (f) The balls are
drawn in the order red, white, blue
7 Of 10 girls in the class, 3 have blue eyes. If two of the 11 girls are chosen at random,
what is the probability that 12 a) Both have blue eyes 13 b) Neither has blue eyes 14 c) At least
one has blue eyes

You might also like