0% found this document useful (0 votes)
28 views50 pages

Prob & Stat

The document discusses key concepts in statistics including variables, populations and samples, descriptive statistics, probability, and comparing data sets. It covers topics such as qualitative and quantitative variables, discrete and continuous variables, measures of central tendency and variability, and comparing the center, spread, shape, and unusual features of data sets.

Uploaded by

Zion Brynn
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
28 views50 pages

Prob & Stat

The document discusses key concepts in statistics including variables, populations and samples, descriptive statistics, probability, and comparing data sets. It covers topics such as qualitative and quantitative variables, discrete and continuous variables, measures of central tendency and variability, and comparing the center, spread, shape, and unusual features of data sets.

Uploaded by

Zion Brynn
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 50

PROBABILITY AND STATISTICS

MN/GL/GM/MR/PE/ES 361
BY

B. ODOI
1
INTRODUCTION
Statistics is a way to get information from data. Statistics is a discipline
which is concerned with:

• summarizing information to aid understanding

• drawing conclusions from data,

• estimating the present or predicting the future, and

• designing experiments and other data collection.

In making predictions, Statistics uses the concept of probability, which


models chance mathematically and enables calculations of chance in
complicated cases. 2
Why Statistics?

• The field of statistics deals with the collection, presentation, analysis,


and use of data to make decisions, solve problems, and design
products and processes. In simple terms, statistics is the science of
data.

• Many aspects of engineering practice involve working with data,


obviously knowledge of statistics is just as important to an engineer
as other engineering sciences.

• Specifically, statistical techniques can be powerful aids in designing


new products and systems, improving existing designs, and designing,
developing, and improving production processes. 3
Objectivity of Statistics
• Statistical analysis provides objective ways of evaluating patterns of events
or patterns in our data by computing the probability of observing such
patterns by chance alone.

• Without the use of statistics, little can be learnt from most research
studies.

• It has become very desirable to understand and practice statistical


thinking because of the increasing use of statistics in so many areas of our
lives.

• This is important even if you do not use statistical methods directly.


4
Branches of Statistics
Descriptive statistics
• This is the branch of statistics that involves the organization, summarization,
and display of data. Two general techniques are used to accomplish this goal.

• Organize the entire set of scores into a table or a graph that allows researchers
(and others) to see the whole set of scores (summarizing data graphically).

• Compute one or two summary values (such as the average) that describe the
entire group (summarizing data numerically).

Inferential statistics
• This is the branch of statistics that involves using a sample to draw conclusions
about a population. A basic tool in the study of inferential statistics is
probability.
5
Variables
In statistics, a variable has two defining characteristics:
• A variable is an attribute that describes a person, place, thing, or idea.

• The value of the variable can "vary" from one entity to another.For example, a person's hair
color.

• Qualitative vs. Quantitative Variables


• Variables can be classified as qualitative (categorical) or quantitative (numeric).

• Qualitative variables take on values that are names or labels. The color of a ball (e.g., red,
green, blue) or the breed of a dog (e.g., collie, shepherd, and terrier).

• Quantitative variables are numeric. They represent a measurable quantity. when we speak of
the population of a city, we are talking about the number of people in the city - a measurable
attribute of the city. Therefore, population would be a quantitative variable.

• In algebraic equations, quantitative variables are represented by symbols (e.g., x, y, or6z).


Discrete vs. Continuous Variables
Quantitative variables can be further classified as discrete or continuous.

• If a variable can take on any value between its minimum value and its maximum
value, it is called a continuous variable; otherwise, it is called a discrete variable.

• Univariate vs. Bivariate Data


Statistical data are often classified according to the number of variables being
studied.

• Univariate data. When we conduct a study that looks at only one variable, we say
that we are working with univariate data.

• Bivariate data. When we conduct a study that examines the relationship


between two variables, we are working with bivariate data. 7
Populations and Samples
• The main difference between a population and sample has to do with how
observations are assigned to the data set.
• A population includes all of the elements from a set of data.

• A sample consists of one or more observations from the population.

• Depending on the sampling method, a sample can have fewer observations


than the population, the same number of observations, or more observations.
More thanone sample can be derived from the same population.

8
• A measurable characteristic of a population, such as a mean or standard deviation,
is called a parameter; but a measurable characteristic of a sample is called a
Statistic

Summarizing data graphically


Selected graphs for qualitative data
• Pie chart
• Bar Chart
• (Also frequency distribution)

Selected graphs for Numerical data


• Box plot
• Dot plot
• Stem-and-leaf
• Histogram 9
Types of graphs
• Dot Plot Bar, Pie chart, histogram Box Plot

10
Summary Statistics
Measure of Location
These provide an indication of the center of the distribution where most of the
scores tend to cluster.
There are three principal measures of central tendency:
• Mode
• Median
• Mean
Measure of Spread/ Variability
Variability is the measure of the spread in the data. The three common ones for
this concept are:
• Range
• Variance
• Standard deviation
11
How to Describe Data Patterns in Statistics
Graphic displays are useful for seeing patterns in data. Patterns in
data are
commonly described in terms of:
Center
Spread: The spread of a distribution refers to the variability of the
data.

• If the observations cover a wide range, the spread is larger.

• If the observations are clustered around a single value, the spread is


smaller. 12
Shape
The shape of a distribution is described by the following characteristics:

• Symmetry: When it is graphed, a symmetric distribution can be divided


at the
center so that each half is a mirror image of the other.

• Number of peaks :Distributions can have few or many peaks.


Distributions with one clear peak are called unimodal, and distributions
with two clear peaks are called bimodal.

• When a symmetric distribution has a single peak at the center, it is


referred to as bell-shaped. 13
Skewness. When they are displayed graphically, some distributions have
many
more observations on one side of the graph than the other.

• Distributions with fewer observations on the right (toward higher values)


are said to be skewed right; and distributions with fewer observations on
the left (toward lower values) are said to be skewed left.

Uniform. When the observations in a set of data are equally spread across
the
range of the distribution, the distribution is called a uniform distribution.

• A uniform distribution has no clear peaks.


14
Unusual Features
Sometimes, statisticians refer to unusual features in a set of data. The two most
common unusual features are gaps and outliers.

Gaps. Gaps refer to areas of a distribution where there are no observations.


• The first figure below has a gap; there are no observations in the middle of the
distribution.

Outliers. Sometimes, distributions are characterized by extreme values that differ


greatly from the other observations. These extreme values are called outliers.

• As a "rule of thumb", an extreme value is often considered to be an outlier if it


is at least 1.5 interquartile ranges below the first quartile (Q1), or at least 1.5
interquartile ranges above the third quartile (Q3). 15
How to Compare Data Sets
Common graphical displays (e.g., dot plots, boxplots, stem plots, bar
charts) can be effective tools for comparing data from two or more
data sets.

Four Ways to Describe Data Sets


When you compare two or more data sets, focus on four features:

Center. Graphically, the center of a distribution is the point where


about half of the observations are on either side.
16
Spread. The spread of a distribution refers to the variability of the
data.

• If the observations cover a wide range, the spread is larger.

• If the observations are clustered around a single value, the spread is


smaller.

Shape. The shape of a distribution is described by symmetry,


skewness, number of peaks, etc.

Unusual features. Unusual features refer to gaps (areas of the


distribution where there are no observations) and outliers.
17
Introduction to Probability
Definitions
Experiment:
• An experiment is any process that generates a set of data or well-defined
outcomes.
Types of experiments
• Deterministic
• Random (or Chance)
• In the deterministic experiments the observed results are not subject to
chance while the outcomes of random experiments cannot be predicted with
certainty.

• A random experiment could be as simple as tossing a coin or die and


observing an outcome or complex as choosing 50 people from a population
and testing them for the AIDS disease. 18
Trial: Each repetition of an experiment is called a trial. That is, a trial is
a single performance of an experiment.

Outcome: The possible result of each trial of an experiment is called an


outcome.

• When an outcome of an experiment has equal chance of occurring


as the others the outcomes are said to be equally likely.

For example
• the toss of a coin and a die yield the possible outcomes in the sets,
{H, T} and {1, 2, 3, 4, 5, 6} and a play of a football match yields {win
(W), loss (L), draw (D)}.
19
Sample Space:
• Sample space is the collection of all possible outcomes at a probability
experiment. We use the notation S for sample space.

• Each element or outcome of the experiment is called sample point.

For example
• The results of two and three tosses of a coin give the following sample
spaces:
S = {HH, HT, TH, TT}
S = {HHH, HHT, HTH, HTT, THH, THT, TTH,TTT}

• A toss of a die and a coin simultaneously give the results.


S = {H1, H2, H3, H4, H5, H6, T1, T2, T3, T4, T5, T6}
20
Event: An event is a collection of one or more outcomes from an
experiment. That is, it is a subset of a sample space. It is denoted by a capital
letter.

For example
• The event of observing a head (H) in three tosses of a coin,
A = {HTT, TTH}
• The event of obtaining a total score of 8 on two tosses of a die,
B = {2,6), (3,5), (4,4), (5,3), (6,2)}
Consider a newly married couple planning to have three children.

• The event of the family having two girls is:


D = {BGG, GBG, GGB} 21
• Tree Diagram: The tree diagram represents pictorially the outcomes of
random experiment.

• The probability of an outcome which is a sequence of trials, is represented by


any path of the tree.

For example:
• Consider a couple planning to have three children, assuming each child born
is equally likely to be a boy (B) or girl (G).

• A soccer team on winning (WT) or losing (LT) a toss can defend either post A
or B. It plays the match and either win (W), draw (D) or lose (L). We illustrate
the experiment on a diagram as follows:
22
Determination of Probability of an Event
• The probability of an event A, denoted, P(A), gives the numerical measure of
the likelihood of the occurrence of event A which is such that 0 ≤ P (A) ≤ 1.

• If P (A) = 0, the event A is said to be impossible to occur. If P(A) = 1, A is said to


be certain.

• If A/ is the complement of the event A, then P(A/) = 1 – P(A), called the


probability that event A will not occur.

• There are three main schools of thought in defining and interpreting the
probability of an event. Classical Definition, Empirical Concept ( and the
Subjective Approach. The first two are referred to as the Objective Approach.
23
• The Classical Definition: This is based on the assumption that the outcomes of
an experiment are equally likely.

For example:
• if an experiment can lead to n mutually exclusive and equally likely outcomes,
then the probability of the event A is defined by

n( A) Number of successful outcomes


P ( A)  
n( S ) Number of possible outcomes

• The classical definition of probability of event A is referred to as priori


probability because it is determined before any experiment is performed to
observe the outcomes of event A.
24
The Empirical Concept: This concept uses the relative frequencies of past
occurrences to develop probabilities for future.
For example:
• The probability of an event A happening in future is determined by observing
what fraction of the time similar events happened in the past. That is,

Number of times A occured in the past


P ( A) 
Total number of observations

• The relative frequency of the occurrence of the event A used to estimate P(A)
becomes more accurate if trials are largely repeated.

• The relative frequency approach of defining P(A) is sometimes called posteriori


probability because P(A) is determined only after event A is observed.
25
The Subjective Definition: The subjective concept of probability is based on the
degree of belief through the evidence available.

• The probability of an event A may therefore be assessed through experience,


intuitive, judgment or expertise.

For example
• determining the probability of getting a cure of a disease or going to rain
today.

• This approach to probability has been developed relatively recently and is


related to Bayesian Decision Analysis

26
Example 1:.
Consider the problem of a couple planning to have three children, assuming each
child born is equally likely to be a boy (B) or a girl (G).
(a) List the possible outcomes in this experiment
(b) What is the probability of the couple having exactly two girls?

Solution:
(a) The sample space for this experiment is
S = {BBB, BBG, BGB, BGG, GBG, GGB, GGG}

(b) Let A be the event of the couple having exactly two girls. Then,
A = {BGG, GBG, GGB}
n( A) 3
P ( A)  
n( S ) 8 27
Example 2:
Suppose a card is randomly selected from a packet of 52 playing cards.
(i) What is the probability that it is a “Heart”?
(ii) What is the probability that the card bears the number 5 or a picture
of a queen?
(b) A box contains 4 red, 2 black and 3 white balls. What is the probability of
drawing
a red ball?
Solution:
Let the sample space be the set, S = {playing cards}, A = {Heart cards}, B = {Cards
numbered 5} Q = {Cards with a picture of queen}.

Then n(S) = 52, n(A) =13, n(B) = 4 and n(Q) = 4


28
(a)

n( A) 13 1
(i) P ( A)   
n( S ) 52 4

(ii) P ( B or Q )  P ( B )  P (Q )

n( B) n(Q) 4 4 2
    
n( S ) n( S ) 52 52 13

(b) The sample space, S = {4R, 2B, 3W-balls} and let R = {red balls}. Then
n( R ) 4
P( R)  
n( S ) 9 29
TRY:
A die is tossed twice. List all the outcomes in each of the following events and
compute the probability of each event.
(a) The sum of the scores is less than 4
(b) Each toss results in the same score
(c) The sum of scores on both tosses is a prime number
(d) The product of the scores is at least 20
Solution:
The sample space for the experiment is the set of ordered paired (m, n), where
m, n each takes the values 1, 2, 3, 4, 5 and 6. Thus
S = {(1, 1), (1, 2), (1, 3), . . . . , (6, 6)}, where n(S) = 36

3 1
(a) A = {(1, 1), (1, 2), (2, 1)}, P ( A)   30
36 12
(b) B = {each toss results in the same score}
= {(1, 1), (2, 2), (3, 3), (4, 4), (5, 5), (6, 6)}
6 1
P( B)  
36 6

(c) D ={sum of scores on both tosses is prime}


D = {(1, 1), (1, 2), (1, 4), (1, 6), (2, 1), (2, 3), (2, 5), (3, 2), (3, 4), (4, 1), (4, 3),(5, 2),(5, 6),
(6, 1), (6, 5)}
15 5
P( D)  
36 12

(d) E = {product of the scores is at least 20}


= {(4, 5), (4, 6), (5, 4), (5, 5), (5, 6), (6, 4), (6 5), (6, 6)}

8 2
P( E )  
36 9 31
Probability of Compound Events
Two or more events are combined to form a single event using the set operations 
and . The event :

• (A B) occurs if either A or B both occur(s).

• (A  B) occurs if both A and B occur.

Definitions:
Mutually Exclusive Events: Two or more events which have no common outcome(s) (i.e. never
occur at the same time) are said to be mutually exclusive.

• If A and B are mutually exclusive events of an experiment, then A  B   and


P  A  B   P  A   P B  since P A  B   0
32
Independent Events: Two or more events are said to be independent if the
probability of occurrence of one is not influenced by the occurrence or non-
occurrence of the other(s).

• Mathematically, the two events, A and B are said to be independent, if and


only if P A  B   P A  PB 

• However, if A and B are such that, P A  B   P A  PB / A they are said to be


conditionally independent.

Conditional Probability: Let A and B be two events in the sample space, S with
P(B) > 0.

• The probability that an event A occurs given that event B has already
occurred, denoted P(A/B), is called the conditional probability of A given B.

33
The conditional probability of A given B is defined as.

P A  B
P A / B   , P( B)  0
P B 

In particular, if S is a finite equiprobable space, then

n A  B n B  n A  B
P A  B   , P B   and P  A / B  
n S  n S  n B 

Exhaustive Events: Two or more events defined on the same sample space are
said to be exhaustive if their union is equal to the sample space S.
34
EXAMPLE 1
a) In a certain population of women, 40% have had breast cancer, 20%
are smokers and 13% are smokers and have had breast cancer. If a women is
selected at random from the population, what is the probability that she had
breast cancer, smokes or both?

(b) Let A and B be events such that P A  0.6 , PB   0.5 and  A  B   0 .8
i. Find P A / B 
ii. are A and B independent?

35
Solution:
(a) Let B be the event of women with breast cancer and W the event of
women who smoke. Then,

PB   0.4 , PW   0.2 and B  W   0.13


P B  W   P B   P W   P B  W 

= 0.4 + 0.20 – 0.13


= 0.47

(b) Given that P A  0.6 , PB   0.5 and  A  B   0 .8

36
i.
P  A  B   P  A  P B   P  A  B 

 0 .6  0 .5  0 .8  0 .3

P A  B
P A / B   , P( B)  0
P B 

0 .3 3
   0 .6
0 .5 5
37
ii. A and B are independent if P  A  P B   P  A  B 

P A  PB   0.6 0.5  0.3  P A  B 

Which means that A and B are independent

Exercise/Assignment
Suppose that of all individuals buying a certain digital camera, 60% include an
optional memory card in their purchase, 40% include an extra battery, and 30%
include both a card and battery. Given that the selected individual purchased
an extra battery, what is the probability that an optional card was also
purchased is?
38
Bayes’ Rule
• The power of Bayes’ rule is that in many situations where we want to compute
P(A|B) it turns out that it is difficult to do so directly, yet we might have direct
information about P(B|A). Bayes’ rule enables us to compute P(A|B) in terms
of P(B|A). P( A  B) P ( B / A) P ( A)
P( A / B)  
P( B) P( B)

Bayes’ Theorem
• Let A and Ac constitute a partition of the sample space S such that with P(A) > 0
and P(Ac) > 0, then for any event B in S such that P(B) > 0,
P ( B / A) P ( A)
P( A / B) 
P ( B / A) P ( A)  P ( B / A c ) P ( A c )

• The denominator P(B) in the equation can be computed,


P ( B )  P A  B    A  B 
 P A  B   P  A  B 

 P ( A) P ( B / A)  P ( A) P ( B / A) 39
Example
A paint-store chain produces and sells latex and semigloss paint. Based on long-
range sales, the probability that a customer will purchase latex paint is 0.75. Of
those that purchase latex paint, 60% also purchase rollers. But only 30% of
semigloss pain buyers purchase rollers. A randomly selected buyer purchases a
roller and a can of paint. What is the probability that the paint is latex?
Solution
L = {The customer purchases latex paint.}, P(L) = 0.75, S = {The customer
purchases semigloss paint.}, P(S) = 0.25 , R = {The customer purchases roller.}
P(R|L) =0.6 ,P(R|S) =0.3

P(R) =P(R|L)P(L)+P(R|S)P(S) = 0.6 × 0.75+0.3 × 0.25 = 0.525

P ( L  R ) P ( R / L) P ( L) 0.6 × 0.75
P( L / R)     0.857
P( R) P( R) 0.6 × 0.75  0.3 × 0.25 40
Axioms of Probability
A.1: For every event A, 0 ≤ P(A) ≤1

A.2: P(S) = 1

A.3: If A and B are mutually exclusive events, i.e A  B   then P A  B   P A  PB 

A.4: If A1 , A2 , A3 ,    , An is a sequence of n mutually exclusive


events then, P A1  A2  A3     An   P( A1 )  P( A2 )  P A3       P( An )

41
Theorems of probability

Theorem 1: If  is the empty set, then P    0


Proof:
Let A be any event, then A and are mutually exclusive and A= A   Then by A.3

P  A  P  A     P  A  P   and P    0

Theorem 2: If A/ is the complement of an event A then  


P A /  1  P  A

Proof
S  A  A/
 
PS   P A  P A /
1  P  A  P A  / by A.2
 
P A /  1  P  A
42
Some Rules of Probability
(a) The Addition Rule:
Let A1 , A2 , A3 ,    , An be events of the sample space, S. Then

(i) P A1  A2 )  P( A1 )  P( A2 )  P( A1  A2 )

(ii) P A1  A2  A3   P( A1 )  P( A2 )  P( A3 )  P( A1  A2 )  P ( A1  A3 )  P ( A2  A3 )+ P A1  A2  A3 

If the events A1 , A2 , A3 ,    , An are mutually exclusive, then

(i) P A1  A2   P( A1 )  P( A2 )

(ii) P A1  A2  A3   P( A1 )  P( A2 )  P( A3 )
P A1  A2  ......  An   p( A1 )  P( A2 )  P A3 .........  P( An )
(iii)
43
(b) The Multiplication Theorem:

If A1 , A2 , A3 ,    , An are events of the same sample space, S, then

(i) P  A1  A2   P ( A1 )  .P ( A2 / A1 )

(ii) P A1  A2  A3   P ( A1 )  .P ( A2 / A1 )  .P ( A3 / A1  A2 )

44
Application of Counting Techniques

• The classical definition of probability of an event A , P A requires


the knowledge of the number of outcomes of A and the total
possible outcomes of the experiment, S .

• To find these outcomes we list such outcomes explicitly, which may


be impossible if they are too many.

• Counting Techniques may be useful to determine the number of


outcomes and compute P A .
45
The Multiplication Principle/Basic Counting Principle

• It states that, If an operation can be performed in n1 ways, and a

second operation can be performed in n 2 ways and so on for kth


operation which can be performed in n k ways, then the combined
experiment or operations can be performed in n1  n 2  n 3    n k

ways.

• Application of the multiplication principle results in the other two


counting techniques: Permutation and Combination, used to find
the number of possible ways when a fixed number of items are to
be picked from a lot without replacement.
46
Permutation of Objects
• An ordered arrangement of objects is called a permutation.
For example:
• the possible permutations of the letters A, B and C are as follows: ABC, ACB,
BAC, BCA, CAB and CBA.

Definitions:
• the number of permutations of n distinct objects, taken all together is:

n !  n n  1 n  2 n  3   3  2  1 or n


Pn 
• the number of permutations of n distinct objects, taken k at a time is:

Pk or P n , k 
n!
n
 , where k  n
n  k ! 47
• the number of permutations of n objects consisting of groups of
which n1 of the first group are alike, n 2 of the second group are
alike and so on for the kth group with n k objects which are
alike is:
n!
n1! n 2 ! n3 !   n k !

• Where n1  n 2  n3      n k  n

48
Circular Permutations: Permutations that occur when objects are arranged in a
circle are called circular permutations.

• The number of ways of arranging n different objects in a circle is given by:

 n  1!
n!
n
Combination of Objects: A Combination is a selection of objects in which the
order of selection does not matter.

Definition:The number of ways in which k objects can be selected from n


distinct objects, irrespective of their order is defined by:
 n  n ! n
Pk
n
C k or     , where k  n
 k  n  k ! k ! k ! 49
THANK YOU

50

You might also like