Data Analytics With Python Lecture 2
Data Analytics With Python Lecture 2
Subjective Probability
• Comes from a person’s intuition or reasoning
• Subjective
-- different individuals may (correctly) assign different numeric probabilities to the same event
• Degree of belief
• Useful for unique (single-trial) experiments
– New product introduction
– Initial public offering of common stock
– Site selection decisions
– Sporting events
Probability - Terminology • Experiment • Event • Elementary Events • Sample Space • Unions and
Intersections • Mutually Exclusive Events • Independent Events • Collectively Exhaustive Events
• Complementary Events
Experiment, Trial, Elementary Event, Event
• Experiment: a process that produces outcomes
– More than one possible outcome
– Only one outcome per trial
• Trial: one repetition of the process
• Elementary Event: cannot be decomposed or broken down into other events
• Event: an outcome of an experiment
– may be an elementary event, or
– may be an aggregate of elementary events
– usually represented by an uppercase letter, e.g., A, E1
An Example Experiment
• Experiment: randomly select, without replacement, two
families from the residents of Tiny Town
• Elementary Event: the sample includes families A and C
• Event: each family in the sample has children in the
household
• Event: the sample families own a total of four automobiles
Sample Space
• The set of all elementary events for an experiment
• Methods for describing a sample space
– roster or listing
– tree diagram
– set builder notation
– Venn diagram
Sample Space: Roster Example
• Experiment: randomly select, without
replacement, two families from the
residents of Tiny Town
• Each ordered pair in the sample space
is an elementary event, for example --
(D,C)
Problem
• A company conducted a survey for the American Society of Interior Designers in which workers
were asked which changes in office design would increase productivity.
• Respondents were allowed to answer more than one type of design change.
Problem • If one of the survey respondents was randomly selected and asked what office design
changes would increase worker productivity, – what is the probability that this person would select
reducing noise or more storage space?
Solution
• Let N represent the event “reducing noise.”
• Let S represent the event “more storage/ filing space.”
• The probability of a person responding with N or S can be symbolized statistically as a union
probability by using the law of addition. 𝑃(𝑁 ∪ 𝑆)
Problem
• A company data reveal that 155 employees worked one of four types of positions.
• Shown here again is the raw values matrix (also called a contingency table) with the frequency
counts for each category and for subtotals and totals containing a breakdown of these employees by
type of position and by sex.
Problem
• Shown here are the raw values matrix and corresponding probability matrix for the results of a
national survey of 200 executives who were asked to identify the geographic locale of their company
and their company’s industry type.
• The executives were only allowed to select one locale and one industry type.
Problem
• Shown here are the raw values matrix and corresponding probability matrix for the results of a
national survey of 200 executives who were asked to identify the geographic locale of their company
and their company’s industry type.
• The executives were only allowed to select one locale and one industry type.
Questions
a. What is the probability that the
respondent is from the Midwest (F)?
b. What is the probability that the
respondent is from the communications
industry (C) or from the Northeast (D)?
c. What is the probability that the respondent
is from the Southeast (E) or from the finance
industry (A)?
Law of Multiplication
Computing Conditional
Probability
• Of the cars on a used car lot, 70%
have air conditioning (AC) and 40%
have a CD player (CD). 20% of the
cars have both.
• What is the probability that a car
has a CD player, given that it has
AC ?
• We want to find P(CD | AC).
Independent Events
• If X and Y are independent events, the occurrence of Y does not affect the probability of X
occurring.
• If X and Y are independent events, the occurrence of X does not affect the probability of Y
occurring.
Poisson Distribution
• Describes discrete occurrences over a continuum or interval
• A discrete distribution
• Describes rare events
• Each occurrence is independent any other occurrences.
• The number of occurrences in each interval can vary from zero to infinity.
• The expected number of occurrences must hold constant throughout the experiment.
Poisson Distribution: Applications
• Arrivals at queuing systems
– airports -- people, airplanes, automobiles, baggage
– banks -- people, automobiles, loan applications
– computer file servers -- read and write operations
• Defects in manufactured goods
– number of defects per 1,000 feet of extruded copper wire
– number of blemishes per square foot of painted surface
– number of errors per typed page
The Hypergeometric Distribution
• The binomial distribution is applicable when
selecting from a finite population with replacement
or from an infinite population without replacement.
• The hypergeometric distribution is applicable when
selecting from a finite population without
replacement.
Hyper Geometric Distribution
• Sampling without replacement from a finite
population
• The number of objects in the population is denoted
N.
• Each trial has exactly two possible outcomes,
success and failure.
• Trials are not independent
• X is the number of successes in the n trials
• The binomial is an acceptable approximation,
if N/10 > n Otherwise it is not. 17
Continuous Probability Distributions
• A continuous random variable is a variable that can assume any value on a continuum (can assume
an uncountable number of values)
– thickness of an item
– time required to complete a task
– temperature of a solution
– height
• These can potentially take on any value, depending only on the ability to measure precisely and
accurately.
Continuous Distributions • Uniform • Normal • Exponential
The Uniform Distribution
• The uniform distribution is a probability distribution that has equal probabilities for all possible
outcomes of the random variable
• Because of its shape it is also called a rectangular distribution
Example: Uniform Distribution
• Consider the random variable x representing the
flight time of an airplane traveling from Delhi to
Mumbai.
• Suppose the flight time can be any value in the
interval from 120 minutes to 140 minutes.
• Because the random variable x can assume any
value in that interval, x is a continuous rather than a
discrete random variable
Example : Uniform Distribution contd…. Uniform Probability Distribution for Fligh time
• Let us assume that sufficient actual flight
data are available to conclude that the
probability of a flight time within any 1-
minute interval is the same as the probability
of a flight time within any other 1-minute
interval contained in the larger interval from
120 to 140 minutes.
• With every 1-minute interval being equally
likely, the random variable x is said to have a
uniform probability distribution.