0% found this document useful (0 votes)
27 views19 pages

Stats Unit3

Master of data science, Statistics & probability module 3 notes

Uploaded by

girab87633
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
27 views19 pages

Stats Unit3

Master of data science, Statistics & probability module 3 notes

Uploaded by

girab87633
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 19

Course: MSc DS

Probability & Statistics

Module: 3
Learning Objectives:

1. Understand the Foundations of Probability

2. Master the Basic Concepts of Probability

3. Apply the Core Properties of Probability

4. Examine the Binomial Distribution

5. Dive Deep into the Normal Distribution

6. Explore and Differentiate Other Common Distributions


Structure:

3.1 Understanding the Nature of Probability

3.2 Basic Concepts of Probability

3.3 Properties of Probability

3.4 Binomial Distribution

3.5 Normal Distribution

3.5 Summary

3.6 Keywords

3.7 Self-Assessment Questions

3.8 Case Study

3.9 Reference
3.1 Understanding the Nature of Probability

Probability can be defined as the measure of the likelihood that an event will occur. It quantifies the uncertainty inherent in predictions.

Chance and uncertainty are often used interchangeably with probability. However, while probability provides a numerical measure, chance and
uncertainty more generally describe situations where outcomes are not deterministic.

Origins and Importance in Decision Making

The concept of probability has its roots in gambling and games of chance, where early thinkers wanted to predict outcomes and optimise
their bets.

Today, probability theory extends far beyond gambling. It's essential in various disciplines like finance, science, economics, and more.

Decision making often involves uncertain outcomes.

Understanding the probability of different outcomes allows decision-makers to assess risks, make informed choices, and optimise results.

3.2 Basic Concepts of Probability


Basic Concepts of Probability

Trial: An action or experiment that leads to one of several possible outcomes.

Outcome: The result of a single trial.

Event: One or more outcomes of interest.

Defining Events and Sample Spaces

Sample Space (S): The set of all possible outcomes of a trial.

o For example, in a coin toss, the sample space is S = {Heads, Tails}.

Event (E): Any subset of the sample space.

o For the coin toss, an event could be E = {Heads}.

Simple, Compound, and Complementary Events

Simple Event: An event that consists of a single outcome.


o For example, drawing an ace from a standard deck of cards.

Compound Event: An event that consists of more than one outcome.

o For example, drawing a red card or a king from a deck.

Complementary Events: If event A is an event, then the complementary event, denoted by A′ or Ac, consists of all outcomes not in A.

o If A is the event of rolling a dice and getting a 3, then A′ is the event of not rolling a 3.

The Classical, Relative Frequency, and Subjective Approaches to Probability

Classical Approach: Assumes all outcomes in the sample space are equally likely. The probability of an event E is calculated as:

P(E)= Number of outcomes in E / Number of outcomes in the sample space

o For instance, the probability of drawing an ace from a

standard deck of cards is 4/52 or 1/13.

Relative Frequency Approach: Defines probability based on historical or experimental data. It is the ratio of the number of times an event
occurs to the total number of trials. P(E)= Number of times E occurs / Total number of trials

Subjective Approach: Based on personal judgement, intuition, or experience rather than objective empirical evidence. Useful in situations
where it is impossible or impractical to collect relevant frequency data. For instance, estimating the likelihood of a new business venture
succeeding.

3.3 Properties of Probability

The probability of any event A is denoted by P(A) and satisfies the following three properties:

0≤P(A)≤1: The probability of any event A is always a value

between 0 and 1, inclusive.

P(S)=1: The probability of the sample space S, which represents all possible outcomes, is 1.

For any finite sequence of disjoint events A1,A2,…,An (events that have no outcomes in common), P(A1 A2 …An)=P(A1)+P(A2)+…+P(An)

1. The Probability of an Impossible Event:


An impossible event is an event that cannot happen. It is also referred to as the null event.

P( )=0

For instance, the probability of rolling a 7 on a fair six-sided die is 0, since it's an impossible event.

2. The Probability of a Certain Event:

A certain event is one that is sure to occur.

P(S)=1 where S is the sample space representing all possible outcomes.

For example, the probability that a fair six-sided die will land

on a number between 1 and 6 inclusive is 1.

3. Additive Rule of Probability:

The additive rule helps us find the probability of the union of two events.

For any two events A and B:


P(A B)=P(A)+P(B)−P(A B)

This rule is essential to account for the overlap (or intersection) of two events so as not to double-count.

4. Multiplicative Rule of Probability:

The multiplicative rule helps us find the probability of the intersection of two events.

For any two events A and B:

P(A B)=P(A)×P(B A)

Where P(B A)is the conditional probability of B given A.

5. Conditional Probability and Independence:

Conditional Probability: The probability of event B happening given that event A has already occurred is denoted as P(B A).It's calculated as:

P(B A)=P(A)P(A B)

provided that P(A)>0.


Independence: Two events A and B are said to be independent if the occurrence of one does not affect the occurrence of the other. For
independent events:

P(A B)=P(A)×P(B)

If this equation holds, A and B are independent. If not, they are dependent.

3.4 Binomial Distribution

Binomial Distribution

Assumptions and Characteristics:

Trials are independent: The outcome of one trial doesn’t affect the outcome of another trial.

Two possible outcomes for each trial: These are often referred to as "success" and "failure."
Probability of success remains the same: For every trial, the

probability of success is constant.

Fixed number of trials: We perform the experiment a set number of times, denoted as n.

Computing Probabilities: The probability of achieving exactly k successes in n trials is given by the formula: P(X=k)=(kn)pk(1−p)n−k Where: (kn)
is the binomial coefficient, p is the probability of success, 1−p is the probability of failure.

Mean and Variance of a Binomial Distribution:

Mean (μ) :n×p

Variance (σ2) :n×p×(1−p)

Applications in Real-World Problems:

Binomial Distribution:

Estimating the probability of a certain number of successes in a fixed number of trials. E.g., the probability of getting 5 heads when flipping a
coin 10 times.

Election polling to estimate the probability of a candidate receiving a certain percentage of votes.
3.5 Normal Distribution

Normal Distribution

Properties and Shape of the Normal Curve:

Bell-shaped: The normal curve is symmetric and bell-shaped.

Mean, median, and mode are equal: They all lie at the centre of the distribution.

Spread determined by standard deviation: The spread of the distribution is determined by its standard deviation, denoted as σ.

Asymptotic: The curve approaches, but never touches, the x-axis.

Standard Normal Distribution and Z-Scores:

The standard normal distribution is a special case of the normal distribution with a mean of 0 and a standard deviation of 1.

A Z-score represents the number of standard deviations a data point is from the mean. It is calculated as: Z = X−μ / σ Where: X is the data
point, μ is the mean, σ is the standard deviation.
The Empirical Rule and Percentiles:

For a normal distribution:

About 68% of the data lies within 1 standard deviation of the mean.

About 95% lies within 2 standard deviations.

About 99.7% lies within 3 standard deviations.

Applications in Real-World Problems:

In quality control, it's used to understand variations in product measurements.

Height, weight, and IQ scores of a population often follow a normal distribution.

Stock returns in finance are often analysed using the normal distribution.

3.6 Summary

A mathematical measure of the likelihood of an event occurring, ranging from 0 (impossible event) to 1 (certain event). It helps quantify
uncertainty and make predictions about outcomes based on known data.
Fundamental terms used in probability, such as:

Events: Specific outcomes or combinations of outcomes.

Sample Spaces: The set of all possible outcomes.

Simple, Compound, and Complementary Events: Types of events based on their complexity and relationship to other events.

Fundamental rules governing the behaviour and calculation of probabilities, ensuring they remain within their defined limits of 0 and 1.

A probability distribution that describes the number of successes in a fixed number of independent Bernoulli trials. It has two parameters: the
number of trials and the probability of success in an individual trial.

A continuous probability distribution characterised by its bell-shaped curve. It is defined by two parameters: the mean (average) and the
standard deviation (dispersion). Many natural phenomena and statistics are approximately normally

distributed.

Various probability distributions tailored to specific types of events or processes. Examples include:

Poisson Distribution: Used for counting the number of events in a fixed interval of time or space.
Exponential Distribution: Describes the time between events in a process that occurs continuously and independently.

Uniform Distribution: All outcomes in the sample space are equally likely.

Hypergeometric Distribution: Describes successes without replacement from a finite population.

3.7 Keywords

Sample Space: The set of all possible outcomes or results of an experiment. In any probability experiment, the sample space represents all
the outcomes that cannot be broken down any further. For example, when rolling a die, the sample space is {1, 2, 3, 4, 5, 6}.

Compound Event: An event that consists of two or more simple events. While a simple event consists of a single outcome, a compound event
encompasses multiple outcomes. For instance, when rolling a die, getting an odd number can be described as a compound event since it
includes the outcomes {1, 3, 5}.

Binomial Distribution: A probability distribution that describes the number of successes in a fixed number of Bernoulli trials. This distribution
is characterised by two parameters: the number of trials and the probability of success in a single trial. It answers questions like, "What is the
probability of getting exactly 3 heads in 5 coin tosses?"

Z-Score: A statistical measurement that describes a value's relationship to the mean of a group of values. Z-score is expressed in terms of
standard deviations from the mean. If a Z-score is 0, it indicates that the data point's score is identical to the mean score. A Z-score of 1.0
indicates a value that is one standard deviation from the mean.

Poisson Distribution: A probability distribution that expresses the probability of a given number of events occurring in a fixed interval of time
or space. It is often used to model rare events in a large population or time frame. For instance, it can represent the number of emails received
in an hour or the number of phone calls at a call centre in a day.

Conditional Probability: The probability of one event occurring given that another event has already occurred. Represented as P(A B) it ,
provides the probability of event A occurring after it is given that event B has already taken place. An example could be the probability of
someone having a cold given that they are sneezing.

3.8 Self-Assessment Questions

1. How would you differentiate between a simple event and a compound event in probability?

2. What is the significance of the Z-score in the context of the Normal Distribution?

3. Which approach to probability—Classical, Relative

Frequency, or Subjective—would be most suitable for predicting the outcome of a coin toss and why?

4. How does the memorylessness property manifest itself in the Exponential Distribution?

5. What are the key assumptions to be met for a probability distribution to be considered a Binomial Distribution?

3.9 Case Study

Title: Sales Forecasting at Disha Electronics, Mumbai


Introduction:

Disha Electronics, a popular electronics store in Mumbai, was facing unpredictable sales patterns for the past few years. With a diversified
product range from mobile phones to refrigerators, forecasting monthly sales had become increasingly challenging.

Background:

In 2021, the store witnessed an unusual peak in sales in the month of July, which was initially thought to be an anomaly. Upon closer inspection,
the management realised that a significant portion of this peak was due to the sale of air conditioners. Historically,

Mumbai experienced the monsoon in July, which led to reduced demand for air conditioners. However, due to changing climate patterns, 2021
saw a delay in monsoon, which inadvertently increased the demand for air conditioners.

Disha's management team decided to employ probability and statistics to better understand these patterns and forecast sales. They collected
data over the past ten years and mapped it against Mumbai's weather patterns. Using statistical analysis, they found a strong correlation
between the onset of monsoon and the sales of specific electronic items.

For air conditioners, there was a negative correlation with monsoon onset. In contrast, sales of items like washing machines and water purifiers
showed a positive correlation. The delayed monsoon in 2021 was an outlier, but the data suggested that the onset of the monsoon was getting
gradually delayed over the years.

Equipped with these insights, Disha Electronics started making informed decisions about inventory management. They increased the stock of
air conditioners in the anticipation of a delayed
monsoon and ran promotional offers on washing machines and water purifiers right before the expected onset of monsoon.

By leveraging probability and statistics, Disha Electronics not only improved their inventory management but also enhanced customer
satisfaction by ensuring product availability as per the demand.

Questions:

1. What factors led to the unpredictability in sales patterns at Disha Electronics?

2. How did Disha Electronics utilise probability and statistics to address the sales forecasting challenge?

3. Based on the case study, how can understanding correlations between external factors (like weather) and product sales be beneficial for
businesses?

3.10 References

1. "The Art of Statistics: Learning from Data" by David Spiegelhalter

2. "Probability, Random Variables and Stochastic Processes" by

Athanasios Papoulis and S. Unnikrishna Pillai

3. "A First Course in Probability" by Sheldon Ross


4. "Introduction to the Theory of Statistics" by Alexander M. Mood, Franklin A. Graybill, and Duane C. Boes

5. "Statistics" by Robert S. Witte and John S. Witte

You might also like