0% found this document useful (0 votes)

15 views7 pages

22amh32 - Data Analytics and Data Science Unit I & Probability Distributions and Fitting A Model 1. Probability Distributions and Fitting A Model

DATA ANALYTICS AND DATA SCIENCE

Uploaded by

Eugene Berna I

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

15 views7 pages

22amh32 - Data Analytics and Data Science Unit I & Probability Distributions and Fitting A Model 1. Probability Distributions and Fitting A Model

DATA ANALYTICS AND DATA SCIENCE

Uploaded by

Eugene Berna I

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

You are on page 1/ 7

22AMH32 – DATA ANALYTICS AND

DATA SCIENCE

UNIT I & PROBABILITY

DISTRIBUTIONS AND FITTING A
MODEL

1. PROBABILITY DISTRIBUTIONS AND FITTING A MODEL

A probability distribution is a mathematical function that describes the probability of
different possible values of a variable. Probability distributions are often depicted using
graphs or probability tables.
Example: Probability distribution
We can describe the probability distribution of one coin flip using a probability table:
Outcom Probability
e
Heads Tails
.5 .5
1.1 What is a probability distribution?
A probability distribution is an idealized frequency distribution.
A frequency distribution describes a specific sample or dataset. It’s the number of times each
possible value of a variable occurs in the dataset.
The number of times a value occurs in a sample is determined by its probability of
occurrence. Probability is a number between 0 and 1 that says how likely something is to
occur:
 0 means it’s impossible.
 1 means it’s certain.
The higher the probability of a value, the higher its frequency in a sample.
More specifically, the probability of a value is its relative frequency in an infinitely large
sample.
Infinitely large samples are impossible in real life, so probability distributions are theoretical.
They’re idealized versions of frequency distributions that aim to describe the population the
sample was drawn from.
Probability distributions are used to describe the populations of real-life variables, like coin
tosses or the weight of chicken eggs. They’re also used in hypothesis testing to determine p
values.

Example: Probability distributions are idealized frequency distributions

Imagine that an egg farmer wants to know the probability of an egg from her farm being a
certain size.
The farmer weighs 100 random eggs and describes their frequency distribution using a
histogram:

Figure 1: Frequency_distribution_example_egg_weight

She can get a rough idea of the probability of different egg sizes directly from this frequency
distribution. For example, she can see that there’s a high probability of an egg being around
1.9 oz., and there’s a low probability of an egg being bigger than 2.1 oz.
Suppose the farmer wants more precise probability estimates. One option is to improve her
estimates by weighing many more eggs.
A better option is to recognize that egg size appears to follow a common probability
distribution called a normal distribution. The farmer can make an idealized version of the egg
weight distribution by assuming the weights are normally distributed:

Figure 2: Normal_distribution_example_egg_weight
Since normal distributions are well understood by statisticians, the farmer can calculate
precise probability estimates, even with a relatively small sample size.
Variables that follow a probability distribution are called random variables. There’s special
notation you can use to say that a random variable follows a specific distribution:
 Random variables are usually denoted by X.
 The ~ (tilde) symbol means “follows the distribution.”
 The distribution is denoted by a capital letter (usually the first letter of the
distribution’s name), followed by brackets that contain the distribution’s parameters.
For example, the following notation means “the random variable X follows a normal
distribution with a mean of µ and a variance of σ2.”

There are two types of probability distributions:

 Discrete probability distributions

 Continuous probability distributions

2.1 Discrete probability distributions

A discrete probability distribution is a probability distribution of a categorical or discrete
variable.
Discrete probability distributions only include the probabilities of values that are possible. In
other words, a discrete probability distribution doesn’t include any values with a probability
of zero. For example, a probability distribution of dice rolls doesn’t include 2.5 since it’s not
a possible outcome of dice rolls.
The probability of all possible values in a discrete probability distribution add up to one. It’s
certain (i.e., a probability of one) that an observation will have one of the possible values.

2.1.1 Probability tables

A probability table represents the discrete probability distribution of a categorical variable.
Probability tables can also represent a discrete variable with only a few possible values or a
continuous variable that’s been grouped into class intervals.
A probability table is composed of two columns:

 The values or class intervals

 Their probabilities
A robot greets people using a random greeting. The probability distribution of the greetings is
described by the following probability table:
Greeting Probability

“Greetings, human!” .6

“Hi!” .1

“Salutations, organic life-form.” .2

“Howdy!” .1

Table 1: Probability table

Notice that all the probabilities are greater than zero and that they sum to one.
2.1.2 Probability mass functions
A probability mass function (PMF) is a mathematical function that describes
a discrete probability distribution. It gives the probability of every possible value of a
variable.
A probability mass function can be represented as an equation or as a graph.
Example: Probability mass function
Imagine that the number of sweaters owned per person in the United States follows a Poisson
distribution.
The probability mass function of the distribution is given by the formula:

Where:

 is the probability that a person has exactly sweaters

 is the mean number of sweaters per person ( , in this case)
 is Euler’s constant (approximately 2.718)
This probability mass function can also be represented as a graph:

Figure 3: Probability mass function

Notice that the variable can only have certain values, which are represented by closed circles.
You can have two sweaters or 10 sweaters, but you can’t have 3.8 sweaters.
The probability that a person owns zero sweaters is .05, the probability that they own one
sweater is .15, and so on. If you add together all the probabilities for every possible number
of sweaters a person can own, it will equal exactly 1.
Common discrete probability distributions
Distribution Description Example

Binomial Describes variables with two possible The number of times a

outcomes. It’s the probability distribution of the coin lands on heads
number of successes in n trials when you toss it five
with p probability of success. times
Discrete Describes events that have equal probabilities. The suit of a randomly
uniform drawn playing card

Poisson Describes count data. It gives the probability of The number of text
an event happening k number of times within a messages received per
given interval of time or space. day

Table 2: Common discrete probability distributions

2.2 Continuous probability distributions
A continuous probability distribution is the probability distribution of a continuous variable.
A continuous variable can have any value between its lowest and highest values. Therefore,
continuous probability distributions include every number in the variable’s range.
The probability that a continuous variable will have any specific value is so infinitesimally
small that it’s considered to have a probability of zero. However, the probability that a value
will fall within a certain interval of values within its range is greater than zero.
2.2.1Probability density functions
A probability density function (PDF) is a mathematical function that describes a continuous
probability distribution. It provides the probability density of each value of a variable, which
can be greater than one.
A probability density function can be represented as an equation or as a graph.
In graph form, a probability density function is a curve. You can determine the probability
that a value will fall within a certain interval by calculating the area under the curve within
that interval. You can use reference tables or software to calculate the area.
The area under the whole curve is always exactly one because it’s certain (i.e., a probability
of one) that an observation will fall somewhere in the variable’s range.
A cumulative distribution function is another type of function that describes a continuous
probability distribution.
Example: Probability density function
The probability density function of the normal distribution of egg weight is given by the
formula:

Where:
 is the probability density of egg weight
 is the mean egg weight in the population ( oz., in this case)
 is the standard deviation of egg weight in the population ( oz., in this
case)
The probability of an egg being exactly 2 oz. is zero. Although an egg can weigh very close
to 2 oz., it is extremely improbable that it will weigh exactly 2 oz. Even if a regular scale
measured an egg’s weight as being 2 oz., an infinitely precise scale would find a tiny
difference between the egg’s weight and 2 oz.
The probability that an egg is within a certain weight interval, such as 1.98 and 2.04 oz., is
greater than zero and can be represented in the graph of the probability density function as a
shaded region:

Figure 4: Probability density function

The shaded region has an area of .09, meaning that there’s a probability of .09 that an egg
will weigh between 1.98 and 2.04 oz. The area was calculated using statistical software.
Common continuous probability distributions
Distribution Description Example

Normal Describes data with values that become less probable SAT scores
distribution the farther they are from the mean, with a bell-shaped
probability density function.

Continuous Describes data for which equal-sized intervals have The amount of time
uniform equal probability. cars wait at a red light

Log-normal Describes right-skewed data. It’s the probability The average body
distribution of a random variable whose logarithm is weight of different
normally distributed. mammal species
Distribution Description Example

Exponential Describes data that has higher probabilities for small Time between
values than large values. It’s the probability earthquakes
distribution of time between independent events.

DISCUSSION QUESTIONS:
1.What are the advantages and limitations of different methods for fitting probability
distributions to empirical data?
2. How can understanding the characteristics and parameters of probability distributions
enhance the accuracy of predictive modeling?
3. In what ways do the properties of specific probability distributions influence their
suitability for modeling different types of data?

Random Variable and Mathematical Expectation
No ratings yet
Random Variable and Mathematical Expectation
31 pages
Study Material - Statistics by Jim
No ratings yet
Study Material - Statistics by Jim
46 pages
Chapter 3
100% (1)
Chapter 3
19 pages
SAP Audit
100% (2)
SAP Audit
23 pages
The Elements of Journalism and The Philippines
50% (2)
The Elements of Journalism and The Philippines
5 pages
Applications of Probability
No ratings yet
Applications of Probability
11 pages
CH-5 Stat I
100% (1)
CH-5 Stat I
20 pages
BSTA 2104 Probability and Statistics II Notes Sep Dec 2024
No ratings yet
BSTA 2104 Probability and Statistics II Notes Sep Dec 2024
75 pages
Grade 9 Cre Theory Paper
100% (1)
Grade 9 Cre Theory Paper
5 pages
Probability Distribution CP - 5,6
No ratings yet
Probability Distribution CP - 5,6
13 pages
Probability
No ratings yet
Probability
10 pages
Unit 2 (PROBABILITY DISTRIBUTIONS)
No ratings yet
Unit 2 (PROBABILITY DISTRIBUTIONS)
50 pages
Probability Distribution
100% (1)
Probability Distribution
20 pages
Probability Notes
No ratings yet
Probability Notes
7 pages
Chapter05 - Probability Disty
No ratings yet
Chapter05 - Probability Disty
17 pages
Lesson - 12
0% (1)
Lesson - 12
38 pages
Agrostar Loyalty Program
100% (4)
Agrostar Loyalty Program
11 pages
A Probability Distribution
No ratings yet
A Probability Distribution
5 pages
Types of Probability Distribution
No ratings yet
Types of Probability Distribution
10 pages
Notes 3
No ratings yet
Notes 3
13 pages
Unit 3
No ratings yet
Unit 3
70 pages
BS IMI U2 Oct23
No ratings yet
BS IMI U2 Oct23
172 pages
ch06-ProbDistribs RandomVars
No ratings yet
ch06-ProbDistribs RandomVars
11 pages
Lesson 5 Normal Distribution
No ratings yet
Lesson 5 Normal Distribution
9 pages
Unit2 Part1
No ratings yet
Unit2 Part1
107 pages
Chapter4 Stats
No ratings yet
Chapter4 Stats
8 pages
Chapter 6
No ratings yet
Chapter 6
12 pages
Assignment
No ratings yet
Assignment
11 pages
Lesson 2 Stats Prob
No ratings yet
Lesson 2 Stats Prob
19 pages
Probability FoundationalMathofAI S24
No ratings yet
Probability FoundationalMathofAI S24
7 pages
CH 06
No ratings yet
CH 06
68 pages
Probability Distribution
No ratings yet
Probability Distribution
20 pages
Statistics Notes Part-2
No ratings yet
Statistics Notes Part-2
24 pages
Lecture 6
No ratings yet
Lecture 6
43 pages
Stat Prob - Q3-Handout
No ratings yet
Stat Prob - Q3-Handout
6 pages
WINSEM2024-25 MAT1011 ETH AP2024254000664 2025-01-21 Reference-Material-I
No ratings yet
WINSEM2024-25 MAT1011 ETH AP2024254000664 2025-01-21 Reference-Material-I
27 pages
BSTATS
No ratings yet
BSTATS
32 pages
Lecture 9 - Probability COMP7180
No ratings yet
Lecture 9 - Probability COMP7180
58 pages
CH 8 - Special Continuous Probability Distribution
No ratings yet
CH 8 - Special Continuous Probability Distribution
12 pages
Unit 4
No ratings yet
Unit 4
128 pages
Module 2 Probability
No ratings yet
Module 2 Probability
35 pages
Module 2 in IStat 1 Probability Distribution
No ratings yet
Module 2 in IStat 1 Probability Distribution
6 pages
Lec # 2
No ratings yet
Lec # 2
22 pages
Module 4
No ratings yet
Module 4
87 pages
Random Variables
No ratings yet
Random Variables
5 pages
Q3 Lectures STATS
No ratings yet
Q3 Lectures STATS
7 pages
Research Methods in Economics Part II STAT
No ratings yet
Research Methods in Economics Part II STAT
350 pages
S-11 - Random Variables and Discrete Probability Distributions
No ratings yet
S-11 - Random Variables and Discrete Probability Distributions
24 pages
Module 4 1
No ratings yet
Module 4 1
55 pages
Module 4
No ratings yet
Module 4
27 pages
Stat - G. Assignment
No ratings yet
Stat - G. Assignment
21 pages
Probability Distribution of Discrete Random Variable (Lesson Plan) 2
No ratings yet
Probability Distribution of Discrete Random Variable (Lesson Plan) 2
8 pages
Probability Distribution
No ratings yet
Probability Distribution
10 pages
M3 - Statistics Foundations For Business Analysts - Presentation
No ratings yet
M3 - Statistics Foundations For Business Analysts - Presentation
35 pages
Ma s1-2 Discrete-Probability-Distributions 201024
No ratings yet
Ma s1-2 Discrete-Probability-Distributions 201024
41 pages
ATP Case-Digest Kinds-of-Agency Compilation PDF
100% (1)
ATP Case-Digest Kinds-of-Agency Compilation PDF
28 pages
Probability Distribution
No ratings yet
Probability Distribution
12 pages
Unit1 - Read-Only
No ratings yet
Unit1 - Read-Only
191 pages
LSM Grade 5 Science 3rd Trim SY 2011 - 2012
100% (3)
LSM Grade 5 Science 3rd Trim SY 2011 - 2012
7 pages
5 Probability and Probability Distribution
No ratings yet
5 Probability and Probability Distribution
43 pages
Discrete Random Variables Biostatistics College of Public Health and Health Professions University of Florida
No ratings yet
Discrete Random Variables Biostatistics College of Public Health and Health Professions University of Florida
19 pages
Increase Your Chances of Winning The Particular Lottery
No ratings yet
Increase Your Chances of Winning The Particular Lottery
3 pages
ENGDAT1 Module3 PDF
No ratings yet
ENGDAT1 Module3 PDF
35 pages
Unit II - ML
No ratings yet
Unit II - ML
29 pages
AML-IV New
No ratings yet
AML-IV New
98 pages
Types of Psychological Test: 1. Achievement and Aptitude Tests
No ratings yet
Types of Psychological Test: 1. Achievement and Aptitude Tests
6 pages
Informative Speech Assignment Packet - Leaders - Online Class
No ratings yet
Informative Speech Assignment Packet - Leaders - Online Class
6 pages
Quiz 1
No ratings yet
Quiz 1
2 pages
Theater Glossary: A Tempo: A Musical Marking Meaning That The Music Has Returned To The Original Speed of The Song
No ratings yet
Theater Glossary: A Tempo: A Musical Marking Meaning That The Music Has Returned To The Original Speed of The Song
7 pages
CE2602-CE2603 MATLAB Assignment 2018-2019
100% (1)
CE2602-CE2603 MATLAB Assignment 2018-2019
7 pages
Smart Test Series: 731298 English-12 Inter Part-II
No ratings yet
Smart Test Series: 731298 English-12 Inter Part-II
3 pages
Crown of Corruption - 1d4chan
No ratings yet
Crown of Corruption - 1d4chan
1 page
22amh32 - Data Analytics and Data Science Unit I & Mathematics Foundations For Data Science 1. Mathematics Foundations For Data Science
No ratings yet
22amh32 - Data Analytics and Data Science Unit I & Mathematics Foundations For Data Science 1. Mathematics Foundations For Data Science
5 pages
Ptsp-Modifierd (15-09-17) QB
No ratings yet
Ptsp-Modifierd (15-09-17) QB
11 pages
22amh32 - Data Analytics and Data Science Unit I & Statistical Inference and Modelling 1. Statistical Inference and Modelling
No ratings yet
22amh32 - Data Analytics and Data Science Unit I & Statistical Inference and Modelling 1. Statistical Inference and Modelling
4 pages
22amh32 - Data Analytics and Data Science Unit Iii & Counting Ones in Awindow 1. Counting Ones in A Window
No ratings yet
22amh32 - Data Analytics and Data Science Unit Iii & Counting Ones in Awindow 1. Counting Ones in A Window
6 pages
22amh32 - Data Analytics and Data Science Unit Iv & Mining Frequent Item Sets 1. Mining Frequent Item Sets
No ratings yet
22amh32 - Data Analytics and Data Science Unit Iv & Mining Frequent Item Sets 1. Mining Frequent Item Sets
6 pages
22amh32 - Data Analytics and Data Science Unit Iii & Estimating Moments 1. Estimating Moments
No ratings yet
22amh32 - Data Analytics and Data Science Unit Iii & Estimating Moments 1. Estimating Moments
4 pages
A Meta Analytic Review of Guided Notes
No ratings yet
A Meta Analytic Review of Guided Notes
25 pages
Abhishek Siddharth AARZOO FILR
No ratings yet
Abhishek Siddharth AARZOO FILR
43 pages
10.4324 9781315719016-2 Chapterpdf
No ratings yet
10.4324 9781315719016-2 Chapterpdf
16 pages
Marketing Analytics
No ratings yet
Marketing Analytics
3 pages
Mod 4
No ratings yet
Mod 4
24 pages
LM17
No ratings yet
LM17
5 pages
Time Series Analysis and Forecasting-Introduction
No ratings yet
Time Series Analysis and Forecasting-Introduction
52 pages
Existential Interpretation of The Great Gatsby: Shuhua Hou
No ratings yet
Existential Interpretation of The Great Gatsby: Shuhua Hou
4 pages
Boolean Network Modeling of Apoptosis and Insulin Resistance in Type 2 Diabetes Mellitus
No ratings yet
Boolean Network Modeling of Apoptosis and Insulin Resistance in Type 2 Diabetes Mellitus
12 pages
Banking Reform Raghuram Rajan
No ratings yet
Banking Reform Raghuram Rajan
3 pages
Lesson 2 CFED 1023
No ratings yet
Lesson 2 CFED 1023
6 pages
DLL - MTB 1 - Q3 - W1
No ratings yet
DLL - MTB 1 - Q3 - W1
8 pages
Gillon (2003) Ethics Needs Principles
No ratings yet
Gillon (2003) Ethics Needs Principles
7 pages
Reasoning Mock 1 PDF - 073445
No ratings yet
Reasoning Mock 1 PDF - 073445
2 pages
Facilitating Learning 1
No ratings yet
Facilitating Learning 1
6 pages
Module 2 - Activity 1 CFAS
No ratings yet
Module 2 - Activity 1 CFAS
2 pages
Shinnie P. Erecilla (ENH 1 Activity)
No ratings yet
Shinnie P. Erecilla (ENH 1 Activity)
1 page
Functions and Probability for Sixth Graders
From Everand
Functions and Probability for Sixth Graders
Home School Brew
No ratings yet
Probability Distributions: Six Sigma Thinking, #5
From Everand
Probability Distributions: Six Sigma Thinking, #5
Sumeet Savant
No ratings yet

22amh32 - Data Analytics and Data Science Unit I & Probability Distributions and Fitting A Model 1. Probability Distributions and Fitting A Model

Uploaded by

22amh32 - Data Analytics and Data Science Unit I & Probability Distributions and Fitting A Model 1. Probability Distributions and Fitting A Model

Uploaded by

22AMH32 – DATA ANALYTICS AND

UNIT I & PROBABILITY

1. PROBABILITY DISTRIBUTIONS AND FITTING A MODEL

Example: Probability distributions are idealized frequency distributions

There are two types of probability distributions:

 Discrete probability distributions

2.1 Discrete probability distributions

2.1.1 Probability tables

 The values or class intervals

“Salutations, organic life-form.” .2

Table 1: Probability table

 is the probability that a person has exactly sweaters

Figure 3: Probability mass function

Binomial Describes variables with two possible The number of times a

Table 2: Common discrete probability distributions

Figure 4: Probability density function

You might also like