Statistical Analysis: Dr. Shahid Iqbal Fall 2021
Statistical Analysis: Dr. Shahid Iqbal Fall 2021
Statistical Analysis: Dr. Shahid Iqbal Fall 2021
Lecture 2
Dr. Shahid Iqbal
Fall 2021
Topics
1. Central Dogma of Statistics
2. Statistical Data Distributions
Binomial Distribution
Normal Distribution
Poisson Distribution
3. Populations and Samples
4. Data Science Process
5. Exploratory Data Analysis
6. Correlation Analysis
Pearson Correlation Coefficient
Spearman Rank Correlation Coefficient
2
Statistics and Data Science
1. Binomial Distribution
2. Normal Distribution
3. Poisson Distribution
Binomial Distributions (BD)
It can be thought of as simply the probability of a
SUCCESS or FAILURE outcome in an experiment or
survey that is repeated multiple times.
Where
x = Total number of “successes” (pass or fail etc.)
p = Probability of a success in an individual trial
n = number of trials
Example 1
A fair coin is tossed 10 times. What is the probability of
getting exactly 6 heads?
Solution:
Solution:
Step 1: n = number of randomly selected items = 9.
Step 2: x = number you are asked to find the probability
for, is 6.
Step 3:
n! / (n – X)! * X!
9! / ((9 – 6)! * 6!) = 84
Example 2……..
Step 4: Find p and q. We are given p = 80%, or 0.8. So
the probability of failure is 1 – 0.8 = .2 = (20%).
Step 5: pX
= 0.86
= 0.262144
Step 6: q(n – X)
= 0.2(9-6)
= 0.23
= 0.008
Step 7:
84 × 0.262144 × .008 = 0.176.
Properties of Binomial Distributions
Heights of people.
Measurement errors.
Blood pressure.
Points on a test.
IQ scores.
Salaries.
Normal Distribution
The empirical rule tells, what percentage of your data
falls within a certain number of standard deviation from
the mean.
Solution:
Step 1: μ = 6800
Step 2: σ = 2500.
Step 3: a = 6500 // lower value
Step 4: b = 7300 //upper value.
Step 5: P(a < X < b) = P(X < b) – P (X < a)
Example 1
Solution:
Step 6: Apply formula
Step 7: P(X < b) = P (X < 7300) =
= 0.2, after matching in z-table
= 0.57926
Step 8: P(X < a) = P (X < 6500) =
= -0.12, after matching in z-table
= 0.45224
Example 1
Solution:
Step 8: P(a < X < b) = P(X < b) – P (X < a)
= 0.57926 – 0.45224
= 0.013
Answer:
P(x; μ) = (e-μ).(μx) / x!
= (2.71828 – 2).(23) / 3!
= (0.13534).(8) / 6
= 0.180
Practical Uses of the Poisson…
A textbook store sell an average of 200 books every
Saturday night. Using this data, you can predict the
probability that more books will be sold (perhaps 300 or
400) on the following Saturday nights.
Another example is the number of diners in a certain
restaurant every day. If the average number of diners for
seven days is 500, you can predict the probability of a
certain day having more dinners.
Because of this application, Poisson distributions are
used by businessmen to make forecasts about the
number of customers or sales on certain days or seasons
of the year.
Populations and Samples
The word population immediately makes us think of the
entire world’s population of 7 billion people.
42
Data Science Process/Cycle
Real World: In real world, lots of people busy at various
activities such as using Google+, competing in the
Olympics; spammers sending spam etc.
48
Exploratory Data Analysis (EDA)
“EDA” is an attitude, a state of flexibility, a willingness to
look for those things that we believe are not there, as well
as those we believe to be there.
56
Example
Subject Age x Glucose Level y
1 43 99
2 21 65
3 25 79
4 42 75
5 57 87
6 59 81
OR
58
Spearman Rank Correlation Coefficient
First, test the type of distribution of your data…whether
they follow a normal distribution or not?
59
Spearman Rank Correlation Coefficient
60
Example
The scores for nine students in physics and math are:
61
Example
62
Example
63
Example
Step 4: Sum (add up) all of your d-squared values.
4 + 4 + 1 + 0 + 1 + 1 + 1 + 0 + 0 = 12.
= 1 – (6*12)/(9(81-1))
= 1 – 72/720
= 1-0.1
= 0.9
64
Any Question
65