EDA Reviewer
EDA Reviewer
TYPES OF STATISTICS
❖ Descriptive Statistics – consists of the collection, organization, summarization, and presentation of data.
■ uses the measures of central tendency (the Triple M a.k.a. mean, median, mode)
❖ Inferential Statistics – consists of generalizing from samples to populations, performing estimations and hypothesis
tests, determining relationships among variables, and making predictions.
Example:
TYPES OF STATISTICAL STUDIES
Graduating Age in my 4th Year Section:{16, 16, 16, 15, 16, 15, 16, 15, 16, 15, 15, 16, 17, 16, 15, 16, 18, 16, 15,
❖ Observational studies the researcher merely observes what is happening or what has happened in the past and
16,16, 17, 16, 16, 15, 16, 16, 16, 15, 15, 16, 16, 16, 16, 15, 16, 16, 15, 16, 16, 16}
tries to draw conclusions based on these observations.
Mean: 15.81 ~ 16, Median: 16, Mode: 16
❖ Experimental studies the researcher manipulates one of the variables and tries to determine how the manipulation
influences other variables.
Saying that the average graduating age of the section is 16 years old falls under descriptive statistics.
■ Explanatory variable
But saying that the average graduating age of the succeeding batches is 16 years oldfalls under inferential statistics.
● the variable that is manipulated
● also called independent variable
TYPES OF VARIABLES
❖ Qualitative variables – variables that can be placed into distinct categories, according to some characteristic or
■ Outcome variable
attribute.
● the variable affected by the manipulated variable
● also called dependent variable
Examples: are gender, sex, school graduated, blood type, movies watched, series finished, etc.
S = { 1, 2, 3, 4, 5, 6 }
P(X) 1 1 1 1 1 1
Example:
n( S) = 6 6 6 6 6 6 6
Rolling a Die
X = { 1, 2, 3, 4, 5, 6 }
E1: Rolling a 2 E1 = {2} n( E 1 ) =1 simple event b) What is its cumulative probability distribution?
E2: Rolling a number > 5 E2 = {6} n( E 2 ) =1 simple event c) What is the probability of rolling at most a 4?
X 1 2 3 4 5 6
E3: Rolling an even number E3 = { 2, 4, 6 } n( E 3 ) =3 compound event 2
E4: Rolling a prime number compound event CP(X = 4) =
E4 = { 2, 3, 5 } n( E 4 ) =3 3 CP(X) 1 1 1 2 5 1
6 3 2 3 6
■ Mutually exclusive events – events whose set of outcomes do not have anything in common.
For classical probability, each outcome is equally likely to occur. As a consequence of being a discrete probability distribution, there are two requirements:
❖ The probabilities for each value of the random variable in the sample space must add up to 1.
n( E ) number of outcomes E
P (E ) = =
n( S) total number of outcomes in the sample space ■ ∑ P (X ) = 1
Example: ❖ The probability for each value of the random variable in the sample space must be between 0 and 1.
Probability of getting a tails on a coin flip n( E )
1 ■ 0 ≤ P (X ) ≤ 1
P (E ) = = = 0.5 = 50%
n( S) 2
MEAN, VARIANCE & STANDARD DEVIATION
❖ Mean
Are complementary events always mutually exclusive? Yes ■ It describes the “center” of allpossible values of a random variable.
Are mutually exclusive events always complementary? No ■ can sometimes be interpreted as the value thatthe variable assumes on average.
❖ Symmetric about the mean f. how many students are between P(57 < x < 69) = 0.8385
57 and 69 inches tall 0.997 0.68
+ = 83.85
2 2
1000 × 0.8358 = 839 students
g. how many students are shorter ?
than 62 inches
INTRODUCING: z-values
z ≡ number of standard deviations from the mean
❖ X-𝜇
❖ Continuous z=
■ never touches the x-axis 𝜎
Example:
x = { 1, 2, 3, 4, 5 }
n=5
∑x 1 + 2 + 3 + 4 + 5
x⏨= = =3
n 5
∑ (x - x⏨) 2
■s =
n-1
Example: The constant a will now be referred to as z a/2 . This is to signify that it is a z-value obtainedfrom the standard normal
x = { 1, 2, 3, 4, 5 } distribution.
n=5
x⏨= 3 Notice also that the confidence interval is two-sided – the population mean is bounded on both sides, i.e. it is of the form
L<μ<U
x - x⏨= { 1 - 3, 2 - 3, 3 - 3, 4 - 3, 5 - 3 }
x - x⏨= { -2, -1, 0, 1, 2 } Lastly, this formulation of the confidence interval assumes that the population standard deviation σ is already known.
(x - x⏨) 2 = (-2) 2 , (-1) 2 , (1) 2 , (2) 2
∑ (x - x⏨) 2 = 10
10
s= = 1.6
5-1
Central Limit Theorem states that the sample mean will be normally distributed, this leads to some useful applications.
❖ Considering all possible samples of the same sample size n, the mean of the sample means is equal to the
population mean.
■ μ x⏨ = μ
❖ Considering all possible samples of the same sample size n, the standard deviation of the sample means is
equal to the population standard deviation divided by the square root of the sample size n:
𝜎 HYPOTHESIS TESTING
■ 𝜎 x⏨ = Three Methods
n ❖ the traditional method
❖ p-value method
❖ The sample mean will exhibit a normal distribution if either X is normally distributed, or n ≥ 30 ❖ the confidence interval method
𝜎 3.000
𝜎 x⏨ = = = 0.949 Two - tailed test Right - tailed Left - tailed
n 10 H0 : μ = k H0 : μ = k H0 : μ = k
H1 : μ ≠ k H1 : μ > k H1 : μ < k
CONFIDENCE INTERVAL
❖ Recall the empirical rule for normal distributions (i.e. 68-95-99.7 rule).
STATISTICAL INFERENCE
P(-1 < z < 1) ≈ 0.68
❖ Statistical test - uses the data obtained from a sample to make decision
P(-2 < z < 2) ≈ 0.95
P(-3 < z < 3) ≈ 0.997 ❖ Test Value - numerical value obtained from a statistical test
❖ Type 1 error - occurs if you reject the null hypothesis when it is true
P(z > a) = Area NORM. S. INV(1 - Area)
P(z < a) = Area NORM. S. INV(Area) ❖ Type 2 error - occurs if you do not reject the null hypothesis when it is false
P(-a < z < a) = Area Area
NORM. S. INV 0.5 - Testing the Difference Between Two Means
2 ❖ z test
Formulas: ❖ t test
■ t test for independent samples with equal variances
𝜎 𝜎 ■ t test for independent samples with unequal variances
⏨- a
❖x < μ < x⏨+ a ■ t test for dependent samples
n n
❖ Testing the Difference Between Two Proportions
𝜎 𝜎 ■ z test
⏨- za/2
❖x < μ < x⏨+ za/2
n n
❖ Testing the Difference Between Two Variances
■ F test
𝜎
❖ where z a/2 is called the maximum error of the estimate (margin of error)
n
• •
• •
♣
μ
X 2 3 5 6 9 X 1 2 5 6 8
𝟏 𝟏 𝟏 𝟏 𝟏 𝟏 𝟏 𝟏
P(X) ? P(X) ?
𝟕 𝟒 𝟏𝟒 𝟐 𝟏𝟒 𝟐 𝟐𝟖 𝟕
∑ 𝑷(𝑿) σ 𝑷ሺ𝑿ሻ
X 3.0 4.0 7.0 7.1 8.2 8.3 X 2.5 6.6 6.9 8.3 8.7 9.2
𝟏 𝟏 𝟏 𝟏 𝟏 𝟏 𝟏 𝟏 𝟐 𝟑 𝟏 𝟐
P(X) P(X)
𝟏𝟓 𝟐 𝟏𝟐 𝟐𝟎 𝟏𝟎 𝟓 𝟏𝟎 𝟔 𝟓 𝟐𝟎 𝟐𝟎 𝟏𝟓
μ μ
σ² σ²
σ σ
𝒑 𝒑
𝒒 𝒒
X 2 3 7 9 10
𝟏 𝟏 𝟏 𝟏
P(X) ?
𝟐 𝟐𝟖 𝟏𝟒 𝟒
∑ 𝑷(𝑿)
𝑪𝑷(𝑿) 𝑪𝑷(𝑿)
𝑷(𝑿 ≥ 𝟑)
𝑷(𝑿 ≤ 𝟕)
𝑷(𝑿 𝐢𝐬 𝐨𝐝𝐝)
μ
σ²
σ
𝒑
𝒒