0% found this document useful (0 votes)

21 views8 pages

EDA Reviewer

Uploaded by

Dianne Rose Nava

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

21 views8 pages

EDA Reviewer

Uploaded by

Dianne Rose Nava

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 8

EDA

INTRODUCTION TO STATISTICS ❖ Ratio level of measurement

❖ Statistics – the science of conducting studies to collect, organize, summarize, analyze,and draw conclusions from ■ possesses all the characteristics of interval measurement, and there exists a true zero. In addition, true ratios
data. exist when the same variable is measured on two different members of the population.
■ Variable - a characteristic or attribute that can assume different values. The simple quantitative variables you usually think of are under ratio measurements: height, weight, age,
■ Data - the values (measurements or observations) that the variables can assume. volume and so on.

Examples: TYPES OF SAMPLING

Variable: Ages in a Certain Family
❖ Random sampling
Data: {17, 22, 26, 52, 53}
❖ Systematic sampling
❖ Stratified sampling
Variable: Biological Sex
❖ Cluster sampling
Data: {Male, Female}

As in mathematics, the list of data is called a data set.

And each element is called a data value.

TYPES OF STATISTICS
❖ Descriptive Statistics – consists of the collection, organization, summarization, and presentation of data.
■ uses the measures of central tendency (the Triple M a.k.a. mean, median, mode)

❖ Inferential Statistics – consists of generalizing from samples to populations, performing estimations and hypothesis
tests, determining relationships among variables, and making predictions.

Example:
TYPES OF STATISTICAL STUDIES
Graduating Age in my 4th Year Section:{16, 16, 16, 15, 16, 15, 16, 15, 16, 15, 15, 16, 17, 16, 15, 16, 18, 16, 15,
❖ Observational studies the researcher merely observes what is happening or what has happened in the past and
16,16, 17, 16, 16, 15, 16, 16, 16, 15, 15, 16, 16, 16, 16, 15, 16, 16, 15, 16, 16, 16}
tries to draw conclusions based on these observations.
Mean: 15.81 ~ 16, Median: 16, Mode: 16
❖ Experimental studies the researcher manipulates one of the variables and tries to determine how the manipulation
influences other variables.
Saying that the average graduating age of the section is 16 years old falls under descriptive statistics.
■ Explanatory variable
But saying that the average graduating age of the succeeding batches is 16 years oldfalls under inferential statistics.
● the variable that is manipulated
● also called independent variable
TYPES OF VARIABLES
❖ Qualitative variables – variables that can be placed into distinct categories, according to some characteristic or
■ Outcome variable
attribute.
● the variable affected by the manipulated variable
● also called dependent variable
Examples: are gender, sex, school graduated, blood type, movies watched, series finished, etc.

PROBABILITY & COUNTING RULES

❖ Quantitative variables – numerical and can be ordered or ranked.
❖ Probability – quantifies the chances of an event or outcome of occurring. It says how likely (or unlikely) something
can happen for a given trial.
Examples: are height, weight, age, BMI, grade equivalents, population, number of dogs,number of movies watched,
■ A trial is a chance process that leads to well-defined results called outcomes.
etc.
■ Discrete variables – can have data values that can be counted.
❖ Sample Space – the set of all possible outcomes in a trial.
■ Continuous variables – can have an infinite number of data values between any two specific values. They
are obtained by measuring. They often include fractions and decimals
Examples:
Examples: ■ Flipping a Coin S = { H, T }
■ Rolling a Die S = { 1, 2, 3, 4, 5, 6 }
Grade Equivalents D
■ Flipping two coins in a row S = { HH, TT, HT, TH }
Population D
Height C
BMI C ■ A tree diagram may be used determining the sample space for sequences of events.
Example: Biological Sex for two Children in a Family
This type of classification uses measurement scales, and the four common levels of measurement are used:
1 2
M
❖ Nominal level of measurement
■ classifies data into mutually exclusive (non overlapping), exhausting categories in which no order or ranking F S = { FM, FF, MM, MF }
can be imposed on the data. F
Some examples include eye color, religion, degree program, and nationality. Cardinality:
M n(S)=4
M
❖ Ordinal level of measurement
■ classifies data into categories that can be ranked; however, precise differences between the ranks do not exist. F
Some examples include letter grades, clothes sizes, and level of educational attainment. There are two common ways for which probabilities are determined:
❖ Classical Probability – uses sample spaces to determine the numerical probability that an event will happen. It
❖ Interval level of measurement assumes that all outcomes in the sample space are equally likely to occur.
■ ranks data, and precise differences between units of measure do exist; however, there is no meaningful zero.
Some examples include temperature, dates in a certain month, and calendar years. ❖ Empirical Probability – uses actual observations to determine the frequency of an event occurring.
Law of Large Numbers – the empirical probability should approach the classical probability, if available, assuming the the sum of several values of P(X) isobtained. An alternative distribution, called the cumulative probability distribution.
trials are fair, as the number of trials increaseslargely. ❖ The cumulative probability distribution is obtained by adding probabilities, and each sum is plotted versus the
random variable values.
❖ Event – consists of a desired set of outcomes of a trial.
Examples:
■ The set of event outcomes are subsets of the sample space of the trial. a) What is its probability distribution?
Consider the trial of rolling a die:
■ An event is a simple event if it only has one outcome. X 1 2 3 4 5 6
Let X = the number that appears
■ Otherwise, it will now be a compound event, which consists of two or more outcomes.

S = { 1, 2, 3, 4, 5, 6 }
P(X) 1 1 1 1 1 1
Example:
n( S) = 6 6 6 6 6 6 6
Rolling a Die
X = { 1, 2, 3, 4, 5, 6 }
E1: Rolling a 2 E1 = {2} n( E 1 ) =1 simple event b) What is its cumulative probability distribution?
E2: Rolling a number > 5 E2 = {6} n( E 2 ) =1 simple event c) What is the probability of rolling at most a 4?
X 1 2 3 4 5 6
E3: Rolling an even number E3 = { 2, 4, 6 } n( E 3 ) =3 compound event 2
E4: Rolling a prime number compound event CP(X = 4) =
E4 = { 2, 3, 5 } n( E 4 ) =3 3 CP(X) 1 1 1 2 5 1
6 3 2 3 6
■ Mutually exclusive events – events whose set of outcomes do not have anything in common.

For classical probability, each outcome is equally likely to occur. As a consequence of being a discrete probability distribution, there are two requirements:
❖ The probabilities for each value of the random variable in the sample space must add up to 1.
n( E ) number of outcomes E
P (E ) = =
n( S) total number of outcomes in the sample space ■ ∑ P (X ) = 1
Example: ❖ The probability for each value of the random variable in the sample space must be between 0 and 1.
Probability of getting a tails on a coin flip n( E )
1 ■ 0 ≤ P (X ) ≤ 1
P (E ) = = = 0.5 = 50%
n( S) 2
MEAN, VARIANCE & STANDARD DEVIATION
❖ Mean
Are complementary events always mutually exclusive? Yes ■ It describes the “center” of allpossible values of a random variable.
Are mutually exclusive events always complementary? No ■ can sometimes be interpreted as the value thatthe variable assumes on average.

COUNTING RULES ■ μ ≡ mean = ∑ X × P(X)]

❖ Permutation used for counting arrangements where order matters.
❖ Combination used for countng arrangements where order does not matter ❖ Variance
■ can sometimes be interpreted as the value thatthe variable assumes on average.
■ Fundamental Counting Rule – if the events are independent, then thetotal number of outcomes is simply the
product of the number of outcomes of each successive event. ■ 𝜎 2 ≡ variance = ∑ [ X 2 × P(X) - μ 2 = ∑ (X - μ) 2 × P(X)

DISCRETE PROBABILITY DISTRIBUTIONS ❖ Standard Deviation

❖ A random variable is a variable whose values are determined by chance. ■ 𝜎
■ For every possible value of the random variable, there is a corresponding probability.
■ If the probabilities for each value of the random variable are plotted, then what we get is a probability Rounding Rule: When reporting the mean, variance and standard deviation, the rule is that it should be reported to one
distribution. decimal place more than the values of X.
■ If the variable can assumed discrete data values, then what we get is a discrete probability distribution.
EXPECTED VALUE
Examples: ❖ The expected value is the single value that we are expecting that the mean will approachas we obtain more
samples.
1. Consider the trial of flipping a coin:
n( X ) X 0 1 ■ if the mean is based on an ideal probability distribution (e.g. for coin flips, dicerolls), this mean is exactly based
Let X = the number of heads that appear P (X ) = on the expected value: after a lot of rolls, we expect toget the mean value on average.
a) What are the possible values of X? n( S) = 2
b) What is the probability P(X) for each X? P(X) 1 1
S = { H, T } 2 2
CONTINUOUS PROBABILITY DISTRIBUTIONS AND THE NORMAL DISTRIBUTIONS
X = { 0, 1 }

2. Now, consider the trial of flipping two

S = { HH, TT, HT, TH } X 0 1 2
coins:
Let X = the number of heads that appear
X = { 0, 1, 2 } P(X) 1 1 1
a) What are the possible values of X?
b) What is the probability P(X) for each X? 4 2 4
c) What is the probability of getting at most
one head, i.e. P(X ≤ 1)?

Figure 1: normal distribution

Properties
❖ Bell-shaped NORMAL DISTRIBUTION - EMIPIRICAL RULE
■ bell curve
■ normal curve Example:
■ gaussian distribution (after Carl Friedrich Gauss) The heights in a school's population were determined to have a mean of 66 inches, with a standard deviation of 3
inches. Assume that height is a normally distributed variable.
❖ Mean=Median=Mode In a batch of 1000 students, estimate:
■ Median=value of x which divides the area under the curve into two equal parts
■ Mode=value of x that correspond to the 'highest point'
■ Mean=calculated in a similar way as discrete variables
a. how many students are P(63 < x < 69) = 0.68
● ∑ X × P(X)] between 63 and 69 inches tall 1000 × 0.68 = 680 students
b. how many students are P(60 < x < 72) = 0.95
between 60 and 72 inches tall 1000 × 0.95 = 950 students
c. how many students are between P(66 < x < 72) = 0.475
66 and 72 inches tall 0.95
= 0.475
2
1000 × 0.475 = 475 students
d. how many students are shorter P(X < 69) = 0.84
❖ Unimodal than 69 inches. 0.68
0.5 + = 0.84
2
1000 × 0.84 = 840 students
e. how many students are taller P(X > 69) = 0.16
than 69 inches 0.68
0.5 - = 0.16
2
1000 × 0.16 = 160 students

❖ Symmetric about the mean f. how many students are between P(57 < x < 69) = 0.8385
57 and 69 inches tall 0.997 0.68
+ = 83.85
2 2
1000 × 0.8358 = 839 students
g. how many students are shorter ?
than 62 inches

INTRODUCING: z-values
z ≡ number of standard deviations from the mean
❖ X-𝜇
❖ Continuous z=
■ never touches the x-axis 𝜎

STANDARD NORMAL DISTRIBUTION

❖ Total Area = 1
❖ is a normal distribution with μ = 0 and 𝜎 = 1
❖ Empirical Rule (68, 95, 99.7 rule)
CENTRAL LIMIT THEOREM
■ one standard deviation above and below the mean = 68% of total area
■ two standard deviation above and below the mean = 95% of total area
■ three standard deviation above and below the mean = 99.7% of total area ⏨)
❖ Sample Mean ( x
■ denoted by , is obtained not from the entire population, but only from a sample.
■ the best estimate that we can get as near to the population mean as possible.
∑x
⏨=
■ x where : sample size = n
n

Example:
x = { 1, 2, 3, 4, 5 }
n=5
∑x 1 + 2 + 3 + 4 + 5
x⏨= = =3
n 5

❖ Sample Standard Deviation (s)

∑ (x - x⏨) 2
■s =
n-1
Example: The constant a will now be referred to as z a/2 . This is to signify that it is a z-value obtainedfrom the standard normal
x = { 1, 2, 3, 4, 5 } distribution.
n=5
x⏨= 3 Notice also that the confidence interval is two-sided – the population mean is bounded on both sides, i.e. it is of the form
L<μ<U
x - x⏨= { 1 - 3, 2 - 3, 3 - 3, 4 - 3, 5 - 3 }
x - x⏨= { -2, -1, 0, 1, 2 } Lastly, this formulation of the confidence interval assumes that the population standard deviation σ is already known.
(x - x⏨) 2 = (-2) 2 , (-1) 2 , (1) 2 , (2) 2

∑ (x - x⏨) 2 = 10
10
s= = 1.6
5-1

Central Limit Theorem states that the sample mean will be normally distributed, this leads to some useful applications.

❖ Considering all possible samples of the same sample size n, the mean of the sample means is equal to the
population mean.
■ μ x⏨ = μ

❖ Considering all possible samples of the same sample size n, the standard deviation of the sample means is
equal to the population standard deviation divided by the square root of the sample size n:
𝜎 HYPOTHESIS TESTING
■ 𝜎 x⏨ = Three Methods
n ❖ the traditional method
❖ p-value method
❖ The sample mean will exhibit a normal distribution if either X is normally distributed, or n ≥ 30 ❖ the confidence interval method

Example: Statistical Hypothesis - is a conjecture about population parameter.

Consider the problem from earlier on kids’ TV habits. Determine the mean and standard deviation of the sample mean if
❖ Null Hypothesis (H 0 ) - there is no difference between two parameters
it is known that = 25.000, and = 3.000. n=10
μ x⏨ = μ = 25.000 ❖ Alternative Hypothesis (H 1 ) - existence of difference between two parameter

𝜎 3.000
𝜎 x⏨ = = = 0.949 Two - tailed test Right - tailed Left - tailed
n 10 H0 : μ = k H0 : μ = k H0 : μ = k
H1 : μ ≠ k H1 : μ > k H1 : μ < k
CONFIDENCE INTERVAL
❖ Recall the empirical rule for normal distributions (i.e. 68-95-99.7 rule).
STATISTICAL INFERENCE
P(-1 < z < 1) ≈ 0.68
❖ Statistical test - uses the data obtained from a sample to make decision
P(-2 < z < 2) ≈ 0.95
P(-3 < z < 3) ≈ 0.997 ❖ Test Value - numerical value obtained from a statistical test

❖ Type 1 error - occurs if you reject the null hypothesis when it is true
P(z > a) = Area NORM. S. INV(1 - Area)
P(z < a) = Area NORM. S. INV(Area) ❖ Type 2 error - occurs if you do not reject the null hypothesis when it is false
P(-a < z < a) = Area Area
NORM. S. INV 0.5 - Testing the Difference Between Two Means
2 ❖ z test
Formulas: ❖ t test
■ t test for independent samples with equal variances
𝜎 𝜎 ■ t test for independent samples with unequal variances
⏨- a
❖x < μ < x⏨+ a ■ t test for dependent samples
n n
❖ Testing the Difference Between Two Proportions
𝜎 𝜎 ■ z test
⏨- za/2
❖x < μ < x⏨+ za/2
n n
❖ Testing the Difference Between Two Variances
■ F test
𝜎
❖ where z a/2 is called the maximum error of the estimate (margin of error)
n
• •
• •

♣
μ
X 2 3 5 6 9 X 1 2 5 6 8
𝟏 𝟏 𝟏 𝟏 𝟏 𝟏 𝟏 𝟏
P(X) ? P(X) ?
𝟕 𝟒 𝟏𝟒 𝟐 𝟏𝟒 𝟐 𝟐𝟖 𝟕

∑ 𝑷(𝑿) σ 𝑷ሺ𝑿ሻ

𝑪𝑷(𝑿) 𝑪𝑷(𝑿) 𝑪𝑷ሺ𝑿ሻ 𝑪𝑷ሺ𝑿ሻ

𝑷(𝑿 ≥ 𝟑) 𝑷ሺ𝑿 ≥ 𝟐ሻ
𝑷(𝑿 ≤ 𝟓) 𝑷ሺ𝑿 ≤ 𝟓ሻ
𝑷(𝑿 𝐢𝐬 𝐨𝐝𝐝) 𝑷ሺ𝑿 𝐢𝐬 𝐨𝐝𝐝ሻ

X 3.0 4.0 7.0 7.1 8.2 8.3 X 2.5 6.6 6.9 8.3 8.7 9.2
𝟏 𝟏 𝟏 𝟏 𝟏 𝟏 𝟏 𝟏 𝟐 𝟑 𝟏 𝟐
P(X) P(X)
𝟏𝟓 𝟐 𝟏𝟐 𝟐𝟎 𝟏𝟎 𝟓 𝟏𝟎 𝟔 𝟓 𝟐𝟎 𝟐𝟎 𝟏𝟓

μ μ
σ² σ²
σ σ

𝒑 𝒑
𝒒 𝒒
X 2 3 7 9 10
𝟏 𝟏 𝟏 𝟏
P(X) ?
𝟐 𝟐𝟖 𝟏𝟒 𝟒

∑ 𝑷(𝑿)

𝑪𝑷(𝑿) 𝑪𝑷(𝑿)
𝑷(𝑿 ≥ 𝟑)
𝑷(𝑿 ≤ 𝟕)
𝑷(𝑿 𝐢𝐬 𝐨𝐝𝐝)

X 1.8 2.1 3.0 3.5 7.0 8.7

𝟕 𝟏 𝟏 𝟓 𝟏 𝟏
P(X)
𝟑𝟎 𝟐𝟎 𝟏𝟓 𝟏𝟐 𝟑𝟎 𝟓

μ
σ²
σ

𝒑
𝒒

Statistics and Probability 2
No ratings yet
Statistics and Probability 2
16 pages
NLP Module 2
No ratings yet
NLP Module 2
73 pages
Statistics and Probability
No ratings yet
Statistics and Probability
43 pages
2 Inferential+Statistics+ (Theoretical)
No ratings yet
2 Inferential+Statistics+ (Theoretical)
4 pages
Intro To Probability (Pattern Recognition)
No ratings yet
Intro To Probability (Pattern Recognition)
94 pages
Statistical Inference
No ratings yet
Statistical Inference
106 pages
Stats and Prob Reviewer
No ratings yet
Stats and Prob Reviewer
7 pages
Stats Review
No ratings yet
Stats Review
65 pages
Q3 Lectures STATS
No ratings yet
Q3 Lectures STATS
7 pages
Statistics and Probability Reviewer Quarter 3
No ratings yet
Statistics and Probability Reviewer Quarter 3
19 pages
Probability FoundationalMathofAI S24
No ratings yet
Probability FoundationalMathofAI S24
7 pages
Viva Que 1
No ratings yet
Viva Que 1
43 pages
Statistical Methods
No ratings yet
Statistical Methods
16 pages
Revision - Elements or Probability: Notation For Events
No ratings yet
Revision - Elements or Probability: Notation For Events
20 pages
Probability and Statistics
No ratings yet
Probability and Statistics
8 pages
Grade 11 Third Quarter Statistics and Probability Reviewer - Docx 1
No ratings yet
Grade 11 Third Quarter Statistics and Probability Reviewer - Docx 1
5 pages
Stat - G. Assignment
No ratings yet
Stat - G. Assignment
21 pages
Statistic S at Probabili TY: Teacher: Aldwin N. Petronio
No ratings yet
Statistic S at Probabili TY: Teacher: Aldwin N. Petronio
44 pages
Statistics Reviewer
No ratings yet
Statistics Reviewer
17 pages
Unit 3 R As A Set of Statistical Tables
No ratings yet
Unit 3 R As A Set of Statistical Tables
31 pages
Statistics and Probability
No ratings yet
Statistics and Probability
2 pages
Probability Theory: Much Inspired by The Presentation of Kren and Samuelsson
No ratings yet
Probability Theory: Much Inspired by The Presentation of Kren and Samuelsson
27 pages
STATISTICS
No ratings yet
STATISTICS
9 pages
Probability Distributions-Sarin B
No ratings yet
Probability Distributions-Sarin B
20 pages
Random Variables and Probability Distribution
No ratings yet
Random Variables and Probability Distribution
50 pages
Ders 1
No ratings yet
Ders 1
34 pages
Stats and Probab Reviewer
No ratings yet
Stats and Probab Reviewer
4 pages
BASIC PROBABILITY - MSC PDF
No ratings yet
BASIC PROBABILITY - MSC PDF
72 pages
Introduction To Statistics
No ratings yet
Introduction To Statistics
37 pages
Probs-Stats Revision Notes
No ratings yet
Probs-Stats Revision Notes
19 pages
1 Intro-Statistics
No ratings yet
1 Intro-Statistics
61 pages
08 STATSPROB Third Quarter
No ratings yet
08 STATSPROB Third Quarter
9 pages
Chapter05 - Probability Disty
No ratings yet
Chapter05 - Probability Disty
17 pages
Lecture Note On Biostatistics
No ratings yet
Lecture Note On Biostatistics
74 pages
Data Analysis Stata and SPSS A Handbook PDF
No ratings yet
Data Analysis Stata and SPSS A Handbook PDF
33 pages
Research - Stats Notes
No ratings yet
Research - Stats Notes
44 pages
Stats
No ratings yet
Stats
2 pages
GENBIO
No ratings yet
GENBIO
2 pages
Module Contents: Introduction To Statistics and Probability
No ratings yet
Module Contents: Introduction To Statistics and Probability
10 pages
BusStats Finals
No ratings yet
BusStats Finals
15 pages
Week 9+10+11
No ratings yet
Week 9+10+11
82 pages
Classify Sample Observation
No ratings yet
Classify Sample Observation
2 pages
Statistical Methods
No ratings yet
Statistical Methods
15 pages
Tài liệu 5
No ratings yet
Tài liệu 5
19 pages
Stats - Prob - 3rd Quarter
No ratings yet
Stats - Prob - 3rd Quarter
4 pages
DMV - Unit I
No ratings yet
DMV - Unit I
44 pages
Probability Distribution
No ratings yet
Probability Distribution
10 pages
Module 5 - Inferential Statistics and Their Application
No ratings yet
Module 5 - Inferential Statistics and Their Application
43 pages
COM 201 - Inferential Statistics - 18032022-1
No ratings yet
COM 201 - Inferential Statistics - 18032022-1
58 pages
Unit 4.
No ratings yet
Unit 4.
22 pages
What Is Statistic
No ratings yet
What Is Statistic
129 pages
Statistics and Probability Reviewer
No ratings yet
Statistics and Probability Reviewer
7 pages
ML Unit-3
No ratings yet
ML Unit-3
18 pages
An Introduction To Basic Statistics & Probability (Shenek Heyward)
No ratings yet
An Introduction To Basic Statistics & Probability (Shenek Heyward)
40 pages
Statistics and Probability Reviewer
No ratings yet
Statistics and Probability Reviewer
3 pages

EDA Reviewer

Uploaded by

EDA Reviewer

Uploaded by

EDA

INTRODUCTION TO STATISTICS ❖ Ratio level of measurement

Examples: TYPES OF SAMPLING

As in mathematics, the list of data is called a data set.

PROBABILITY & COUNTING RULES

COUNTING RULES ■ μ ≡ mean = ∑ X × P(X)]

DISCRETE PROBABILITY DISTRIBUTIONS ❖ Standard Deviation

2. Now, consider the trial of flipping two

Figure 1: normal distribution

STANDARD NORMAL DISTRIBUTION

❖ Sample Standard Deviation (s)

Example: Statistical Hypothesis - is a conjecture about population parameter.

𝑪𝑷(𝑿) 𝑪𝑷(𝑿) 𝑪𝑷ሺ𝑿ሻ 𝑪𝑷ሺ𝑿ሻ

X 1.8 2.1 3.0 3.5 7.0 8.7

You might also like