0% found this document useful (0 votes)
8 views17 pages

Statistics Reviewer

Uploaded by

Clarke Gregorio
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
8 views17 pages

Statistics Reviewer

Uploaded by

Clarke Gregorio
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 17

Statistics

Saturday, 24 February 2024 11:20 am

~ Statistics ~
- Collection, organization, and interpretation of data to provide
answers or solution
- Study of data
- Tool in decision-making
Statistical Processin in Making Decision
1. Planning or Designing the Collection of Data
2. Collecting Data as per Inquired by the Plan
3. Verifying the Quality of Data
4. Summary of Information
5. Examining the Summary Statistics so that Insight and Meaningful
Information

Data Collection
- Data
○ Individuals pieces of factual information recorded, which are
analyzed, interpreted and presented
○ Numeric or non-numeric
- Data Collection
○ Usually takes place early in a study
○ Often formalized through data
▪ Pre-Collection Activity
□ Agree on goals, target data, definitions and method
▪ Collection
□ Data gathering
▪ Present Findings
□ Analysis or interpretation
Common Data Collection Methods
Survey Test
Case Study Photographs, Videotapes
Interview Diaries, Journals, Logs
Observation Testimonies
Peer Review Doc Review
Group Assessment Analysis

Statistics and Probability Page 1


Types of Data Sources
- Primary Data
○ First-hand information collected, compiled, and published by
organization for some purpose
- Secondary Data
○ Second-hand information which was already collected by
someone for some purpose
○ Available for present study

Statistics
- Descriptive Statistics
○ Describing properties of data
- Inferential Statistics
○ Drawing conclusions about population based on information in
a sample

Variables
- Characteristic or attribute we observe or measured from every
element of the population
- Qualitative
○ Categorical attribute
○ Do not strictly take numerical values
○ Answers "What kind?"
○ Example: Gender, Religion
○ Discrete
▪ Can be COUNTED
▪ Example: Number of Males
- Quantitative
○ Numerical data
○ Have actual units of measure
○ Continous
▪ Can be MEASURED
▪ Example: Height of Males

Levels of Measurement of Variables


Nominal Level
- No order, distance or origin
- Determination of Equivalent
- All examples are EQUAL

Statistics and Probability Page 2


- All examples are EQUAL
- Categorical and non-numerical
- Numbers have no sense of ordering
- Example
○ Sex
○ Religion
○ Student Number
Ordinal Level
- Has order
- no distance or unique origin
- Determination of Greater or Lesser Values
- Categorical variables, but has level
- ORDERING is IMPORTANT
- Values of the variables can be ranked
- Example
○ Junior High School Level
○ Ranking in Competition
○ Socio-Economic Status
Interval Level
- Both with order and distance
- No unique origin
- Determination of Equality of Intervals or Difference
- One unit differs by a certain amount of degree from another unit
- No True Zero, only an ARBITRARY ZERO POINT (additional
measurement point)
- Example
○ Temperature
○ Years on a Timeline
○ Intelligence Quotient (IQ)
Ratio Level
- Has order, distance, and unique origin
- Determination of Equality of Ratios or Means
- One unit has so many times as much of the property as does another
unit
- meaningful (unique and non-arbitrary) absolute, fixed zero point and
allows arithmetic operations
- Example
○ Weight
○ Score in a Quiz
○ Amount of Money

Art of Data Science


Statistics and Probability Page 3
Art of Data Science
Data Science
- Scientific discovery and practice that involves the collection,
management, processing, analysis, visualization, and interpretation of
vast amounts of heterogeneous data
- Lies at the intersection of the statistical and the computational sciences,
and domain specific scholarly disciplines and application area
Data Scientist
- One-part mathematician, computer specialist, trend-setter
- Duty to collect large amounts of unruly data and organizing them
Data Mining and Big Data
Use of the most powerful progamming systems and the most efficient
algorithms to solve problems in complex applications
Open Source Programming Language
- R
○ Statistical computing and graphics by Bell Laboratories
- Python
○ Object-oriented, interpreted and interactive language
- SAS Language
○ Statistical tool by Anthony James

Statistics and Probability Page 4


Probability
Thursday, 4 April 2024 8:48 pm

Probability
- Chance that a particular event will happen

Probability Terms
- Random Equipment
○ Any activity performed wherein we arrive at a desirable
outcome
- Sample Space
○ List of all the possible outcomes of a random experiment
- Event
○ Subset of a sample space

Tree Diagram
- Graphic organizer used to list all the possibilities of a sequence of
events in a systematic way

Classical Probability
- Ratio of the number of outcomes when the event will occur over the total
number of possible outcomes of the random process

n = number of outcomes favorable to A


N = number of possible outcomes
P(a) = probability of an event

Frequentist/Empirical Probability
Statistics and Probability Page 5
Frequentist/Empirical Probability
- Limiting value of the relative frequency of occurenct of the event if the
random process were to be repeated endlessly

P(A) = probability of an event

~ Events ~
- Mutually Exclusive Events
○ Cannot occur at the same time
○ UNION of Events
○ Denoted by ∪
- Not Mutually Exclusive Events
○ Events that have at least one outcome in common
○ INTERSECTION of Events
○ Denoted by ∩

Statistics and Probability Page 6


Random Variable
Thursday, 4 April 2024 9:08 pm

Random Variable
- Variable whose value is a real number determined by each element in the sample
space

Types of Random Variable


- Discrete Random Variable
○ Can only take countable values
○ Set of possible values is in one-to-one correspondence with a subset of
natural numbers
- Continous Random Variable
○ Can assume an infinite number of values in an interval between two specific
values

Possible Values of a Random Variable


Example 1
If 3 coins are tossed, what numbers can be assigned for the frequency of tails that will
occur?

Step 1. Determine the sample space. Let H represent the head and T represents the tail
S = {TTT, TTH, THT, HTT, HHT, HTH, THH, HHH}
n (S) = 8

Step 2. Construct a table to determine the random variable

Possible Outcomes Y (no. of tails)


TTT 3 So, the possible
TTH 2 values of
the random
THT 2 variable y are
HTT 2 0, 1, 2, and 3
HHT 1
HTH 1
THH 1
HHH 0

Discrete Probability Distribution of a Random Variable


Statistics and Probability Page 7
Discrete Probability Distribution of a Random Variable
- Discrete Probability Distribution
○ Table, graph, or a formula listing all possible values that a discrete random
variable can take on, along w/ the associated probabilities
- Probability Mass Function (PMF)
○ Probability distribution of a discrete random variable specifies the probability of
each random variable

Properties of a Discrete Probability Distribution


- Probability of each value of the random variable must be between or equal to 0 and 1

0 < P(x) < 1

Statistics and Probability Page 8


Binomial Distribution
Thursday, 4 April 2024 9:24 pm

Binomial Distribution
- Binomial Experiment
○ Any activity performed wherein we arrived at the
desirable/possible outcome
- Bernoulli Trial
○ Outcome can either be "success" or "failure"
- Outcome
○ Possible results from an experiment trial
- Frequency of the Outcome
○ Number of times a certain outcome will occur
- Binomial Distribution
○ Discrete probability distribution of the number of successes in a
sequence of n independent Bernoulli Trials
○ A trial is independent - the result of the first trial doesn’t affect the
result of the next
○ Examples: Toss Coin, Exam

Binomial Probability Distribution

P (X = x) = b (x; n, p) = nCx * px * qn-x

b (x; n, p) = value of Binomial Probability

Formula of Binomial Probability


nC x * px * qn-x

x = no. of successful trials


n = no. of Independent Bernoulli Trials
n - x = number of failures
p = number of probability of success
q = number of probability of failure ( q = 1 - p)

Mean μ (expected value) of a Binomial Probability Distribution

μ = E(x) = np

n = no. of Independent Bernoulli Trials


Statistics and Probability Page 9
n = no. of Independent Bernoulli Trials
p = no. of probability of success

Variance σ2 of a Binomial Probability Distribution

σ2 = npq

q = no. of probability of failure (q = 1 - p)

Statistics and Probability Page 10


Poisson Probability Distribution
Thursday, 4 April 2024 9:31 pm

Poisson Probability Distribution


- Discrete probability distribution that expresses the probability of a
number of events occuring in a fixed period of time
- Poisson Random Variable
○ Counts the number of rare events that occur in a specified time
interval or specific region

Possion Probability Formula

p (x; μ) value of Poisson Probability


x = Poisson Random Variable
e = euler's constant
μ = average number of successes occurring in an interval (λ > 0)

Statistics and Probability Page 11


Normal Distribution
Thursday, 4 April 2024 9:45 pm

Normal Distribution
- Continous probability
- Bell-shaped
- Symmetrical
- Basis of Inference
- Approximation of Other Distribution

- Changes position by μ (mean)


- Stretches by σ (standard deviation)

Normal Probability Distribution


- Distribution of continous random variables
- Many random variables are either normally distributed or
approximately distributed
- Example: Height, Weight, Exam Scores

Empirical Rule (Three-Sigma Rule)


- 68% - 95 % - 99.7 %
- Theoretical results based on the analysis of the normal distribution
- Tells us that for a normally distributed variable, the following are true:

Statistics and Probability Page 12


Approximately 68.46% of the data lie within 1 standard deviation of the
mean
P (μ - 1σ < X < μ + 1σ)

Approximately 95.46% of the data lie within 2 standard deviation of the


mean
P (μ - 2σ < X < μ + 2σ)

Approximately 99.73% of the data lie within 3 standard deviation of the


mean
P (μ - 3σ < X < μ + 3σ)

Standard Normal Distribution


- "z - distribution"
- Special normal distribution that has a population mean (μ = 0) and
standard deviation (σ = 1)

Z Score
- "z-value"
- Standard score that tells you how many standard deviations away from
the mean an individual value measurement lies in the distribution

Z = z - score
x = normal random variable
μ = mean of X
σ = standard deviation of X

Positive z - score means that you x value is greater than the mean
(x > μ)
Statistics and Probability Page 13
(x > μ)

Negative z - score means that you x value is less than the mean
(x < μ)

Zero z - score means that you x value is equal to the mean


(x = μ)

Z - Tests and P - Value


- z - score is the test statistics used in a z-test
- z - test is used to compare the means of two groups, or to compare the
meano f a group to a set value
- Its null hypothesis typically assumes no difference between groups
- Area under the curve to the right of a z-score is the p-value, and it's the
likelihood of your observation occurring if the null hypothesis is true
- p - value of 0.05 or less means that your results are unlike to have
risen by chance; it indicates a significant effect

Statistics and Probability Page 14


Sampling and Sampling Distribution
Thursday, 4 April 2024 10:07 pm

Random Sampling
- Method of selecting a sample (random sample) from a statistical
population
- Method of which n measurements form a population is a subset of a
population selected in a manner such that every sample of size n from
the population has a equal chance of being selected
- Population
○ complete set of people w/ specialized characters
- Sample
○ Small part or quantity intended to show what the whole is like
○ Subset of a population

Types of Random Sampling


- Probability Sampling
○ Every member of the target population has a known chance of
being included in the sample
○ Simple Random Sampling
▪ Every element has the same possibility of being selected
○ Systematic Sampling
▪ List of elements of the population is used as a sample frame,
and the elements to be included in the desired samples are
selected through by skipping through the list at intervals
○ Stratified Sampling
▪ Population is divided into strata and then the samples are
randomly selected separately from each stratum
○ Cluster or Area Sampling
▪ Entire population is broken into small groups, or clusters, and
then, some of the clusters are randomly selected
▪ The data from the rnadomly selected clusters are the one
analyzed
- Non-Probability Sampling
○ Researchers choose sample based on a subjective judgment,
preferrable random selection
○ Convenience Sampling
▪ Units are selected for inclusion in the sample because they are
the easiest for the researchers to access
○ Purposive (Judgmental) Sampling
▪ Chosen only on the basis of the researcher's knowledge and
Statistics and Probability Page 15
▪ Chosen only on the basis of the researcher's knowledge and
judgment
○ Snowmall Sampling
▪ Currently enrolled research participants help recruit
future subjects
○ Quota Sampling
▪ Non-random selection of a predetermined number or
portion of units
▪ Mutually exclusive subgroups (strata)

Statistic
▪ Measure that describes a sample
- Usually denoted by Roman Numerals
Parameter
▪ Measure that describes a population
- Usually denoted by Gree Letters

Population Sample
Mean

mean (mu-bar x bar


σ Standard S
Deviation
σ2 Variance S2
P Proportion
N Size N

Formulas

Population Sample
Mean

Middle Rank

Unimodal = 1 Mode Unimodal = 1


Bimodal = 2 Bimodal = 2
Multimodal = 3 above Multimodal = 3 above
No Mode = 0 No Mode = 0
Standard Deviation
σ Statistics and Probability Page 16
Standard Deviation
σ

Variance
σ2 =

Statistics and Probability Page 17

You might also like