Stats Reviewer
Stats Reviewer
Random Variable
Is a numerical quantity that represents the outcome of an experiment. It is usually
denoted by an uppercase letter of the English alphabet. It is a mapping of the random
outcome to a number.
Examples:
Q = the number of times the coin turns up a head in five trials of tossing it
S = The total running time of a track and field athlete
T = the number of points of an MVP player
Sample Space
A collection of all possible events usually denoted by the letter “S”. The elements in the
sample space are called sample points.
1. Determine the sample space. Assign letters that will represent each outcome.
2. Count the number of the value of the random variable (capital letter assigned).
Example 1:
Suppose a coin is tossed twice. Let X be the number of heads that occur. Determine the
possible values of a random variable X.
Step 1: List the sample space of the experiment.
H - Head S = {HH, HT ,TH ,TT}
T - Tail
Step 2: Count the number of heads that occur.
HH 2
HT 1
TH 1
TT 0
2 HH 1
1 HT, TH 2
0 TT 1
Total: 4
2 1 1/4
1 2 1/2
0 1 1/4
Total: 4 1
Types of Random Variable
1. Discrete Random Variable
Is a random variable associated with numerical values or a set of possible
outcomes that are countable or whole numbers only. Countable or listable values
that take on one specific value (whole/counting numbers, finite)
2. Continuous Random Variable
Is a random variable where the observed numerical values are in a continuous
scale. This can take on non-integers as this can take on any of the numbers
contained within the number line. (only accurate as the decimal places)
Number of Heads 0 1 2
A probability mass function (PMF) is a function over the sample space of a discrete
random variable X which gives the probability that X is equal to a certain value.
f(x)=P[X=x] ; f ( x ) = P [ X = x ] .
Good data allows organizations to establish baselines, benchmarks, and goals to keep
moving forward.
Data allows you to measure. You will establish baselines, find benchmarks, and set
performance goals.
Purpose of Statistics
It is more than collecting and organizing data. It is about analyzing and displaying
information coherently so others can:
- Observe Patterns
- Determine Relationships
- Draw inferences or conclusions about what is seen
Statistics is to gather, organize, analyze, and interpret data. It is the art and science of
designing studies and analyzing the data that the studies produce
An element (individual) is a specific subject or object about the information collected.
An element is sometimes referred to as a unit, case or member and may represent
people, animals, or things.
Data set is the collection of observations of one or more variables sometimes data is
referred to as values.
Descriptive Statistics summarizes collected data using graphs, charts, number lines,
averages, and percentages. Summarizes what we are seeing.
- Frequency
Distribution
- Measures of
Central
Tendency
DESCRIPTIVE
- Variability (Range,
STATISTIC SD, Variance)
S - Convey
information
INFERENTIAL
- Regression Analysis
- Hypothesis Testing
- Confidence Interval
- Infer conclusions,
predictions
Categorical and Quantitative Data
Data are the measurable or observable characteristics of a group of objects or people
and classified by the type of value or variable that it represents.
Types of Data:
1. Categorical Data
2. Quantitative Data
Categorical Data
- values that describe some characteristics of the element.
- It is also known as Qualitative Data, It has values that are verbal descriptions.
Ex.
- Gender
- Race
- Year in school (Grades school, Junior High, Senior High)
- Type of car you drive
- Favorite color
Quantitative Data
- Data that takes on numerical values as in a count or measure and is used to
measure averages and ranges.
Ex.
- # of accidents in a year
- # of students in a class
- # of pets owned
- # of cars in a parking lot
- #shoe size
Discrete
- Countable or listable values that take on one specific value(whole/counting
numbers, and finite).
Ex.
- Income
- Height
- Weight
- Time
- Hours worked
Continuous
- Data can assume infinitely many possible values in the domain or interval (only
accurate as the decimal places).
Data
- Univariate Data (one-variable) describes a single characteristic of a data set or
population.
- Bivariate Data (two-variables) – describes two characteristics for each subject.
- Multivariate Data (many variables) – describes many characteristics for each
subject.
Frequency Table
Pie Chart
Bar Graph
MEASURES OF CENTER
Mean
- Arithmetic mean
- The average of the expected value that measures the central value of a data set.
- - It is found by adding all of the data values and dividing it by the number of
values
Sample Mean
- The average of the values collected
Population Mean
- The average of all the values in the population.
Median
- The middle term, or number in a data set ranked in ascending order.
- It separates the lower half of the data set from the upper half.
Odd data set - The median is the middle number.
Even data set - The median is the average of the two middle terms.
Mode
- The value that occurs most frequently in the data set and is considered the most
popular number or term.
Range
- The difference between the maximum value and the minimum value.
Interquartile
- The difference of Q3 and Q1
- Represents if the outlier is strong or mild.
- IQR = Q3 – Q1
Q1 = median of lower portion of the data set
Q3 = median of upper portion of data set
In statistics, a quartile is a type of quantile which divides the number of data points into
four parts, or quarters, of more-or-less equal size. The data must be ordered from
smallest to largest to compute quartiles; as such, quartiles are a form of order statistic
To find the values for quartiles, simply find the median (Quartile 2), the median of
the Quartile 1(lower half), and the median of Quartile 3 (upper half).
Outliers
- are those values that do not seem to fit the rest of the dataset.
We need to find the “fences” or those numbers that enclose the data and indicate the
acceptable range for our data set.