Reviewer in IE-SAN1
Lesson 1: Basic Concepts of Probability and 2. Qualitative - It is non numerical data and is
Statistics subdivided into Two Types:
- Categorical data are purely descriptive and
Statistics - Statistical knowledge helps you use the
imply no ordering of any kind such as sex,
proper methods to collect the data, employ the
area of residence.
correct analyses, and effectively present the results.
- Ordinal data: are those which imply some
Statistics is a crucial process behind how we make
kind of ordering like Level of education,
discoveries in science, make decisions based on
Socio-economic status, and Degree of
data, and make predictions.
severity of disease.
Probability - plays a vital role in the day to day life.
In the weather forecast, sports and gaming
strategies, buying or selling insurance, online Presentation of Data
shopping, and online games, determining blood
The first step in statistical analysis is to present
groups, and analyzing political strategies.
data in an easy way to be understood. The two basic
ways for data presentation are:
Definition of Statistics 1. Tabular presentation.
2. Graphical presentation
Statistics is the science of dealing with numbers.
It is used for Collection, Summarization,
Presentation and Analysis of data.
Tabulation
Statistics provides a way of organizing data to
get information on a wider and more formal Some rules for the construction tables:
(objective) basis than relying on personal
1. The table must be self-explanatory.
experience (subjective).
2. Title: written at the top of table to define
precisely the content, the place and the
time.
Types of data
3. Clear heading of the columns and rows and
Any aspect of an individual that is measured, is units of measurements
called variable. Variables are either: 4. The size of the table depends on the number
of classes. Usually lie between 2 and 10 rows
1. Quantitative - it is numerical data.
or classes. Its selection depends on the form
- Discrete data: are usually whole numbers,
of data and the requirement of the
such as number of cases of certain disease,
distribution. Too small may obscure some
number of hospital beds (no decimal
information and too long will not differ
fraction).
from raw data.
- Continuous data: it implies the
measurement on a continuous scale e.g. Types of tables
height, weight, age (a decimal fraction can
o For Qualitative data, draw a simple table e.g.,
be present).
List Table: count the number of observations
(frequencies) in each category.
o For Quantitative data, we have to form 3. Enumerate
frequency distribution Table. - the individuals in each blood group i.e.
individuals with blood group A are 6 and
those with blood group B are 6, AB are 5 and
List Table: blood group O are 3
- Make sure that the total number of
A table consisting of two columns, the first
individuals in all blood groups is 20 (the
giving an identification of the observational
number of the studied group).
unit and the second giving the value of variable
4. Calculate The relative frequency
for that unit.
- (%) of each blood group by dividing the
Example: number of patients in each hospital
frequency of that group over the total
department are:
number of individuals and multiplied by 100
Medicine 100 patients - the percentage of group A = 6/20 x 100, and
Surgery 80 patients the same for group AB = 5/20 x 100 and
ENT 28 patients group O = 3/20 x 100.
Frequency Distribution Table: In Conclusion:
are used for presentation of qualitative (and o We can conclude from this table that blood
quantitative Discrete) data groups A & B are the most common groups and
By recording the number of observations in the rarest is group O (depending on the
each category. percentage of each group).
These counts are called frequencies. o So presenting data in table is beneficial in
For Quantitative Continuous Data consists of a deducing facts and information than raw data.
series of classes (intervals) together with the
number of observations (frequency) whose
values fall within the interval of each class.
Example: Assume we have a group of 20
Lesson 2: Central of Tendency
individuals whose blood groups were as
followed: A, AB, AB, O, В, А, A, B, B, AB, O, AB, AB,
A, B, B, B, A, O, A. We want to present these data
by table. Measure of Central Tendency
Usually when two or more different data sets
are to be compared it is necessary to condense
How to Construct a Frequency Distribution the data, but for comparison the condensation
Tables of data set into frequency distribution and
visual presentation are not enough.
1. Put a title
It is then necessary to summarize the data set
- Distribution of the studied individuals
in a single value. Such a value usually
according to their blood group. e.g.
somewhere in the center and represent the
2. Draw a table (Columns & Rows),
entire data set
- First column - Studied Variable - " Blood
Group",
- 2nd column - heading - "Frequency-
Number"
- 3rd column – heading - " Percentage %"
Mode b. According to Consumer Reports, the brand
with the lowest overall taste rating costs
The set of data that occurs most frequently, it is
35cents/ounce.
also uncommon for data set to have more than
one mode. Solution:
This happens when two or more elements occur
Eliminate that brand and find the median price
with equal frequency in the data set.
per ounce for the remaining barbecue-flavored
Example: Single Mode
chips. Again, order the data. Note that there
Data Set = 2, 5, 9, 3, 5, 4, 7
are on add number of entries, so the median is
Mode = 5
simply the middle value.
Example: Bimodal
Data Set = 2, 5, 2, 3, 5, 4, 7 18 19 19 27 28
Mode = 2 and 5
Median = Middle Value = 19 cents
Example: Trimodal
Data Set = 2, 5, 2, 7, 5, 4, 7
Mode = 2, 5, and 7
Range
A data set is the difference between the largest
Median value and smallest value contained in the data
set.
The set of data that depends on whether the
Steps in determining the Range
number of elements in the data set if odd or
1. Reorder the data set from smallest to
even.
largest.
How to find the Median:
2. Subtract the last element to the first
1. Order the data from smallest to largest.
element.
2. For an odd number of data values
distribution, Median = Middle data value
Example:
3. For an even number of data value in the
Data Set = 2, 5, 9, 3, 5, 4, 7
distribution, Median = Sum of middle two
Reorder = 2, 3, 4, 5,5,7,9
values / 2
Subtract = 9 – 2 = 7
Range = 7
Example:
A consumer report on barbecue - flavored
potato chip price per ounce was released, and
the prices are 19, 19, 27, 28, 18, & 35. Difference Between Sample & Population
a. Find the median. Population Sample
Solution: The measurable The measurable
quality is called a quality is called a
18 19 19 27 28 35 parameter statistics
The population is a The population is a
19 + 27
𝑀𝑒𝑑𝑖𝑎𝑛 = complete set subset of the
2 population
𝑀𝑒𝑑𝑖𝑎𝑛 = 𝟐𝟑 𝒄𝒆𝒏𝒕𝒔 Reports are true Reports have a margin
representation of of error and
opinion confidence interval
It contains all It is a subset that
members of a specified represents entire
group population
Mean Variance
An average that uses the exact value of each The term variance refers to a statistical
entry. measurement of the spread between numbers
How to find the Mean: in a data set. More specifically, variance
1. Compute ∑ 𝑥 that is, find the sum of all the measures how far each number in the set is
data values. from the mean and thus from every other
2. Divide the total by the number of data number in the set.
values. Steps in Calculating the Variance
- Sample statistic 𝑥̅ 1. Find the mean of the data set. Add all data
∑𝑥 values and divide by the sample size n.
𝑥̅ =
𝑛 2. Find the squared difference from the mean
- Population parameter μ for each data value. Subtract the mean
∑𝑥 from each data value and square the result.
𝜇=
𝑁 3. Find the sum of all the squared differences.
- Where, The sum of squares is all the squared
n = number of data values in the sample differences added together.
N= number of data values in the population 4. Calculate the variance. Variance is the sum
Trimmed Mean of squares divided by the number of data
points.
A measure of center that is more resistant than
the mean but still sensitive to specific data - Population
values is the trimmed mean. A trimmed mean is ∑(𝑥𝑖 − 𝜇)2
the mean of the data values left after 𝑉𝑎𝑟𝑖𝑎𝑛𝑐𝑒 = 𝜎 2 =
𝑛
"trimming" a specified percentage of the - Sample Set
smallest and largest data values from the data ∑(𝑥𝑖 − 𝑥̅ )2
2
set. Usually, a 5% trimmed mean is used. This 𝑉𝑎𝑟𝑖𝑎𝑛𝑐𝑒 = 𝑠 =
𝑛−1
implies that we trim the lowest 5% of the data
as well as the highest 5% of the data.
Standard Deviation
HOW TO COMPUTE A 5% TRIMMED MEAN
A standard deviation is a statistic that
1. Order the data from smallest to largest.
measures the dispersion of a dataset relative to
2. Delete the bottom 5% of the data and the top
its mean. The standard deviation is calculated
5% of the data. Note: If the calculation of 5% of
as the square root of variance by determining
the number of data values does not produce a
each data point's deviation relative to the
whole number, round to the nearest Integer.
mean. If the data points are further from the
3. Compute the mean of the remaining 90% of the
mean, there is a higher deviation within the
data.
data set; thus, the more spread out the data, the
Example: Find the sample statistic mean of 6, 8, 11, higher the standard deviation.
5, 2, 9, 7, 8 Steps in Calculating the Standard Deviation
∑𝑥 1. The mean value is calculated by adding all
𝑥̅ = the data points and dividing by the number
𝑛
of data points.
6 + 8 + 11 + 5 + 2 + 9 + 7 + 8
=
8
56
= =𝟕
8
2. The variance for each data point is Solving for Sample Variance:
calculated by subtracting the mean from the
value of the data point. Each of those ∑(𝑥𝑖 − 𝑥̅ )2
𝑠2 =
resulting values is then squared and the 𝑛−1
results summed. The result is then divided by 70
𝑠2 =
the number of data points less one. 5
𝟐
3. The square root of the variance-result from 𝒔 = 𝟏𝟒
no. 2 is then used to find the standard
deviation.
Solving for the Sample Standard Deviation:
Formula: ∑(𝑥 − 𝑥̅ )2
𝑠=√
𝑛−1
Sample Standard Population Standard
𝑠 = √14
∑(𝑥 − 𝑥̅ )2 ∑(𝑥 − 𝑥̅ )2 𝒔 = 𝟑. 𝟒𝟕
𝑠=√ 𝜎=√
𝑛−1 𝑛
Example:
Big Blossom greenhouse was commissioned to
develop an extra-large rose for the Rose Bowl
Parade. A random sample of blossoms from
Hybrid A bushes yielded the following
diameters (in inches) for mature peak blooms.
2, 3, 3, 8, 10, 10
Find the Sample Variance and the Standard
Deviation
Solution:
∑𝑥
𝑥̅ =
𝑛
36
𝑥̅ =
6
̅ = 𝟔 𝒊𝒏𝒄𝒉𝒆𝒔
𝒙
𝒙 𝒙−𝒙 ̅ ̅) 𝟐
(𝒙 − 𝒙
2 2 − 6 = −4 −42
3 3 − 6 = −3 −32
3 3 − 6 = −3 −32
8 8−6=2 22
10 10 − 6 = 4 42
10 10 − 6 = 4 42
∑ 𝒙 = 𝟑𝟔 ∑(𝑥 − 𝑥̅ )2 = 𝟕𝟎