Statistical Reviewer Midterm 1
Statistical Reviewer Midterm 1
Statistical Reviewer Midterm 1
Statistics is a branch of mathematics that deals with the collection, organization, presentation,
analysis, and interpretation of data. Statistics derived from the Latin word “status” which means
state.
➢ Statistics provides us with tools needed to convert massive data into pertinent
information that can be used in decision making.
➢ Statistics can provide us information that we can use to make sensible decisions.
DATA
Data are “factual information used as a basis for reasoning, discussion, or calculation”.
Data can be numerical, as in height, or no numerical, as in gender. In either case, data describe
characteristics of an individual
Field of Statistics
A. Mathematical Statistics
The study and development of statistical theory and methods in the abstract.
B. Applied Statistics
The application of statistical methods to solve real problems involving randomly
generated data and the development of new statistical methodology motivated by real
problems.
Limitation of Statistics
1
3. Statistical laws are not exact.
DEFINITIONS
Universe is the set of all entities under study.
Population is the total or entire group of individuals or observations from which information is
desired by a researcher.
Apart from persons, a population may consist of mosquitoes, villages, institution, etc.
Classification of Statistics
➢ Inferential statistics uses methods that take a result from a sample, extend it to the
population, and measure the reliability of the result.
EXAMPLE
You are walking down the street and notice that a person walking in front of you drops PHP100.
Nobody seems to notice the PHP100 except you. Since you could keep the money without anyone
knowing, would you keep the money or return it to the owner?
1. Present the scenario to 50 students and use the results to make a statement about all the
students at the school.
2. Collect the information needed to answer the questions.
Suppose 39 of the 50 students stated that they would return the money to the owner.
We could present this result by saying that the percent of students in the survey who
would return the money to the owner is 78%. Descriptive statistics
3. If we extend the results of our sample to the population, we are performing inferential
statistics. The Generalization contains uncertainty because a sample cannot tell us
everything about a population. Therefore, inferential statistics includes a level of
confidence in the results. So rather than saying that 78% of all students would return the
money, we might say that we are 95% confident that between 74% and 82% of all
students would return the money. Notice how this inferential statement includes a level
of confidence (measure of reliability) in our results.
PROCESS OF STATISTICS
1. Identify the research objective.
2
A researcher must determine the question(s) he or she wants answered. The
question(s) must clearly identify the population that is to be studied. Identify the
research objective.
Example: A research objective is presented. For each research objective, identify
the population and sample in the study.
2. Organize and summarize the information. Descriptive statistics allow the researcher to
obtain an overview of the data and can help determine the type of statistical methods the
researcher should use.
3. Draw conclusion from the information. In this step the information collected from the
sample is generalized to the population.
Example For the following statements, decide whether it belongs to the field of descriptive
statistics or inferential statistics.
1. A badminton player wants to know his average score for the past 10 games.
(Descriptive Statistics)
2. A car manufacturer wishes to estimate the average lifetime of batteries by testing a
sample of 50 batteries. (Inferential Statistics)
3. Janine wants to determine the variability of her six exam scores in Algebra.
(Descriptive Statistics)
4. A shipping company wishes to estimate the number of passengers traveling via their ships
next year using their data on the number of passengers in the past three years.
(Inferential Statistics)
5. A politician wants to determine the total number of votes his rival obtained in the past
election based on his copies of the tally sheet of electoral returns. (Descriptive Statistics)
SOURCES OF DATA
After determining the number of samples needed, the next step is to choose the method on how
you are going to collect the data.
Data can be collected through observation, experimentation, or conducting censuses or surveys.
These data are called primary data.
Data obtained from those already published by the government, industries, or individual sources
are called secondary data
Data sets can consist of two types of data: qualitative data and quantitative data.
3
Example Determine whether the following variables are qualitative or quantitative
Hair color
Temperature
Stages of breast cancer
Example: Determine whether the following quantitative variables are discrete or continuous.
2. The number of cars that arrive at a McDonald’s drive-through between 12:00 P.M and
1:00 P.M.
3. The distance of a 2005 Toyota Prius can travel in city conditions with a full tank of gas.
4
5
GRAPHS AND DIAGRAMS
GRAPHS
6
INTRODUCTION TO THE STATISTICAL CONCEPTS
Levels of Measurement
It is important to know which type of scale is represented by your data since different statistics
are appropriate for different scales of measurement. A characteristic may be measured using
nominal, ordinal, interval and ration scales.
1. Nominal Level - They are sometimes called categorical scales or categorical data. Such a
scale classifies persons or objects into two or more categories. Whatever the basis for
classification, a person can only be in one category, and members of a given category
have a common set of characteristics.
Example: - Method of payment (cash, check, debit card, credit card)
- Type of school (public vs. private)
- Eye Color (Blue, Green, Brown)
2. Ordinal Level - This involves data that may be arranged in some order, but differences
between data values either cannot be determined or meaningless. An ordinal scale not
only classifies subjects but also ranks them in terms of the degree to which they possess
a characteristic of interest. In other words, an ordinal scale puts the subjects in order from
highest to lowest, from most to least. Although ordinal scales indicate that some subjects
are higher, or lower than others, they do not indicate how much higher or how much
better.
Examples: - Food Preferences Stage of Disease
- Social Economic Class (First, Middle, Lower)
- Severity of Pain
3. Interval Level - This is a measurement level not only classifies and orders the
measurements, but it also specifies that the distances between each interval on the scale
are equivalent along the scale from low interval to high interval. A value of zero does not
mean the absence of the quantity. Arithmetic operations such as addition and subtraction
can be performed on values of the variable.
4. Ratio Level - A ratio scale represents the highest, most precise, level of measurement. It
has the properties of the interval level of measurement and the ratios of the values of the
variable have meaning. A value of zero means the absence of the quantity. Arithmetic
operations such as multiplication and division can be performed on the values of the
variable.
7
3. Number of vehicles registered.
4. Brands of soft drinks.
5. Number of car passers along C5 on a given day.
6. Zip code
7. Degree of pain
Data collection is the process of gathering and measuring information on variables of interest, in
an established systematic fashion that enables one to answer stated research questions, test
hypotheses, and evaluate outcomes. Without proper planning for data collection, a number of
problems can occur. If the data collection steps and processes are not properly planned, the
research project can ultimately end up with a data set that does not serve the purpose for which
it was intended.
3. Observation Method - In this method, the researcher observes the behavior of the
participants.
The investigator is the person gathering the data, while the subject is the person being
observed.
- It makes use of the human senses.
8
Key Design Principles of a Good Questionnaire
Sampling Techniques
When conducting research, it would be impossible to collect data from every individual in a
population. So, instead of taking data from a whole population, a small group that will represent
the population is used. Getting only a sample from the population can lessen the expenses and
increase the speed of data gathering. Various sampling techniques can be used in determining
the samples.
Sampling is the process of selecting a group from the population from which data will be
collected. For example, if you would like to get the opinion of the students in your school
regarding the “Education in the New Normal”, then you could survey a sample of 100 students.
B. Non-probability Sampling. It does not involve random selection, wherein members of the
population may not have an equal chance of being selected or they may not have a chance
at all. It allows you to collect data easily.
Example: Mr. Aquino announces that he has a graded recitation in class today. He has written all
his students’ names on a chip and placed them inside a box. Without looking, he picks a chip
inside the box, calls on the student, and starts his graded recitation.
2. Systematic Sampling
➢ Every member of the population is listed in order and individuals are
chosen at regular intervals.
➢ The sampling starts by choosing the first element from the list and then
selecting every nth element and so on.
9
Example: The principal wanted to survey the feedback of the self-learning modules. The parents
were asked to fall in line outside the gate. The principal then started the survey with the first
parent who entered the school and selected every 5 th person in the line.
3. Stratified Sampling
➢ It involves dividing the population into subpopulations and then
randomly select samples from each subgroup.
4. Cluster Sampling
1. Convenience Sampling
Example: A tourist, who is visiting the Philippines for the first time, wants to know the best tourist
spot in the country. He decides to ask the 2 passengers seated beside him.
2. Purposive Sampling
➢ This is also known as judgment sampling.
➢ In this method, the researcher selects a sample that is useful for
the study
Example: A company of a certain adult milk drink wants to do market research of their product.
A lady with a clipboard is assigned to a grocery store to do the job. She starts by looking for
individuals whom she thinks may meet the criteria for their needed samples. After the
verification, she then asks these persons if they can be interviewed and participate in the survey.
3. Quota Sampling
➢ is a non-probability sampling technique similar to stratified sampling.
In this method, the population is split into segments (strata) and you
have to fill a quota based on people who match the characteristics of
each stratum.
10
• Proportional quota sampling gives proportional numbers that represent segments in the
wider population. For this, the population frame must be known.
• Non-proportional quota sampling uses stratum to divide a population, though only the
minimum sample size per stratum is decided.
4. Snowball Sampling
➢ is a non-probability sampling type that mimics a pyramid system in its selection
pattern. You choose early sample participants, who then go on to recruit further
sample participants until the sample size has been reached. This ongoing pattern
can be perfectly described by a snowball rolling downhill: increasing in size as it
collects more snow (in this case, participants).
Bar Graph
11
• A horizontal bar graph is a bar graph in which the bars are
horizontal. It is more convenient
to use when the categories have
long names.
Pie Chart
• A pie chart is an alternative to the bar graph for displaying
relative frequency information. A pie chart is a circle. The
circle is divided into sectors, one for each category. The
relative sizes of the sectors match the
relative frequencies of the categories.
Frequency Distribution
The frequency of a category is
the number of times it occurs in Credit Cards Frequency
the data set.
Discoverer 7
• A frequency distribution is a
table that presents the
Visa 23
frequency for each category
Am. Express 9
• Classes – intervals of equal
width that cover all the Master Card 11
values that are observed.
• Not the same as difference between lower and upper limit of a class
• Should be expressed with the same number of
decimal places as the data
12
Frequency Distribution
• Classes – intervals of equal width that cover all the values that are observed.
• We then count the number of observations that fall into each class to
obtain the class frequencies.
Frequency Distribution
How to make frequency distribution table for quantitative data:
1. List classes
• Compute the lower-class limit for the second class by adding the class width to the
lower-class limit of the first class
Histogram
13
called relative frequency histograms. Histograms are related to bar graphs and are appropriate
for quantitative data.
Stem-and-Leaf Plot
Dots Plot
• Dot plots are graphs that can be used to give a rough
impression of the shape of a data set. It is useful
when the data set is not too large, and when there
are some repeated values.
Time-Series Plot
Time-series plots may be used when the data consist
of values of a variable measured at different points
in time.
14
MEASURES OF CENTRAL TENDENCY OR MEASURES OF LOCATION
OR MEASURES OF AVERAGES
Descriptive Statistics
The goal of descriptive statistics is to summarize a collection of data in a clear and
understandable way.
Central Tendency
The value that divides the distribution in half when observations are ordered.
◦ Mode
The most frequent score.
15
Mean
Is the balance point of a distribution.
Median
Definition:
The value that is larger than half the population and smaller than half the population.
Pros
Cons
◦ May not exist in the data.
16
Pros and Cons of the Mode
Pros
Suppose the next patient enrolls and their age is 97 years. How does the mean and median
change?
21, 24, 34, 34, 42, 44, 46, 52, 56, 64, 97
17
DESCRIPTIVE STATISTICS MEASURES OF CENTRAL TENDENCY GROUPED DATA
Class limits
- The smallest and the largest values that fall within the class interval (class)
- Taken with equal number of significant figures as the given data.
Frequency
- The number of observations falling within a particular class.
- can be derived from the frequency distribution and can be also obtained by simply
adding the class frequencies
- Partial sums
Relative Frequency
- Percentage frequency of the class with respect to the total population
18
- Obtained by dividing the class frequency by the total frequency, and multiplying the
answer by 100
1. Get the lowest and the highest value in the distribution. We shall mark the highest and
lowest value in the distribution.
2. Get the value of the range. The range denoted by r, refers to the difference between the
highest and the lowest value in the distribution. Thus, R = H-L.
3. Determine the number of classes. In the determination of the number of classes, it should
be noted that there is no standard method to follow. Generally, the number of
classes must not be less than 5 and should not be more than 15.
4. Determine the size of the class interval. The value of c can be obtained by dividing the
range by the desired number of classes. Hence, 𝐶 = 𝑅τ𝑘.
5. Construct the classes. In constructing the classes, we first determine the lower limit of the
distribution. The value of this lower limit can be chosen arbitrarily as long as the lowest
value shall be on the first interval and the highest value to the last interval.
6. Determine the frequency of each class. The determination of the number of
frequencies is done by counting the number of items that shall fall in each interval.
Measures of Variability or Dispersion are measures of the average distance of each observation
from the center of the distribution. They measure homogeneity or heterogeneity of a particular
group.
While the measures of central tendency convey information about the commonalities of
measured properties, the properties, the measures of variability quantify the degree to which
they differ.
Measures of variability are lengths between various points within the distribution. The spread of
these data points tells you about variability.
19
- clustered closely around the mean
- more homogeneous;
- less variable
- more consistent and;
Range
- difference between the highest value and lowest value.
R = HV – LV
20
Example 3. Find the absolute deviation of the female group in Example 1.
Formula:
𝑥ҧ - mean
n - number of samples
Example. Compute for the variance of the grades in Math of the two groups in example 1.
Male group:
Example. Compute for the variance of the grades in Math of the two groups in example 1.
Female group:
21
Standard Deviation – is the square of the average deviation from the mean, or simply the square
root of the variance.
We see that the scores of male are more spread out than those of
the females.
22