Lesson 1 Introduction To Statistics
Lesson 1 Introduction To Statistics
TO STATISTICS
Prepared by: JAKE C. MAGBANUA
Data and Statistics
Statistics
-Collection of methods for planning experiments, obtaining data, and then organizing,
summarizing, presenting, analyzing, interpreting, and drawing conclusions.
2. Inferential Statistics
Generalizing from samples to populations using probabilities. Performing
hypothesis testing, determining relationships between variables, and making
predictions.
3 main types of descriptive statistics
The 3 main types of descriptive statistics concern the frequency distribution, central
tendency, and variability of a dataset.
Distribution refers to the frequencies of different responses.
Measures of central tendency give you the average for each response.
Measures of variability show you the spread or dispersion of your dataset.
Inferential Statistics
-Involves drawing the right conclusions from the statistical analysis that has been performed
using descriptive statistics. In the end, it is the inferences that make studies important and
this aspect is dealt with in inferential statistics.
-Most predictions of the future and generalizations about a population by studying a smaller
sample come under the purview of inferential statistics.
-Most social sciences experiments deal with studying a small sample population that helps
determine how the population in general behaves.
-By designing the right experiment, the researcher is able to draw conclusions relevant to his
study.
Variable
Characteristic or attribute that can assume different values
Random Variable
A variable whose values are determined by chance.
Data is a specific measurement of a variable – it is the value you record in your data
sheet. Data is generally divided into two categories:
Note: Since continuous variables are real numbers, we usually round them or we set an
interval. This implies a boundary depending on the number of decimal places.
For example: 64 is really anything 63.5 ≤ x < 64.5. Likewise, if there are two decimal places,
then 64.03 is really anything 63.025 ≤x < 63.035. Boundaries always have one more decimal
place than the data and end in a 5.
Variables
A social scientist explores if there is a link between socioeconomic status and the number of children someone
has. Dependent variable (output variable)
independent variable - socioeconomic status
Independent variable (input variable)
dependent variable - number of children
Job Satisfaction and Pay
A human resources professional wonders if how much money a person earns can impact the extent to
which an individual experiences job satisfaction.
independent variable - compensation (salary or wages)
dependent variable - job satisfaction
Variables
Population
- All subjects possessing a common characteristic that is being
Parameter
studied.
Characteristic or measure obtained from a population.
Sample
- Subgroup or subset of the population
Statistic (not to be confused with Statistics)
Characteristic or measure obtained from a sample.
Example: In a recent survey, 400 students of CPSU were asked if they smoked cigarettes regularly.
Thirty (30) of the students said yes. Identify the population and the sample.
Responses of all
CPSU students Responses of
(population) students in survey
(sample)
Population vs Sample
Population Sample
The measurable characteristic of the population
The measurable characteristic of the sample is called
like the mean or standard deviation is known as
a statistic.
the parameter.
The sample is a subset of the population that is
Population data is a whole and complete set.
derived using sampling.
A survey done of an entire population is
accurate and more precise with no margin of A survey done using a sample of the population bears
error except human inaccuracy in responses. accurate results, only after further factoring
However, this may not be the margin of error and confidence interval.
possible always.
The parameter of the population is a numerical The statistic is the descriptive component of the
or measurable element that defines the system sample found by using sample mean or sample
of the set. proportion.
Sample rather than the
Population
Reasons to choose a sample from a given population
Practicality: In most cases, a population can be too large to collect accurate data – which is
not practical. Samples offer a representation of the whole population if sampled accordingly.
It offers urgent data: When it comes to research, the amount of time available can be a
defining factor for a study. A sample provides a smaller set of the population for review, that
delivers data that is useful to represent the whole population.
Cost-effective: The cost of conducting research is often a parameter for the study.
Accuracy of representation: Depending on the method of sampling, research conducted on
a sample can be accurate with lesser non-response bias, than if performed by the census. A
sample that is selected using the non-probability method is an accurate representation of the
population. This data collected can be used to gather insight into the whole community.
Inferential statistics: Inferential statistics is a process by which representative data is used
to infer insights about the entire population. Data collected from a sample represents the
whole population. Inferential statistics can only be obtained using data samples.
At times, a sample is more accurate than a census: A census of an entire population
does not always offer accurate data due to errors such as inconsistency in responses, or non-
response bias. A carefully obtained sample, however, does away with this bias and provides
Scales of Measurement
2.
Ordinal Ordinal variables are variables that have two or more categories just like
nominal variables only the categories can also be ordered or ranked. So if you asked
someone if they liked the policies of the Duterte Administration and they could
answer, "They are OK" or "Yes, “Not Okay or No”, “undecided or it can be yes or
not”, not very much and many more - a lot of categories, then you have an ordinal
variable. Why? Because you have categories in an orderly manner.
Thus, the result can be ranked, you can rank them from the most positive (Yes,
a lot), to the middle response (They are OK), to the least positive (Not very much).
However, while we can rank the levels, we cannot place a "value" to them; we
cannot say that "They are OK" is twice as positive as "Not very much" for example.
3.
Interval-numbers are assigned to the items or objects. These are use to identify and
rank the objects. They also measure the degree of differences between any two
classes.
Example: weights, heights, temperatures, IQ, grades, test scores
4. Ratio
measurement
-the ratio of numbers assigned in the measurement shows the ratio in the
amounts of property being measured.
Male (50)
Year Level 1 (100) Stratified
Female (50) sampling-
CPSU Sampling is
Male (50)
Education Year Level 2 (100) per strata
Students Female (50)
(2000) cluster
Year Level 3 (100) Male (50) sampling-
Sampling
Female (50) use the
Year Level 4 (100) whole
Male (50) selected
Female (50) strata
Types of Sampling
(Probabilistic and non-
probabilistic sampling)
Non-probability sampling, on the other hand, does not involve “random”
processes for selecting participants. In non-probability sampling, the members
of the population will not have an equal chance of being selected, and in many
cases, there will be members of the population who have no chance of being
selected
Convenience sampling is very easy to do, but it's probably the worst technique to
use. In convenience sampling, readily available data is used. That is, the first
people the surveyor runs into.
Quota sampling is a non-probabilistic sampling method where we divide the survey
population into mutually exclusive subgroups. These subgroups are selected with
respect to certain known (and thus non-random) features, traits, or interests.
People in each subgroup are selected by the researcher or interviewer who is
conducting the survey.
Types of Sampling
(Probabilistic and non-
probabilistic sampling)
• Snowball sampling is where research participants recruit other participants for a test
or study. It is used where potential participants are hard to find. It’s called snowball
sampling because (in theory) once you have the ball rolling, it picks up more “snow”
along the way and becomes larger and larger. Snowball sampling is a non-probability
sampling method. It doesn’t have the probability involved, with say, simple random
sampling (where the odds are the same for any particular participant being chosen).
Rather, the researchers used their own judgment to choose participants.
Purposive sampling is used in cases where the specialty of an authority can select a more
representative sample that can bring more accurate results than by using other probability