STATISTICS N Quantitative
STATISTICS N Quantitative
WHAT IS STATISTICS?
• Statistics is the study that deals with collection , organization and presentation, and analysing data ,
interpreting results as well as drawing conclusion to make decisions.
• Statistics is both the science of uncertainty and the technology of extracting information from data.
• The general prerequisite for statistical decision making is the gathering of data. First, we need to
identify the individuals or objects to be included in the study and the characteristics or features of the
individuals that are of interest.
• Individuals are the people or objects included in the study.
• A variable is a characteristic of the individual to be measured or observed.
BRANCHES OF STATISTICS
QUANTITATIVE QUALITATIVE
DATA/VARIABLE DATA/VARIABLE
1.A quantitative variable refers to numerical or A qualitative variable also known as categorical
measurable information. or non-numerical data, provides information
It is expressed in terms of quantities, amounts, or about qualities, characteristics, properties, or
values that can be represented with numbers. attributes of a subject. It describes and captures
This type of data is used when the researcher subjective or non-measurable aspects of the
aims to measure or quantify variables or data. E.g. Gender, rank, class(1st class,…),
attributes. E.g. mass, height, length, number of race(black,…), type of food, blood group, sickle
students in a class, age, conc of Nitrogen, volume status, etc.
etc.
Quantitative
variable/data
Takes a finite numeric value mostly obtained Takes any numeric value (real number)
by counting or enumerating something. E.g. within some given range or interval (i.e.
Number of students, number of workers in a within a given lower and upper limits)
company, number of households in a house, mostly obtained by measuring. E.g. Height,
number of languages spoken in a country weight, temperature, mass, volume, length,
etc. speed etc.
TERMINOLOGIES
• Population is the total number of individuals, items or entities that is under study.
• Sample is a subset of the population under study.
• Internal Source: Internal data is facts and information that come directly from the company’s systems and are
specific to the company in question. In almost every case, internal data cannot be accessed and studied by outside
parties without the express permission of the business entity.
• Internal data provides a look into the company’s current practices and their effectiveness. Collected from sources
like website KPIs and customer surveys, internal data is an invaluable tool for evaluating company policies,
products and branding, and employee productivity.
1. Sales Data (Revenue; Distribution channels; Price points; Customer surveys; Social media impressions; Website
and online store analytics.
• This brings up the concept of errors especially sampling errors and also,
accuracy.
SAMPLING AND NON-SAMPLING ERRORS
• Sampling error refers to the discrepancy or difference between the characteristics
or results obtained from a sample and the true characteristics or results that
would have been obtained if the entire population had been examined.
• Non-sampling errors can occur due to a variety of reasons, including human error, data
entry mistakes, measurement errors, respondent errors, sampling frame errors, data
processing errors, and biases in data collection or analysis.
EXAMPLES OF NON-SAMPLING ERRORS INCLUDE:
1.Measurement error: This occurs when the measurement instrument used to
collect data is faulty, imprecise, or subject to interpretation errors. For instance, if a
weighing scale is not calibrated correctly, it may produce inaccurate weight
measurements.
2.Non-response bias: This happens when some individuals or groups selected for the
study do not participate or provide incomplete responses, leading to a potential
bias in the results.
3.Sampling frame errors: These errors occur when the sampling frame, which is the
list or source used to select the sample, does not accurately represent the target
population. This can result in undercoverage or overcoverage of certain population
segments.
4.Data processing errors: Mistakes can occur during data entry, coding, or data
cleaning processes. For example, if survey responses are incorrectly transcribed or
entered into a database, it can introduce errors into the dataset.
• Researchers employ various quality assurance measures, such as rigorous
data collection protocols, validation checks, and statistical techniques, to
reduce non-sampling errors and enhance the accuracy and integrity of the
data analysis.
• The total error in a sample survey, consisting of both the sampling and non-
sampling errors, is referred to as the “degree of accuracy”
REASONS TO USE SAMPLE
• There are several reasons for using a sample in research:
1.Cost and resource efficiency: Conducting research on an entire population can be
impractical or financially unfeasible. By using a sample, researchers can collect and
analyze data more efficiently, requiring fewer resources.
2.Time-saving: Studying a sample allows researchers to obtain results more quickly
compared to studying the entire population. This is particularly beneficial when
time is a constraint.
3.Feasibility: In some cases, it may be impossible to access or gather data from the
entire population. Using a sample allows researchers to work with a subset that is
more accessible and manageable.
4.Ethical considerations: In research involving human subjects, using a sample
reduces the burden on participants, as data collection is focused on a smaller group
rather than the entire population.
4. Accuracy: With proper sampling techniques, a well-designed sample can provide
accurate and reliable results that represent the characteristics of the population,
allowing for valid inferences and generalizations.
5. Practicality: Research may involve destructive testing, where the process of data
collection destroys or alters the subjects being studied. In such cases, using a sample
helps preserve the remaining population for future studies.
6. Statistical validity: By using appropriate sampling methods, researchers can apply
statistical techniques to estimate the sampling error, determine confidence intervals,
and assess the level of uncertainty associated with the findings. This helps ensure the
validity and reliability of the research conclusions.
• Overall, using a sample in research offers practical and efficient ways to gather data,
make inferences about a population, and draw meaningful conclusions while
considering constraints such as cost, time, and accessibility.
TERMINOLOGIES
• POPULATION
Population: The population refers to the entire set of individuals, objects, or events that
share a common characteristic and are of interest to the researcher. It is the complete
group from which data is collected and analyzed. The population is typically defined by
specific criteria or attributes relevant to the research study.
• Randomly selecting 500 registered voters from a country to study their voting preferences.
• Choosing a representative sample of 200 students from a university to assess their
satisfaction with campus facilities.
• Sampling 100 employees from a company to investigate their job satisfaction levels.
• Selecting a sample of 50 cars produced by an automobile manufacturer for quality testing.
• TARGET POPULATION
• refers to the specific group or population of individuals that a researcher intends to study or generalize the research
findings to.
• It is the well-defined and specific group that meets the criteria for inclusion in a research study. The target population is
typically defined based on certain characteristics or attributes that are relevant to the research objectives.
• The target population is a subset of the broader population that the researcher wants to make inferences about or draw
conclusions for.
• It is important to clearly define the target population to ensure that the research study focuses on the appropriate group and
that the findings are applicable to that specific group.
For example:
• In a study on the effectiveness of a new medication, the target population might be defined as adult individuals (age 18 and
above) with a specific medical condition.
• In a survey on consumer preferences for a particular product, the target population might be defined as individuals aged 25-
45 who have purchased the product in the past six months.
• In a study on the impact of a specific teaching method on student performance, the target population might be defined as
high school students attending a particular school.
• SAMPLING FRAME
• A sampling frame refers to a list, database, or other representation of the target
population from which a sample is drawn in a research study. It serves as a source or
reference from which potential participants or elements for the study can be selected.
• The sampling frame is constructed to encompass all members of the target population
and provides a comprehensive and accessible list of individuals or units that could
potentially be included in the sample.
For example:
• In a study aiming to survey university students, the sampling frame might consist of a list
of all currently enrolled students obtained from the university's registration system.
• In a study focusing on household incomes, the sampling frame might be constructed by
using a list of residential addresses obtained from a postal service or census data.
THE CONSTRUCTION OF A SAMPLING FRAME REQUIRES
CAREFUL CONSIDERATION OF THE FOLLOWING:
1.Inclusion and Exclusion Criteria: The sampling frame should clearly define the characteristics or
attributes that determine whether an individual or unit belongs to the target population and
should be included in the frame.
2.Coverage: The sampling frame should cover the entire target population and avoid excluding
any eligible individuals or units. However, certain constraints or limitations may make it difficult
to achieve complete coverage.
3.Accuracy and Currency: The sampling frame should be as accurate and up to date as possible to
reflect the current composition of the target population. It is important to regularly update and
validate the sampling frame to minimize errors and omissions.
4.Accessibility: The sampling frame should be easily accessible for the researchers to select
potential participants or units for the study. This may involve obtaining permission or access to
relevant databases or records.
5.Sampling Unit: The sampling frame should clearly identify the individual units or elements that
can be selected as part of the sample. These units should be mutually exclusive and collectively
exhaustive. (NB: Sampling units are units or group of units to be selected in a sample survey)
FINITE AND INFINITE POPULATION
• The terms "finite population" and "infinite population" refer to the size and characteristics of the
population being studied. These terms help determine the appropriate sampling methods and statistical
techniques to be used in research.
1.Finite Population: A finite population refers to a population that consists of a fixed and identifiable
number of individuals or elements. The size of the population is known and can be explicitly determined.
• The employees of a specific company, where the total number of employees is known.
• The students in a particular school or university, where the enrollment records provide the exact count.
• The residents in a specific town, where the census data provides the population size.
2.Infinite Population: An infinite population refers to a population that is
theoretically unlimited or extremely large. The size of the population is either
humans.
study.
• It involves making decisions about the sampling method, sample size, and
the procedure for selecting participants or units from the population.
• Surveys can be conducted in various formats, including online surveys, telephone surveys,
face-to-face interviews, or paper-based questionnaires.
• Surveys are widely used in social sciences, market research, public opinion polling, and other
fields to obtain data for analysis and draw conclusions about a larger population.
POPULATION CENSUS
• Census data provides a comprehensive profile of the population and is used for policy-
• Instead of collecting data from the entire population, a sample survey focuses on
collecting information from a representative subset of the population. The sample is
carefully selected using specific sampling techniques to ensure that it accurately
represents the characteristics of the population.
• The collected data from the sample is then extrapolated or generalized to make
inferences about the larger population.
• Sample surveys are cost-effective and efficient ways to gather data, especially when
conducting a population census is impractical or not feasible.
PLANNING A SURVEY
1. Statement of Objectives: State the objectives of the survey clearly and concisely, and refer to
those objectives regularly as the survey progresses.
2. Target Population: Carefully define the population to be sampled. If females of childbearing
are to be sample, then define what is meant by childbearing age (e.g. all females over the age of
18), and state what group of adults are included (e.g. all residents of a district).
3. The frame: Select the frame(s) so that the list of sampling units and the target population show
close agreement. Keep in mind that multiple frames may make the sampling more efficient. E.g.
Residents of a district can be sampled from the list of towns/villages coupled with a list of
households within towns/villages.
4. Sampling Design: Choose the design of the sample, including the sample size, just to obtain
enough information to meet the objectives.
5. Method of Measurement: Decide on the method of measurement, either one or more of the ff:
personal interviews, telephone interviews, mailed questionnaires or direct observations. If it is a
questionnaire to be used, plan the questions so that they minimize non-response and incorrect
response bias.
CON’T
6. Selection and Training of Field Workers: Carefully select and train field workers. After
the sampling plan is clearly and completely set up, someone must collect the data.
7. The Pretest: Select a small sample for a pretest. The pre-test is relevant since it allows
you to field-test the questionnaire or other measurement device, to screen interviewers, and
to check on the management of the field operations.
8. Organization of fieldworkers: Plan the fieldwork in detail. Any large –scale survey
involves numerous people working as interviewers, coordinators, or data managers.
9. Organization of Data Management: Outline how each piece of data is to be handle for
all stages of the of the survey. A well-prepared management plan is needed especially
when the data size is large.
observations from a larger population for the purpose of gathering data and making
• Sampling techniques are specific methods or procedures used to select the individuals
or units for inclusion in the sample. Various sampling techniques are available, each
with its own advantages, limitations, and applicability based on the research objectives,
• The key feature of simple random sampling is that every member of the population has
an equal probability of being chosen, providing an unbiased representation of the
population.
• This technique allows for the estimation of sampling error and enables generalization of
findings from the sample to the larger population.
• It is often used when the population is relatively homogenous, and there are no specific
subgroups or strata of interest.
CON’T
• Stratified Sampling: The population is divided into subgroups or strata based on specific
characteristics, and participants are randomly selected from each stratum in proportion
to its representation in the population. Stratified sampling improves representativeness
and allows for separate analyses within each stratum.
• Cluster Sampling: The population is divided into clusters, such as geographical areas or
organizations, and a random sample of clusters is selected. All individuals or elements
within the selected clusters are included in the sample. Cluster sampling is useful when
it is difficult or costly to obtain a list of the entire population.
• Suppose that you want to set up an experiment to see if mindfulness exercises can
increase the performance of long-distance runners. First, you need to recruit your
participants. You can do so by placing posters near locations where people go
running, such as parks or stadiums.
• Your ad should follow ethical guidelines, making it clear what the study involves. It
should also include more practical information, such as the types of participants
required. In this case, you decide to focus on runners who can run at least 5 km and
have no prior training or experience in mindfulness.
• Keep in mind that not all people who apply will be eligible for your research. There
is a high chance that many applicants will not fully read or understand what your
study is about, or may possess disqualifying factors. It’s important to double-check
eligibility carefully before inviting any volunteers to form part of your sample.
SNOWBALL SAMPLING
• Snowball sampling is used when the population you want to research is hard to
reach, or there is no existing database or other sampling frame to help you find
them. Research about socially marginalized groups such as drug addicts, homeless
people, or sex workers often uses snowball sampling.
• To conduct a snowball sample, you start by finding one person who is willing to
participate in your research. You then ask them to introduce you to others.
• Alternatively, your research may involve finding people who use a certain product
or have experience in the area you are interested in. In these cases, you can also
use networks of people to gain access to your population of interest.
EXAMPLE: SNOWBALL SAMPLING
• You are studying homeless people living in your city. You start by attending a
housing advocacy meeting, striking up a conversation with a homeless woman. You
explain the purpose of your research and she agrees to participate. She invites you
to a parking lot serving as temporary housing and offers to introduce you around.
• In this way, the process of snowball sampling begins. You started by attending the
meeting, where you met someone who could then put you in touch with others in
the group.
• Example: You are researching the long-term side effects of working with asbestos.
You determine “long-term” to mean 20 years or longer. Using homogeneous sampling,
only people who worked with asbestos for 20 years or longer are included in your
sample.
TYPICAL CASE SAMPLING
• A typical case sample is composed of people who can be regarded as “typical” for a community or
phenomenon. A typical case sample allows you to develop a profile of what would generally be agreed
as being “average” or “normal.”
• Typical case samples are often used when large communities or complex problems are investigated. In
this way, you can gain an understanding in a relatively short time, even if you are not familiar with
what’s going on yourself.
• Example: Suppose you want to evaluate the level of care provided by physiotherapists to clients at a
certain clinic. To develop a typical case sample, you interact closely with both therapists and clients in
order to develop a set of criteria of what is “typical,” or average.
• For physiotherapists, this could include years of professional experience, educational background, etc. For
patients, criteria can include their age, or how often they have visited the clinic in the past year. By
comparing the two typical case samples, you can conclude whether the average physiotherapist has the
expertise needed to meet the average client’s needs.
• Note that the purpose of typical case sampling is to describe and illustrate what is typical to those
unfamiliar with the setting or situation. The purpose is not to make generalized statements about the
experiences of all participants. In other words, typical case sampling allows you to compare samples,
not generalize samples to populations.
EXTREME (DEVIANT) CASE SAMPLING
• Extreme (or deviant) case sampling uses extreme cases of a particular
phenomenon (outliers). This can mean remarkable failures, successes, or crises, as
well as any event, organization, or individual that appears to be the “exception to
the rule.” Extreme case sampling is most often used when researchers are
developing best-practice guidelines.
• Note that extreme case sampling usually occurs in combination with other sampling
strategies. The process of identifying extreme or deviant cases usually occurs after
some portion of data collection and analysis has already been completed.
• Example: You are studying serial killers. You identify a few cases where the serial
killer was female. These cases are outliers, i.e., cases that stand out in your sample. In
an effort to develop a richer, more in-depth understanding of the phenomenon, you
decide to select these outliers and analyze them further.
CRITICAL CASE SAMPLING
• Critical case sampling is used where a single case (or a small number of cases) can be
critical or decisive in explaining the phenomenon of interest. It is often used in exploratory
research, or in research with limited resources.
There are a few cues that can help show you whether or not a case is critical, such as:
• “If it happens here, it will happen anywhere”
• “If that group is having problems, then all groups are having problems”
• It is critical to ensure that your cases fit these criteria prior to proceeding with this sampling
method.
• Example: You want to know how well people understand a new tax law. If you ask tax
professionals and they do not understand it, then it’s likely laypeople won’t either.
Alternatively, if you ask people from other professional fields, irrelevant to taxes or law, and
they do understand it, then it’s safe to assume most people will.
• In other words, your critical cases could either be those with relevant expertise or those who
have no relevant expertise.
EXPERT SAMPLING