0% found this document useful (0 votes)
43 views11 pages

Intro To Statistics

Statistics is the science of collecting, organizing, and interpreting data. It has two broad categories - descriptive statistics which summarizes data, and inferential statistics which is used to make conclusions about a population based on a sample. There are two types of variables - qualitative which have categorical values like gender, and quantitative which have numeric values like height. Data can be collected through various methods like surveys, experiments, and observational studies. Samples are used to make inferences about populations. Probability sampling gives all units a chance of selection to avoid bias, while non-probability sampling does not. Common probability sampling techniques include simple random sampling.

Uploaded by

Vincent Kimani
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
43 views11 pages

Intro To Statistics

Statistics is the science of collecting, organizing, and interpreting data. It has two broad categories - descriptive statistics which summarizes data, and inferential statistics which is used to make conclusions about a population based on a sample. There are two types of variables - qualitative which have categorical values like gender, and quantitative which have numeric values like height. Data can be collected through various methods like surveys, experiments, and observational studies. Samples are used to make inferences about populations. Probability sampling gives all units a chance of selection to avoid bias, while non-probability sampling does not. Common probability sampling techniques include simple random sampling.

Uploaded by

Vincent Kimani
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 11

STATISTICS

Introduction
What is statistics?
The Word statistics has been derived from Latin word “Status” or the Italian word “Statista”, the
meaning of these words is “Political State” or a Government. Early applications of statistical thinking
revolved around the needs of states to base policy on demographic and economic data.

Definition
Statistics: a branch of science that deals with the collection, presentation, analysis, and interpretation of
data.

Statistics is divided into 2 broad categories namely descriptive and inferential statistics. Descriptive
Statistics: summary values and presentations which gives some information about the data Eg the mean
height of a 1st year student in JKUAT is170cm. 170cm is a statistics which describes the central point of
the heights data.
Inferential Statistics: summary values calculated from the sample in order to make conclusions about
the target population.

Types of Variables
Qualitative Variables: Variables whose values fall into groups or categories. They are called
categorical variables and are further divided into 2 classes namely nominal and ordinal variables
a) Nominal variables: variables whose categories are just names with no natural ordering. Eg gender
marital status, skin colour, district of birth etc
b) Ordinal variables: variables whose categories have a natural ordering. Eg education level,
performance category, degree classifications etc

Quantitative Variables: these are numeric variables and are further divided into 2 classes namely
discrete and continuous variables
a) Discrete variables: can only assume certain values and there are gaps between them. Eg the
number of calls one makes in a day, the number of vehicles passing through a certain point etc
b) Continuous variables: can assume any value in a specified range. Eg length of a telephone call,
height of a 1st year student in JKUAT etc

1. Data Collection:
1.1 Sources of Data
There are 2 sources for data collection namely Primary, and Secondary data
Primary data:- freshly collected ie for the first time. They are original in character ie they are the
first hand information collected, compiled and published for some purpose. They haven’t undergone
any statistical treatment
Secondary Data:- 2nd hand information mainly obtained from published sources such as statistical
abstracts books encyclopaedias periodicals, media reports eg census report CD-roms and other
electronic devices, internet. They are not original in character and have undergone some statistical
treatment at least once.

1.2 Data Collection Methods


The 1st step in any investigation (inquiry) is data collection. Information can either be collected
directly or indirectly from the entire population or a sample.
There are many methods of collecting data which includes the ones illustrated in the flow chart below
Methods of data collection

Experimental methods are so called because in them the investigator in a laboratory tests the
hypothesis about the cause and effect relationship by manipulating the independent variables
under controlled conditions.
Non-Experimental methods are so called because in them the investigator does not control or change
any aspect of the situation under study but simply describes what naturally occurs at a certain point or
period of time.
Non-Experimental methods are widely used in social sciences. Some of the Non-
Experimental methods used for data collection are outlined below.

a) Field study:- aims at testing hypothesis in natural life situations. It differs from field experiment
in that the researcher does not control or manipulate the independent variables but both of them
are carried out in natural conditions
b) Census. A census is a study that obtains data from every member of a population (totality of
individuals /items pertaining to certain characteristics). In most studies, a census is not practical,
because of the cost and/or time required.
c) Sample survey. A sample survey is a study that obtains data from a subset of a population, in order
to estimate population attributes/ characteristics. Surveys of human populations and institutions are
common in government, health, social science and marketing research.
d) Case study –It’s a method of intensively exploring and analyzing the life of a single social unit be it
a family, person, an institution, cultural group or even an entire community. In this method no
attempt is made to exercise experimental or statistical control and phenomena related to the unit are
studied in natural. The researcher has several discretion in gathering information from a variety of
sources such as diaries, letters, autobiographies, records in office, files or personal interviews.
e) Experiment. An experiment is a controlled study in which the researcher attempts to understand
cause-and-effect relationships. In experiments actual experiment is carried out on certain individuals
/ units about whom information is drawn. The study is "controlled" in the sense that the researcher
controls how subjects are assigned to groups and which treatments each group receives.
f) Observational study. Like experiments, observational studies attempt to understand cause-and-effect
relationships. However, unlike experiments, the researcher is not able to control how subjects are
assigned to groups and/or which treatments each group receives. Under this method information, is
sought by direct observation by the investigator.

1.3 Population and Sample


Population: The entire set of individuals about which findings of a survey refer to.
Sample: A subset of population selected for a study.
Sample Design: The scheme by which items are chosen for the sample.
Sample unit: The element of the sample selected from the population.
Unit of analysis: Unit at which analysis will be done for inferring about the population. Consider that
you want to examine the effect of health care facilities in a community on prenatal care. What is the
unit of analysis: health facility or the individual woman?.

Sampling Frames
For probability sampling, we must have a list of all the individuals (units) in the population. This list
or sampling frame is the basis for the selection process of the sample. “A [sampling] frame is a clear
and concise description of the population under study, by virtue of which the population units can
be identified unambiguously and contacted, if desired, for the purpose of the survey” - Hedayet and
Sinha, 1991
Based on the sampling frame, the sampling design could also be classified as:
Individual Surveys if List of individuals is available or when the size of population is small
Special population
Household Surveys; If it’s Based on the census of the households and if the individual level
information is unlikely to be available In practice, it’s limited to small geographical areas and know
as “area sampling frame” Example: Demographic and Health Surveys (DHS)
Institutional Surveys If it’s Based on the census of say Hospital/clinic lists eg
i) 1990 National Hospital Discharge Survey
ii) National Ambulatory Medical Care Survey

Problems of Sampling Frame


(i) Missing elements (v) Undercoverage
(ii) Noncoverage (vi) May not be readily available
(iii) Incomplete frame (vii) Expensive to gather
1.4 Sampling

Sampling is a statistical process of selecting a representative sample. We have probability sampling


and non-probability sampling Probability Samples involves a mathematical chance of selecting the
respondent. Every unit in the population has a chance, greater than zero, of being selected in the
sample. Thus producing unbiased estimates. They include;
(i) Simple random sampling (iv) Cluster sampling (ii) Systematic sampling (v)
multi-stage sampling (iii) Stratified sampling

Non-probability sampling is any sampling method where some elements of the population have no
chance of selection (also referred to as “out of coverage”/”undercovered”), or where the probability of
selection can't be accurately determined. It yields a non-random sample therefore making it difficult
to extrapolate from the sample to the population. They include; Judgement sample, purposive sample,
convenience sample: subjective Snow-ball sampling: rare group/disease study

1.4.1 Sampling Procedure


Sampling involves two tasks
• How to select the elements?
• How to estimate the population characteristics – from the sampling units?
We employ some randomization process for sample selection so that there is no preferential
treatment in selection which may introduce selectivity bias

1.4.2 Reasons Behind sampling


(i) Cost; the sample can furnish data 0f sufficient accuracy at much lower cost.
(ii) Time; the sample provides information faster than census thus ensuring timely decision making.
(iii) Accuracy; it is easier to control data collection errors in a sample survey as opposed to census.
(iv) Risky or destructive test call for sample survey not census eg testing a new drug.

1.4.3 Probability Sampling Techniques


a)...Simple Random Sampling (SRS)
In this design, each element has an equal probability of being selected from a list of all population
units (sample of n from N population). Though it’s attractive for its simplicity, the design is not usually
used in the sample survey in practice for several reasons:
(i) Lack of listing frame: the method requires that a list of population elements be available, which
is not the case for many populations.
(ii) Problem of small area estimation or domain analysis: For a small sample from a large
population, all the areas may not have enough sample size for making small area estimation or
for domain analysis by variables of interest.
(iii) Not cost effective: SRS requires covering of whole population which may reside in a large
geographic area; interviewing few samples spread sparsely over a large area would be very costly.

Implementation of SRS sampling:


(i) Listing (sampling) Frame
(ii) Random number table (from published table or computer generated)
(iii) Selection of sample

b)..Systematic Sampling
Systematic sampling, either by itself or in combination with some other method, may be the most
widely used method of sampling.” In systematic sampling we select samples “evenly” from the list
(sampling frame): First, let us consider that we are dividing the list evenly into some “blocks”. Then,
we
select a sample element from each block.

In systematic sampling, only the first unit is selected at random, the rest being selected according to a
predetermined pattern. To select a systematic sample of n units, the first unit is selected with a random
start r from 1 to k sample, where k=N/n sample intervals, and after the selection of first sample, every
kth unit is included where 1≤ r ≤ k.

c)..Stratified Sampling
In stratified sampling the population is partitioned into groups, called strata, and sampling is
performed separately within each stratum.
This sampling technique is used when;
i) Population groups may have different values for the responses of interest.
ii) we want to improve our estimation for each group separately.
iii) To ensure adequate sample size for each group.

In stratified sampling designs:


i) Stratum variables are mutually exclusive (no over lapping), e.g., urban/rural areas,
economic categories, geographic regions, race, sex, etc. The principal objective of
stratification is to reduce sampling errors.
ii) The population (elements) should be homogenous within-stratum, and the
population (elements) should be heterogeneous between the strata..

Allocation of Stratified Sampling


The major task of stratified sampling design is the appropriate allocation of samples
to different strata.
Types of allocation methods:
(i) Equal allocation
(ii) Proportional to stratum size
(iii) Cost based sample allocation

d)..Cluster Sampling
In many practical situations the population elements are grouped into a number of clusters. A list of
clusters can be constructed as the sampling frame but a complete list of elements is often unavailable,
or too expensive to construct. In this case it is necessary to use cluster sampling where a random
sample of clusters is taken and some or all elements in the selected clusters are observed. Cluster
sampling is also preferable in terms of cost, because it is much cheaper, easier and quicker to collect
data from adjoining elements than elements chosen at random. On the other hand, cluster sampling is
less informative and less efficient per elements in the sample, due to similarities of elements within
the same cluster. The loss of efficiency, however, can often be compensated by increasing the overall
sample size. Thus, in terms of unit cost, the cluster sampling plan is efficient.

e)..Multi-Stage Samples
Here the respondents are chosen through a process of defined stages. Eg residents within
Kibera (Nairobi) may have been chosen for a survey through the following process:
Throughout the country (Kenya) the Nairobi may have been selected at random, ( stage 1), within
Nairobi, Langata (constituency) is selected again at random (stage 2), Kibera is then selected
within Langata (stage 3), then polling stations from Kibera (stage 4) and then individuals from the
electoral voters’ register (stage 5)! As demonstrated five stages were gone through before the final
selection of respondents were selected from the electoral voters’ register.

Advantages of probability sample


(i) Provides a quantitative measure of the extent of variation due to random effects
(ii) Provides data of known quality
(iii)Provides data in timely fashion
(iv) Provides acceptable data at minimum cost
(v) Better control over nonsampling sources of errors
(vi) Mathematical statistics and probability can be applied to analyze and interpret the data

1.4.4 Non-probability Sampling


Social research is often conducted in situations where a researcher cannot select the kinds of
probability samples used in large-scale social surveys. For example, say you wanted to study
homelessness - there is no list of homeless individuals nor are you likely to create such a list.
However, you need to get some kind of a sample of respondents in order to conduct your
research. To gather such a sample, you would likely use some form of non-probability sampling.
There are four primary types of non-probability sampling methods:

a)..Convinience Sampling
It’s a method of choosing subjects who are available or easy to find. This method is also
sometimes
referred to as haphazard, accidental, or availability sampling. The primary advantage of the
method is that it is very easy to carry out, relative to other methods.

b)..Quota Sampling
Quota sampling is designed to overcome the most obvious flaw of availability sampling. Rather
than taking just anyone, you set quotas to ensure that the sample you get represents certain
characteristics in proportion to their prevalence in the population. Note that for this method, you
have to know something about the characteristics of the population ahead of time. Say you want
to make sure you have a sample proportional to the population in terms of gender - you have to
know what percentage of the population is male and female, then collect sample until yours
matches. Marketing studies are particularly fond of this form of research design.
The primary problem with this form of sampling is that even when we know that a quota sample
is representative of the particular characteristics for which quotas have been set, we have no way
of knowing if sample is representative in terms of any other characteristics. If we set quotas for
gender and age, we are likely to attain a sample with good representativeness on age and gender,
but one that may not be very representative in terms of income and education or other factors.
Moreover, because researchers can set quotas for only a small fraction of the characteristics relevant to
a study quota sampling is really not much better than availability sampling. To reiterate, you must
know the characteristics of the entire population to set quotas; otherwise there's not much point to
setting up quotas. Finally, interviewers often introduce bias when allowed to self-select respondents,
which is usually the case in this form of research. In choosing males 18-25, interviewers are more
likely to
choose those that are better-dressed, seem more approachable or less threatening. That may
be understandable from a practical point of view, but it introduces bias into research
findings.
Imagine that a researcher wants to understand more about the career goals of students at a single
university. Let’s say that the university has roughly 10,000 students. suppose we were interested in
comparing the differences in career goals between male and female students at the single university.
If this was the case, we would want to ensure that the sample we selected had a proportional
number of male and female students relative to the population. To create a quota sample, there are
three steps:

Choose the relevant grouping chsr and divide the population accordingly gender
Calculate a quota (number of units that should be included in each for group
Continue to invite units until the quota for each group is met

c)..Purposive Sampling
Purposive sampling is a sampling method in which elements are chosen based on purpose of the
study. Purposive sampling may involve studying the entire population of some limited group or a
subset of a population. As with other non-probability sampling methods, purposive sampling does not
produce a sample that is representative of a larger population, but it can be exactly what is needed in
some cases - study of organization, community, or some other clearly defined and relatively limited
group.

d)..Snowball Sampling
Snowball sampling is a method in which a researcher identifies one member of some population of
interest, speaks to him/her, and then asks that person to identify others in the population that the
researcher might speak to. This person is then asked to refer the researcher to yet another person, and
so on.
Snowball sampling is very good for cases where members of a special population are difficult to
locate. For example,.populations that are subject to social stigma and marginalisation, such as suffers
of AIDS/HIV, as well as individuals engaged in illicit or illegal activities, including prostitution and
drug use. Snowball sampling is useful in such scenarios because:
The method creates a sample with questionable representativeness. A researcher is not sure who is in
the sample. In effect snowball sampling often leads the researcher into a realm he/she knows little
about. It can be difficult to determine how a sample compares to a larger population. Also, there's an
issue of who respondents refer you to - friends refer to friends, less likely to refer to ones they don't
like, fear, etc.
Snowball sampling is a useful choice of sampling strategy when the population you are interested
in studying is hidden or hard-to-reach.

1.4.5 Limitations of Sampling


a) Sampling frame: may need complete enumeration
b) Errors of sampling may be high in small areas
c) May not be appropriate for the study objectives/questions
d) Representativeness may be vague, controversial

1.4.6 Characteristics of Good sampling


A good sample should;
a) Meet the requirements of the study objectives
b) Provides reliable results
c) Clearly understandable
d) Manageable/realistic: could be implemented
e) Time consideration: reasonable and timely
f) Cost consideration: economical
g) Interpretation: accurate, representative
h) Acceptability

1.5 Survey Administration


1.5.1 Steps in Survey
1. Setting the study objectives; What are the objectives of the study? Is survey the best procedure to
collect data? Why other study design (experimental, quasi-experimental, community randomized
trials, epidemiologic designs,,e.g., case-control study) is not appropriate for the study? What
information/data need to be collected?

2. Defining the study population; Representativeness Sampling frame


3. Decide sample design: alternative considerations
4. Questionnaire design; Appropriateness, acceptability, culturally appropriate, understandable
Pre-test: Appropriate, acceptable, culturally appropriate, will answer
5. Fieldwork; Training/Supervision Quality monitoring Timing: seasonality
6. Quality assurance Every steps Minimizing errors/bias/cheating
7. Data entry/compilation Validation Feedback
8. Analysis: Design consideration
9. Dissemination
10. Plans for next survey: what did you learn, what did you miss?
1.5.2 Modes of Survey Administration
a) Self-Administered Surveys
b) Personal interview
c) Telephone
d) Mail
e) Computer assisted self-interviewing(CASI)Variants: CAPI (personal interview);
CATI (telephone interview) – Replaces the papers
f) Combination of methods
a)..Self-Administered Surveys
Self-administered surveys have special strengths and weaknesses.
They are useful in describing the characteristics of a large population and make large samples
feasible.
Advantages:
i) Low cost. Extensive training is not required to administer the survey. Processing and
analysis are usually simpler and cheaper than for other methods.
ii) Reduction in biasing error. The questionnaire reduces the bias that might result from
personal characteristics of interviewers and/or their interviewing skills.
iii) Greater anonymity. Absence of an interviewer provides greater anonymity for the
respondent. This is especially helpful when the survey deals with sensitive issues.
iv) Convenience to the respondents (may complete any time at his/her own convenient time)
v) Accessibility (greater coverage, even in the remote areas)
vi) May provide more reliable information (e.g., may consult with others or check records to
avoid recall bias)

Disadvantages:
i) Requires simple questions. The questions must be straightforward enough to be
comprehended solely on the basis of printed instructions and definitions.
ii) No opportunity for probing. The answers must be accepted as final. Researchers have
no opportunity to clarify ambiguous answers.
iii) Low response rate; respondents may not respond to all questions and/or may not
return questionnaire
iv) The respondent must be literate to read and understand the questionnaire
v) Introduce self selection bias
vi) Not suitable for complex questionnaire

b). . .Interview Surveys


Unlike questionnaires interviewers ask questions orally and record respondents’ answers. This type
of survey generally decreases the number of ―”do not know” and ―”no answer” responses,
compared
with self-administered surveys. Interviewers also provide a guard against confusing items. If a
respondent has misunderstood a question, the interviewer can clarify, thereby obtaining
relevant responses.
Interviewer selection: background characteristics (race, sex, education, culture) listening
skill recording skill experience unbiased observation/recording
Interviewer training: be familiar with the study objectives and significance thorough familiarity with
the questionnaire contextual and cultural issues privacy and confidentiality informed consent and
ethical issues unbiased view mock interview session
Supervision of the interviewer: Spot check Questionnaire check Reinterview (reliability check)
Advantages
i) Flexibility. Allows flexibility in the questioning process and allows the interviewer to
clarify terms that are unclear.
ii) Control of the interview situation. Can ensure that the interview is conducted in private,
and respondents do not have the opportunity to consult one another before giving their
answers.
iii) High response rate. Respondents who would not normally respond to a mail questionnaire
will often respond to a request for a personal interview.
iv) May record non-verbal behaviour, activities, facilities, contexts
v) Complex questionnaire may be used
vi) Illiterate respondents may participate

Disadvantages
i) Higher cost. Costs are involved in selecting, training, and supervising interviewers; perhaps
in paying them; and in the travel and time required to conduct interviews.
ii) Interviewer bias. The advantage of flexibility leaves room for the interviewer’s
personal influence and bias, making an interview subject to interviewer bias.
iii) Lack of anonymity. Often the interviewer knows all or many of the respondents. Respondents
may feel threatened or intimidated by the interviewer, especially if a respondent is sensitive
to the topic or to some of the questions.
iv) Less accessibility
v) Inconvenience
vi) Often no opportunity to consult records, families, relatives

c).. Telephone Interview


Advantages:
(i) Less expensive
(ii) Shorter data collection period than personal interviews
(iii) Better response than mail surveys

Disadvantages:
(i) Biased against households without telephone, unlisted number
(ii) Nonresponse
(iii) Difficult for sensitive issues or complex topics
(iv) Limited to verbal responses

d)… Focus Groups


Focus groups are useful in obtaining a particular kind of information that would be difficult to obtain
using other methodologies. A focus group typically can be defined as a group of people who possess
certain characteristics and provide information of a qualitative nature in a focused discussion
Focus groups generally are composed of six to twelve people. Size is conditioned by two factors: the
group must be small enough for everyone to participate, yet large enough to provide diversity. This
group is special in terms of purpose, size, composition, and procedures. Participants are selected
because they have certain characteristics in common that relate to the topic at hand, such as parents of
gang members, and, generally, the participants are unfamiliar with each other. Typically, more than
one
focus group should be convened, since a group of seven to twelve people could be too atypical to
offer
any general insights on the gang problem.
A trained moderator probes for different perceptions and points of view, without pressure to
reach consensus. Focus groups have been found helpful in assessing needs, developing plans,
testing new ideas, or improving existing programs
Advantages:
i) Flexibility allows the moderator to probe for more in-depth analysis and ask participants
to elaborate on their responses.
ii) Outcomes are quickly known.
iii) They may cost less in terms of planning and conducting than large surveys and
personal interviews.
Limitations
i) A skilled moderator is essential.
ii) Differences between groups can be troublesome to analyze because of the qualitative nature
of the data.
iii) Groups are difficult to assemble. People must take the time to come to a designated place at
a particular time.
iv) Participants may be less candid in their responses in front of peers.

1.6 Sample Size Determination


Sample Size Determination is influenced factors like the purpose of the study, population size, the
risk of selecting a "bad" sample, and the allowable sampling error.
There are several approaches to determining the sample size. These include using a census for small
populations, imitating a sample size of similar studies, using published tables, and applying formulas
to calculate a sample size.

Using a census for small populations


One approach is to use the entire population as the sample. It’s impractical for large populations.
A census eliminates sampling error and provides data on all the individuals in the population.
Finally, virtually the entire population would have to be sampled in small populations to achieve a
desirable level of precision

Using a sample size of a similar study


Another approach is to use the same sample size as those of studies similar to the one you plan.
Without reviewing the procedures employed in these studies you may run the risk of repeating errors
that were made in determining the sample size for another study. However, a review of the literature
in your discipline can provide guidance about "typical" sample sizes which are used.

You might also like