Lecture-1 - Ch-1 - Basic Concept
Lecture-1 - Ch-1 - Basic Concept
Abebe N.
Department of Statistics, Jimma University
Email: [email protected]
October, 2024
Jimma, Ethiopia
1
Outlines of the course (chapters)
1. Introduction
2. Methods of data organization and presentation
3. Measures of central tendency and measures of variation
4. Elementary probability and probability distribution
5. Sampling and sampling technique
6. Estimation and hypothesis testing
2
Chapter One: Introduction
3
1. Introduction
What is statistics?
o Statistics is defined as the systematic collection, organization,
analysis and interpretation of data for the purpose of
Answering a question
Solving a problem or
Adding body of knowledge
4
What is biostatistics? Which is the branch of statistics directed
toward applications in the biological sciences and medicine .
o Because some statistical methods are more heavily used in health
applications than elsewhere.
o Biostatistics provides the most fundamental tools and techniques
of the scientific methods for generating information used for:
Gathering and summarizing data
Forming and testing hypotheses
Designing experimental and observational studies
Drawing inferences from data.
To help scientists and health professionals make informed
decisions based on data.
Example: The determination of major risk factors for heart disease,
lung disease and cancer. Testing of new drugs to combat AIDS.
5
Classification of statistics
Depending on how data can be used, statistics divide into:
o Descriptive statistics: Summarize and describe the main feature of
the observed data using some statistical measure ( i.e. Sample
mean, sample variance, sample proportion, ...) and diagrams(i.e.
graphs, charts, tables).
o Inferential statistics: generalizes the finding from the sample to
the population: Includes
Estimation and hypothesis testing
Determining relationships
Making prediction
o Inferential statistics use probability theory to estimate how likely
the conclusions drawn from the sample are to be true for the
entire population.
6
Inference …. Cont’d
o It is important because statistical data usually arises from
sample
o Valid inferential statistics requires a strong link between the
sample and the population about which one wishes to draw
conclusions.
Valid inferential statistics requires:
Correct statistical methodology
Correct interpretation of results
Statistical techniques based on probability theory are
required.
7
Stages of scientific investigation
1. Collection of data: the processes of measuring or gathering
raw data to meet predefined objectives.
Hence you have care should exercised (Garbage data
garbage result)
Data can be collected in a variety of ways; most common
choices include;
Observation
Interview : Face-to-face Vs Telephone
Self Administered Questionnaire
2. Organization of data: Arrange and organize the data based on
some common characteristics. Its necessary to
Edit the data to minimize recording error
Classification and
Tabulation of data.
8
Cont’d……stage
3. Presentation of the data:
Provides an overview of what the data actually looks like.
Makes it easier for the audience to understand and interpret.
Can make informed decisions, draw conclusions, or take
appropriate actions based on the data.
It can be done in the form of tables, graphs/diagrams.
4. Analysis of data:
o To dig out useful information for decision-making and extracting
relevant information from the data(like mean, median, mode,
range, and variance).
o The processes of cleaning, transforming, and modeling data to
discover useful information.
9
Cont’d……stage
5. Interpretation of data:
o Concerned with drawing conclusions from the observed
data and analyzed and giving meaning to the results.
o Generalize the finding from the sample to the target
population.
10
Definition of some basic terms
o Target population: is the entire group of individual which inferences are to
be made. Represents the target of an investigation, and the objective of the
investigation is to draw conclusions about the population.
E.g. A researcher conducted a study on the prevalence of HIV among orphan
children in Ethiopia; a random sample of orphan children in some
selected town was included. All orphan children in Ethiopia are target
populations.
o Study Population: a group of individual from target population who are
available and willing to participate.
Orphan children in selected town of Ethiopia.
In research, it is not practical to include all members of a population.
o Sample: A subset of a study population, about which information is actually
obtained.
Selected orphan children from those towns participating in the study.
o Sampling: The process or method of selection of samples from the
population is called Sampling.
11
Cont’d…….Basic term
o To draw valid conclusions from your results, you have to carefully decide
how you will select a sample that is representative of the target population.
12
Defn…Cont’d
Sample size: The number of elements or observation to be
included in the sample.
Survey: a method of collecting data from a subset of individuals
to gather insights about a larger population.
Reduced cost
Save time and energy
Greater accuracy
Census: complete enumeration every individual in the entire
population. It is the collection of data from every element in a
population.
Parameter: Characteristic or measure obtained from a
population.
Statistic: value computed from observed data and used to
estimate the parameter. Because it may not always feasible
13
directly measure.
Defn…Cont’d
Variable: A characteristic or quantity that can be measured or
quantified. It can takes different values in different persons,
places, or things.
Any aspect of an individual or object that is measured (e.g., BP)
or recorded (e.g., age, sex) and takes any value.
E.g., A study of treatment outcome of TB: Hospital name, date
of birth, sex, date of diagnosis, weight (kg), smear result
(Positive, negative or uncertain), culture result (negative,
positive), cured after 6 months (yes/no).
14
Defn…Cont’d
Qualitative Variables: Describe qualities or characteristics and
used to categorize or label distinct groups or categories.
Are non- numeric variables and can't be measured.
Example: Gender of patients(M, F), Martial status, patients’
health status, hair color.
They can be reassigned as numeric values (male =1, female =2
but they are still intrinsically qualitative)
Quantitative Variables: Are numerical variables and can be
measured or counted.
Example: patients age, patients’ weight, BP of a patients in the
hospital.
Quantitative Variables can be classified as:
Discrete variable
15 Continuous variable
Defn…Cont’d
o Discrete variable; is a variable which can take countable values.
Take distinct, separate values with no intermediate values in
between.
e.g. Number of daily admissions to a hospital
o Continuous variable has a set of possible values including all
values in an interval of the real line.
Can take any value within a range, including fractional or
decimal values.
Heights of pre-school children
Weight of participants
Age of group of individual
Blood pressure reading (mm Hg)
16
Defn…Cont’d
Variables can be again classified in to two broad categories
o Outcome variable
18
Scale of …Cont’d
2. Ordinal Scales
When the order among categories becomes important, the
observations are referred to as ordinal data.
Property:
Ordinal data possess the property of order, but not the property
of distance & fixed zero.
Arithmetic operations are not applicable but relational
operations are applicable.
Ordering is the sole property of ordinal scale.
For example injuries may be classified as (severe, moderate, and
minor).
Blood pressure (high, good, low)
Patients satisfaction rate, education level, etc.
19
Scale of …Cont’d
3. The Interval Scales: assigns each measurement to one of an
unlimited number of categories that are equally spaced.
That possess the properties of equal interval between values,
but not the property of fixed zero (does not indicate the
absence of the quantity being measured)
All arithmetic operations except division and multiplication
are applicable.
Relational operations are also possible.
Examples
IQ
Temperature in 0c ,0F
Grading scale
20
Scale of …Cont’d
4. The Ratio Scale: is characterized by meaningful compression
in term of ratios as well as equality of intervals may be
determined and usually used in quantitative data.
Level of measurement which classifies data that can be
ranked, differences are meaningful, and there is a true zero.
True ratios exist between the different units of measure.
All arithmetic and relational operations are applicable.
It helps to study the distribution and determinants of health related issue in populations.
Identify risk factors for diseases
Disease outbreaks, and evaluate public health interventions
Overall, statistics provides a systematic and objective way to analyze data and make
decisions based on that data, and is widely used in a variety of fields to inform decision-
making and improve outcomes
22
Uses of statistics
The following are some uses of statistics:
23
Limitations of statistics
24
Cont’d…limitation
Statistics can be misused in the following ways
are misleading.
25
2. Types and Methods of Data Collection
Data: is the raw material of statistics.
It is numerical fact and it can be obtained either by measurement or
counting.
The statistical data(numerically expressed, aggregate of facts, collected in a
systematic manner, collected for a predetermined purpose, estimated
according to the reasonable standards of accuracy) may be classified under
two categories depending up on the sources;
Primary data:
are those data which are collected by the investigator himself for the
purpose of a specific inquiry or study.
unique until published, no one else has access to it.
Secondary data:
when an investigator uses data which have already been collected by
others.
Note: Data which are primary for one may be secondary for the other.
26
Primary data collection
Planning:
Identify source and elements of the data.
27
Observation
Watching people engaged in activities and recording what occurs.
i.e Jot down the wanted information
Advantages:-
Gives relatively more accurate data on behavior and activities
Collection of information on facts.
Disadvantages:-
Investigators or observers own bias, prejudice
(discrimination), desires,
It needs more resources and skill human power during the use
of high level machines.
28
Interview
A. Personal interview (face to face)
Data collection through oral conversations
Advantages:
Serious approach by respondent resulting in accurate
information
Good response rate
Completed and immediate
Interviewer in control and can give help if there is a problem
Can use recording equipment
Characteristics of respondent assessed – tone of voice, facial
expression, hesitation, etc.
29
Interview… Cont’d
Disadvantages:
Time consuming
Geographic limitations
Can be expensive
Normally need a set of questions
Respondent bias – tendency to please or impress, create false
personal image, or end interview quickly
Embarrassment possible if personal questions
training is required
30
Interview… Cont’d
B. Telephone interview
Advantages:
Quick
Can cover reasonably large numbers of people or organizations
Wide geographic coverage
High response rate – keep going till the required number
No waiting
Spontaneous response
Help can be given to the respondent
31
Interview… Cont’d
Disadvantages:
Not everyone has a telephone
Questionnaire required
Repeat calls are inevitable
Straightforward questions are required
Respondent has little time to think
Cannot use visual aids
Can cause irritation
Good telephone manner is required
32
Experimental
Desired information is also be collected from conducting an
experiment in laboratories or experiment cites.
Manipulating one or more independent variables to determine
their effect on a dependent variable.
Biologist, physics, chemists and other natural scientist obtain
the required data from laboratories
Scientist may take a sample blood and examine about the blood
group, the hemoglobin content, the nature and amount of Red
and White blood cells.
Agriculturist may study the soil ingredient in a particular area.
Example: A teacher who wants to study whether a new
methodology of teaching is superior to the old one. He /She may
divide the students into experimental and control groups.
33
Focus group discussion
Method used to gather insights, opinions, and perceptions from
a targeted group of participants.
It is usually conducted by inviting six to ten people to gather for
a few hours with a trained moderator to talk about a product,
service or organization.
The moderator needs objectivity, knowledge of the subject and
industry, and some understanding of group and consumer
behavior.
The moderator starts with a broad question before moving to
more specific issues, encouraging open and easy discussion to
bring out true feelings and thoughts.
The meeting is held in a pleasant place, and refreshments are
served to create a relaxed environment.
34
Con‘d….FGD
Advantages:
Quick result and cost-effective
Groups may generate important issues
Ideas as how to proceed with the study may be generated
Disadvantages:
Topic of discussion may be missed
The discussion my be manipulated by the moderator
Needs well trained professionals
35
Questionnaire
An instrument (form) consisting of a series of questions designed
to gather information from respondents
A series of written questions/items in a fixed, rational order.
Advantages:
Can cover a large number of people or organizations
No prior arrangements are needed
No interviewer bias
Great impersonality
Disadvantages:
Little opportunity to use visual aids
Low response rate
Can’t reach all type of people
36
Designing a questionnaire
Questions should be simple
Questions should be unambiguous
The best kinds of questions are those which allow a pre-printed
answer to be ticked
The questionnaire should be as short as possible
Questions should be neither irrelevant nor too personal
Questions should have a logical sequence.
Leading questions should be avoided
Example. “How would you describe the taste of our new ice-
cream?” You then provide the following response categories:
Super Excellent
Great Pretty good
37
Types of questions
Closed ended questions
A question is asked and then a number of possible answers are
provided for the respondent. The respondent selects the answer
which is appropriate.
Sex: Male [ ] Female [ ]
Did you watch television last night? Yes [ ] No [ ]
Open ended questions
It allows the respondent to elaborate upon an earlier more
specific question.
Permit free responses
Not allowed any possible answers to choose from.
Mostly used for the investigation of facts which the
researcher is not familiar.
38
Secondary data collection
A data that has already been collected by someone else for a
different purpose.
For example, annual company reports, Government statistics, and
Health care records.
Where has the data come from?
In this case data were obtained from already collected sources like
newspaper, magazines, CSA, DHS, hospital records and
existing data like;
Mortality reports
Morbidity reports
Epidemic reports
Reports of laboratory utilization (including laboratory test
results)
39