Handling Data
Handling Data
By:
Abdusamed M., PhD fellow
May, 2022
Harar, Ethiopia
11/21/2023 1
Learning Outcomes
After completion of this session, the learners will be able to;
• Discuss the scales of Measurement
11/21/2023 3
Biostatistics:
· The application of statistical methods to the fields of biological, medical and
public health sciences
· Concerned with interpretation of biological data & the communication of
information derived from these data
· Has central role in medical investigations
11/21/2023 4
Question:
Please describe:
• Rationale of studying statistics
• Limitations of statistics
11/21/2023 5
Uses of Biostatistics
• Provide methods of organizing information
• Assessment of health status
• Health program evaluation
• Resource allocation
• Magnitude of association
• Strong Vs weak association between exposure and outcome
• Assessing risk factors: Cause & effect relationship
• Evaluation of a new vaccine or drug
• What can be concluded if the proportion of people free from the disease is
greater among the vaccinated than the unvaccinated?
• How effective is the vaccine (drug)?
• Is the effect due to chance or some bias?
• Drawing of inferences: Information from sample to population
11/21/2023 6
Types of Statistics
Descriptive statistics:
• Ways of organizing and summarizing data
• Helps to identify the general features and trends in a set of data and extracting
useful information
• Also very important in conveying the final results of a study
• Example: tables, graphs, numerical summary measures
Inferential statistics:
• Methods used for drawing conclusions about a population based on the
information obtained from a sample of observations drawn from that population
• Example: Principles of probability, estimation, confidence interval,
comparison of two or more means or proportions, hypothesis testing,
etc.
11/21/2023 7
Data
• Data are numbers which can be measurements or can be obtained by counting
• The raw material for statistics
• Can be obtained from: routinely kept records, literature, surveys, counting,
experiments, reports, observation, etc.
Statistical Data
• Refers to numerical descriptions of things
• These descriptions may take the form of counts or measurements
Types of Data
1. Primary data: collected from the items or individual respondents directly
by the researcher for the purpose of a study
2. Secondary data: which had been collected by certain people or
organization, & statistically treated and the information contained in it is
used for other purpose by other people 8
Characteristics of Statistical Data
• Numerical descriptions must possess following characteristics to be called statistics:
i. They must be in aggregates – Statistics are 'number of facts'
• A single fact, even though numerically stated, cannot be called statistics
ii.They must be affected to a marked extent by a multiplicity of causes
• This means that statistics are aggregates of such facts only as grow out
of a ‘variety of circumstances’
iii. They must be enumerated or estimated according to a reasonable standard
of accuracy – Statistics must be enumerated or estimated according to reasonable
standards of accuracy
iv. They must have been collected in a systematic manner for a predetermined
purpose
v. They must be placed in relation to each other => they must be comparable
9
For
example:
– When a hospital administrator counts the number of patients (counting)
11/21/2023 10
Sources of Data
• We search for suitable data to serve as the raw material for our
investigation
• Such data are available from one or more of the following sources:
1. Routinely kept records:
• Hospital medical records contain immense amounts of information on patients
• Hospital accounting records contain a wealth of data on the facility’s business
activities
2. External sources: include already existed data in the form of
• Published reports
• Commercially available data banks, or
• The research literature, i.e. someone else has already asked the same question
11/21/2023 11
3. Surveys:
• The source may be a survey, if the data needed is about answering certain
questions
• For example:
– If the administrator of a clinic wishes to obtain information regarding the
mode of transportation used by patients to visit the clinic, then a survey may
be conducted among patients to obtain this information
4. Experiments:
• The data needed to answer a question are available only as the result of an
experiment
• For example:
– If a professional wishes to know which of several strategies is best for
maximizing patient compliance, he might conduct an experiment in which
the different strategies of motivating compliance are tried with different
patients
11/21/2023 12
Variable
• Characteristic that takes on different values in different persons,
places, or things
• It is not the same when observed in different possessors of it
• Examples:
– Diastolic blood pressure,
– Heart rate,
– The heights of adult males,
– The weights of preschool children,
– The ages of patients seen in a dental clinic
11/21/2023 13
Random Variable
• The values obtained arise as a result of chance factors, so that they cannot be
exactly predicted in advance
– Example:
• Adult height
• When a child is born, we cannot predict exactly his or her height at maturity
(genetic and environmental factors)
11/21/2023 14
Types of Data
11/21/2023 15
.
Types of Data
Quantitative
Qualitative
continuos
nominal
descrete
11/21/2023 16
Types
. of Quantitative Data
11/21/2023 17
Types of Qualitative data
.
Nominal Ordinal
11/21/2023 18
Measurement and Measurement Scale
Measurement
• A procedure where qualities or quantities are assigned to the characteristics of
objects or events
• All measurements are not the same
• Example: weight- kg, height- meter, ……
• Measuring the status of the patient on the scale: “improved”, “ stable”,
“unimproved”
Measurement of scales
• Are important for the statistical analysis of data
• There are four types of measurement scales
Nominal scale
Ordinary scale
Interval scale
Ratio scale
11/21/2023 19
The Nominal Scale
• The lowest measurement scale
• Consists of naming of observations or classifying them into various mutually
exclusive and collectively exhaustive categories
• The values fall into unordered categories or classes
• Uses names, labels or symbols to assign each measurement
• Example: blood types, sex, race, marital status, religion, causes of illnesses and
causes of death
• Dichotomous or binary: if nominal data can only take two possible values
• Example:
• Male / Female
• Yes / No
• Cured from the disease or not
• Well / Sick,
• Child / Adult,
11/21/2023• Married / Not married 20
The Ordinal Scale
• Assigned each measurement to limited number of categories that are ranked in
terms of order
• Observations are not only different from category to category but can be ranked
according to some criterion
• Although non-numerical, considered to have a natural ordering
• Example:
– Patient status:- unimproved, improved, and much unproved
– Cancer stages
– Social classes
11/21/2023 21
Example:
Pain level • The numbers have limited meaning
1. None • 6>5>4>3>2>1 is all we know apart from
2. Very mild their utility as labels
3. Mild
4. Moderate
5. Sever
6. Very
sever
11/21/2023 22
Likert scales are ordinal scale
• For this data, not only is Monday with 18 ⁰c is cooler than Wednesday, but 5 ⁰c
cooler
• It has no true zero
• Example: intelligence, time in year, BP, etc.
11/21/2023 24
The Ratio Scale
• The highest level of measurement is the ratio scale
• Characterized by:
Equality of ratios can be determined
Equality of intervals may be determined
True zero point
11/21/2023 25
Exercise - 1
• Identify the type of data (nominal, ordinal, interval and ratio) represented by each of
the following. Confirm your answers by giving your own examples.
1. Blood group
2. Temperature (Celsius)
3. Ethnic group
4. Job satisfaction index (1-5)
5. Number of heart attacks
6. Calendar year
7. Serum uric acid (mg/100ml)
8. Number of accidents in 3 - year period
9. Number of cases of each reportable disease reported by a health worker
10. The average weight gain of 6 1-year old dogs (with a special diet supplement)
was 950grams last month
11/21/2023 26
Population and Sample
Population: Refers to any collection of objects
• The largest collection of entities for which we have an interest at a particular time
Finite populations
• If a population of values consists of a fixed number of these values
Infinite population
• A population consists of an endless succession of values
11/21/2023 27
Target Population:
• A collection of items that have something in common for which we wish to draw
conclusions at a particular time
• E.g., All hospitals in Ethiopia
• The whole group of interest
Sample:
• A subset of a study population, about which information is actually obtained
• The individuals who are actually measured and comprise the actual data
28
11/21/2023
• Role of statistics in using information from a sample to make inferences
about the population E.g. In a study of the prevalence of Covid-19 among adults in
Ethiopia, a random sample of adults in Dire Dawa were
included
• Target Population: All adults in Ethiopia
• Study population: All adults in Dire Dawa
Population • Sample: Adults in Dire Dawa who were included in the study
Information Sample
Target Population 29
Generalizability
• Is a two-stage procedure:
• We need to be able to generalize from:
The sample to the study population, &
Then from the study population to the target population
11/21/2023 30
Parameter and Statistic
• Parameter: A descriptive measure computed from the data of a
population
• E.g., the mean (µ) age of the target population
11/21/2023 31
Major Steps in Statistical Methods
• A statistical investigation is an investigation conducted according to the
statistical technique
• The main steps in statistical investigation are:
i. Collection of data,
ii. Organization of data,
iii. Presentation of data,
iv. Analysis of data, and
v. Interpretation of data
11/21/2023 32
Collection of Data
• This is the process of obtaining measurements, counts, or information
• Is the first step in a statistical investigation
• Valid conclusions can only result from properly collected data
Organization of Data
• This step involves the series of data editing, classifying, and tabulation
• The data editing: correcting or adjusting omissions, inconsistencies, irrelevant
answers and wrong computations
• Data classification: arranging data according to some common characteristics
possessed by the items constituting the data
• Data tabulation: arranging the data in columns and rows so that there is
absolute clarity in the data presented
11/21/2023 33
Presentation of Data
• Is about arranging the data using graphs and diagrams
• The main purpose of data presentation is to facilitate statistical analysis
Analysis of Data
• Extraction of summarised and comprehensible numerical descriptions of the
data
• The purpose: to dig out information useful for decision-making
• Methods used in analysing data: observation, measures of central tendency,
measures of variation, correlation and regression
Interpretation of Data
• Refers to making conclusions about the data
• This step usually involves decision-making about
• A large collection of objects (population) and
• Information gathered from a small collection of similar objects (sample)
11/21/2023 34
What does Biostatistics cover?
Research Planning
Data Analysis
Presentation
Interpretation
11/21/2023 Publication 35