Data Analysis
Data Analysis
Biostatistics
Bushra Haider
Research Officer
Medical Professional Education Department
Northwest General Hospital & Research
Center
Course Objective
At the end of this session, you will be able to
1. Define data, information, statistics, and biostatistics.
2. Understand and differentiate between different data types.
3. Select the right statistical analysis methods based on the data
type.
4. Differentiate between descriptive and inferential statistics.
5. Can enter data in SPSS independently.
6. Analyze data through SPSS
7. Interpret data findings.
What we will not discuss
We will not discuss
1. Lots of maths.
2. Numerical statistical formulas.
3. Medical students coming to statistics support usually want help
with using SPSS, choosing the right analysis, and interpreting
output.
4. Maths is little scary for medical students.
What is data?
Data are individual pieces of factual information recorded and used
for the purpose of analysis.
It is the raw information from which results are created.
Data can be something simple and seemingly
1. Collection of data
2. Presentation of data
3. Analysis of data
4. Interpretation of data
Biostatistics
The branch of statistics that deals with data relating to living
organisms.
Statistics applied to the collection, analysis, and interpretation of
biological data and especially data relating to human biology,
health, and medicine.
Observation
In statistics, an observation often means any sort of numerically recording
of information, whether it is a physical measurement such as height or
weight; a classification such as male or female; or an answer to a
question such as yes or no.
For Example
Patients BMI Recording in kg/m2: 18.45, 19.00, 24.50, 30.00, 28.55, 25.00
Here each observation represents a single data point from a single
patient at a given time
Variable
A characteristic that varies with an individual or object is called
variable.
For Example;
Age, weight, height, BMI, health status, clinical characteristics,
gender, disease stages, laboratory diagnosis, etc all are variables as
it is different from person to person
Here we will define a variable as a group name for which numerical
recordings of information called observation are obtained across
different individuals included in the study to provide a collection of
observation called data.
Classification of variable
(Based on Nature of Data)
Nominal
Qualitative
(Categorical
)
Ordinal
Variable
(Data)
Discrete
Quantitative
(Numerical)
Continuous
Qualitative/Categorical Variable
Qualitative / Categorical variables refer to non-numerical data or
words.
For example, gender, medication adherence, treatment plan,
disease status, education level, socioeconomic status, etc.
All these words represent categorical entities.
We can count the number of patients/individuals in each category.
Categorical variables can be in the form of ordered data or non-
ordered
Nominal Variable (Data Ordinal Variable (Data type)
type)
Data that can be categorized but not Here the categories have
ordered. meaningful order or ranking.
The categories must be mutually Intervals between the categories
exclusive. are not necessarily equal or known.
Here we use numbers to denote It allows for the comparison of
various categories but these number order but not the magnitude of
don’t have any numerical difference between categories.
importance. Example:
Example: • Pain Severity (None, Mild,
• Gender (Male, Female, Non-binary). Moderate, Severe).
• Blood Type (A, B, AB, O). • Stage of Cancer (Stage I, Stage II,
• Disease Type (Diabetes, Stage III, Stage IV).
Hypertension, Asthma) etc. • Patient Satisfaction (Very
Dissatisfied, Dissatisfied, Neutral,
Satisfied, Very Satisfied) etc.
studies
Rating Scale: Likert scale
It is a type of ordinal data scale where A type of rating scale use to measure attitudes, opinions,
• The ratings are usually numeric and can of responses that indicate various degrees of
depending on the assessment. • Usually has an equal number of positive and negative
• Data is numeric but it represents ordered response options around a neutral point.
categories rather than continuous values. • The responses indicate order but not the precise
Frequency and graphical representation (simple bar chart, cluster bar chart
and pie chart), chi-square.
2. Ordinal data:
Frequency and graphical representation (simple bar chart, cluster bar chart
and pie chart), chi-square.
It takes a finite or countable number of distinct It takes an infinite number of values (any value
values typically whole numbers. including fractions and decimals) within a given
Tool Bar
Data Editor
window
Variable view
Data view
Windows of SPSS
There are four main windows in SPSS
1. Data editor (Data view and Variable view)
2. Output viewer
3. Syntax editor
4. Script window
1. Data Editor
Spreadsheet-like system for defining variables, entering, editing,
• The variable name and row number of the active cell are displayed in the
top left corner of the Data Editor.
• When we select a cell and enter a data value, the value is displayed in the
cell editor at the top of the Data Editor.
• Data values are not recorded until we press Enter or select another cell.
Select
1. Nominal of naming variable
2. Ordinal for ordered data/Likert scale
3. Scale for numerical (continuous variable or discrete variable)
Practical Exercise (Enter the following data
in SPSS)
Screening and cleaning of data
Step 1
Checking for errors: First, we need to check each of variables for
scores that are out of range (i.e. not within the range of possible
scores).
Step 2
Finding and correcting the error in the data file: Second, we
need to find where in the data file this error occurred (i.e. which case
is involved) and correct or delete the value.
Checking for Error
When checking for errors, we primarily look for values that fall
outside the range.
For example, if gender is coded 1=male, 2=female, we should not
find any scores other than 1 or 2 for this variable. Scores that fall
outside the possible range can distort our statistical analysis
To check for errors, we need to inspect the frequencies for each of
the variables.
Procedure for checking Error
Click on Analyze -> Descriptive Statistics -> Frequency
box.
Click on the ‘Statistics’ Option.
section
Click ‘Continue’ and then ‘OK’
Correcting the error in data file
Click on a variable in which you want to check the identified error
where it is located
Then Click on Edit -> Find
A dialogue box will open
Enter the incorrect value in the ‘find bar’
Then click ‘find next’
Correct the entry after checking in the record.
Analysis of Categorical
3. Click Define
4. Select the variable you wish to display on the horizontal axis, and move
it into the “Category Axis” box
5. Select the second variable, and move it to the “Define Clusters by” box
8. Click OK
Pie Chart
1. Click Graphs -> Legacy Dialogs -> Pie
2. Select “Summaries for groups of cases”
3. Click Define
4. Click “Reset” (recommended)
5. Move the variable for which you are creating a pie chart into the
“Define slices by” box
6. Select your desired option under “Slices Represent”
7. Select “Titles” to add a title (recommended)
8. Click “OK”
Analysis of
Numerical / Continuous
Variable
Before going toward the statistical analysis of continuous variables,
let’s discuss some of the important terms/definitions like
Level of confidence
Hypothesis Testing
Normality of data
P-value
Level of Confidence
It represents the probability that the true population parameter falls
within a specified range estimated from the sample data.
Example:
In a clinical trial investigating the efficacy of a new drug for lowering
blood pressure, researchers might aim for a 95% level of confidence.
This means that they are 95% confident that the true mean reduction
in blood pressure among patients treated with the new drug lies
within the calculated confidence interval.
Hypothesis Testing
Hypothesis testing is a statistical method used to make inferences or
draw conclusions about a population based on sample data. It involves
formulating two competing hypotheses
Reject H0 if p-value≤0.05
SPSS Steps for checking data
Normality
• Analyze -> Descriptive -> Explore data
• The dialogue box will open
• Move the desire variable into the box under heading “Dependent
List”
• Click on “Plot” option
• Click on the option “Normality plot with test”
• Click Continue
• Then Ok. The result will display in the table in output window under
the heading of “test of normality”
• Check the significance value (p-value) in front of Shapiro-Wilk.
• If value is > 0.05 then the data of the variable is normal.
Descriptive Analysis of Continues
Data
If involve
• Measure of central tendency (Mean, Median, Mode) if data is normal
Move the dependent variable to the box under “test variable heading”
Enter "1" into the "Group 1:" box and enter "2" into the "Group 2:" box.
variable in values label on the basis of which we compare the two samples.
Click on “Continue” and then click on “Ok”.
ANOVA
• ANOVA (Analysis of Variance) is used to determine whether there
are significant differences between the means of three or more
independent (unrelated) groups.
• It extends the t-test to more than two groups, assessing whether at
least one group mean is significantly different from the others.
SPSS Steps
Click
Analyze -> Compare Means ->Select One-Way ANOVA.
Click on the Post Hoc button to specify which post hoc test to use if the
ANOVA is significant. Common choices include Tukey, Scheffé, or Bonferroni.
Select the desired post hoc test and click Continue.
One sample Wilcoxon Signed Rank Test
It is a non-parametric test SPSS Steps
Analyze > Nonparametric Tests > One
used to compare a single
Sample...
sample median to a Click 'Settings'.
hypothesized population Under 'Select an item', select 'Choose
median. Tests'.
Choose 'Customise Tests'.
It is used when the data do
Select 'Compare median to hypothesized
not meet the assumptions of (Wilcoxon signed rank test)'.
the one-sample t-test, Enter the value of the 'Hypothesized