Basic Statistical Concepts-1
Basic Statistical Concepts-1
Research Bangladesh
5 Data entry
What is biostatistics?
Biostatistics is the application of statistical principles to questions and problems in
medicine, public health, or biology.
16
9
25
DATA VARIABLES
A data variable is "something that varies" or differs from person to person or group to
group. Data variables are the items that we collect data about.
se
nu
Examples:sex, age, weight, marital status, satisfaction rate, etc.
te
po
leg
hy
DATA VARIABLES TYPES leg
A data variable is "something that varies" or differs from person to person or group to
group. Data variables are the items that we collect data about.
Examples:sex, age, weight, marital status, satisfaction rate, etc.
2)ORDINAL VARIABLES
Those are categorical variables that have an order, and that order has a meaning.
1)DISCRETE VARIABLES
Examples
Number of kids in a family.
Number of stents inserted into the corona
hypotenuse
Number of patient visits to the hospital
leg
2)CONTINUOUS VARIABLES
Examples Weight (in kg) leg
Step 2:
For the categorical variables: Is there an order?
If No, it is nominal, and if Yes, it is ordinal.
Whenever possible, collect your data at the highest level, numerical continuous or numerical discrete, as it is
more accurate and can be categorized easily later on.
Data
entry
DATA ENTRY
The goal of any data entry process is to have data arranged in a spreadsheet, like this one
A WELL-ARRANGED DATASHEET SHOULD SATISFY THE FOLLOWING CHARACTERISTICS
It is better to use numeric codes when entering categorical data, easier, less prone to typing mistakes, and
more suitable for statistical software packages.
Severity of disease
Mild → 1 If multiple answers are allowed for one question, use a
Moderate → 2 column for each choice and code it as 1/0 representing
Severe → 3
Yes/No.
Severity of Pain
No pain → 0
Do you have any of the following Chronic diseases?
Mild pain → 1 DM
Moderate pain → 2 Hypertension
Severe pain → 3 CVD
If binary (Yes/No) Hypothyroidism
Yes → 1
No → 0
TIPS FOR DATA ENTRY OF NUMERIC VARIABLES
Before running the statistical analysis, we need to explore the data to make sure that there are no
data entry errors.
Check the range (minimum and maximum)
Are there any incorrect extreme values? Are they consistent with other data values?
Check the frequency distribution for categorical variables
Are there any typing mistakes or unusual codes or groups?
Check the missing values
Are they really not available? Or do we just forget them during data entry?
Checking the consistency of data
For example, a man can't be pregnant, disease duration can't be larger than age, and diastolic blood
pressure can't be larger than systolic blood pressure.
Graphically checking the data
A histogram or a boxplot for a single numeric variable, and a scatterplot for two related variables as
weight and waist circumference may be helpful in exploring possible errors.
Thank You!