0% found this document useful (0 votes)
4 views

Basic Statistical Concepts-1

Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
4 views

Basic Statistical Concepts-1

Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 18

INNOVATION

Research Bangladesh

BASIC STATISTICAL CONCEPTS-1


MAHBUB TAREK
Founder & Chief Scientist
AGENDA
1 Introduction

2 The statistical analysis journey

3 Data,Variable & It's types

4 Levels of data measurement

5 Data entry

6 Data coding & Exploring data for errors


INTRODUCTION
What is statistics?
Statistics is the science concerned with developing and studying methods for collecting,
analyzing, interpreting, and presenting data.

What is biostatistics?
Biostatistics is the application of statistical principles to questions and problems in
medicine, public health, or biology.

What studying biostatistics is useful for?


Design and analysis of research studies.
Describe and summarize the data we have.
Analyze data to measure the association or difference.
To conclude if an observation is of real significance or just due to chance.
To understand and evaluate published scientific research papers.
THE STATISTICAL ANALYSIS JOURNEY

Transforming the research idea into a research question.

Choosing the proper study design and selecting a suitable sample.

Performing the study and collecting data.

Analyzing data (using the appropriate test).

Getting and interpreting the p-value.

Reaching a conclusion (answer) regarding the research question


DATA & DATA TYPES

16

9
25
DATA VARIABLES
A data variable is "something that varies" or differs from person to person or group to
group. Data variables are the items that we collect data about.

se
nu
Examples:sex, age, weight, marital status, satisfaction rate, etc.

te
po

leg
hy
DATA VARIABLES TYPES leg

A data variable is "something that varies" or differs from person to person or group to
group. Data variables are the items that we collect data about.
Examples:sex, age, weight, marital status, satisfaction rate, etc.

When dealing with data, it is important to recognize the type


of each data variable for the following reasons
Summarizing data
Graphical presentation
Analyzing data
A)CATEGORICAL VARIABLES
They are also known as qualitative or nominal data; they have NO unit of
measurement.

Sometimes, categorical variables are coded in numbers like:


1 = females
2 = males
0 = No
1 = yes, and so on.
Even if they are coded or represented as numbers, they are
still categories and the data type is categorical.
1)NOMINAL VARIABLES
Those are categorical variables that have no intrinsic order.
Sex: (female, male), can also be presented as (male, female)
Blood groups: (A, B, AB, O) can also be presented as (A, B, O, AB) or any other order.
Nationality: can be presented in any way ; there is no order for the countries.
If the nominal variable has only two groups as sex (male, female), an answer to a question
(Yes, No), or a disease status (diseased, not diseased),
we call it a dichotomous variable, or a binomial variable.

2)ORDINAL VARIABLES
Those are categorical variables that have an order, and that order has a meaning.

Even if this variable is coded in numbers


from 1 to 5, it is still an ordinal variable
that is categorical and not numerical
B)NUMERICAL VARIABLES
Those variables are either measured or counted, represented in numbers, and have a
measurement unit
Numerical variables are either discrete or continuous.

1)DISCRETE VARIABLES
Examples
Number of kids in a family.
Number of stents inserted into the corona
hypotenuse
Number of patient visits to the hospital
leg

2)CONTINUOUS VARIABLES
Examples Weight (in kg) leg

Height (in cm)


Blood glucose level (in mg/dL)
HOW TO DIFFERENTIATE BETWEEN TYPES OF DATA VARIABLES

Step 1: Is there a unit of measurement?


If No, it is categorical, and if Yes, it is numerical.

Step 2:
For the categorical variables: Is there an order?
If No, it is nominal, and if Yes, it is ordinal.

For the numerical variables: Is it counted or


measured?
If counted, it is discrete, and if measured, it is
continuous.
LEVELS OF DATA MEASUREMENT
It is possible to change the type of data variable into another one, but only in one direction
numerical continuous → numerical discrete → ordinal → nominal
We can change the age from a numerical variable to an ordinal variable if we categorize it into different
age groups.
Also, we can change the age from an ordinal variable as age groups into a nominal variable of two levels
(young, and old).
However if we collect the data in a categorical form, we cannot transform it into a numerical form.

Whenever possible, collect your data at the highest level, numerical continuous or numerical discrete, as it is
more accurate and can be categorized easily later on.
Data
entry
DATA ENTRY

The goal of any data entry process is to have data arranged in a spreadsheet, like this one
A WELL-ARRANGED DATASHEET SHOULD SATISFY THE FOLLOWING CHARACTERISTICS

1)Each column represents one variable


If one variable is measured twice (as before and after an experiment), then it should be recorded in two
columns.
If a variable consists of 2 elements (as blood pressure consisting of systolic and diastolic blood pressure),
then each element should be recorded in a single column.
2)The unit of measurement is unified in each column
Height is measured either in meters or in cm, which can't be in meters for some patients, and in cm for
others.
3)Each row represents a case
4)Each cell contains only one data point.
It can't include both systolic and diastolic blood pressure, or gestational age in weeks and days.
5)Nominal and ordinal data are coded using numeric codes
We use numbers as codes for each category instead of writing the name of the category. For example, we
may use 1 as code for males and 2 as code for females. Always keep a codebook for your coded variables
where you can find the codes and corresponding values
CODING OF CATEGORICAL DATA

It is better to use numeric codes when entering categorical data, easier, less prone to typing mistakes, and
more suitable for statistical software packages.
Severity of disease
Mild → 1 If multiple answers are allowed for one question, use a
Moderate → 2 column for each choice and code it as 1/0 representing
Severe → 3
Yes/No.
Severity of Pain
No pain → 0
Do you have any of the following Chronic diseases?
Mild pain → 1  DM
Moderate pain → 2  Hypertension
Severe pain → 3  CVD
If binary (Yes/No)  Hypothyroidism
Yes → 1
No → 0
TIPS FOR DATA ENTRY OF NUMERIC VARIABLES

CODING OF MISSING DATA


It is better to use codes for missing data instead of
leaving the cells empty so that we are sure that it is a
missing value and not a data entry mistake.
➢ Use impossible values (as codes) that can't be
correct for this variable
EXPLORING DATA FOR ERRORS

Before running the statistical analysis, we need to explore the data to make sure that there are no
data entry errors.
Check the range (minimum and maximum)
Are there any incorrect extreme values? Are they consistent with other data values?
Check the frequency distribution for categorical variables
Are there any typing mistakes or unusual codes or groups?
Check the missing values
Are they really not available? Or do we just forget them during data entry?
Checking the consistency of data
For example, a man can't be pregnant, disease duration can't be larger than age, and diastolic blood
pressure can't be larger than systolic blood pressure.
Graphically checking the data
A histogram or a boxplot for a single numeric variable, and a scatterplot for two related variables as
weight and waist circumference may be helpful in exploring possible errors.
Thank You!

You might also like