0% found this document useful (0 votes)
41 views42 pages

Intro To Stats 7 - 4 - 2021

This document provides an introduction to basic biostatistics. It discusses what statistics is, the two main branches of statistics (descriptive and inferential), and key concepts in descriptive statistics. Descriptive statistics are used to summarize and describe characteristics of data through measures like frequency, relative frequency, means, medians, ranges, and interquartile ranges. These statistics can profile characteristics of study participants and examine one or more variables, which may be categorical/qualitative (like gender) or numerical/quantitative (like age). Descriptive statistics are generally reported before addressing research questions.

Uploaded by

Asyura Roslan
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
41 views42 pages

Intro To Stats 7 - 4 - 2021

This document provides an introduction to basic biostatistics. It discusses what statistics is, the two main branches of statistics (descriptive and inferential), and key concepts in descriptive statistics. Descriptive statistics are used to summarize and describe characteristics of data through measures like frequency, relative frequency, means, medians, ranges, and interquartile ranges. These statistics can profile characteristics of study participants and examine one or more variables, which may be categorical/qualitative (like gender) or numerical/quantitative (like age). Descriptive statistics are generally reported before addressing research questions.

Uploaded by

Asyura Roslan
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 42

INTRODUCTION

TO BASIC
BIOSTATISTICS
INTRODUCTION TO STATISTICS
● What is STATISTICS?
● Why should I study statistics?
● Two main branches of statistics
WHAT IS STATISTIC?
● Statistic is a science that deals with the
● Collection
● Organizing
● Analysis
● Interpretation
● Presentation
Of information that can be stated numerically
(Daniel WW, 1999)

● The application of statistics in biological sciences &


medicine : Biostatistics
WHY I SHOULD STUDY STATISTICS?
● A tool for research
● Easier to communicate with statisticians/
Bio statisticians
● Understanding medical literature (improve
literature appraisal skills)
TWO BRANCHES OF STATISTICS
● Descriptive Statistics
● Describe the data by summarizing them
● Inferential Statistics
● Involves using a sample to draw conclusions
about a population
DESCRIPTIVES STATISTICS
● Branch of statistics that involves the organization,
summarization and display of data
● Data : Consists of information coming from observations,
counts, measurements or responses
● Data sets : 2 types :
● Populations : Collection of all outcomes, responses,
measurements or counts that are of interest
● Samples : Subset or part of population
WHAT IS THE DESCRIPTIVE STATISTICS
FOR?
● In any study..before going to answer the research
questions, we should introduce the characteristics
(e.g. age, gender, ethnicity, level of education etc.)
of the study respondents
● Then we use descriptive statistics
EXAMPLE
● Title : Prevalence of and factors associated with periodontal disease in
District A, Kedah
Obj 1: Prevalence of Periodontal (PO) disease?
Obj 2 : Association between age and PO?
Obj 3 : Association between family income and PO?
Obj 4 : Association between knowledge and PO?

3. Result
3.1 Descriptive statistics of the respondents
3.2 Prevalence of PO
3.3 Association between age and PO
3.4 Association between family income and PO
3.5 Association between knowledge and PO
WHAT IS VARIABLE?
● Data item
● A variable is any characteristics, number, or
quantity that can be measured or counted.
● Age, sex, business income and expenses, country of
birth, capital expenditure, class grades, eye colour and
vehicle type are examples of variables.
● We have to identify what are the variables need to be
measured to fulfil our research objectives
VARIABLE
● Characteristics
Id no. Age Gender Income Decay Missing

1 001 36 2 4500 2 1

2 002 45 1 1200 7 1

3 003 33 2 2500 1 2

One Variable

One Variable
4 004 41 2 700 2 3

5 005 38 1 1600 6 2

6 006 43 1 1700 8 1

7 007 32 2 2700 1 1
TYPES OF VARIABLES Nominal
Categorical (e.g Gender, Ethnic)
(Qualitative)
(Non Numerical) Ordinal
(e.g Education Level)

Variables
Discrete (counts)
(e.g No of children in a
family)
Quantitative
(Numerical)
Continuous
(Measurements) e.g
weight, height
CLASSIFY THESE VARIABLE AS CATEGORICAL
(QUANTITATIVE) OR QUANTITATIVE
● Number of decayed teeth ?
● Net weight ( in grams) of packet cereal?
● Brand of toothpaste used?
● Type of transportation used by students
● Knowledge on number of teeth?
The number of teeth in adults is 20 teeth Yes/No

● Responses as Likert Scale :


1 : Strongly Disagree,
2 : Disagree,
3 : Neither agree nor disagree,
4 : Agree,
5 : Strongly agree
INDEPENDENT & DEPENDENT VARIABLE
The variables in a study of a cause-and-effect relationship are called the independent
and dependent variables.
● The independent variable is the cause. Its value is independent of other variables
in your study.
● The dependent variable is the effect. Its value depends on changes in the
independent variable.
Examples of independent and dependent variables
Research Question Independent variable(s) Dependent variable(s)

Do tomatoes grow fastest under • The type of light the tomato • The rate of growth of the
fluorescent, incandescent, or natural plant is grown under tomato plant
light?
What is the effect of diet and regular soda • The type of soda you drink • Your blood sugar levels
on blood sugar levels? (diet or regular)
How does phone use before bedtime • The amount of phone use • Number of hours of sleep
affect sleep? before bed • Quality of sleep
How well do different plant species • The amount of salt added • Plant growth
tolerate salt water? to the plants’ water • Plant wilting
• Plant survival rate
VARIABLE ANALYSIS
● What's the difference between univariate,
bivariate and multivariate descriptive statistics? 
● Univariate statistics summarize only one variable at a
time.
● Bivariate statistics compare two variables. 
● Multivariate statistics compare more than two
variables.
What is the Descriptive Statistics for?
In any study … .
Before going to answer the research questions, we should introduce the
characteristics (e.g. age, gender, ethnicity, etc.) of study respondents.
Then we use descriptive statistics.
Example
Title: Prevalence of, and factors associated with periodontal disease in District
A, Kelantan
Obj. 1: Prevalence of periodontal (PO) disease?
Obj. 2: Association between age and PO?
Obj. 3: Association between family income and PO?
Obj. 4: Association between knowledge and PO?

3. Result
3.1 Descriptive statistics of the respondents
3.2 Prevalence of PO
3.3 Association between age and PO
3.4 Association between family income and PO
3.5 Association between knowledge and PO
Contents
● Descriptive Statistics for Qualitative (Categorical) Variables

● Descriptive Statistics for Quantitative (Numerical) Variables


How to describe a categorical variables?
Statistics
● Frequency
● Relative frequency
● Cumulative relative frequency

Figure/Chart
● Bar
● Pie
Frequency Table Relative frequency
Cumulative
relative
frequency

Ethnicity Frequency Percent Cum. %

Malay 478 79.7 79.7

Chinese 65 10.8 90.5

Indian 51 8.5 99.0

Others 6 1.0 100.0

Total 600 100.0


Bar Chart
Figure 1. Distribution of Ethnicity
600

500 478

400
Frequency

300

200

100 65 51
6
0
Malay Chinese Indian Others

Ethnicity
Pie Chart
Others
Indian 1%
9%

Chinese
11%

Malay
79%
Figure 1. Ethnicity
How to describe a numerical variable?
Statistics
● Central tendency
● Dispersion

Figure/Chart
● Histogram/Frequency polygon
● Box plot
Central tendency
● Mean
● Median (50th Percentile)

Dispersion
● Standard deviation (SD) / Variance
● Inter-quartile range (IQR) (3rd quartile- 1st quartile)
● Range (Maximum – Minimum)
Central tendency
● Mean (arithmetic mean or average)
Central tendency
● Median
 The 50th percentile of a set of measurement

16 5 13 8 10 12 6 5 6 8 10 12 13 16

5 13 8 10 12 6 5 6 8 10 12 13

Min. 25th Percentile 50th Percentile 75th Percentile Max.


1st Quartile 2nd Quartile 3rd Quartile
Median
Dispersion (variation)
● Range
 The distance between maximum and minimum
 Range = (Maximum – Minimum)

Range

Min. 25th Percentile 50th Percentile 75th Percentile Max.


1st Quartile 2nd Quartile 3rd Quartile
Median
Dispersion (variation)
● Interquartile range (IQR)
 The distance between the 1st quartile and the 3rd quartile
 IQR = (3rd quartile – 1st quartile)

IQR

Min. 25th Percentile 50th Percentile 75th Percentile Max.


1st Quartile 2nd Quartile 3rd Quartile
Median
Dispersion (variation)
● Variance (s2)
● Standard deviation (s)
 Measure the amount of variability or spread about the mean of a sample

√ ∑ ( 𝑥𝑖 − 𝑥 ) 2

𝑠=
( 𝑛 −1 )
5 6 8 10 12 14 15

10
Why do we need both ‘central tendency’ and ‘dispersion’ to describe a
numerical variable?

Example (age)

11 7
12 9
13 11
14 13
15 Mean = 15.0 15 Mean = 15.0
16 SD = 2.7 17 SD = 5.5
17 19
18 21
19 23

A B
Histogram / Frequency Polygon

Age of respondents

50
45
45
40
40 40
35
Frequency

30
30
25
25
20
20
15
15 15
10
10 10
5
6
0
0-4 5-9 10-14 15-19 20-24 25-29 30-34 35-39 40-44 45-49 >50

Age in years
Histogram/ Frequency Polygon
Age of respondents

50

40
Frequency

30
Frequency
20
polygon

10

0
0-4 5-9 10-14 15-19 20-24 25-29 30-34 35-39 40-44 45-49 >50

Age in years
Box plot
Histogram, Normal Distribution

Reasonably (approximately) normally distributed

Skewed to the right Skewed to the left


(positively skewed) (negatively skewed)
● If the distribution is normally
distributed, use Mean (SD)

● If the distribution is skewed, use


Median (IQR)
Properties of normal distribution
● Bell-shaped curve
● Symmetrical about its mean (mirror image
to each side)
 Mean and median are equal.
 One side of the mean is 50% of the area.
 The area between mean+1 SD and mean-
1 SD is 68% (Mean1 SD).
 The area between mean+2 SD and mean-
2 SD is 95% (Mean2 SD).
 The area between mean+3 SD and mean-
3 SD is 99.7% (Mean3 SD).
3. Result

3.1 Descriptive statistics of study sample

From the 102 selected residents, 100 (98% response rate) has agreed to participate. Details of the descriptive statistics of
study respondents are shown in table 3.1

Table 3.1: Descriptive statistics of study sample (100 respondents)


Variable Mean (SD) Median (IQR) Freq. (%)
Age (year) 40.6 (5.36) - -
Family income (RM) - 2500 (2000) -
Gender
Male 48 (48.0)
Female 52 (52.0)
Ethnic
Malay 80 (80.0)
Chinese 15 (15.0)
Indian 4 (4.0)
Others 1(1.0)
Education level
No schooling 34 (34.0)
Primary school 32 (32.0)
Secondary/higher 34 (34.0)

SD = Standard Deviation ; IQR = Interquartile Range ; Freq = Frequency


a. The distribution is skewed to the right
DATA ENTRY
● Start with alphabet and use small letter
● No spacing, so use underscore
● Avoid use symbol
● Preferably use Microsoft Excel
1 Malay Smoking_
2 Chinese 0 No response
1 Male 3 Indian 1 Yes
id_no 2 Female 4 Others Age 2 No

001 1 1 22 1

002 2 1 25 2

003 1 3 30 2

004 2 2 35 2

005 2 4 21 2

006 1 1 27 1

007 1 2 24 1
Example :
Objectives :
1. To compare duration of exercise between obese and non-obese
population
Dependent variable = ?
Independent variable = ?, how many?
2. To compare duration of exercise between groups of patients with different
levels of education
Dependent variable = ?
Independent variable =?, how many?
ASSUMPTIONS
● In statistical analysis, all parametric tests assume some certain
characteristic about the data, also known as assumptions.  Violation of
these assumptions changes the conclusion of the research and
interpretation of the results. Therefore all research, whether for a journal
article, thesis, or dissertation, must follow these assumptions for accurate
interpretation  Depending on the parametric analysis, the assumptions
vary.
CHOOSING THE CORRECT STATISTICAL
TEST
Number of Groups Parametric Test Non-parametric

Numerical Data Numerical Data

Two (Independent) Independent t-test Mann-Whitney test


-Categorical
(smokers and non smokers)
> Two (dependent) One-way ANOVA Kruskal-Wallis test
-Categorical
(Malay, Chinese, Indian)
Two (dependent) Paired t-test Wilcoxon Signed Rank Test
-Categorical
(Pre and post intervention)
Two Pearson’s correlation Spearman’s correlation
-Numerical
CHOOSING THE CORRECT STATISTICAL
TEST
Number of Groups
Assumptions Assumptions

Two (Independent) Chi-square test Fisher's exact test


-Categorical
-(smokers & non-smokers)
> Two (Independent) Chi-square test Fisher's exact test
-Categorical
(Malay, Chinese, Indian)
Non-parametric test
Example:

You might also like