0% found this document useful (0 votes)
2 views

Lecture 2

The document provides an overview of variables in statistics, categorizing them into quantitative (numerical) and qualitative (categorical) types. It explains the importance of independent, dependent, and control variables in research, as well as different measurement scales such as nominal, ordinal, interval, and ratio. Additionally, it discusses methods for organizing and displaying data, including frequency distributions and graphical representations like histograms and bar charts.

Uploaded by

Peter Parker
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
2 views

Lecture 2

The document provides an overview of variables in statistics, categorizing them into quantitative (numerical) and qualitative (categorical) types. It explains the importance of independent, dependent, and control variables in research, as well as different measurement scales such as nominal, ordinal, interval, and ratio. Additionally, it discusses methods for organizing and displaying data, including frequency distributions and graphical representations like histograms and bar charts.

Uploaded by

Peter Parker
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 50

VARIABLE

If, as we observe a characteristic, we find that it takes on


different values in different persons, places, or things, we
label the characteristic a variable.
TYPES OF VARIABLES

The description of the


characteristic of interest may
result in a numeric or non
numeric value.
There are two main types of
variables in statistics i.e.
Quantitative (Numerical) and
qualitative (Categorical).
1. QUANTITATIVE VARIABLES
A measurement or count is required to describe the characteristic
of interest
• i.e. described by numeric values
• Examples: age, weight, height and systolic blood pressure of
patients in a clinic
Variables like those given in the example are called random
variables.
• Because we cannot predetermine the value of the variables (i.e.
the value arises purely by chance).
1. QUANTITATIVE VARIABLES
Quantitative variables can be further described as discrete
random variables or continuous random variables.

Discrete variables result from counts


• Characterised by gaps or interruptions in the values that it can
assume.
• The gaps indicate the absence of the value between particular
values that the variable can assume.
• E.g. number of live births for a woman only be presented as whole
numbers i.e. 0, 1, 2, 3 etc.
1. QUANTITATIVE
VARIABLES
A continuous variable can assume any numerical value
within a specified relevant interval of values that can be
assumed by the variable.
• Usually results from making a measurements.
• E.g. weight, height, mid arm circumference etc.
2. QUALITATIVE VARIABLES
A variable for which the description of the characteristic of
interest results in a non-numerical value.
Variables are categorised in groups based on presence or
absence of a characteristic of interest.
We can count elements belonging to the categories.
Examples: Residence, sex, level of education, eye colour etc.
TYPES OF STATISTICAL DATA
FAMILIES/CATEGORIE
S OF VARIABLES
Variables play a crucial role in formulating research
questions, designing experiments, collecting data, and
drawing conclusions.
They are broadly categorized into independent variables,
dependent variables, and control variables
INDEPENDENT VARIABLE
(IV):

• The independent variable is the factor that researchers


manipulate or control in an experiment.
• It is also referred to as the "cause" variable as changes in the
independent variable are believed to directly influence the
dependent variable.
• For example, in a study examining the effect of different study
techniques on exam performance, the study technique
employed is the independent variable.
DEPENDENT
VARIABLE (DV
• The dependent variable is the outcome or response that
researchers measure to determine the effects of the
independent variable.
• It is also referred to as the "effect" variable as it is expected
to change based on variations in the independent
variable.
• Continuing with the previous example, the exam performance
of the participants is the dependent variable being measured.
CONTROL VARIABLES:

• Control variables are factors that are held constant or


controlled in an experiment to prevent them from influencing
the relationship between the independent and dependent
variables.

• In the study on study techniques and exam performance,


factors like the participants' prior knowledge, time spent
studying, and the difficulty level of the exam might be
controlled.
Measurement scales
TYPES OF MEASUREMENT SCALES
There are four main measurement scales that result from the fact
that measurement may be carried out under different sets of rules.
The four main scales of measurement include;
• Nominal
• Ordinal
• Interval and
• Ratio scale.
NOMINAL SCALE
Characterised by naming observations or
Classifying them into various mutually exclusive and collectively
exhaustive categories.

• Consist of names, labels, or categories only.


• Cannot be arranged in an ordering scheme.
• The arithmetic operations ( i.e. +, -, x or /) are not performed for
nominal data.
• It is the lowest level of measurement.

Examples: sex, residence, colour of eyes, race, ethnicity, religion,


blood group etc.
ORDINAL SCALE
Applies to observations that can be arranged in some order or
ranked, but….

differences between data values either cannot be determined or


sometimes it’s meaningless.

Examples: Intelligence level, level of income, level of pain etc.


INTERVAL SCALE
Applies to data that can be arranged in some order and for which
differences in data values are meaningful (i.e. can be calculated
and interpreted).

This is a more sophisticated scale and it is a truly quantitative


scale.

The value zero is arbitrary


• does not imply an absence of the characteristic being measured.
(E.g. regarding temperature, 0 degrees does not mean absence of
heat).

Examples: temperature, test scores, intelligence quotient (IQ) etc.


RATIO SCALE
Data can be arranged in an ordering scheme (ranked) and
differences between ratios can be calculated and
interpreted

It is the highest level of measurement.

All arithmetic operations including division can be


performed on scale data. Except division by zero.

Ratio level data has an absolute zero and a value of zero


indicates a complete absence of the characteristic of
interest.

Example: weight, height, length etc.


WHAT SCALES OF MEASURES ARE
USED FOR THE VARIABLES IN THIS
DATA SET?
ORGANISIN
G DATA
ORGANISING DATA
Organising data is an important first step in Biostatistics….
• before getting into sophisticated data analysis.
Involves making data summaries and displays.
• Useful in familiarising with the data
• Identification of unusual values or errors
Techniques for organising and displaying data may be;
• In text
• Tabular or
• graphical
EXAMPLE
Given data (i.e. name and age) of 15 students in a class presented
in alphabetical order……

Agnes. B 30 Mark. L 45
Andrew. K 33 Martha. S 32
Anne. G 28
Mary. D 39
Ben. T 22
Mercy. J 42
Henry. P 22
Michael. I 39
Irene. S 44
Peter. N 37
Jane. J 27
John. C 20 Pretty. M 24

It may be very difficult to draw any useful conclusions from the


data
AS THE SEX OF
STUDENTS

Male Female
John. C 20 Mary. D 39
Henry. P 22 Martha. S 32
Ben. T 22 Agnes. B 30
Peter. N 37 Mercy. J 42
Andrew. K 33 Pretty. M 24
Mark. L 45 Jane. J 27
Michael. I 39 Anne. G 28
Irene. S 44
Or we may classify the data according to age groups

Age between Age between Age over 40


20 and 29 30 and 39

John. C 20 Peter. N 37 Mark. L 45


Henry. P 22 Andrew. K 33 Mercy. J 42
Ben. T 22 Michael. I 39 Irene. S 44
Pretty. M 24 Mary. D 39
Jane. J 27 Martha. S 32
Anne. G 28 Agnes. B 30
We still cannot draw any conclusions from the data because it is
about individuals.

In Biostatistics our interest is in groups of data and not


individuals.

We need to transform the data into information about the group,


and we can do this by tabulation (See table 1 on the next slide).
TABLE 1: STUDENTS
BY AGE AND SEX
Sex
Total
Age group Male Female

20 – 29 2 4 6

30 – 39 4 2 6

Over 40 2 1 3

Total 8 7 15
TABLE FEATURES
Table title
Table
number

Column
Row header
header
Row
Body
classifie
r
FREQUENCY TABLES
A frequency distribution is a tabulation which
shows the number of times (i.e. the frequency)
each different value occurs.
• the following figures are the times (in minutes) taken by
a Lab Technologist to perform a given repetitive task on
20 specified occasions during the working day:
3.5 3.8 3.8 3.4 3.6
3.6 3.8 3.9 3.7 3.5
3.4 3.7 3.6 3.8 3.6
3.7 3.7 3.7 3.5 3.9
If we now assemble and tabulate these figures,
we can obtain a frequency distribution as
follows (Table 2, next slide)
TABLE 2: TIME TAKEN FOR A
TECHNOLOGIST TO PERFORM A TASK
Length of time Frequency Cumulative Relative Cumulative
(minutes) frequency frequency relative
frequency
3.4 2 2 0.1 0.10
3.5 3 5 0.15 0.25
3.6 4 9 0.2 0.45
3.7 5 14 0.25 0.70
3.8 4 18 0.2 0.90
3.9 2 20 0.1 1.00
Total 20 1.00
FREQUENCY TABLES
Cumulative frequency is useful when we want to know the number
of items below or equal to an individual value or category.

The de-cumulative frequency is useful if we want are interested in


the items above or equal to an individual value or category.

The relative frequency is the proportion or percentage of the total


frequency for the frequency in individual values or categories.

The cumulative relative frequency is derived the same way as the


cumulative frequency to show the proportion of items below or
equal to the value or category of interest.
FREQUENCY
DISTRIBUTION:
Definition: A frequency distribution is a table that displays
the frequency of various outcomes in a dataset.
It shows how often each different value appears in a dataset.
FREQUENCY
DISTRIBUTION:
Grouping data into ordered arrays makes observations more
comprehensible and meaningful.
While computers can now calculate descriptive measures
directly from large data sets, grouping primarily serves
summarization purposes.
This process helps in understanding data but may also
reduce specificity.
FREQUENCY
DISTRIBUTION
Data are grouped into non-overlapping class intervals.
The number of intervals is crucial:
too few intervals cause loss of information, while too many
fail to summarize.
FREQUENCY
DISTRIBUTION
Generally, five to fifteen intervals are recommended.
Sturges’s formula, k=1+3.322log​10(n), offers guidance on the
number of intervals, but adjustments can be made for clarity.

Symbolically, the class interval width is given by w=R/k


FREQUENCY
DISTRIBUTION
where R (the range) is the difference between the smallest
and the largest observation in the data set, and k is defined
as above.
As a rule this procedure yields a width that maybe
inconvenient for use.
Again, we may exercise our good judgment and select a
width (usually close to one given by Equation ) that is more
convenient.
FREQUENCY
DISTRIBUTION:
to Create a Frequency Distribution:
• Identify the Range: Determine the range of values in the
dataset.
• Determine the Number of Classes or Intervals: Decide on
the number of classes (groups) to divide the data.
• Calculate the Class Width: Divide the range by the number
of classes to determine the width of each class.
FREQUENCY
DISTRIBUTION:

• Create Classes: Divide the range of values into intervals


based on the class width.
• Count Frequencies: Count the number of data points falling
within each class interval.
• Construct the Frequency Table: Create a table showing the
classes and their corresponding frequencies.
GRAPHICAL
REPRESENTATIONS
Graphical methods are used to summarize data visually.
Common graphical representations include histograms, bar
charts, pie charts, and box plots.
Purpose: Graphical summarization helps in quickly
identifying patterns, trends, and distributions within the data.
It provides a visual understanding of the dataset.
GRAPHICAL
PRESENTATION OF DATA
Pie chart
Bar graph
Histogram
Frequency polygon
Scatter diagrams
• Shapes of frequency distributions
GRAPHICAL
REPRESENTATION
Bar Charts: Used for categorical data.
Histograms: Used for continuous data, showing frequency
distribution.
Box Plots: Show the distribution of data based on five-
number summary (minimum, first quartile, median, third
quartile, maximum).
Scatter Plots: Used to examine relationships between two
continuous variables.
Line Graphs: Display trends over time for continuous data.
LINE GRAPH
BOX AND WHISKER
PLOT
Pie Chart

FIGURE 1: PERCENTAGE DISTRIBUTION OF


HOUSEHOLDS BY MAIN SOURCE OF LIGHTING
ENERGY IN RURAL ZAMBIA

A Pie Chart is a type of graph that displays data in a circular graph. The
pieces of the graph are proportional to the fraction of the whole in each
category. In other words, each slice of the pie is relative to the size of that
category in the group as a whole.
BAR GRAPH/CHART

Figure 2: Education level of pregnant women


tested for HIV during antenatal care visit

A bar graph (also known as a bar chart or bar diagram) is a visual tool
that uses bars to compare data among categories. A bar graph may
run horizontally or vertically. The important thing to know is that the
longer the bar, the greater its value.
HISTOGRAM

Histograms is a graphical representation of the distribution of quantitative


(numerical) or continuous data, first introduced by Karl Pearson.
Histogram plot data with ranges of the data grouped into intervals
called bins while Bar Charts are used to plot categorical data.
The major visual difference between bar chart and histogram is gaps among
bars that is typical for bar chart but not for the histogram.
PATTERNS IN
HISTOGRAMS
FREQUENCY POLYGON

A frequency polygon is a graph constructed by using lines to join the


midpoints of each interval, or bin. The heights of the points represent
the frequencies. A frequency polygon can be created from the
histogram or by calculating the midpoints of the bins from
the frequency distribution table.
SCATTER GRAPH

A scatter plot (aka scatter chart, scatter graph) uses dots to


represent values for two different numeric variables. The position of
each dot on the horizontal and vertical axis indicates values for an
individual data point. Scatter plots are used to observe relationships
between variables.
GENERAL CONSIDERATIONS
Graphs and tables must be clearly labelled
• Table titles are placed above the table while graph/figure title are
places at the bottom of the graph/figure.
Graphs and tables should be self explanatory even with
accompanying text.
• At the same time they should not be too cluttered with too much
detail
Graphs and tables should not be misleading

Know the type of data you have to select the best type of graph to
use
• E.g one cannot use a bar graph for data on age that is continuous,
but could do so if the data is categorical i.e. age groups.
END OF LECTURE

You might also like