Module 2 - Script Tagged
Module 2 - Script Tagged
Introduction
In this module, we’ll give orientation to the basic notations and mathematical operations that
you’ll need to know in order to be fully prepared to take on the concepts specific to
biostatistics. This module will cover all of the rudiments of biostatistics practice, building a
foundation for both statistical analysis and numerical programming.
Concept 1: Numeracy
A common notion for numerals is that they exist along a Number Line, which includes negative
numbers to the left of the zero, and positive numbers to the right:
…. -2 -1 0 +1 +2 ….
If we have a point or a set of points, we show this on the number line with a closed dot. For
instance, the set {0,1,4} is shown as
0 1 4
Numerical values in applied biostatistics typically fall into three categories: “whole numbers”
(Integers), and fractions and decimals.
Note that American notation is to use commas to separate thousands and millions.
Decimals are numbers with a portion of a whole number: 1.5, 98.6, -2.12, 0.05
Note also that American notation is to use periods or dots to demarcate the decimal
Reminder that the fraction has two parts: a numerator (on top) and denominator (on
bottom).
Concept 2: Arithmetic
The foundation for statistics starts with the four Basic Operations: Addition, subtraction,
multiplication and division.
2+2=4
2–2=0
3×2=6
6÷2=3
Remember that division has multiple ways of being written: as a fraction, with a division
symbol, or with a slash:
20
5
= 20 ÷ 5 = 20/5 = 4
and multiplication can be written with a dot, with a multiplication symbol, or with one
value in parentheses
Sometimes, two objects can be sitting right next to each other with no notation, and
multiplication is implied(!!). This is a common short-hand for when a numerical value is next to
a variable, e.g. ‘3x’ means “three times x”.
2 + 2 + 2 (=6)
2 + 2 – 5 (=-1)
Often, Compound Expressions will be phrased with parentheses, which indicate Precedence, in
order to indicate which operation should be performed first.
(2 + 2) × 5 = 20
2 + (2 × 5) = 12
You’ll see this again in a minute: operations in parentheses are the first items you
evaluate in an expression.
Concept 3: Roots, Exponents and Scientific Notation
Several key concepts in statistics incorporate powers, which is a number multiplied by itself a
number of times. The number being multiplied is the base, and the number of self-
multiplications is indicated by the exponent, which is the super-script notation above and to the
right of base
32 = 3 × 3 = 9
And a fractional value is a “root”; the most common root is the square-root
161/2 = √16 = 4
Now that we’ve learned about exponents, we can discuss the Order of Operations, which
dictates the Precedence for the six basic arithmetic operations. This is informally known as
PEMDAS
Parentheses
Exponent
Multiplication
Division
Addition
Subtraction
When you see a compound expression, you should perform all operations, based on the Order
PEMDAS. From earlier:
(2 + 2) × 5 = 20
2 + (2 × 5) = 12
We prioritized the parentheses first, in each expression, in order to calculate the answer. For
more complicated expression, we follow PEMDAS until complete:
Exponents also help us to navigate the conventional phrasing for very large and small numbers,
known as Scientific Notation. This is accomplished by using 10 and an exponent
Quantities with two different units cannot be added or subtracted from one another:
For multiplication and division, the one rule is that the units of the end-result must reflect the
operation.
5cm × 3 cm = 15cm2
10𝑚𝑔 10 𝑚𝑔 10 𝑚𝑔
10mg ÷ 2 mg = 2𝑚𝑔 = 2 𝑚𝑔 = 2 𝑚𝑔= 5 (units cancel)
In real life, and in statistics in particular! every operation involving units must include the
appropriate unit in the answer.
Concept 5: Symbols
When dealing with a range of numbers, the square braces [ ] indicate that the bounds includes
everything between the stated values including the values themselves; the round braces
indicate that the range includes everything between the values, but not the values themselves
[0, 1] is all values on the number line between 0 and 1, including 0 and 1
(10,20) is all values between 10 and 20, but not including 10 or 20
(-1, 1] is all values between -1 and 1, including 1, but not including negative 1
Note that these ranges are shown as a thickened section on the number line. Inclusive bounds
are shown with a dark circle; exclusive bounds are shown with an empty circle.
0 1
10 20
-1 1
Very commonly, we’ll use comparator symbols: greater-than, less-than, and greater-
than/equal-to and less-than/equal-to
Remember the directionality of the arrow can be thought of as a fish opening up to eat a BIG
meal. If the symbol is just a pointy-bracket, then it is greater than/less than; if it is a pointy
bracket over a bar, then it is greater-than-equal-to or less-than-equal-to.
The plus-minus symbol indicates a pair of values, and almost always as a range. Typically this is
an inclusive range, so square brackets would be used
±3 = [-3, 3]
-3 0 +3
5±4 = [1, 9]
1 5 9
A set is a group of numbers with something in common. For example, if we take body-weight
measurements of three people and get a set w:
We reference elements within the set with a subscript: for instance, w3 = 68.9.
1. The summation symbol ∑ is a special operator utilizing the Greek ‘S’ to convey the
meaning “sum” of elements in a set
∑𝑤
The Sigma typically has notations above- and below that give you clear indication of
what to sum over
𝑛=3
∑ 𝑤𝑖
𝑖=1
w1 = 69.2
w2 = 54.6
w3 = 68.9
𝑛=3
∑ 𝑤𝑖 = 192.7
𝑖=1
Concept 6: Algebra + Functions
Algebra is the branch of mathematics where we use symbols in place of numbers; these
symbols are place-holders representing a quantity of unknown value. A lot of times in real-
world problem solving, we don’t have all the facts in front of us, and we have to use what we
do have to solve for the unknowns. This is the premise of algebra.
The most basic use of elementary algebra is seen in solving an expression with a single-
unknown. Let’s consider, for example, an expression that might roughly describe weight, in
pounds, as determined by height, in inches
𝑝𝑜𝑢𝑛𝑑𝑠
80 pounds = 6 <times> an unknown height, x, in inches – 250 pounds
𝑖𝑛𝑐ℎ
We might phrase this as a mathematical expression by removing the units, and simply stating
the quantities and their respective operations
6x – 250 = 80
If we want to know what height might correspond to a weight of 80 pounds, given this
expression, we need to solve for the unknown height, x. In order to get ‘x’ by itself, we apply
the Order of Operations in reverse. First we add 250 pounds to both sides, yielding
6x ÷ 6 = 330 ÷ 6
x = 55
Then, division:
Thus, a person who is 75 inches tall might be expected to weigh near 200 pounds.
Concept 7: Graphing
It is important to be able to draw and interpret graphs of functions. We start by drawing axes,
which are one horizontal line intersecting with one vertical line. These axes are Number Lines
that share a relationship through a Function. When we plot a functional relationship, we make
marks in whatever location matches the position on the two Number Lines at the same time.
We start by drawing our axes:
f(x)
This “x-y” plot is known as the Cartesian plane, after Rene Descartes.
We always label our plots. Remember that the variable in the argument of the function (here:
x), is the independent variable and is plotted along the horizontal axis. The output (f(x)), the
dependent variable, is plotted along the vertical axis.
We can then mark a few ticks on our axes to get a sense for the grid
f(x)
200
180
160
140
120
100
60 65 70
x
To plot the function manually, we take a few values of x, substitute them into the equation, and
get the output, then we mark them on the plot accordingly. Suppose our function is f(x) = 6x -
250. A few test points would be
So we can begin plotting by taking the first point, and matching its x- and f(x)-values along their
axes. For the first data point, we see that this corresponds to the vertical line at x=60 and the
horizontal line at f(x) = 110; the intersection is where these two lines meet.
f(x)
200
180
160
140
120
100
60 65 70
x
Once we find the respective locations of this data-point along each axis, we can plot a dot at
the intersection.
f(x)
200
180
160
140
120
100
60 65 70
x
Then we populate the remaining points
f(x)
200
180
160
140
120
100
60 65 70
x
f(x)
200
180
160
140
120
100
60 65 70
x
Later, we’ll see how to get R to do all this plotting for us. We won’t do too much plotting by
hand! But it’s important that we know where these plots come from, and are able to generate
them step-by-step.
Closure
That’s it! Now we’ve covered the foundational concepts that you’ll need to move forward in
statistics, and to start learning R.
In this module, we learned basic concepts of numeracy including the number line and how to
perform arithmetical operations on these numbers, including order the operations. We
discussed a number of symbols including operators, brackets, and algebraic and functional
notations, and we used our arithmetic techniques to solve for unknowns and plot functional
relationships.
I hope you find this module informative and encouraging, and that you emerge with an
excitement to transfer these skills into a whole new world of statistics and numerical
programming!
For a practice problem set and skills check, head on over to the course website and complete
the rest of the Module materials!