Math1041 Study Notes For UNSW
Math1041 Study Notes For UNSW
Math1041 Study Notes For UNSW
Oliver Bogdanovski
altering location and shape, but not shape. They are found by the
equation:
xnew = a + bx
Measures of location follow this:
xnew = a + bx
Mnew =
a + bM
Measures of spread are only affected by b:
snew = bs
IQRnew = bIQR
Non-linear transformations change shape, and are good for
correcting skewed data and working with outliers. To pull down the
right tail (right-skewed) use log(x) [preferred], x 1/4 or x1/2 (from
strongest to weakest). These are monotonically increasing (keeps
everything in order), and the base of the log only affects the scale,
not shape, and hence will not make it more symmetrical. Because
log(ab)=log(a)+log(b), they change multiplicative values to
additive. To pull down the left tail (left-skewed), treat it as -x then
continue with right-skewed (e.g. log(-x). If dealing with zeros in
right-skewed, use log(x+1). To stretch the proportions of data
where 0<x<1 to -<x<, use the logit transformation:
xnew = log
Relationships between Two Quantitative Variables
Between two quantitative variables, the relationship can be:
existent or non-existent (random variation)
strong, mild or weak (deviation from line of best fit)
increasing or decreasing (direction)
linear or non-linear
Outliers (in scatterplots) could reveal possible systematic
structure worthy of investigations (e.g. possible external influences).
Correlation (r) - a measure of strength of a linear relationship
between two variables (sensitive to outliers - plot data to be aware
of them):
r=
where
xi = x-axis values (i=1, 2, 3n)
yi = y-axis values (i=1, 2, 3n)
sx = standard deviation of x
-1<r<1
close to 1 = strong positive/increasing linear
relationship
close to -1 = strong negative/decreasing linear
relationship
close to 0 = weak/non-existent linear relationship
Least-Squares Regression
Regression is used to study causal relationships, when an
explanatory variable (independent, x-axis) changes the response
variable, related by a regression line (which doesnt extend
beyond limits of the current data). This is different to correlation as
in correlation each variable is on equal footing (association does not
Page 2
Oliver Bogdanovski
Oliver Bogdanovski
Page 4
Oliver Bogdanovski
Oliver Bogdanovski
3) P(Ac) = 1 - P(A)
complement rule
4) P(A or B) = P(A) + P(B)
addition
rule
(independent/disjoint events)
5) P(1 of A, B, C) = P(A) + P(B) + P(C)
(and
so
on,
independent)
6) P(A and B) = P(A) P(B)
multiplication rule (independent
events)
7) P(A or B) = P(A) + P(B) - P(A and B)
(P(A and B) = 0 if disjoint:
Rule 4)
Picture these as Venn Diagrams to derive them. Two events are
independent if knowing one does not change the other. If a random
phenomenon has equally likely outcomes, then each event has:
P(A) = #outcomes of A/#outcomes of S
In binomial probability (remember only two, independent
outcomes where probability of success remains constant - shown by
random sampling) we can use nCr to determine probability.
Conditional Probability
P(B|A) = =
Multiplication Rule
P(A and B) = P(A) P(B|A)
For more than two events
P(A, B and C) = P(A) P(B|A) P(C|(A
and B))
Both of these rules are mere rearrangements of the definition of
conditional probability.
Total Law of Probabilities for Multiple Outcomes
P(B) = P(A1) P(B|A1) + P(A2) P(B|A2) + P(A3) P(B|A3) +
where A1, A2, A3 are independent (basically sums all areas of B)
Bayes Rule
P(A|B) = =
The first equality comes from applying the multiplication rule to the
definition, and the second part by using the total law of probabilities
on the bottom (which is helpful in circumstances where P(B) is not
directly known). This assumes P(B) 0.
Random Variables
A random variables value is a numerical outcome of a
random phenomenon, usually represented by capital letters towards
the end of the alphabet, except Z (e.g. X, Y). They can be discrete
(where possible values are countable) or continuous (where values
are placed within some interval of real numbers, taking an infinite
number of possibilities). These are represented in tables listing
values (or ranges) and their respective probabilities (p i), called a
probability distribution or just distribution. They must follow two
rules: 0pi1 and p1+p2+ = 1
A binomial distribution is a special case of a discrete random
variable in which an experiment repeated n times has two
outcomes: success (p) or failure (1-p). The probability of a certain
number of trials (X, an integer where 0Xn) is:
P(X=k) = () pk (1-p)n-k
Page 6
Oliver Bogdanovski
Page 7
Oliver Bogdanovski
Normal Distribution
General Equation:
y = e^
The shorthand for normal distribution is: X ~ N(, )
The normality assumption is how much a data set looks like
a normal curve, often based on a somewhat helpful (however
sometimes misleading) histogram. However a normal quantile
plot is made specifically to check this, having normal quantiles
(or z-scores; expected values for a normally distributed data set)
plotted horizontally against the vertical sample quantiles from the
actual data set. If proportional (straight, increasing line with minimal
deviation - small amount allowed near edges) then the normality
assumption is reasonable, however if the data is right-skewed it will
be concave up, if left-skewed it will be concave down (but still
increasing). Other shapes are possible. The similarity can be
described with: excellent, good, fair, poor, hopeless. For normal
measurements, 68% of the data falls within 1 of the , whilst 95%
is roughly within 2 and 99.7% within 3.
In a standard normal distribution we use the letter Z,
where Z ~ (0, 1). These values can be looked up directly in a
standard normal probability table, in which the left axis provides the
first two digits, the top axis provides the third digit, together making
the horizontal value on the normal distribution, and the
corresponding value between these is the area to the left: P(Z<z).
To find P(Z>z) we use 1-P(Z<z) as total area sums to 1. If using
discrete values and less than we must jump a value below (for
P(X<10), use P(X9), but for P(X>10) use 1-P(X<10)). Being equal
to doesnt make a difference as the area at a point is 0 (unless
looking at discrete values). To find P(z 1<Z<z2) or P(|Z|<z)=P(z<Z<z) simply subtract areas (z is a constant representing a
specific value of Z). Similarly, if we know the probability (the area),
the value for which this is true can be deduced. If the distribution is
a normal distribution but not standard, we can standardise it to
calculate values with tables by the linear transformation:
Z=
Hence P(X<c) = P= Pwhich can then be found on the table.
The sampling distribution of a statistic is the probability
distribution of values take by the random variable (tells us how the
statistic will behave from one sample to another). In a binomial
variable X ~ B(n,p):
X = np
X2 = np(1-p)
X =
Often we know n, but not p, so its estimated with the sample
proportion: pp =
The mean and variance of a sample proportion can be expressed as:
pp = p
pp 2 =
pp =
If n is large enough a binomial distribution can be approximated to
a normal distribution where X Y ~ N(np, ) - a similar thing can be
done with pp (using its own values):
pp ~N(p, )
To determine if n is large enough it must satisfy:
Page 8
Oliver Bogdanovski
np>10
AND
n(1-p)>10
Page 9
Oliver Bogdanovski
Page 10
Oliver Bogdanovski
if Ha: < 0
P-value = P(Z z)
(onesided)
if Ha: 0
Page 11
Oliver Bogdanovski
Page 12
Oliver Bogdanovski
Oliver Bogdanovski
xX t*
Using a matched pairs design allows us to account for other
variables (the effects of each individual on the results before so we
know the effect of after).
Relationships between Categorical Variables
In two-way tables (of frequencies) each row or column
represents a variable (listing each of its categories), whilst in
between is the chance of each combination occurring. They are
summaries of joint distributions of the two categorical variables
(as opposed to marginal distributions that only look at one
aspect - e.g. only the males compared to the course they are
enrolled in - in their distribution, rather than both - e.g. females and
males compared to course; if asked to find add these values as row
and column sums on each end/margin). A conditional distribution
shows the effect of one variable upon the other (as with conditional
probability before), and is represented the same way as a joint
distribution.
Two-way tables can be visualised as multiple bar charts
(placed side-by-side), clustered bar charts (within the same chart,
each category has a set of columns of different colours, explained
through a legend), or bar charts of the conditional distribution
(considers one aspect like if female, then has different columns that
show the distribution across each course, as a decimal out of 1).
Simpsons paradox is when there are lurking variables that
may influence performance, so it may appear things are linked,
however in reality the linkage occurs between the response variable
and some lurking variable. This can only be fixed by altering
experimental design.
To make inferences for two-way tables with categorical data,
we use 2 tests. Starting with:
H0: No association (they are independent)
Ha: An association (one is dependent upon the other)
Having the observed counts in the table, we can compute
expected counts under H0 (expected count = row total column
total/n), and produce:
X2 =
Our assumptions are our n observations are independent and
that the sample size is large enough so that all expected counts>10.
We can then look up the respective P-values in the chi-square
distribution table, and our degrees of freedom are (r-1)(c-1) [no. of
rows/columns]. The P-values are calculated as:
P(2>X2)
Unlike t and Z, this distribution is not symmetric, but rightskewed (having larger proportions to the left), and can also only be
positive. 2(df) has mean df, and 2(1) = Z2. At low numbers (2), the
values on the left never decrease to hit zero, however as they
increase the values are shifted more and more towards the centre.
Page 14
Oliver Bogdanovski
Oliver Bogdanovski
Page 16
Oliver Bogdanovski