0% found this document useful (0 votes)
5 views

Basic Probability and Statistical Distribution

This document provides an introduction to basic probability concepts and statistical distributions. It discusses key probability concepts like sample spaces, events, and rules of probability like addition and multiplication. It also introduces discrete and continuous random variables and their associated probability mass functions and cumulative distribution functions. Random variables allow probabilities to be assigned to quantified outcomes. Probability functions characterize the distribution of random variables and are used to define summary measures like means, variances, and expected values.

Uploaded by

susan mwingo
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
5 views

Basic Probability and Statistical Distribution

This document provides an introduction to basic probability concepts and statistical distributions. It discusses key probability concepts like sample spaces, events, and rules of probability like addition and multiplication. It also introduces discrete and continuous random variables and their associated probability mass functions and cumulative distribution functions. Random variables allow probabilities to be assigned to quantified outcomes. Probability functions characterize the distribution of random variables and are used to define summary measures like means, variances, and expected values.

Uploaded by

susan mwingo
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 10

Basic probability and statistical distributions

The elements of probability theory serve as a cornerstone to most,if


not all, statistical operations, be they descriptive or inferential. In this
chapter, a brief introduction is given to these elements, with focus on
the concepts that underlie the foundations of statistical informatics.
2.1 Concepts in probability
Data are generated when arandom process produces aquantifiable or categorical outcome. We
collect all possible outcomes from a particular random process together into a set, S,called
the sample space or support space. Any subcollection of possible outcomes, including asingle
outcome. is called an event,E. Notice that an event is technically asubset of the sample space
S. Standard set notation for this is &CS.
Probabilities of observing events are defined in terms of their long-term frequencies of
occurrence, that is, how frequent the events (or combinations of events) occur relative to all
other elements of the sample space. Thus if we generate a random outcome in a repeated
manner and count the number of occurrences of an event &,then the ratio of thiscount to the
total number of times the outcome could occur is the probability of the event of interest. This
is the relative frequency interpretation of probability. The shorthand for P[Observe event E] is
P[E]for any EcS. To illustrate, consider the following simple, if well recognized, example.
Example 2.1.1 Six-sided die rol. Rollafair, six-sided die and observe the number of 'pips'
seen on that roll. The sample space is the set of all possible outcomes from one roll of that die:
S={1.2. ....6}. Any individual event is asingle number, say. E =(6} = (a rollshowing 6
pips).Clearly, the single event = {6}is contained within the larger sample space S.
If the die is fair, then each individual event is equally likely. As there are six possible
events in S, to find P[E], divide 1(for the single occurrence of E) by 6 (for the six possible
Outcomes): P[E]= , that is, in one out of every six tosses, we expect to observe a {6).
2.1.1 Probability rules
A variety of fundamental axioms are applied in the
most well known are interpretation of a probability P[E). The

(la) 0< P[E]< 1, and


(Ib) P[S]= 1.
In addition, anumber of basic rules apply for
combinations of two events, E, and E,. These are
(2a) Addition Rule. P[E, or E,]= P[E]+ P[E,]-PE, and E,].
(2b) Conditionaliy Rule. P[E given &,]= P[E, and &,]/P[E] for any
event E, such that
P[E,]> 0. For notation,conditional probabilities are written with the symbol |', for
example, P[E,|E,]= P[E, given El.
(2c) Multiplication Rule. P[E, and E,] = P[E, |E,] P[E].
Special cases of these rules occur when the events in question relate in a
certain
example, two events E and E, that can never occur simultaneously are called way. For
equivalently, mutuallyexclusive). In this case, P[E, and E,] = 0. Notice that if twodisjoint
(or
disjoint, the Addition Rule in (2a) simplifies to P[E or E]= P[E,]+ P[E,]. events are
events, E and E, are complementaryif the joint event ( orE,} makes up the entire Two disjoint
space S. Notice that this implies P[E, or E,]= 1.Iftwo events, Eand E,, are sample
complementary
Example 2.1.2 Six-sided die roll (Example 2.1.1, continued). Return to the roll of a fair.
six-sided die. As seen in Example 2.1.1, the sample space is S ={1,2, ,6). As the die
is only rolled once, no two singleton events can occur together, so, for example,
observing a
{6} and observinga{4) are disjoint events. Thus from the Addition Rule (2a) with disjoint
events, P[4 or 6] = P[4] + P[6] =+=;
More involved constructions are also possible. For instance, from the Complement Rule
(2d), P[not observing a 6]= P[lor 2or 3 or 4or 5] = 1- PI6] = 1-=.
The case where disjoint events completely enumerate the sample space S has a special
name: it is called a partition. One need not be restricted to only two events, however. If a
set of h>2events, (E,.E, .. ,E,), exists such that (i) all the h events are disjoint from
A different relationship occurs between two events if they do not impact each other in
any way. Suppose the knowledge that one event E occurs has absolutely no impact on the
probabilitythat a second event &, occurs and that the reverse is also true. Two such events
are called independent. In effect, independent events modify the Conditionality Rule (2b) into
P[E,|E,] = PIE]and P(E,|E,]= P[E,]. More importantly, for two independent events, the
Multiplication Rule (2c) simplifies to PIE, and &,]= P|E||P[E,I.
2.1.2 Random variables and probability functions
Suppose a random outcome can be guantifiedformally. either because (i) it is an actual mea
Slurementor count or (ii) it is a qualitative outcome that has been unambiguouslycoded into a
numeric value. Such aquantified random outcome iscalled arandom variable. Standard nota
ion tor random variables is uppercaseRoman letters, such as Xor Y. To distinguish between
a conceptual random variable and one that has already been realized in practice, the realized
Value is denoted by a lowercase Roman character: x or y. The basic probability rules for events
as discussed in Section 2.I.Ican then be expanded to describe random variables.
Atthe core of the operation is the notion of a probability function. Probability functions
are unifying mathematical descriptions of how a random outcome varies. They are used to
characterize two basic types of random variables: discrete and continuous. A discrete ran
domvariable takes on only discrete values; examples include simple binary variates (say
*damaged' =l vs. 'operating' =0 in a component reliability study), counts of occurrences
(numbers of customers who purchase a sale item or numbers of adverse-event reports with a
new drug), or even studies that result in an infinite, yet countable, number of outcomes (i.e.
counts without aclear upper bound, such as the number of different insect species in atropi
cal forest). Acontinuous random variable takes on values over a continuum (mass or length.
stock market averages, blood levels of a chemical, etc.). Discrete random variables often arise
from counting or classification processes, while continuous random variables often arise from
some sort of measurement process. In either case, the probability functions will depend on the
nature of the random outcome.
Suppose the random variable X is discrete and consider the 'event' that X takes on some
specific value, sayX= m. Then, the values of P[X = m]over all possible values of mdescribe
the probability distribution of X.Standard notation here is fr(m) = P[X = n]; this is called the
probability mass function (or p.m.f.) of X. Asfr(m) is a probability, it must satisfy the various
axioms and rules from Section 2.1.1. Thus, for example, 0 sfr(n) s 1for all arguments m
and e sfrm) = 1, where the sum is taken over all possible values of m in the sample space
S. (The symbol 'E' isread 'is an element of.')
Summing the diserete p.m.f. over increasing values up to mproduces what is called the
cumulative distribution function (or c.d.f.) of X:
E(m) = P[X < m] = ).
As the c.d.f. is itself a probability, it must also satisty the probability rules from
in particular, 0< Fy(m) < 1 for any m. Or, from the Complement Rule (2d), P[X >
Section 2.1.l.
I- P[X < m] = 1 Fx(m). m]=
As it gives cumulative probabilities, the c.d.f. must be a nondecreasing function. In
for discrete random variables, the c.d.f. will typicallyhave the fact,
step function. appearance ofa nondecreasing
Example 2.1.3 Six-sided die roll (Example 2.1.1, continued). Roll a fair, six-sided die and
now formally define the random variable X as the number of 'pips' seen on that roll. As
in Example 2.1.1. the sample space is S = {1,2, ... 6} and seen
is P[X = m] =frm) = 1/6 for any mE S.
because the die is fair, the p.m.f.
The c.d.f. Fy(m) = P[X < m] is simply the cumulative sum of these
up toand including the argument m. So, for example, uniform probabilities
4

Fx(4) =r(m) = 6 + 6
m=l

Notice, however, that if the argument to the c.d.f. is not an element of the
sample space S,
2.1.3 Means, variances, and expected values
The various probability functions for discrete and continuous random variables
introduced in
Section 2. 1.2 provide general characterizations of a variable's probability structure. In
cases, however, it is useful to construct summary measures of the random
many
variable that
sulate its various features. These can be derived from the underlying p.m.f. or p.d.f.encap
The
general form of such a summary measure is called an expected value, and it is based on a
mathematical construct known as an expectation operator, E[·]. In its most general usage, the
expected value ofa function of a random variable, g(X), is defined as

Elg(X)] = 8(mym) (2.5)


mES

for ap.m.f. fy (m) and


ElsX)) = -00
(2.6)

for a p.d.f. fxr). That is, expectation involves summation for discrete random variables and
integration for continuous random variables. In effect, the expected value of a function g(X)
is aweighted average of gX). with weights taken as the probability mass or density of the
random variable X.
2.1.4 Median, quartiles, and quantiles
Another uscful summarization for a random variable Xrelates the values it achieves to the
C.d.I.. Fx(). at those values. Consider. for cxample. the c.d.f. at its middle value. 50%.
One might ask what value of x in the sample space S satisfies Fx(x) = 1/2? The point that
does so is called the median of X. More formally, the median is the quantity ,ES where
above which
P\X < O|> and P[X >O,|>.That is. O, is the point below which and
notation 2, willbe
at least SO% of the probability mass or probabilitydensity rests. (The
explained below.)
discrete distributions,
For most continuous distributions, the median is unique. For many
for a dis
however, its defining equations may not be unambiguously satisfied. For example,
m|=;2 and
crete random variable, Xmay have two adjacent values m, < m, such that P[X <
called the median of
P[X > m,]= In this case, any value of Xbetween m, and m, could be
=;(m + m).
X. If this occurs in practice, Q, is set equal to the midpoint of theinterval. @,
six-sided die, and
Example 2.1.5 Six-sided die roll (Example 2.1.4, continued). Roll a fair,
the sample space is
let X be the number of 'pips' seen on that roll. As seen in Example 2.1.4,
S= {1.2, ... .6} and the p.m.f. is fr(m) =; for any me S. I or X= 2 or X =3]
To find the population median, recognize that P[X < 3]=P[X = PIX >
=P[X = 1]+ P[X =2]+ P[X = 3] = 0.5, because the events are disjoint. Similarly.
including)
4] =P[X =4or X = 5 or X =6] = 0.5. Thus any value of , between (but not
to let ,
m =3and m = 4 would satisfy the definition of a median for X. Simplest here is
ofS. As with the
be the midpoint: , = (3 +4) =3.5. Notice that this is not an element
population mean, the population median need not be an element of the sample space.
the mean
As it characterizes the 'center' of a distribution, O, is used as an alternative to
E[X] to measure X's central tendency. In fact, the two values can be equal - as in Examples
2.1.4 and 2.I.5 although this is not guaranteed. When a random variable exhibits a large
skew, the median will be less influenced than the population mean by the extreme values in
the skewed tail of the distribution. Thus it can be particularly useful for measuring central
tendency with skewed distributions.
One can extend the concept of a median that is, the 50% point of a distribution - to any
and 75%
desired probability point along the range of Fy(x). Two obvious values are the 25%
points. These are known as the first (or lower) andthird (or upper) quartiles of the distribution
andare denoted as Q, and 3, respectively. (The second or middle quartile is just the median,
Q. which explains its notation.) Formally, the first (lower) quartile is defined as the point
Q, eS such that P[X < l 0.25 and P[X > ]>0.75.Similarly. the third (upper) quar
tile is defined as the point @3 such that P[X < Q,l>0.75 and P[X >Ql>0.25.
variable into equal-probability
Quartiles act to separate the distribution of a random desired separalion, so that,
Tourths (hence, their name). This concept can be applied to any
intotenths, andpercentiles
lor example, quintiles separate Sintofifths, deciles separate S quantle of a
into hundredths. Fully generalized to any desired probability point. the pth
distribution is the point q, that satisies P[X <al> pand P[X > q,]> I-P. for O<p<l.
IH the c.d.f. of X, Fya), is continuous and strictly increasing such that it has an inverse
function F). the quantiles can be defined by inverting Fy(): 4, = '(P).

BIVARIATE DISTRIBUTIONS

Let a be a variable that assumes the values f1,22,...,Zn}. Then, a function


that expresses the relative frequency of these values is called a univariate frequency
function. It must be true that

f(z;)>0 for all i and f(r) = 1.


i

The following table provides a trivial example:

f(z)
-1 0.25
1 0.75

Let z and u be variables that assume values in the the


sets T1,T2,...,Cnt and
{y132,. . , ym}, respectively. Then the function f(zi, y;), which gives the relative
frequencies of the occurrence of the pairs (T;, y;) is called a bivariate frequency
function. It must be true that

f(ai, y;) >0 for all ¿ and


Lf,) =1.
An example of a bivariate frequency table is as follows:
BIVARIATE DISTRIBUTIONS
Let r be a variable that assumes the values {21.d9....Tn}. Then. a function
that expresses the relative frequency of these valucs is called a univariate frequency
function. It must be true that

f(ri) >0 for all and f(r:) = 1.

The following table provides a trivial example:


f(r)
-1 0.25
1 0.75

Let z and y be variables that assume values in the the sets {1,T2,...,I,} and
{y1-y2... . .ym}, respectively. Then the function f(r;, yi), which gives the relative
frequencies of the occurrence of the pairs (T;, y;) is called a bivariate frequency
function. It must be true that

f(i, y;)>0 for all i and L f 3}) = 1.

An example of a bivariate frequency table is as follows:


-1 1
-1 0,04 0.01 0.20
0.12 0.03 0.60

The values of f(ri, yi) are within the body of the table.
The marginal frequency function of r gives the relative frequencies of the values
of ; regardless of the values of y; with which thev are associated: and it is defined
by
f(ri)= f(ri.yj): i=1,... ,n.
It follows that

f(ri) >0, and ia) = 8)=1,


The marginal frequency function f(y;) is defined
The bivariate frequency table above providesanalogously.
examples of the two marginal
frequency functions:

f(r=-1) = 0.04 + 0.01 +0.20= 0.25,


f(r= 1)=0.12 +0.03 +0.60 = 0.75.
REGRESSION AND CONDITIONAL EXPECTATIONS
Linear conditional expectations. If . u are correlated, then a knowledge of one
of them enables us to make a better Drediction of the other. This knowledge can
be used in forning
conditional expectations.
some cases, it is reasonable to make the assumption that the conditional
expectation E(ylr) is a linear function of z:
E(y) = at rß. (i)
This function is described asa linear regression equation. The error from predicting
ywebyhave
its conditional expectation can be denoted by e = y- E(ylz); and therefore
y= E(ylz) +e
= atzß +[.
Our object is to express the parameters a and B as functions of the moments of
the joint probability distribution of z and y. Usually, the moments of the distribu
tion can be estimated in a straightforward way from a set of observations on z and
y. Using the relationship that exits between the parameters and the theoretical
moments, we should be able to find estimates for a and ß corresponding to the
estimated moments.
We begin by multiplying equation (i) throughout by f(z), and by integrating
with respect to z. This gives the equation
E(y) = a + BE(), (i)
whence
a= E(y) - BE(). (ii)
These equations shows that the regression line passes through the point E(T,y) =
{E(r), E(u)} which is the expected value of the joint distribution.
REGRESSION AND CONDITIONAL EXPECTATIONS
Lincar conditionalexpectations. If r, y arecorrelatcd, then aknowlcdge of one
of tlem cables us to nmake a better prediction of the other. This kuowlelge can
be sed in formiig conditioual expectations.
In souecases, it is reasonable to make the assumption that the conditional
Cxpetation E(ur) is a lincar function of r:
Eur) =o+ rB. ()
This function is described as a linecar regression cquation. The error from predicting
y by its conditional expectation can be denoted by [ =y- E(ul¢): and therefore
We have
y= E(yl) +[
=0+Tß+[.
Our object is to express the parameters a and ß as functions of the moments of
the joint probability distribution of r and y. Usually, the moments of the distribu
tion can be estimated in a straightforward way from a set of observations on z and
y. Using the relationship that exits between the parameters and the theoretical
moments, we should be able to find estimates for a and corresponding to the
estimated moments.
We begin by multiplying cquation (i) throughout by f(z), and by integrating
with respect to r. This gives the equation

E(y) = a+ BE(r), (ii)


whence
a= E(y) E( ). (iii)

You might also like