0% found this document useful (0 votes)

47 views14 pages

Chapter 2 - Random Variables and Probabi - 2016 - Introduction To Statistical Ma

Uploaded by

Robinson

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

47 views14 pages

Chapter 2 - Random Variables and Probabi - 2016 - Introduction To Statistical Ma

Uploaded by

Robinson

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 14

CHAPTER

RANDOM VARIABLES
AND PROBABILITY
DISTRIBUTIONS 2
CHAPTER CONTENTS
Mathematical Preliminaries . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11
Probability . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13
Random Variable and Probability Distribution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14
Properties of Probability Distributions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16
Expectation, Median, and Mode . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16
Variance and Standard Deviation. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18
Skewness, Kurtosis, and Moments . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19
Transformation of Random Variables . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22

In this chapter, the notions of random variables and probability distributions are
introduced, which form the basis of probability and statistics. Then simple statistics
that summarize probability distributions are discussed.

2.1 MATHEMATICAL PRELIMINARIES

When throwing a six-sided die, the possible outcomes are only 1, 2, 3, 4, 5, 6, and
no others. Such possible outcomes are called sample points and the set of all sample
points is called the sample space.
An event is defined as a subset of the sample space. For example, event A that any
odd number appears is expressed as

A = {1, 3, 5}.

The event with no sample point is called the empty event and denoted by ∅. An
event consisting only of a single sample point is called an elementary event, while
an event consisting of multiple sample points is called a composite event. An event
that includes all possible sample points is called the whole event. Below, the notion
of combining events is explained using Fig. 2.1.
The event that at least one of the events A and B occurs is called the union of
events and denoted by A ∪ B. For example, the union of event A that an odd number
appears and event B that a number less than or equal to three appears is expressed as

A ∪ B = {1, 3, 5} ∪ {1, 2, 3} = {1, 2, 3, 5}.

An Introduction to Statistical Machine Learning. DOI: 10.1016/B978-0-12-802121-7.00013-3

Copyright © 2016 by Elsevier Inc. All rights of reproduction in any form reserved. 11
12 CHAPTER 2 RANDOM VARIABLES AND PROBABILITY DISTRIBUTIONS

(a) Event A (b) Event B (c) Complementary event Ac

(d) Union of events (e) Intersection of events (f) Disjoint events

(g) Distributive law 1 (h) Distributive law 2

(i) De Morgan’s law 1 (j) De Morgan’s law 2

FIGURE 2.1
Combination of events.

On the other hand, the event that both events A and B occur simultaneously is called
the intersection of events and denoted by A ∩ B. The intersection of the above events
A and B is given by

A ∩ B = {1, 3, 5} ∩ {1, 2, 3} = {1, 3}.

If events A and B never occur at the same time, i.e.,

A ∩ B = ∅,
2.2 PROBABILITY 13

events A and B are called disjoint events. The event that an odd number appears
and the event that an even number appears cannot occur simultaneously and thus are
disjoint. For events A, B, and C, the following distributive laws hold:

(A ∪ B) ∩ C = (A ∩ C) ∪ (B ∩ C),
(A ∩ B) ∪ C = (A ∪ C) ∩ (B ∪ C).

The event that event A does not occur is called the complementary event of A and
denoted by Ac . The complementary event of the event that an odd number appears is
that an odd number does not appear, i.e., an even number appears. For the union and
intersection of events A and B, the following De Morgan’s laws hold:

(A ∪ B)c = Ac ∩ B c ,
(A ∩ B)c = Ac ∪ B c .

2.2 PROBABILITY
Probability is a measure of likeliness that an event will occur and the probability that
event A occurs is denoted by Pr(A). A Russian mathematician, Kolmogorov, defined
the probability by the following three axioms as abstraction of the evident properties
that the probability should satisfy.
1. Non-negativity: For any event Ai ,

0 ≤ Pr(Ai ) ≤ 1.

2. Unitarity: For entire sample space Ω,

Pr(Ω) = 1.

3. Additivity: For any countable sequence of disjoint events A1 , A2 , . . .,

Pr(A1 ∪ A2 ∪ · · · ) = Pr(A1 ) + Pr(A2 ) + · · · .

From the above axioms, events A and B are shown to satisfy the following
additive law:

Pr(A ∪ B) = Pr(A) + Pr(B) − Pr(A ∩ B).

This can be extended to more than two events: for events A, B, and C,

Pr(A ∪ B ∪ C) = Pr(A) + Pr(B) + Pr(C)

− Pr(A ∩ B) − Pr(A ∩ C) − Pr(B ∩ C)
+ Pr(A ∩ B ∩ C).
14 CHAPTER 2 RANDOM VARIABLES AND PROBABILITY DISTRIBUTIONS

FIGURE 2.2
Examples of probability mass function. Outcome
of throwing a fair six-sided dice (discrete uniform
distribution U{1, 2, . . . , 6}).

2.3 RANDOM VARIABLE AND PROBABILITY

DISTRIBUTION
A variable is called a random variable if probability is assigned to each realization
of the variable. A probability distribution is the function that describes the mapping
from any realized value of the random variable to probability.
A countable set is a set whose elements can be enumerated as 1, 2, 3, . . .. A
random variable that takes a value in a countable set is called a discrete random
variable. Note that the size of a countable set does not have to be finite but can be
infinite such as the set of all natural numbers. If probability for each value of discrete
random variable x is given by

Pr(x) = f (x),

f (x) is called the probability mass function. Note that f (x) should satisfy

∀x, f (x) ≥ 0, and f (x) = 1.
x

The outcome of throwing a fair six-sided die, x ∈ {1, 2, 3, 4, 5, 6}, is a discrete random
variable, and its probability mass function is given by f (x) = 1/6 (Fig. 2.2).
A random variable that takes a continuous value is called a continuous random
variable. If probability that continuous random variable x takes a value in [a, b] is
given by
 b
Pr(a ≤ x ≤ b) = f (x)dx, (2.1)
a
2.3 RANDOM VARIABLE AND PROBABILITY DISTRIBUTION 15

(a) Probability density function f (x) (b) Cumulative distribution function F(x)

FIGURE 2.3
Example of probability density function and its cumulative distribution function.

f (x) is called a probability density function (Fig. 2.3(a)). Note that f (x) should satisfy

∀x, f (x) ≥ 0, and f (x)dx = 1.

For example, the outcome of spinning a roulette, x ∈ [0, 2π), is a continuous random
variable, and its probability density function is given by f (x) = 1/(2π). Note that
Eq. (2.1) also has an important implication, i.e., the probability that continuous
random variable x exactly takes value b is actually zero:
 b
Pr(b ≤ x ≤ b) = f (x)dx = 0.
b
Thus, the probability that the outcome of spinning a roulette is exactly a particular
angle is zero.
The probability that continuous random variable x takes a value less than or equal
to b,
 b
F(b) = Pr(x ≤ b) = f (x)dx,
−∞
is called the cumulative distribution function (Fig. 2.3(b)). The cumulative distribu-
tion function F satisfies the following properties:
• Monotone nondecreasing: x < x ′ implies F(x) ≤ F(x ′).
• Left limit: lim x→−∞ F(x) = 0.
• Right limit: lim x→+∞ F(x) = 1.
If the derivative of a cumulative distribution function exists, it agrees with the
probability density function:
F ′(x) = f (x).
16 CHAPTER 2 RANDOM VARIABLES AND PROBABILITY DISTRIBUTIONS

FIGURE 2.4
Expectation is the average of x weighted according to f (x), and median
is the 50% point both from the left-hand and right-hand sides. α-quantile
for 0 ≤ α ≤ 1 is a generalization of the median that gives the 100α%
point from the left-hand side. Mode is the maximizer of f (x).

Pr(a ≤ x) is called the upper-tail probability or the right-tail probability, while

Pr(x ≤ b) is called the lower-tail probability or the left-tail probability. The upper-tail
and lower-tail probabilities together are called the two-sided probability, and either
of them is called a one-sided probability.

2.4 PROPERTIES OF PROBABILITY DISTRIBUTIONS

When discussing properties of probability distributions, it is convenient to have
simple statistics that summarize probability mass/density functions. In this section,
such statistics are introduced.

2.4.1 EXPECTATION, MEDIAN, AND MODE

The expectation is the value that a random variable is expected to take (Fig. 2.4). The
expectation of random variable x, denoted by E[x], is defined as the average of x
weighted according to probability mass/density function f (x):

Discrete: E[x] = x f (x),
x

Continuous: E[x] = x f (x)dx.

Note that, as explained in Section 4.5, there are probability distributions such as the
Cauchy distribution that the expectation does not exist (diverges to infinity).
The expectation can be defined for any function ξ of x similarly:

Discrete: E[ξ(x)] = ξ(x) f (x),
x
2.4 PROPERTIES OF PROBABILITY DISTRIBUTIONS 17

FIGURE 2.5
Income distribution. The expectation is 62.1 thousand dollars, while the
median is 31.3 thousand dollars.


Continuous: E[ξ(x)] = ξ(x) f (x)dx.

For constant c, the expectation operator E satisfies the following properties:

E[c] = c,
E[x + c] = E[x] + c,
E[cx] = cE[x].

Although the expectation represents the “center” of a probability distribution, it

can be quite different from what is intuitively expected in the presence of outliers. For
example, in the income distribution illustrated in Fig. 2.5, because one person earns
1 million dollars, all other people are below the expectation, 62.1 thousand dollars.
In such a situation, the median is more appropriate than the expectation, which is
defined as b such that

Pr(x ≤ b) = 1/2.

That is, the median is the “center” of a probability distribution in the sense that it is
the 50% point both from the left-hand and right-hand sides. In the example of Fig. 2.5,
the median is 31.3 thousand dollars and it is indeed in the middle of everybody.
The α-quantile for 0 ≤ α ≤ 1 is a generalization of the median that gives b such
that

Pr(x ≤ b) = α.

That is, the α-quantile gives the 100α% point from the left-hand side (Fig. 2.4) and
is reduced to the median when α = 0.5.
18 CHAPTER 2 RANDOM VARIABLES AND PROBABILITY DISTRIBUTIONS

Let us consider a probability density function f defined on a finite interval [a, b].
Then the minimizer of the expected squared error, defined by

   b
E (x − y) 2
= (x − y)2 f (x)dx,
a

with respect to y is shown to agree with the expectation of x. Similarly, the minimizer
y of the expected absolute error, defined by

   b
E |x − y| = |x − y| f (x)dx, (2.2)
a

with respect to y is shown to agree with the expectation of x. Furthermore, a weighted

variant of Eq. (2.2),

 
b
 (1 − α)(x − y) (x > y),



|x − y|α f (x)dx, |x − y|α = 
a

 α(y − x)


(x ≤ y),

is minimized with respect to y by the α-quantile of x.
Another popular statistic is the mode, which is defined as the maximizer of f (x)
(Fig. 2.4).

2.4.2 VARIANCE AND STANDARD DEVIATION

Although the expectation is a useful statistic to characterize probability distributions,
probability distributions can be different even when they share the same expectation.
Here, another statistic called the variance is introduced to represent the spread of the
probability distribution.
The variance of random variable x, denoted by V [x], is defined as
 
V [x] = E (x − E[x])2 .

In practice, expanding the above expression as

 
V [x] = E x 2 − 2xE[x] + (E[x])2 = E[x 2 ] − (E[x])2

often makes the computation easier. For constant c, variance operator V satisfies the
following properties:

V [c] = 0,
V [x + c] = V [x],
V [cx] = c2V [x].

Note that these properties are quite different from those of the expectation.
2.4 PROPERTIES OF PROBABILITY DISTRIBUTIONS 19

The square root of the variance is called the standard deviation and is denoted by
D[x]:

D[x] = V [x].


Conventionally, the variance and the standard deviation are denoted by σ 2 and σ,
respectively.

2.4.3 SKEWNESS, KURTOSIS, AND MOMENTS

In addition to the expectation and variance, higher-order statistics such as the
skewness and kurtosis are also often used. The skewness and kurtosis represent
asymmetry and sharpness of probability distributions, respectively, and defined as
E (x − E[x])3

Skewness: ,
(D[x])3
E (x − E[x])4

Kurtosis: − 3.
(D[x])4
(D[x])3 and (D[x])4 in the denominators are for normalization purposes and −3
included in the definition of the kurtosis is to zero the kurtosis of the normal
distribution (see Section 4.2). As illustrated in Fig. 2.6, the right tail is longer than
the left tail if the skewness is positive, while the left tail is longer than the right tail
if the skewness is negative. The distribution is perfectly symmetric if the skewness is
zero. As illustrated in Fig. 2.7, the probability distribution is sharper than the normal
distribution if the kurtosis is positive, while the probability distribution is duller than
the normal distribution if the kurtosis is positive.
The above discussions imply that the statistic,
 
νk = E (x − E[x])k ,

plays an important role in characterizing probability distributions. νk is called the kth

moment about the expectation, while

µk = E[x k ]

is called the kth moment about the origin. The expectation, variance, skewness, and
kurtosis can be expressed by using µk as

Expectation: µ1 ,
Variance: µ2 − µ21 ,
µ3 − 3µ2 µ1 + 2µ31
Skewness: 3
,
(µ2 − µ21 ) 2
µ4 − 4µ3 µ1 + 6µ2 µ21 − 3µ41
Kurtosis: − 3.
(µ2 − µ21 )2
20 CHAPTER 2 RANDOM VARIABLES AND PROBABILITY DISTRIBUTIONS

(a) Skewness: −0.32 (b) Skewness: 0 (c) Skewness: 0.32

FIGURE 2.6
Skewness.

(a) Kurtosis: −1.2 (b) Kurtosis: 0 (c) Kurtosis: 3

FIGURE 2.7
Kurtosis.

Probability distributions will be more constrained if the expectation, variance,

skewness, and kurtosis are specified. As the limit, if the moments of all orders
are specified, the probability distribution is uniquely determined. The moment-
generating function allows us to handle the moments of all orders in a systematic
way:


e t x f (x)

(Discrete),





 x
Mx (t) = E[e t x ] = 



 
 e t x f (x)dx

(Continuous).




Indeed, substituting zero to the kth derivative of the moment-generating function with
(k)
respect to t, Mx (t), gives the kth moment:
(k)
Mx (0) = µk .

Below, this fact is proved.

2.4 PROPERTIES OF PROBABILITY DISTRIBUTIONS 21

The value of function g at point t can be expressed as

g ′(0) g ′′(0)
g(t) = g(0) + t + t2 +··· .
1! 2!
If higher-order terms in the right-hand side are ignored and the infinite sum
is approximated by a finite sum, an approximation to g(t) can be obtained.
When only the first-order term g(0) is used, g(t) is simply approximated
by g(0), which is too rough. However, when the second-order term tg ′(0)
is included, the approximation gets better, as illustrated below. By further
including higher-order terms, the approximation gets more accurate and
converges to g(t) if all terms are included.

FIGURE 2.8
Taylor series expansion at the origin.

Given that the kth derivative of function e t x with respect to t is x k e t x , the Taylor
series expansion (Fig. 2.8) of function e t x at the origin with respect to t yields
(t x)2 (t x)3
e t x = 1 + (t x) + + +··· .
2! 3!
Taking the expectation of both sides gives
µ2 µ3
E[e t x ] = Mx (t) = 1 + t µ1 + t 2 + t3 +··· .
2! 3!
Taking the derivative of both sides yields
µ3 µ4
Mx′ (t) = µ1 + µ2 t + t 2 + t 3 + · · · ,
2! 3!
µ4 2 µ5 3
Mx (t) = µ2 + µ3 t + t + t + · · · ,
′′
2! 3!
..
.
(k) µk+2 2 µk+3 3
Mx (t) = µk + µk+1 t + t + t +··· .
2! 3!
(k)
Substituting zero into this gives Mx (0) = µk .
22 CHAPTER 2 RANDOM VARIABLES AND PROBABILITY DISTRIBUTIONS

Depending on probability distributions, the moment-generating function does not

exist (diverges to infinity). On the other hand, its sibling called the characteristic
function always exists:

φ x (t) = Mi x (t) = Mx (it),

where i denotes the imaginary unit such that i 2 = −1. The characteristic function
corresponds to the Fourier transform of a probability density function.

2.5 TRANSFORMATION OF RANDOM VARIABLES

If random variable x is transformed as

r = ax + b,

the expectation and variance of r are given by

E[r] = aE[x] + b and V [r] = a2V [x].

1 E[x]
Setting a = and b = − yields
D[x] D[x]

x E[x] x − E[x]
z= − = ,
D[x] D[x] D[x]

which has expectation 0 and variance 1. This transformation from x to z is called

standardization.
Suppose that random variable x that has probability density f (x) defined on X is
obtained by using transformation ξ as

x = ξ(r).

probability density function of z is not simply given by f ξ(r) , because

Then the
f ξ(r) is not integrated to 1 in general. For example, when x is the height of a
person in centimeter and r is its transformation in meter, f ξ(r) should be divided
by 100 to be integrated to 1.
More generally, as explained in Fig. 2.9, if the Jacobian dx
dr is not zero, the scale
should be adjusted by multiplying the absolute Jacobian as

dx
g(r) = f ξ(r) .
dr

g(r) is integrated to 1 for any transform x = ξ(r) such that dx

dr , 0.
2.5 TRANSFORMATION OF RANDOM VARIABLES 23

Integration of function f (x) over X can be expressed by using function g(r)

on R such that

x = g(r) and X = g(R)

as
 
dx
f (x)dx = f g(r) dr.
X R dr
This allows us to change variables of integration from x to r. dx

dr in the right-
hand side corresponds to the ratio of lengths when variables of integration are
changed from x to r. For example, for

f (x) = x and X = [2, 3],

integration of function f (x) over X is computed as

  3  3
1 2 5
f (x)dx = xdx = x = .
X 2 2 2 2

On the other hand, g(r) = r 2 yields

√ √ dx
R = [ 2, 3], f g(r) = r 2 , and = 2r.

dr
This results in
  √3   √3
dx 1 5
f g(r) dr = √ r 2 · 2rdr = r 4 √ = .
R dr
2 2 2 2

FIGURE 2.9
One-dimensional change of variables in integration. For multidimensional cases, see Fig. 4.2.

For linear transformation

r = ax + b and a , 0,
r−b dx 1
x= yields = , and thus
a dr a
 
1 r−b
g(r) = f
|a| a
is obtained.

ALL ST218 Lecture Notes
No ratings yet
ALL ST218 Lecture Notes
87 pages
Chapter3-Probability Distribution
100% (1)
Chapter3-Probability Distribution
35 pages
Chapter 3
No ratings yet
Chapter 3
26 pages
Probability FoundationalMathofAI S24
No ratings yet
Probability FoundationalMathofAI S24
7 pages
A 18-Page Statistics & Data Science Cheat Sheets
No ratings yet
A 18-Page Statistics & Data Science Cheat Sheets
18 pages
STAT 516 Course Notes Part 0: Review of STAT 515: 1 Probability
No ratings yet
STAT 516 Course Notes Part 0: Review of STAT 515: 1 Probability
21 pages
Basic Probability Review
No ratings yet
Basic Probability Review
77 pages
SI Chapter-1
No ratings yet
SI Chapter-1
30 pages
Lecture 2
No ratings yet
Lecture 2
70 pages
Lecture 3 - Statistics
No ratings yet
Lecture 3 - Statistics
16 pages
Discrete Random Variable
No ratings yet
Discrete Random Variable
41 pages
Random Variables
No ratings yet
Random Variables
44 pages
Chapter 4
80% (5)
Chapter 4
21 pages
Probability
No ratings yet
Probability
69 pages
0.1. Probability Review
No ratings yet
0.1. Probability Review
6 pages
Chapter 3
No ratings yet
Chapter 3
35 pages
Module 1
No ratings yet
Module 1
12 pages
Chapter 3: Random Variables and Probability Distributions This Chapter Is All About
No ratings yet
Chapter 3: Random Variables and Probability Distributions This Chapter Is All About
8 pages
340 Printable Course Notes
No ratings yet
340 Printable Course Notes
184 pages
Random Variable and Mathematical Expectation
No ratings yet
Random Variable and Mathematical Expectation
9 pages
Introduction To Probability: 2.1 Random Variable
No ratings yet
Introduction To Probability: 2.1 Random Variable
4 pages
MM3&4 - Probability and Distributions Summary Notes
No ratings yet
MM3&4 - Probability and Distributions Summary Notes
31 pages
Intro To Probability (Pattern Recognition)
No ratings yet
Intro To Probability (Pattern Recognition)
94 pages
Probability and Statistics II MAY 2023
No ratings yet
Probability and Statistics II MAY 2023
51 pages
STAT515 Lecture
No ratings yet
STAT515 Lecture
85 pages
CHP 5
No ratings yet
CHP 5
63 pages
Introductory Probability and The Central Limit Theorem
No ratings yet
Introductory Probability and The Central Limit Theorem
11 pages
Probability Distributions: Values That The Random Variable Can Take On. Thus, The Expression P (X X) Symbolizes The
No ratings yet
Probability Distributions: Values That The Random Variable Can Take On. Thus, The Expression P (X X) Symbolizes The
6 pages
Probability Review
No ratings yet
Probability Review
12 pages
Main
No ratings yet
Main
24 pages
Chapter 2
No ratings yet
Chapter 2
8 pages
Econ-2042 - Unit 2-HO
No ratings yet
Econ-2042 - Unit 2-HO
12 pages
3 Prob-Review
No ratings yet
3 Prob-Review
77 pages
MAS 102 - Topic 1
No ratings yet
MAS 102 - Topic 1
13 pages
RVSP Notes
89% (9)
RVSP Notes
123 pages
Unit II
No ratings yet
Unit II
140 pages
Sub BAB 4.1
No ratings yet
Sub BAB 4.1
4 pages
All Cheat Shests 1749903425
No ratings yet
All Cheat Shests 1749903425
3 pages
ML DL AI Cheatsheet
No ratings yet
ML DL AI Cheatsheet
52 pages
Prepared By: Mohammad Saifuddin: Discrete or Continuous
No ratings yet
Prepared By: Mohammad Saifuddin: Discrete or Continuous
7 pages
Ch.3 - Ch.4 - Ch.5 RV-PD - Part I
No ratings yet
Ch.3 - Ch.4 - Ch.5 RV-PD - Part I
82 pages
Week05-06 EC With Annotations
No ratings yet
Week05-06 EC With Annotations
84 pages
DRV
No ratings yet
DRV
12 pages
8366probability Summary Sheet
No ratings yet
8366probability Summary Sheet
4 pages
Lesson - 12
0% (1)
Lesson - 12
38 pages
IARE P&S Lecture Notes 0
No ratings yet
IARE P&S Lecture Notes 0
71 pages
ProbabilityStatistics Probability2
No ratings yet
ProbabilityStatistics Probability2
11 pages
Ch-04 - Random Variables and Their Properties
No ratings yet
Ch-04 - Random Variables and Their Properties
32 pages
Chap2 Discrete Distributions
No ratings yet
Chap2 Discrete Distributions
22 pages
Probability Slides
No ratings yet
Probability Slides
45 pages
CHAPTER TWO (2) S
No ratings yet
CHAPTER TWO (2) S
69 pages
PRP - Unit 2
No ratings yet
PRP - Unit 2
41 pages
Stats 210 Course Book
No ratings yet
Stats 210 Course Book
200 pages
Orientation - Basic Mathematics and Statistics - Probability
No ratings yet
Orientation - Basic Mathematics and Statistics - Probability
48 pages
Probability: Totalfavourable Events Total Number of Experiments
No ratings yet
Probability: Totalfavourable Events Total Number of Experiments
39 pages
Problems in Probability Theory, Mathematical Statistics and Theory of Random Functions
From Everand
Problems in Probability Theory, Mathematical Statistics and Theory of Random Functions
A. A. Sveshnikov
3/5 (3)
Worked Examples in Mathematics for Scientists and Engineers
From Everand
Worked Examples in Mathematics for Scientists and Engineers
G. Stephenson
No ratings yet
Calculus I Essentials
From Everand
Calculus I Essentials
Editors of REA
1/5 (1)
Group Theory I Essentials
From Everand
Group Theory I Essentials
Emil Milewski
No ratings yet
Geometric functions in computer aided geometric design
From Everand
Geometric functions in computer aided geometric design
Oscar Ruiz
No ratings yet
Woobles - Three Peas in A Pod
No ratings yet
Woobles - Three Peas in A Pod
13 pages
Government Polytechnic College Vechoochira, Pathanamthitta: Ajumal Anish Reg No: 20130611
No ratings yet
Government Polytechnic College Vechoochira, Pathanamthitta: Ajumal Anish Reg No: 20130611
26 pages
NX902 - Morphing Objects Using OmniCAD
No ratings yet
NX902 - Morphing Objects Using OmniCAD
7 pages
Be1227 Inductica 2012 Paper MDL PDF
No ratings yet
Be1227 Inductica 2012 Paper MDL PDF
8 pages
DBAF396689
No ratings yet
DBAF396689
562 pages
CPR Compact Position Report Format Faa
No ratings yet
CPR Compact Position Report Format Faa
10 pages
Final Project
No ratings yet
Final Project
40 pages
TLE-CSS Grade9 Q1 LAS1
No ratings yet
TLE-CSS Grade9 Q1 LAS1
6 pages
6.03 Trigonometric Ratio Worksheet
No ratings yet
6.03 Trigonometric Ratio Worksheet
12 pages
1.4. Linear Dependence and Independence-1 PDF
No ratings yet
1.4. Linear Dependence and Independence-1 PDF
4 pages
MP-0-0800-89 Rev-00 Aug-2009 ZDM3 Loco For Kalka Shimla Section
No ratings yet
MP-0-0800-89 Rev-00 Aug-2009 ZDM3 Loco For Kalka Shimla Section
36 pages
CST 303
No ratings yet
CST 303
19 pages
Nano Sweep BT
No ratings yet
Nano Sweep BT
38 pages
Vertical In-Line Pump: Design Envelope, IVS & Conventional Parts List
No ratings yet
Vertical In-Line Pump: Design Envelope, IVS & Conventional Parts List
22 pages
Capture One 9.3 Release Notes (Rev 1.6)
No ratings yet
Capture One 9.3 Release Notes (Rev 1.6)
37 pages
Ni6hant Lenovo Laptop Prices
No ratings yet
Ni6hant Lenovo Laptop Prices
10 pages
Chemistry Food Project Planning Document - Hot Chocolate
No ratings yet
Chemistry Food Project Planning Document - Hot Chocolate
3 pages
Hepatitis
100% (1)
Hepatitis
8 pages
End of Chapter 2 Test
No ratings yet
End of Chapter 2 Test
4 pages
1786 Amendment 4
No ratings yet
1786 Amendment 4
2 pages
From Toxic To Pure Harnessing Renewable Energy For Non Conventional Water Remediation and Green Hydrogen Production
No ratings yet
From Toxic To Pure Harnessing Renewable Energy For Non Conventional Water Remediation and Green Hydrogen Production
10 pages
MPMC LAB Manual R20
No ratings yet
MPMC LAB Manual R20
49 pages
PIC16 (L) F15354 - 55 Data Sheet 40001853C-1314298
No ratings yet
PIC16 (L) F15354 - 55 Data Sheet 40001853C-1314298
539 pages
IC Design of Power Management Circuits (I)
No ratings yet
IC Design of Power Management Circuits (I)
40 pages
Military Chants For Criminology.
No ratings yet
Military Chants For Criminology.
3 pages
Analysis of Longitudinal Data (Oxford Statistical Science Series), 2nd Edition Accessible DOCX Download
100% (8)
Analysis of Longitudinal Data (Oxford Statistical Science Series), 2nd Edition Accessible DOCX Download
15 pages
Msds Coagulation
No ratings yet
Msds Coagulation
54 pages
Urban Studies Case Study-Townships: Location
No ratings yet
Urban Studies Case Study-Townships: Location
10 pages
Gen.-Physics-1 Ch-46 Plus Week 6 Complete 10 Pages.v.1.0
No ratings yet
Gen.-Physics-1 Ch-46 Plus Week 6 Complete 10 Pages.v.1.0
10 pages
Transportation Management Module
No ratings yet
Transportation Management Module
58 pages

Chapter 2 - Random Variables and Probabi - 2016 - Introduction To Statistical Ma

Uploaded by

Chapter 2 - Random Variables and Probabi - 2016 - Introduction To Statistical Ma

Uploaded by

CHAPTER

2.1 MATHEMATICAL PRELIMINARIES

A ∪ B = {1, 3, 5} ∪ {1, 2, 3} = {1, 2, 3, 5}.

An Introduction to Statistical Machine Learning. DOI: 10.1016/B978-0-12-802121-7.00013-3

(a) Event A (b) Event B (c) Complementary event Ac

(d) Union of events (e) Intersection of events (f) Disjoint events

(g) Distributive law 1 (h) Distributive law 2

(i) De Morgan’s law 1 (j) De Morgan’s law 2

A ∩ B = {1, 3, 5} ∩ {1, 2, 3} = {1, 3}.

If events A and B never occur at the same time, i.e.,

2. Unitarity: For entire sample space Ω,

3. Additivity: For any countable sequence of disjoint events A1 , A2 , . . .,

Pr(A1 ∪ A2 ∪ · · · ) = Pr(A1 ) + Pr(A2 ) + · · · .

Pr(A ∪ B) = Pr(A) + Pr(B) − Pr(A ∩ B).

Pr(A ∪ B ∪ C) = Pr(A) + Pr(B) + Pr(C)

2.3 RANDOM VARIABLE AND PROBABILITY

Pr(a ≤ x) is called the upper-tail probability or the right-tail probability, while

2.4 PROPERTIES OF PROBABILITY DISTRIBUTIONS

2.4.1 EXPECTATION, MEDIAN, AND MODE

For constant c, the expectation operator E satisfies the following properties:

Although the expectation represents the “center” of a probability distribution, it

with respect to y is shown to agree with the expectation of x. Furthermore, a weighted

2.4.2 VARIANCE AND STANDARD DEVIATION

In practice, expanding the above expression as

2.4.3 SKEWNESS, KURTOSIS, AND MOMENTS

plays an important role in characterizing probability distributions. νk is called the kth

(a) Skewness: −0.32 (b) Skewness: 0 (c) Skewness: 0.32

(a) Kurtosis: −1.2 (b) Kurtosis: 0 (c) Kurtosis: 3

Probability distributions will be more constrained if the expectation, variance,

Below, this fact is proved.

The value of function g at point t can be expressed as

Depending on probability distributions, the moment-generating function does not

φ x (t) = Mi x (t) = Mx (it),

2.5 TRANSFORMATION OF RANDOM VARIABLES

the expectation and variance of r are given by

E[r] = aE[x] + b and V [r] = a2V [x].

which has expectation 0 and variance 1. This transformation from x to z is called

probability density function of z is not simply given by f ξ(r) , because

g(r) is integrated to 1 for any transform x = ξ(r) such that dx

Integration of function f (x) over X can be expressed by using function g(r)

x = g(r) and X = g(R)

f (x) = x and X = [2, 3],

integration of function f (x) over X is computed as

On the other hand, g(r) = r 2 yields

For linear transformation

You might also like