Random Variable - Wikipedia
Random Variable - Wikipedia
A random variable (also called random quantity, aleatory variable, or stochastic variable) is a
mathematical formalization of a quantity or object which depends on random events.[1]
This graph shows how random variable is a function from all possible outcomes to real values. It also shows how random
variable is used for defining probability mass functions.
It is common to consider the special cases of discrete random variables and absolutely
continuous random variables, corresponding to whether a random variable is valued in a discrete
set (such as a finite set) or in an interval of real numbers. There are other important possibilities,
especially in the theory of stochastic processes, wherein it is natural to consider random
sequences or random functions. Sometimes a random variable is taken to be automatically
valued in the real numbers, with more general random quantities instead being called random
elements.
According to George Mackey, Pafnuty Chebyshev was the first person "to think systematically in
terms of random variables".[2]
Definition
Standard case
In many cases, is real-valued, i.e. . In some contexts, the term random element (see
extensions) is used to denote a random variable not of this form.
When the image (or range) of is countable, the random variable is called a discrete random
variable[4]: 399 and its distribution is a discrete probability distribution, i.e. can be described by a
probability mass function that assigns a probability to each value in the image of . If the image
is uncountably infinite (usually an interval) then is called a continuous random variable.[5][6] In
the special case that it is absolutely continuous, its distribution can be described by a probability
density function, which assigns probabilities to intervals; in particular, each individual point must
necessarily have probability zero for an absolutely continuous random variable. Not all
continuous random variables are absolutely continuous,[7] a mixture distribution is one such
counterexample; such random variables cannot be described by a probability density or a
probability mass function.
Any random variable can be described by its cumulative distribution function, which describes
the probability that the random variable will be less than or equal to a certain value.
Extensions
The term "random variable" in statistics is traditionally limited to the real-valued case ( ).
In this case, the structure of the real numbers makes it possible to define quantities such as the
expected value and variance of a random variable, its cumulative distribution function, and the
moments of its distribution.
However, the definition above is valid for any measurable space of values. Thus one can
consider random elements of other sets , such as random boolean values, categorical values,
complex numbers, vectors, matrices, sequences, trees, sets, shapes, manifolds, and functions.
One may then specifically refer to a random variable of type , or an -valued random variable.
This more general concept of a random element is particularly useful in disciplines such as
graph theory, machine learning, natural language processing, and other fields in discrete
mathematics and computer science, where one is often interested in modeling the random
variation of non-numerical data structures. In some cases, it is nonetheless convenient to
represent each element of , using one or more real numbers. In this case, a random element
may optionally be represented as a vector of real-valued random variables (all defined on the
same underlying probability space , which allows the different random variables to covary). For
example:
A random word may be represented as a random integer that serves as an index into the
vocabulary of possible words. Alternatively, it can be represented as a random indicator vector,
whose length equals the size of the vocabulary, where the only values of positive probability
are , , and the position of the 1 indicates the word.
Distribution functions
Recording all these probabilities of output ranges of a real-valued random variable yields the
probability distribution of . The probability distribution "forgets" about the particular probability
space used to define and only records the probabilities of various values of . Such a
probability distribution can always be captured by its cumulative distribution function
Examples
Another random variable may be the person's number of children; this is a discrete random
variable with non-negative integer values. It allows the computation of probabilities for individual
integer values – the probability mass function (PMF) – or for sets of values, including infinite
sets. For example, the event of interest may be "an even number of children". For both finite and
infinite event sets, their probabilities can be found by adding up the PMFs of the elements; that
is, the probability of an even number of children is the infinite sum
.
In examples such as these, the sample space is often suppressed, since it is mathematically
hard to describe, and the possible values of the random variables are then treated as a sample
space. But when two random variables are measured on the same sample space of outcomes,
such as the height and number of children being computed on the same random persons, it is
easier to track their relationship if it is acknowledged that both height and number of children
come from the same random person, for example so that questions of whether such random
variables are correlated or not can be posed.
Coin toss
The possible outcomes for one coin toss can be described by the sample space
. We can introduce a real-valued random variable that models a $1
payoff for a successful bet on heads as follows:
If the coin is a fair coin, Y has a probability mass function given by:
Dice roll
If the sample space is the set of possible numbers rolled on two dice, and the random variable of interest is the sum S of
the numbers on the two dice, then S is a discrete random variable whose distribution is described by the probability mass
function plotted as the height of picture columns here.
A random variable can also be used to describe the process of rolling dice and the possible
outcomes. The most obvious representation for the two-dice case is to take the set of pairs of
numbers n1 and n2 from {1, 2, 3, 4, 5, 6} (representing the numbers on the two dices) as the
sample space. The total number rolled (the sum of the numbers in each pair) is then a random
variable X given by the function that maps the pair to the sum:
and (if the dice is fair) has a probability mass function fX given by:
An example of a continuous random variable would be one based on a spinner that can choose
a horizontal direction. Then the values taken by the random variable are directions. We could
represent these directions by North, West, East, South, Southeast, etc. However, it is commonly
more convenient to map the sample space to a random variable which takes values which are
real numbers. This can be done, for example, by mapping a direction to a bearing in degrees
clockwise from North. The random variable then takes values which are real numbers from the
interval [0, 360), with all parts of the range being "equally likely". In this case, X = the angle spun.
Any real number has probability zero of being selected, but a positive probability can be
assigned to any range of values. For example, the probability of choosing a number in [0, 180] is
1⁄ . Instead of speaking of a probability mass function, we say that the probability density of X is
2
1/360. The probability of a subset of [0, 360) can be calculated by multiplying the measure of the
set by 1/360. In general, the probability of a set for a given continuous random variable can be
calculated by integrating the density over the given set.
where the last equality results from the unitarity axiom of probability. The probability density
function of a CURV is given by the indicator function of its interval of support
normalized by the interval's length:
Of particular interest is the uniform distribution on the unit interval . Samples of any
desired probability distribution can be generated by calculating the quantile function of on
a randomly-generated number distributed uniformly on the unit interval. This exploits properties
of cumulative distribution functions, which are a unifying framework for all random variables.
Mixed type
A mixed random variable is a random variable whose cumulative distribution function is neither
discrete nor everywhere-continuous.[8] It can be realized as a mixture of a discrete random
variable and a continuous random variable; in which case the CDF will be the weighted average
of the CDFs of the component variables.[8]
An example of a random variable of mixed type would be based on an experiment where a coin
is flipped and the spinner is spun only if the result of the coin toss is heads. If the result is tails, X
= −1; otherwise X = the value of the spinner as in the preceding example. There is a probability of
1⁄ that this random variable will have the value −1. Other ranges of values would have half the
2
probabilities of the last example.
Most generally, every probability distribution on the real line is a mixture of discrete part, singular
part, and an absolutely continuous part; see Lebesgue's decomposition theorem § Refinement.
The discrete part is concentrated on a countable set, but this set may be dense (like the set of all
rational numbers).
Measure-theoretic definition
The most formal, axiomatic definition of a random variable involves measure theory. Continuous
random variables are defined in terms of sets of numbers, along with functions that map such
sets to probabilities. Because of various difficulties (e.g. the Banach–Tarski paradox) that arise
if such sets are insufficiently constrained, it is necessary to introduce what is termed a sigma-
algebra to constrain the possible sets over which probabilities can be defined. Normally, a
particular such sigma-algebra is used, the Borel σ-algebra, which allows for probabilities to be
defined over any sets that can be derived either directly from continuous intervals of numbers or
by a finite or countably infinite number of unions and/or intersections of such intervals.[9]
When is a topological space, then the most common choice for the σ-algebra is the Borel σ-
algebra , which is the σ-algebra generated by the collection of all open sets in . In such
case the -valued random variable is called an -valued random variable. Moreover, when
the space is the real line , then such a real-valued random variable is called simply a random
variable.
In this case the observation space is the set of real numbers. Recall, is the
probability space. For a real observation space, the function is a real-valued random
variable if
This definition is a special case of the above because the set generates the
Borel σ-algebra on the set of real numbers, and it suffices to check measurability on any
generating set. Here we can prove measurability on this generating set by using the fact that
.
Moments
Moments can only be defined for real-valued functions of random variables (or complex-valued,
etc.). If the random variable is itself real-valued, then moments of the variable itself can be taken,
which are equivalent to moments of the identity function of the random variable.
However, even for non-real-valued random variables, moments can be taken of real-valued
functions of those variables. For example, for a categorical random variable X that can take on
the nominal values "red", "blue" or "green", the real-valued function can be
constructed; this uses the Iverson bracket, and has the value 1 if has the value "green", 0
otherwise. Then, the expected value and other moments of this function can be determined.
A new random variable Y can be defined by applying a real Borel measurable function
to the outcomes of a real-valued random variable . That is, . The cumulative
distribution function of is then
If function is invertible (i.e., exists, where is 's inverse function) and is either
increasing or decreasing, then the previous relation can be extended to obtain
With the same hypotheses of invertibility of , assuming also differentiability, the relation
between the probability density functions can be found by differentiating both sides of the above
expression with respect to , in order to obtain[8]
If there is no invertibility of but each admits at most a countable number of roots (i.e., a
finite, or countably infinite, number of such that ) then the previous relation
between the probability density functions can be generalized with
where , according to the inverse function theorem. The formulas for densities do
not demand to be increasing.
Example 1
If , then , so
If , then
so
Example 2
Example 3
Consider the random variable We can find the density using the above formula for a
change of variables:
In this case the change is not monotonic, because every value of has two corresponding
values of (one positive and negative). However, because of symmetry, both halves will
transform identically, i.e.,
Then,
This is a chi-squared distribution with one degree of freedom.
Example 4
Consider the random variable We can find the density using the above formula for a
change of variables:
In this case the change is not monotonic, because every value of has two corresponding
values of (one positive and negative). Differently from the previous example, in this case
however, there is no symmetry and we have to compute the two distinct terms:
Then,
Some properties
The probability distribution of the sum of two independent random variables is the
convolution of each of their distributions.
Probability distributions are not a vector space—they are not closed under linear
combinations, as these do not preserve non-negativity or total integral 1—but they are closed
under convex combination, thus forming a convex subset of the space of functions (or
measures).
There are several different senses in which random variables can be considered to be
equivalent. Two random variables can be equal, equal almost surely, or equal in distribution.
In increasing order of strength, the precise definition of these notions of equivalence is given
below.
Equality in distribution
If the sample space is a subset of the real line, random variables X and Y are equal in distribution
To be equal in distribution, random variables need not be defined on the same probability space.
Two random variables having equal moment generating functions have the same distribution.
This provides, for example, a useful method of checking equality of certain functions of
independent, identically distributed (IID) random variables. However, the moment generating
function exists only for distributions that have a defined Laplace transform.
Two random variables X and Y are equal almost surely (denoted ) if, and only if, the
probability that they are different is zero:
For all practical purposes in probability theory, this notion of equivalence is as strong as actual
equality. It is associated to the following distance:
where "ess sup" represents the essential supremum in the sense of measure theory.
Equality
Finally, the two random variables X and Y are equal if they are equal as functions on their
measurable space:
This notion is typically the least useful in probability theory because in practice and in theory, the
underlying measure space of the experiment is rarely explicitly characterized or even
characterizable.
Convergence
There are various senses in which a sequence of random variables can converge to a
random variable . These are explained in the article on convergence of random variables.
See also
Aleatoricism
Observable variable
Random element
Random function
Random measure
Random variate
Random vector
Randomness
Stochastic process
References
Inline citations
1. Blitzstein, Joe; Hwang, Jessica (2014). Introduction to Probability. CRC Press. ISBN 9781466575592.
2. George Mackey (July 1980). "Harmonic analysis as the exploitation of symmetry - a historical survey".
Bulletin of the American Mathematical Society. New Series. 3 (1).
4. Yates, Daniel S.; Moore, David S; Starnes, Daren S. (2003). The Practice of Statistics (https://web.archive.
org/web/20050209001108/http://bcs.whfreeman.com/yates2e/) (2nd ed.). New York: Freeman.
ISBN 978-0-7167-4773-4. Archived from the original (http://bcs.whfreeman.com/yates2e/) on 2005-02-
09.
6. Dekking, Frederik Michel; Kraaikamp, Cornelis; Lopuhaä, Hendrik Paul; Meester, Ludolf Erwin (2005). "A
Modern Introduction to Probability and Statistics" (https://doi.org/10.1007/1-84628-168-7) . Springer
Texts in Statistics. doi:10.1007/1-84628-168-7 (https://doi.org/10.1007%2F1-84628-168-7) . ISBN 978-
1-85233-896-1. ISSN 1431-875X (https://www.worldcat.org/issn/1431-875X) .
8. Bertsekas, Dimitri P. (2002). Introduction to Probability. Tsitsiklis, John N., Τσιτσικλής, Γιάννης Ν.
Belmont, Mass.: Athena Scientific. ISBN 188652940X. OCLC 51441829 (https://www.worldcat.org/oclc/5
1441829) .
9. Steigerwald, Douglas G. "Economics 245A – Introduction to Measure Theory" (http://faculty.econ.ucsb.ed
u/~doug/245a/Lectures/Measure%20Theory.pdf) (PDF). University of California, Santa Barbara.
Retrieved April 26, 2013.
Literature
Fristedt, Bert; Gray, Lawrence (1996). A modern approach to probability theory (https://books.google.com/
books?id=5D5O8xyM-kMC) . Boston: Birkhäuser. ISBN 3-7643-3807-5.
Papoulis, Athanasios (1965). Probability, Random Variables, and Stochastic Processes (http://www.mhhe.
com/engcs/electrical/papoulis/) (9th ed.). Tokyo: McGraw–Hill. ISBN 0-07-119981-0.
External links
Zukerman, Moshe (2014), Introduction to Queueing Theory and Stochastic Teletraffic Models (ht
tp://www.ee.cityu.edu.hk/~zukerman/classnotes.pdf) (PDF), arXiv:1307.2968 (https://arxiv.o
rg/abs/1307.2968)
Retrieved from
"https://en.wikipedia.org/w/index.php?
title=Random_variable&oldid=1082169988"
Last edited 23 days ago by Txebixev