What Is Probability
What Is Probability
Somehow probability models and probability distribution are terms that can be use interchanable.
In probability theory and statistics, a probability distribution identifies either the probability of
each value of a random variable (when the variable is discrete), or the probability of the value
falling within a particular interval (when the variable is continuous).[1] The probability
distribution describes the range of possible values that a random variable can attain and the
probability that the value of the random variable is within any (measurable) subset of that range.
When the random variable takes values in the set of real numbers, the probability distribution is
completely described by the cumulative distribution function, whose value at each real x is the
probability that the random variable is smaller than or equal to x.
There are various probability distributions that show up in various different applications. Two of
the most important ones are the normal distribution and the categorical distribution. The normal
distribution, also known as the Gaussian distribution, has a familiar "bell curve" shape and
approximates many different naturally occurring distributions over real numbers. The categorical
distribution describes the result of an experiment with a fixed, finite number of outcomes. For
example, the toss of a fair coin is a categorical distribution, where the possible outcomes are
heads and tails, each with probability 1/2.
For example, if the skewness (which measures the deviation of the distribution from symmetry)
is clearly different from 0, then that distribution is asymmetrical, while normal distributions are
perfectly symmetrical. If the kurtosis (which measures "peakedness" of the distribution) is
clearly different from 0, then the distribution is either flatter or more peaked than normal; the
kurtosis of the normal distribution is 0.
In probability theory and statistics, a probability mass function (pmf) is a function that gives
the probability that a discrete random variable is exactly equal to some value. The probability
mass function is often the primary means of defining a discrete probability distribution, and such
functions exist for either scalar or multivariate random variables, given that the distribution is
discrete.
A probability mass function differs from a probability density function (pdf) in that the values of
a pdf, defined only for continuous random variables, are not probabilities as such. Instead, the
integral of a pdf over a range of possible values (a,b] gives the probability of the random variable
falling within that range. The notation (a,b] is a standard form of interval notation and has a
specific meaning: the value a is excluded from the interval, while the value b is included.
If we repeat an experiment a large number of times and construct the relative frequency histogram h(x),
we would anticipate that h(x) would be equal to f(x). We call this function, f(x), the probability mass
function.
What is probability?.
Probability is a number. Is a number coming from the ratio of two numbers. One number represent the
amount of certain events. The other represent the amount of total events.
Normal distributions are symmetrical with a single central peak at the mean (average)
of the data. The shape of the curve is described as bell-shaped with the graph falling
off evenly on either side of the mean. Fifty percent of the distribution lies to the left
of the mean and fifty percent lies to the right of the mean.
The mean and the median are the same in a normal distribution.
Chart prepared by the NY State Education Department
Reading from the chart, we see that approximately 19.1% of normally distributed data
is located between the mean (the peak) and 0.5 standard deviations to the right (or
left) of the mean.
(The percentages are represented by the area under the curve.)
s.d. in callout boxes = standard deviation
If you are asked for the interval about the mean containing 50% of the data, you are
actually being asked for the interquartile range, IQR. The IQR (the width of an
interval which contains the middle 50% of the data set) is normally computed by
subtracting the first quartile from the third quartile. In a normal distribution (with
mean 0 and standard deviation 1), the first and third quartiles are located at -0.67448
and +0.67448 respectively. Thus the IQR for a normal distribution is:
Percentiles
and the Normal Curve
The mean (at the
center peak of the
curve) is the 50%
percentile.
1. Find the percentage of the normally distributed data that lies within 2 standard
deviations of the mean.
Solution: Read the percentages from the chart at the top of this page from -2 to +2
standard deviations.
4.4% + 9.2% + 15.0% + 19.1% + 19.1% + 15.0% + 9.2% + 4.4% = 95.4%
2. At the New Age Information Corporation, the
ages of all new employees hired during the last 5
years are normally distributed. Within this curve,
95.4% of the ages, centered about the mean, are
between 24.6 and 37.4 years. Find the mean age
and the standard deviation of the data.
The most accurate answer to a problem such as this cannot be obtained by using the
chart at the top of this page. One standard deviation above the mean would be located at
41.2 hours, 2 standard deviations would be at 42.4, and one and one-half standard
deviations would be at 41.8 standard deviations. None of these locations corresponds
exactly to the needed 42 hours. We need more power than we have in the chart to find
the most accurate answer. Calculator to the rescue!!
Solution: Graph the
normal curve. We see
from the location of 42
on the graph that the
answer is going to be
quite small.
Now, determine the probability of a value falling to the right of 42 hours (between
42 hours and infinity). Answer: 4.779%
See how to
use your TI-
83+/TI-84+
graphing
calculator
with normal
distributions
.
Click
calculator.