RANDOM VARIABLES AND PROBABILITY DISTRIBUTIONS
STATISTICS is the science of conducting studies to collect, organize,
summarize, analyze, and draw conclusions from data.
BASIC TERMS IN STATISTICS
DATA - the values that are variables can assume.
VARIABLE - is a characteristic or attribute that can assume different values.
POPULATION - Is the set of all possible values of a variable.
SAMPLE - Is a subgroup of a population.
A RANDOM VARIABLE is unknown value or a function that assigns values
to each of an experiment's outcomes. Random variables are often designated by
letters and can be classified as discrete, which are variables that have specific
values, or continuous, which are variables that can have any values within a
continuous range.
In statistics and probability, random variables are used to quantify outcomes of
a random occurrence, and therefore, can take on many values. are required to
be measurable and are typically real numbers.
-is a variable whose values are determined by chance.
QUALITATIVE VARIABLES - Words or codes that represent a class or
category.
- Express a categorical attributes like gender, religion marital status, highest
educational attainment.
QUANTITATIVE VARIABLES - Number that represent an amount or a
count.
- Numerical data, sizes are meaningful and answer questions such as “ how
many” or how much”, like height, weight, household size and number of
registered car.
DISCRETE- Data that can be counted. – No. of days, No. of siblings, No. of
students.
CONTINUOS- Data can be measured. -Weight, -Height, - Temperature, -Speed
DISCRETE PROBABILITY DISTRIBUTION
Consists of the values a random variable can assume and the corresponding
probabilities of the values. The probabilities are determined theoretically or by
observation.
Also known as probability mass function.
PROPERTIES OF PROBABILITY DISTRIBUTIONS OF DISCRETE
RANDOM VARIABLES
1.The probability of each value of the random variable must be between or
equal to 0 and 1. In symbol, we write it as 0≤P(X) ≤1.
2.The sum of the probabilities of all values of the random variable must be
equal to 1. In symbol, we write it as ΣP(X) = 1.
CONSTRUCTING PROBABILITY MASS FUNCTION OF A DISCRETE
RANDOM VARIABLE AND ITS CORRESPONDING HISTOGRAM
HISTOGRAM
Karl Pearson introduced the histogram in 1891. He used it to show time
concepts of various reigns of Prime Ministers.
The histogram is a graph that displays the data by using contiguous vertical bars
(unless the frequency of a class is 0) of various heights to represent the
frequencies of the classes.
MEAN and VARIANCE OF A DISCRETE RANDOM
VARIABLE
NORMAL DISTRIBUTION
This distribution is also known as a bell curve or a Gaussian
distribution, named for the German mathematician Carl
Friedrich Gauss (1777–1855), who derived its equation.
A normal distribution curve can be used to study many variables
that are not perfectly normally distributed but are nevertheless
approximately normal.
A normal distribution is a continuous, symmetric, bell-shaped
distribution of a variable.
The shape and position of a normal distribution curve depend on
two parameters, the mean and the standard deviation. Each
normally distributed variable has its own normal distribution curve,
which depends on the values of the variable’s mean and standard
deviation.
The discovery of the equation for a normal distributioncan be
traced to three mathematicians. In 1733, the French mathematician
Abraham DeMoivre derived an equation for a normal distribution
based on the random variation of the number of heads appearing
when a large number of coins were tossed. Not realizing any
connection with the naturally occurring variables, he showed this
formula to only a few friends.
About 100 years later, two mathematicians, Pierre Laplace in
France and Carl Gauss in Germany, derived the equation of the
normal curve independently and without any knowledge of
DeMoivre’s work. In 1924, Karl Pearson found that DeMoivre
had discovered the formula before Laplace or Gauss.
SYMMETRIC DISTRIBUTION
When the data values are evenly distributed about the mean, a
distribution is said to be a symmetric distribution.
NEGATIVELY- SKEWED DISTRIBUTION
When the majority of the data values fall to the right of the mean, the
distribution is said to be a negatively or left-skewed distribution. The
mean is to the left of the median, and the mean and the median are to
the left of the mode.
POSITIVELY-SKEWED DISTRIBUTION
When the majority of the data values fall to the left of the mean, a
distribution is said to be a positively or right-skewed distribution. The
mean falls to the right of the median, and both the mean and the
median fall to the right of the mode.
PROPERTIES OF THE THEORETICAL NORMAL
DISTRIBUTION
1.A normal distribution curve is bell-shaped.
2.The mean, median, and mode are equal and are located at the center
of the distribution.
3.A normal distribution curve is unimodal (i.e., it has only one mode).
4. The curve is symmetric about the mean, which is equivalent to
saying that its shape is the same on both sides of a vertical line
passing through the center.
5. The curve is continuous; that is, there are no gaps or holes. For
each value of X, there is a corresponding value of Y.
6. The curve never touches the x axis. Theoretically, no matter how
far in either direction the curve extends, it never meets the x axis—but
it gets increasingly closer.
7. The total area under a normal distribution curve is equal to 1.00, or
100%. This fact may seem unusual, since the curve never touches the
x axis, but one can prove it mathematically by using calculus.
8. The area under the part of a normal curve that lies within 1 standard
deviation of the mean is approximately 0.68, or 68%; within 2
standard deviations, about 0.95, or 95%; and within 3 standard
deviations, about 0.997, or 99.7%.
THE STANDARD NORMAL DISTRIBUTION
Applications of the Normal Distribution
SAMPLING DISTRIBUTION
RANDOM SAMPLING
Population – The set all possible values of a variable.
Sample – It consists of one or more data drawn from
the population.
Random Sampling - It is a sampling method of choosing
representatives from the population wherein every sample has an
equal chance of being selected. Accurate data can be collected using
random sampling techniques.
Probability Sampling - The sampling techniques that involve
random selection.
Non- Probability Sampling – The sampling techniques that do not
involve random selection of data.
Different Types of Random Sampling
A Simple random sampling technique is the most basic random
sampling wherein each element in the population has an equal
probability of being selected.
Systematic Random Sampling – This can be done by listing all the
elements in the population and selecting every kth element in your
population list.
Stratified random sampling – is a random sampling wherein the
population is divided into different strata or divisions.
Cluster sampling – is a random sampling wherein population is
divided into clusters or groups and then the clusters are randomly
selected.
Convenience sampling – wherein the researcher gathers data from
nearby sources of information exerting minimal effort.
Snowball sampling or chain- referral sampling is defined as a non-
probability sampling technique in which the samples have traits that
are rare to find.
Quota Sampling – sample units are picked for convenience but
certain quotas are given to interviews.
Volunteer sampling – sample units are volunteers in studies wherein
the measuring process is painful or troublesome to a respondent.
PARAMETER AND STATISTICS
Parameter
The measurement or quantity that describes the
population.
Statistics
The measurement or quantity that describes the
sample.
SAMPLING DISTRIBUTIONS OF SAMPLE MEAN
A sampling distribution of sample means is a frequency distribution using the
means computed from all possible random samples of a specific size taken from
a population. The means of the samples are less than the mean of the
population.
Sampling error
-is the difference between the sample measure and the corresponding
population measure due to the fact that the sample is not a perfect representation
of the population.
PROPERTIES OF THE DISTRIBUTION OF SAMPLE MEANS
The mean of the sample means will be the same as the population mean.
The standard deviation of the sample means will be smaller than the standard
deviation of the population, and it will be equal to the population
standard deviation divided by the square root of the sample size.
1. Determine the number of sets of all ppossible random samples that can be
drawn from the given population by using the formula NCn, where N is the
population size and n is the sample size.
2. List all the possible samples and compute the mean of each sample.
3. Steps in Constructing the Sampling Distribution of the Means.
4. Construct the sampling distribution.
5. Compute the mean of the sampling distribution of the sample means.
6. Compute the variance of the sampling distribution of the sample means.
7. Construct a histogram.
CENTRAL LIMIT THEOREM
Population- The set that contains all data of elements, individuals or measurements from
your experimenting space.
Distribution- It describes the data/population/sample range and how data is spread in that
range.
Mean- Average value of all data from your population or sample.
Sample- It is a randomly selected subset from the population where the sample size is
denoted by n.
Standard Deviation- It describes how spread the population is.
Normal Distribution- The population is spread perfectly symmetrical with σ standard
deviations around the mean value.
PROPERTIES OF THE DISTRIBUTION OF SAMPLE MEANS
The mean of the sample means will be the same as the population mean.
The standard deviation of the sample means will be smaller than the standard deviation of the
population, and it will be equal to the population standard deviation divided by the square
root of the sample size.