What Is A Probability Distribution
What Is A Probability Distribution
• A probability distribution is a statistical function that describes all the possible values and
likelihoods that a random variable can take within a given range.
• A probability distribution depicts the expected outcomes of possible values for a given data
generating process.
• Probability distributions come in many shapes with different characteristics, as defined by the
mean, standard deviation, skewness, and kurtosis.
• Poisson distribution
• Normal distribution
• Binomial distribution
Binomial Distribution
• Describes discrete data set i.e situations where there can be only two results in an experiment.
• each observation falls into one of two categories called a success or failure.
• the probability of success (p) for each observation is the same - equally likely.
Poisson distribution
• Describes discrete data i.e situations where the random variable can take integer values.
• For eg: no. of patients arriving at the physician’s office,No. of cars arriving at the toll booth
Normal Distribution
• Normal distribution, also known as the Gaussian distribution, is a probability distribution that is
symmetric about the mean, showing that data near the mean are more frequent in occurrence
than data far from the mean. In graph form, normal distribution will appear as a bell curve.
• The mean is at the middle and divides the area into halves;
• It is completely determined by its mean and standard deviation σ (or variance σ2)
• A statistical hypothesis is an assumption about a population which may or may not be true.
• Hypothesis testing is a set of formal procedures used by statisticians to either accept or reject
statistical hypotheses.
Characteristics of Hypothesis
Types of hypothesis
• The null hypothesis states that a population parameter (such as the mean, the standard
deviation, and so on) is equal to a hypothesized value. The null hypothesis is often an initial
claim that is based on previous analyses or specialized knowledge.
• The alternative hypothesis states that a population parameter is smaller, greater, or different
than the hypothesized value in the null hypothesis. The alternative hypothesis is what you might
believe to be true or hope to prove true.
• One-sided
• A researcher has exam results for a sample of students who took a training course for a national
exam. The researcher wants to know if trained students score above the national average of
850. A one-sided alternative hypothesis (also known as a directional hypothesis) can be used
because the researcher is specifically hypothesizing that scores for trained students are greater
than the national average. (H0: μ = 850 vs. H1: μ > 850)
• Two-sided
• A researcher has results for a sample of students who took a national exam at a high school. The
researcher wants to know if the scores at that school differ from the national average of 850. A
two-sided alternative hypothesis (also known as a nondirectional hypothesis) is appropriate
because the researcher is interested in determining whether the scores are either less than or
greater than the national average. (H0: μ = 850 vs. H1: μ≠ 850)
• State the hypotheses - This step involves stating both null and alternative hypotheses. The
hypotheses should be stated in such a way that they are mutually exclusive. If one is true then
other must be false.
• Formulate an analysis plan - The analysis plan is to describe how to use the sample data to
evaluate the null hypothesis. The evaluation process focuses around a single test statistic.
• Analyze sample data - Find the value of the test statistic (using properties like mean score,
proportion, t statistic, z-score, etc.) stated in the analysis plan.
• Interpret results - Apply the decisions stated in the analysis plan. If the value of the test statistic
is very unlikely based on the null hypothesis, then reject the null hypothesis.
• Large sample theory.:The sample size n is greater than 30 (n≥30) it is known as large sample.
For large samples the sampling distributions of statistic are normal(Z test). A study of sampling
distribution of statistic for large sample is known as large sample theory.
• Small sample theory:If the sample size n is less than 30 (n<30), it is known as small sample. For
small samples the sampling distributions are t, F and χ2 distribution. A study
of sampling distributions for small samples is known as small sample theory.
A Type I error occurs when the sample data appear to show a treatment effect when, in fact,
there is none.
• In this case the researcher will reject the null hypothesis and falsely conclude that the
treatment has an effect.
• Type I errors are caused by unusual, unrepresentative samples. Just by chance the researcher
selects an extreme sample with the result that the sample falls in the critical region even though
the treatment has no effect.
• A Type II error occurs when the sample does not appear to have been affected by the treatment
when, in fact, the treatment does have an effect.
• In this case, the researcher will fail to reject the null hypothesis and falsely conclude that the
treatment does not have an effect.
• Type II errors are commonly the result of a very small treatment effect. Although the treatment
does have an effect, it is not large enough to show up in the research study.
What are the sampling methods or Sampling Techniques?
In Statistics, the sampling method or sampling technique is the process of studying the
population by gathering information and analyzing that data. It is the basis of the data where
the sample space is enormous.
Probability Sampling
Non-probability Sampling
Systematic Sampling
In the systematic sampling method, the items are selected from the target population by selecting
the random selection point and selecting the other methods after a fixed sample interval. It is
calculated by dividing the total population size by the desired population size.
Example:
Suppose the names of 300 students of a school are sorted in the reverse alphabetical order. To
select a sample in a systematic sampling method, we have to choose some 15 students by randomly
selecting a starting number, say 5. From number 5 onwards, will select every 15th person from the
sorted list. Finally, we can end up with a sample of some students.
Stratified Sampling
In a stratified sampling method, the total population is divided into smaller groups to complete the
sampling process. The small group is formed based on a few characteristics in the population. After
separating the population into a smaller group, the statisticians randomly select the sample.
For example, there are three bags (A, B and C), each with different balls. Bag A has 50 balls, bag B
has 100 balls, and bag C has 200 balls. We have to choose a sample of balls from each bag
proportionally. Suppose 5 balls from bag A, 10 balls from bag B and 20 balls from bag C.
Clustered Sampling
In the clustered sampling method, the cluster or group of people are formed from the population set.
The group has similar significatory characteristics. Also, they have an equal chance of being a part
of the sample. This method uses simple random sampling for the cluster of population.
Example:
An educational institution has ten branches across the country with almost the number of students. If
we want to collect some data regarding facilities and other things, we can’t travel to every unit to
collect the required data. Hence, we can use random sampling to select three or four branches as
clusters.
All these four methods can be understood in a better manner with the help of the figure given below.
The figure contains various examples of how samples will be taken from the population using
different techniques.
Convenience Sampling
In a convenience sampling method, the samples are selected from the population directly because
they are conveniently available for the researcher. The samples are easy to select, and the
researcher did not choose the sample that outlines the entire population.
Example:
In researching customer support services in a particular region, we ask your few customers to
complete a survey on the products after the purchase. This is a convenient way to collect data. Still,
as we only surveyed customers taking the same product. At the same time, the sample is not
representative of all the customers in that area.
Consecutive Sampling
Consecutive sampling is similar to convenience sampling with a slight variation. The researcher
picks a single person or a group of people for sampling. Then the researcher researches for a period
of time to analyze the result and move to another group if needed.
Quota Sampling
In the quota sampling method, the researcher forms a sample that involves the individuals to
represent the population based on specific traits or qualities. The researcher chooses the sample
subsets that bring the useful collection of data that generalizes the entire population.
Learn more about quota sampling here.
Snowball Sampling
Snowball sampling is also known as a chain-referral sampling technique. In this method, the
samples have traits that are difficult to find. So, each identified member of a population is asked to
find the other sampling units. Those sampling units also belong to the same targeted population.
These are used for research which is conclusive. These are used for research which is exploratory.
These involve a long time to get the data. These are easy ways to collect the data quickly.
There is an underlying hypothesis in probability The hypothesis is derived later by conducting the
sampling before the study starts. Also, the objective research study in the case of non-probability
of this method is to validate the defined hypothesis. sampling.