0% found this document useful (0 votes)
19 views19 pages

Unit - 3

The document provides a comprehensive overview of the Normal Distribution, including its definition, probability density function, and applications in statistics. It explains how to calculate the probability density function using mean and standard deviation, and discusses the normal approximation to the binomial distribution. Additionally, it covers random sampling methods and their types, emphasizing the importance of probability sampling in representing a population.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
19 views19 pages

Unit - 3

The document provides a comprehensive overview of the Normal Distribution, including its definition, probability density function, and applications in statistics. It explains how to calculate the probability density function using mean and standard deviation, and discusses the normal approximation to the binomial distribution. Additionally, it covers random sampling methods and their types, emphasizing the importance of probability sampling in representing a population.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 19

Normal Distribution Definition :

The Normal Distribution is defined by the probability density function for a


continuous random variable in a system. Let us say, f(x) is the probability
density function and X is the random variable. Hence, it defines a function
which is integrated between the range or interval (x to x + dx), giving the
probability of random variable X, by considering the values between x and
x+dx.

f(x) ≥ 0 ∀ x ϵ (−∞,+∞)

And -∞∫+∞ f(x) = 1

Normal Distribution Formula :


The probability density function of normal or gaussian distribution is given
by;

Where,

 x is the variable
 μ is the mean
 σ is the standard deviation

Normal Distribution Curve


The random variables following the normal distribution are those whose
values can find any unknown value in a given range. For example, finding
the height of the students in the school. Here, the distribution can consider
any value, but it will be bounded in the range say, 0 to 6ft. This limitation is
forced physically in our query.

Whereas, the normal distribution doesn’t even bother about the range. The
range can also extend to –∞ to + ∞ and still we can find a smooth curve.
These random variables are called Continuous Variables, and the Normal
Distribution then provides here probability of the value lying in a particular
range for a given experiment. Also, use the normal distribution calculator to
find the probability density function by just providing the mean and
standard deviation value

Normal Distribution Curve :

Normal Distribution Table :


The table here shows the area from 0 to Z-value.
Z-
0.00 0.01 0.02 0.03 0.04 0.05 0.06 0.07 0.08 0.09
Value

0.0 0.0000 0.0040 0.0080 0.0120 0.0160 0.0199 0.0239 0.0279 0.0319 0.0359

0.1 0.0398 0.0438 0.0478 0.0517 0.0557 0.0596 0.0636 0.0675 0.0714 0.0753

0.2 0.0793 0.0832 0.0871 0.0910 0.0948 0.0987 0.1026 0.1064 0.1103 0.1141

0.3 0.1179 0.1217 0.1255 0.1293 0.1331 0.1368 0.1406 0.1443 0.1480 0.1517

0.4 0.1554 0.1591 0.1628 0.1664 0.1700 0.1736 0.1772 0.1808 0.1844 0.1879

0.5 0.1915 0.1950 0.1985 0.2019 0.2054 0.2088 0.2123 0.2157 0.2190 0.2224

0.6 0.2257 0.2291 0.2324 0.2357 0.2389 0.2422 0.2454 0.2486 0.2517 0.2549

0.7 0.2580 0.2611 0.2642 0.2673 0.2704 0.2734 0.2764 0.2794 0.2823 0.2852

0.8 0.2881 0.2910 0.2939 0.2967 0.2995 0.3023 0.3051 0.3078 0.3106 0.3133

0.9 0.3159 0.3186 0.3212 0.3238 0.3264 0.3289 0.3315 0.3340 0.3365 0.3389

1.0 0.3413 0.3438 0.3461 0.3485 0.3508 0.3531 0.3554 0.3577 0.3599 0.3621

1.1 0.3643 0.3665 0.3686 0.3708 0.3729 0.3749 0.3770 0.3790 0.3810 0.3830
1.2 0.3849 0.3869 0.3888 0.3907 0.3925 0.3944 0.3962 0.3980 0.3997 0.4015

1.3 0.4032 0.4049 0.4066 0.4082 0.4099 0.4115 0.4131 0.4147 0.4162 0.4177

1.4 0.4192 0.4207 0.4222 0.4236 0.4251 0.4265 0.4279 0.4292 0.4306 0.4319

1.5 0.4332 0.4345 0.4357 0.4370 0.4382 0.4394 0.4406 0.4418 0.4429 0.4441

1.6 0.4452 0.4463 0.4474 0.4484 0.4495 0.4505 0.4515 0.4525 0.4535 0.4545

1.7 0.4554 0.4564 0.4573 0.4582 0.4591 0.4599 0.4608 0.4616 0.4625 0.4633

1.8 0.4641 0.4649 0.4656 0.4664 0.4671 0.4678 0.4686 0.4693 0.4699 0.4706

1.9 0.4713 0.4719 0.4726 0.4732 0.4738 0.4744 0.4750 0.4756 0.4761 0.4767

2.0 0.4772 0.4778 0.4783 0.4788 0.4793 0.4798 0.4803 0.4808 0.4812 0.4817

2.1 0.4821 0.4826 0.4830 0.4834 0.4838 0.4842 0.4846 0.4850 0.4854 0.4857

2.2 0.4861 0.4864 0.4868 0.4871 0.4875 0.4878 0.4881 0.4884 0.4887 0.4890

2.3 0.4893 0.4896 0.4898 0.4901 0.4904 0.4906 0.4909 0.4911 0.4913 0.4916

2.4 0.4918 0.4920 0.4922 0.4925 0.4927 0.4929 0.4931 0.4932 0.4934 0.4936
2.5 0.4938 0.4940 0.4941 0.4943 0.4945 0.4946 0.4948 0.4949 0.4951 0.4952

2.6 0.4953 0.4955 0.4956 0.4957 0.4959 0.4960 0.4961 0.4962 0.4963 0.4964

2.7 0.4965 0.4966 0.4967 0.4968 0.4969 0.4970 0.4971 0.4972 0.4973 0.4974

2.8 0.4974 0.4975 0.4976 0.4977 0.4977 0.4978 0.4979 0.4979 0.4980 0.4981

2.9 0.4981 0.4982 0.4982 0.4983 0.4984 0.4984 0.4985 0.4985 0.4986 0.4986

3.0 0.4987 0.4987 0.4987 0.4988 0.4988 0.4989 0.4989 0.4989 0.4990 0.4990

Normal Distribution Problems and Solutions :


Question 1: Calculate the probability density function of normal
distribution using the following data. x = 3, μ = 4 and σ = 2.

Solution: Given, variable, x = 3

Mean = 4 and

Standard deviation = 2

By the formula of the probability density of normal distribution, we can


write;

Hence, f(3,4,2) = 1.106.


Question 2: If the value of random variable is 2, mean is 5 and the
standard deviation is 4, then find the probability density function of the
gaussian distribution.

Solution: Given,

Variable, x = 2

Mean = 5 and

Standard deviation = 4

By the formula of the probability density of normal distribution, we can


write;

f(2,2,4) = 1/(4√2π) e0

f(2,2,4) = 0.0997

There are two main parameters of normal distribution in statistics namely


mean and standard deviation. The location and scale parameters of the
given normal distribution can be estimated using these two parameters.

Applications :
The normal distributions are closely associated with many things such as:

 Marks scored on the test


 Heights of different persons
 Size of objects produced by the machine
 Blood pressure and so on.
Normal Approximation to Binomial: Definition &
Example

If X is a random variable that follows a binomial


distribution with n trials and p probability of success on a
given trial, then we can calculate the mean (μ) and standard
deviation (σ) of X using the following formulas:

 μ = np
 σ = √np(1-p)

It turns out that if n is sufficiently large then we can actually


use the normal distribution to approximate the probabilities
related to the binomial distribution. This is known as
the normal approximation to the binomial.

The following table shows when you should add or subtract


0.5, based on the type of probability you’re trying to find:
Using Binomial Using Normal Distribution with Continuity
Distribution Correction
X = 45 44.5 < X < 45.5
X ≤ 45 X < 45.5
X < 45 X < 44.5
X ≥ 45 X > 44.5
X > 45 X > 45.5

Example: Normal Approximation to the Binomial


Suppose we want to know the probability that a coin lands on
heads less than or equal to 43 times during 100 flips.
In this situation we have the following values:

 n (number of trials) = 100


 X (number of successes) = 43
 p (probability of success on a given trial) = 0.50

To calculate the probability of the coin landing on heads less


than or equal to 43 times, we can use the following steps:

Step 1: Verify that the sample size is large enough to use the
normal approximation.

First, we must verify that the following criteria are met:


 np ≥ 5
 n(1-p) ≥ 5
In this case, we have:
 np = 100*0.5 = 50
 n(1-p) = 100*(1 – 0.5) = 100*0.5 = 50
Both numbers are greater than 5, so we’re safe to use the
normal approximation.

Step 2: Determine the continuity correction to apply.

Referring to the table above, we see that we should


add 0.5 when we’re working with a probability in the form
of X ≤ 43. Thus, we will be finding P(X< 43.5).

Step 3: Find the mean (μ) and standard deviation (σ) of the
binomial distribution.

μ = n*p = 100*0.5 = 50

σ = √n*p*(1-p) = √100*.5*(1-.5) = √25 = 5

Step 4: Find the z-score using the mean and standard


deviation found in the previous step.

z = (x – μ) / σ = (43.5 – 50) / 5 = -6.5 / 5 = -1.3.


Step 5: Find the probability associated with the z-score.

We can use the Normal CDF Calculator to find that the area
under the standard normal curve to the left of -1.3 is .0968.

Thus, the probability that a coin lands on heads less than or


equal to 43 times during 100 flips is .0968.

Random Sampling :
Random sampling is a method of choosing a sample of observations from
a population to make assumptions about the population. It is also
called probability sampling

Type of Random Sampling


The random sampling method uses some manner of a random choice. In
this method, all the suitable individuals have the possibility of choosing the
sample from the whole sample space. It is a time consuming and
expensive method. The advantage of using probability sampling is that it
ensures the sample that should represent the population. There are four
major types of this sampling method, they are;

1. Simple Random Sampling


2. Systematic Sampling
3. Stratified Sampling
4. Clustered Sampling

Now let us discuss its types one by one here.

Simple random sampling


In this sampling method, each item in the population has an equal and
likely possibility of getting selected in the sample (for example, each
member in a group is marked with a specific number). Since the selection
of item completely depends on the possibility, therefore this method is
called “Method of chance Selection”. Also, the sample size is large, and the
item is selected randomly. Thus it is known as “Representative Sampling”.

Systematic Random Sampling


In this method, the items are chosen from the destination population by
choosing the random selecting point and picking the other methods after
a fixed sample period. It is equal to the ratio of the total population size and
the required population size.

Stratified Random Sampling


In this sampling method, a population is divided into subgroups to obtain a
simple random sample from each group and complete the sampling
process (for example, number of girls in a class of 50 strength). These
small groups are called strata. The small group is created based on a few
features in the population. After dividing the population into smaller
groups, the researcher randomly selects the sample.

Clustered Sampling
Cluster sampling is similar to stratified sampling, besides the population is
divided into a large number of subgroups (for example, hundreds of
thousands of strata or subgroups). After that, some of these subgroups are
chosen at random and simple random samples are then gathered within
these subgroups. These subgroups are known as clusters. It is basically
utilised to lessen the cost of data compilation.

Random Sampling Formula :


If P is the probability, n is the sample size, and N is the population. Then;

 The chance of getting a sample selected only once is given by;


P = 1 – (N-1/N).(N-2/N-1)…..(N-n/N-(n-1))

Cancelling = 1-(N-n/n)
P = n/N

 The chance of getting a sample selected more than once is given by;
P = 1-(1-(1/N))n

Advantages of Simple Random Sampling :


Some of the advantages of random sampling are as follows:

 It helps to reduce the bias involved in the sample, compared to other methods
of sampling and it is considered as a fair method of sampling.
 This method does not require any technical knowledge, as it is a fundamental
method of collecting the data.
 The data collected through this method is well informed.
 As the population size is large in the simple random sampling method,
researchers can create the sample size that they want.
 It is easy to pick the smaller sample size from the existing larger population.

Random Sampling Example


Suppose a firm has 1000 employees in which 100 of them have to be
selected for onsite work. All their names will be put in a basket to pull 100
names out of those. Now, each employee has an equal chance of getting
selected, so we can also easily calculate the probability (P) of a given
employee being selected since we know the sample size (n) and the
population size(N).

Therefore, the chance of selection of an employee only once is;

P = n/N = 100/1000 = 10%

And the chance of selection of an employee more than once is;

P = 1-(1-(1/N))n

P = 1 – (999/1000)100

P = 0.952

P ≈ 9.5%
sampling distribution
Sampling distribution is a statistic that determines the probability of an
event based on data from a small group within a large population. Its
primary purpose is to establish representative results of small samples of a
comparatively larger population. Since the population is too large to
analyze, you can select a smaller group and repeatedly sample or analyze
them. The gathered data, or statistic, is used to calculate the likely
occurrence, or probability, of an event.

Factors that influence sampling distribution :

There are three primary factors that influence the


variability of a sampling distribution. They are:

 The number observed in a population: The symbol


for this variable is "N." It is the measure of observed
activity in a given group of data.

 The number observed in the sample: The symbol


for this variable is "n." It is the measure of observed
activity in a random sample of data that is part of
the larger grouping.

 The method of choosing the sample: How you


chose the samples can account for variability in
some cases.

Types of distributions :
There are three standard types of sampling distributions
in statistics:

1. Sampling distribution of mean

The most common type of sampling distribution is the


mean. It focuses on calculating the mean of every
sample group chosen from the population and plotting
the data points. The graph shows a normal distribution
where the center is the mean of the sampling
distribution, which represents the mean of the entire
population.

2. Sampling distribution of proportion

This sampling distribution focuses on proportions in a


population. You select samples and calculate their
proportions. The means of the sample proportions from
each group represent the proportion of the entire
population

3. T-distribution

A T-distribution is a sampling distribution that involves a


small population or one where you don't know much
about it. It is used to estimate the mean of the
population and other statistics such as confidence
intervals, statistical differences and linear regression.
The T-distribution uses a t-score to evaluate data that
wouldn't be appropriate for a normal distribution.

The formula for t-score is:

t = [ x - μ ] / [ s / sqrt( n ) ]

In the formula, "x" is the sample mean and "μ" is the


population mean and signifies standard deviation.

Example of a sampling distribution

Here is an example of a sampling distribution using a


fictional scenario with a data set and a graph:

A professor is interested in understanding the


sampling distribution of their students' test scores.
This professor thinks this may help determine a
suitable curve for the previous tests their students
completed. The professor recorded test scores from
the previous three tests and created a data table and
a sampling distribution graph.

Test 1 Test 2 Test 3

Student 1 80 82 84
Test 1 Test 2 Test 3

Student 2 78 76 80

Student 3 74 86 80

Student 4 75 81 80

Student 5 76 81 78

Student 6 88 81 89

Student 7 72 79 75

Student 8 94 95 99

Student 9 69 68 63
This sampling distribution shows the professor this
their students' scores have a mostly normal
distribution with a mean of around 76 to 80%.

central limit theorem :


The central limit theorem, which is a statistical theory,
states that when a large sample size has a finite
variance, the samples will be normally distributed, and
the mean of samples will be approximately equal to the
mean of the whole population

Assumptions of the Central Limit Theorem


 The sample should be drawn randomly following the condition of
randomisation.
 The samples drawn should be independent of each other. They should not
influence the other samples.
 When the sampling is done without replacement, the sample size shouldn’t
exceed 10% of the total population.
 The sample size should be sufficiently large.

 Formula
 The formula for the central limit theorem is given below:

Applications of Central Limit Theorem

1] The sample distribution is assumed to be normal when the distribution is


unknown or not normally distributed according to the central limit theorem.
This method assumes that the given population is distributed normally. It
helps in data analysis.

2] The sample mean deviation decreases as we increase the samples


taken from the population, which helps in estimating the mean of the
population more accurately.

3] The sample mean is used to create a range of values which likely


includes the population mean.

4] The concept of the central limit theorem is used in election polls to


estimate the percentage of people supporting a particular candidate as
confidence intervals.

5] CLT is used in calculating the mean family income in a particular


country.

6] It is used in rolling many identical, unbiased dice.

7] The probability distribution for the total distance covered in a random


walk will approach a normal distribution.

8] Flipping many coins will result in a normal distribution for the total
number of heads (or, equivalently total number of tails).

9] By looking at the sample distribution, CLT can tell whether the sample
belongs to a particular population.

10] It enables us to make conclusions about the sample and population


parameters and assists in constructing good machine-learning models.

What is sampling distribution S2?


X 1 , X 2 , … , X n are observations of a random sample of size from the normal
distribution. X ¯ = 1 n ∑ i = 1 n X i is the sample mean of the observations, and. S 2
= 1 n − 1 ∑ i = 1 n ( X i − X ¯ ) 2 is the sample variance of the observations.

T- Distribution Definition
The t-distribution is a hypothetical probability distribution. It is also known
as the student’s t-distribution and used to make presumptions about a
mean when the standard deviation is not known to us. It is symmetrical,
bell-shaped distribution, similar to the standard normal curve. As high as
the degrees of freedom (df), the closer this distribution will approximate a
standard normal distribution with a mean of 0 and a standard deviation of
1

T Distribution Formula
A t-distribution is the whole set of t values measured for every possible
random sample for specific sample size or a particular degree of freedom.
It approximates the shape of normal distribution.

Let x have a normal distribution with mean ‘μ’ for the sample of size ‘n’ with
sample mean x̄ and the sample standard deviation ‘s’, then the t variable
has student’s t-distribution with a degree of freedom, d.f = n – 1. The
formula for t-distribution is given by;

F DISTRIBUTION
The F-Distribution is a continuous probability distribution that has a non-
negative range of values. It is a ratio of two independent chi-square
distributions, each divided by their degrees of freedom. The F-Distribution
has two parameters, the numerator degrees of freedom (df1) and the
denominator degrees of freedom (df2). The probability density function
(PDF) of the F-Distribution is given by:
f (x) = ((df1/ 2 ) * (df2/ 2 )) / (B((df1/ 2 ),

(df2/ 2 ))) * (x^((df1

where B is the Beta function, which is defined as:

B (x,y) = (gamma(x) * gamma (y) ) / gamma (x+y)

where gamma is the gamma function.

The mean and variance of the F-Distribution are given by:

Mean = df2 / (df2 - 2 ) (when df2 > 2 )

Variance = ( 2 * (df2^ 2 ) * (df1 + df2 - 2 )) /

(df1 * (df2 - 2 )^

The shape of the F-Distribution depends on the degrees of freedom. As the


degrees of freedom increase, the distribution becomes more symmetrical
and approaches a normal distribution

You might also like