0% found this document useful (0 votes)

17 views63 pages

Statistics

The document provides an overview of statistics, including definitions, types, and methods of data analysis. It covers descriptive and inferential statistics, measures of central tendency and dispersion, types of data, scales of measurement, sampling techniques, and concepts like covariance and correlation. Key statistical tools and techniques such as T-tests, ANOVA, and various sampling methods are also discussed.

Uploaded by

adars251

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

17 views63 pages

Statistics

Uploaded by

adars251

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 63

@Gangadhar Tiwari

Statistics
Introduction to Statistics: -
Stats Definition: - Stats is the science of collecting, organizing and
analyzing data.

Data: - Facts or pieces of information

E.g.: - 1. Height of student in classroom

2. No. of sales in term of revenue of a company

3. IQ of students in classroom

Type of Statistics: -

1. Descriptive Statistics
2. Inferential Statistics

1. Descriptive Statistics: - it consists of organizing summarizing and

Visualizing data.

I. Measure of Central Tendency: -

II. Measures of Dispersion: -

@Gangadhar Tiwari
III. Different type of distribution of data: -

i. Bernoulli Distribution
ii. Uniform Distribution
iii. Binomial Distribution
iv. Normal or Gaussian Distribution
v. Exponential Distribution
vi. Poisson Distribution

2. Inferential Statistics: - Inferential statistics are used to make conclusions

about the population by using analytical tools on the sample data.

Measures of inferential statistics are

T-test
Z-test
CHI Square Test
Anova test
Hypothesis testing
P-Value
Significance value

E.g.: - Let say there are 10 Cricket Camps in Bangalore and you have collected the
height of cricketers from one of the camps.

Height is recorded are [175cm,180cm,140cm,140,135cm,160cm,135cm]

@Gangadhar Tiwari
(Sample data)

a. Descriptive Question: -
IV. What is the average height of the entire camps
V. Disturbance of a data
VI. 140cm how many STD it is away from mean

b. Inferential Question: -
• Are the average height of a players of camp1 similar to that of
camp2

Sample
data

 Population and Sample data: -

• Population Data (N): - Population is a group or a superset of data that
you are interested in studying.
• Sample Data (n): - a sample is a subset of population data.

 Types Of Data: -

@Gangadhar Tiwari
No ranks Ranks Whole Numbers Any Value

E.g.:- Gender, Blood E.g.:- Customer E.g.:- No. of children e.g.:- House price in
Group, Colors, feedback {1, 2,3,4,5} in a family Bengaluru
No. of bikes Length of river
location, cities, days No. of people working

 Scales of Measurement: -the variables or numbers are defined and

categorized using different scales of measurements. Each level of
measurement scale has specific properties that determine the various use
of statistical analysis
There are four different scales of measurement.

• Nominal Scale
• Ordinal Scale
• Interval Scale
• Ratio Scale

@Gangadhar Tiwari
I. Nominal Scale data: - A nominal scale is the 1st level of measurement scale in
which the numbers serve as “tags” or “labels” to classify or identify the objects. A
nominal scale usually deals with the non-numeric variables or the numbers that do not
have any value

• Qualitative/ Categorical Data

• E.g.: - Gender, color, Labels
• Order or rank does not matter

II. Ordinal Scale Data: - The ordinal scale is the 2 nd level of measurement that reports
the ordering and ranking of data without establishing the degree of variation between
them. Ordinal represents the “order.”
Ordinal data is known as qualitative data or categorical data. It can be grouped, named
and also ranked.

• Rank is important
• Order matters
• Difference cannot be measured • Example:

o Ranking of school students – 1st, 2nd, 3rd, etc.

o Assessing the degree of agreement
▪ Totally agree
▪ Agree
▪ Neutral
▪ Disagree

@Gangadhar Tiwari
▪ Totally disagree

III. Interval Scale Data: - The interval scale is the 3 rd level of measurement scale. It is
defined as a quantitative measurement scale in which the difference between the two
variables is meaningful. In other words, the variables are measured in an exact manner,
not as in a relative way in which the presence of zero is arbitrary.
• The order matters
• Difference can be measured
• The ratio cannot be measured
• No ‘0’ starting point • Example:

• Likert Scale
• Net Promoter Score (NPS)
• Bipolar Matrix Table
• IQ

IV. Ratio Scale Data: - The ratio scale is the 4th level of measurement
scale, which is quantitative. It is a type of variable measurement scale.
It allows researchers to compare the differences or intervals. The ratio
scale has a unique feature. It possesses the character of the origin or
zero points.

• The order matters

• Differences are measurable (Ratio)
• Contant a “0” Starting point
• E.g.: - o Students marks in a class

 Descriptive Statistics
1. Measure of Central Tendency: -
o Mean
oMedian
oMode

@Gangadhar Tiwari
 Mean: - The mean represents the average value of the dataset. It can be calculated as the sum
of all the values in the dataset divided by the number of values.

 Median: - Median is the middle value of the dataset in which the dataset is arranged in
the ascending order or in descending order. When the dataset contains an even number
of values, then the median value of the dataset can be found by taking the mean of the
middle two values. Consider the given dataset with the odd number of
observations arranged in descending order – 23, 21, 18, 16, 15, 13, 12, 10, 9, 7, 6, 5, and
2

Here 12 is the middle or median number that has 6 values above it and 6 values below it.
Now, consider another example with an even number of observations that are arranged in
descending order – 40, 38, 35, 33, 32, 30, 29, 27, 26, 24, 23, 22, 19, and 17

@Gangadhar Tiwari
When you look at the given dataset, the two middle values obtained are 27 and 29. Now,
find out the mean value for these two numbers. i.e., (27+29)/2 =28
Therefore, the median for the given data distribution is 28.

 Mode: - The mode represents the frequently occurring value in the dataset. Sometimes
the dataset may contain multiple modes and, in some cases, it does not contain any
mode at all.

Consider the given dataset 5, 4, 2, 3, 2, 1, 5, 4, 5

Since the mode represents the most common value. Hence, the most frequently
repeated value in the given dataset is 5.

2. Measures of Dispersion: - Dispersion is the state of getting

dispersed or spread. Statistical dispersion means the extent to which
numerical data is likely to vary about an average value. In other words,
dispersion helps to understand the distribution of the data.
I. Variance: -

@Gangadhar Tiwari
• The sample variance is divided by n-1 so that we can create an
Unbiased estimator of the population variance

• More the spread more the variance

II. Standard Deviation: - The square root of the variance is known as the standard
deviation i.e. S.D. = √σ.
• A standard deviation is used to determine how estimations for a group of
observations (i.e., data set) are spread out from the mean (average or expected
value).
• How many STD Xi is away from mean

 Random Variables: - A random variable is a process of mapping the

output of a random process or experiment to a number.
E.g.: - Tossing a coin Rolling
a dice

@Gangadhar Tiwari
 Sets: -
A= {1,2,3,4,5,6,7,8}
B= {3,4,5,6,7}

I. Intersection: -
A ∩ B = {3,4,5,6,7}

II. Union: -
A B = {1,2,3,4,5,6,7,8}

III. Difference: -
A-B= {1,2,8}

IV. Subset: -
A B = False
B A= True

V. Superset: -
A B = True
B A= False

 Histograms and Skewness: -

Histogram: - Ages=
{10,12,14,18,24,30,35,36,37,40,41,42,43,50,51}
Bins, Bin size

@Gangadhar Tiwari
No. of Bins=50/5=10
Bin size=5

Skewness: - Skewness can be defined as a statistical measure that describes

the lack of symmetry or asymmetry in the probability distribution of a
dataset. It quantifies the degree to which the data deviates from a perfectly
symmetrical distribution, such as a normal (bell-shaped) distribution.
Skewness is a valuable statistical term because it provides insight into the
shape and nature of a dataset’s distribution.

A. No Skewed: -

@Gangadhar Tiwari
B. Right Skewed: -

Mean > Median > Mode

C. Left Skewed: -

Mean < Median < Mode

@Gangadhar Tiwari
 sampling Techniques: -

A. Simple random sampling:-

Example: Simple random sampling:- You want to select a simple random sample of
1000 employees of a social media marketing company. You assign a number to every
employee in the company database from 1 to 1000, and use a random number
generator to select 100 numbers.

B. Stratified sampling:-
Stratified sampling involves dividing the population into subpopulations that may
differ in important ways. It allows you draw more precise conclusions by ensuring
that every subgroup is properly represented in the sample.

To use this sampling method, you divide the population into subgroups (called strata)
based on the relevant characteristic (e.g., gender identity, age range, income bracket,
job role).

C. Systematic sampling:-
Systematic sampling is similar to simple random sampling, but it is usually slightly
easier to conduct. Every member of the population is listed with a number, but instead
of randomly generating numbers, individuals are chosen at regular intervals.

Example: Systematic sampling: - All employees of the company are listed in

alphabetical order. From the first 10 numbers, you randomly select a starting point:
number 6. From number 6 onwards, every 10th person on the list is selected (6, 16, 26,
36, and so on), and you end up with a sample of 100 people.

D. Convenience sampling:-
A convenience sample simply includes the individuals who happen to be most
accessible to the researcher.

@Gangadhar Tiwari
This is an easy and inexpensive way to gather initial data, but there is no way to tell if
the sample is representative of the population, so it can’t
produce generalizable results. Convenience samples are at risk for both sampling bias
and selection bias.

Example: Convenience sampling: - You are researching opinions about student

support services in your university, so after each of your classes, you ask your fellow
students to complete a survey on the topic. This is a convenient way to gather data,
but as you only surveyed students taking the same classes as you at the same level,
the sample is not representative of all the students at your university.

E. Purposive sampling:-
This type of sampling, also known as judgement sampling, involves the researcher
using their expertise to select a sample that is most useful to the purposes of the
research.

It is often used in qualitative research, where the researcher wants to gain detailed
knowledge about a specific phenomenon rather than make statistical inferences, or
where the population is very small and specific. An effective purposive sample must
have clear criteria and rationale for inclusion. Always make sure to describe your
inclusion and exclusion criteria and beware of observer bias affecting your
arguments.

Example: Purposive sampling: - You want to know more about the opinions and
experiences of disabled students at your university, so you purposefully select a
number of students with different support needs in order to gather a varied range of
data on their experiences with student services.

F. Cluster sampling:-
Cluster sampling also involves dividing the population into subgroups, but each
subgroup should have similar characteristics to the whole sample. Instead of sampling
individuals from each subgroup, you randomly select entire subgroups.

If it is practically possible, you might include every individual from each sampled
cluster. If the clusters themselves are large, you can also sample individuals from
within each cluster using one of the techniques above. This is called multistage
sampling.

This method is good for dealing with large and dispersed populations, but there is
more risk of error in the sample, as there could be substantial differences between
clusters. It’s difficult to guarantee that the sampled clusters are really representative of
the whole population.

@Gangadhar Tiwari
Example: Cluster sampling: - The company has offices in 10 cities across the country
(all with roughly the same number of employees in similar roles). You don’t have the
capacity to travel to every office to collect your data, so you use random sampling to
select 3 offices – these are your clusters.

@Gangadhar Tiwari
 Covariance and Correlation: -
• Covariance is a statistical term that refers to a systematic relationship
between two random variables in which a change in the other reflects
a change in one variable.

• The covariance value can range from -∞ to +∞, with a negative value
indicating a negative relationship and a positive value indicating a
positive relationship.
• The greater this number, the more reliant the relationship. Positive
covariance denotes a direct relationship and is represented by a
positive number.
• A negative number, on the other hand, denotes negative covariance,
which indicates an inverse relationship between the two variables.
Covariance is great for defining the type of relationship, but it's
terrible for interpreting the magnitude.

• Positive: An increase in one of the variables results in an increase in

the other.
• Negative: The variables are in opposite directions.
• Zero: Then, no relationship exists.

@Gangadhar Tiwari
A. Pearson correlation coefficient: - The Pearson correlation coefficient (r) is the
most common way of measuring a linear correlation. It is a number between –1 and 1
that measures the strength and direction of the relationship between two variables.

Pearson Correlation type Interpretation Example

correlation
coefficient (r)

Between 0 and 1 Positive correlation When one variable Baby length & weight:
changes, the other
variable changes in the
same direction. The longer the baby, the
heavier their weight.

0 No correlation There is no relationship Car price & width of

between the variables. windshield wipers: The
price of a car is not
related to the width of its
windshield wipers.
Between 0 Negative When one variable Elevation & air pressure:
and –1 correlation changes, the other The higher the elevation,
variable changes in the the lower the air pressure.
opposite direction.

where

• cov is the covariance

• σx is the standard deviation of X
• σy is the standard deviation of Y

B. Spearman's rank correlation coefficient:- A correlation can easily be drawn as a

scatter graph, but the most precise way to compare several pairs of data is to use a
statistical test - this establishes whether the correlation is really significant or if it
could have been the result of chance alone.
Spearman's Rank correlation coefficient is a technique which can be used to
summarise the strength and direction (negative or positive) of a relationship between
two variables. The result will always be between 1 and minus 1.

@Gangadhar Tiwari
 Probability Distribution Function: - a distribution function is a
mathematical expression that describes the probability of different possible
outcomes for an experiment.

Let us say we are running an experiment of tossing a fair coin. The possible events
are Heads, Tails. And for instance, if we use X to denote the events, the probability
distribution of X would take the value 0.5 for X=heads, and 0.5 for X=tails

o Data Types: - we have Qualitative and Quantitative data. And in Quantitative

data, we have Continuous and Discrete data types.
 Continuous data is measured and can take any number of values in a given
finite or infinite range. It can be represented in decimal format. And the
random variable that holds continuous values is called the Continuous random
variable.

Examples: A person’s height, Time, distance, etc.

 Discrete data is counted and can take only a limited number of values. It
makes no sense when written in decimal format. And the random variable that
holds discrete data is called the Discrete random variable.

Example: The number of students in a class, number of workers in a

company, etc.

o Types of Probability Distributions

Two major kinds of distributions based on the type of likely values for the variables
are,

@Gangadhar Tiwari
1. Discrete Distributions
2. Continuous Distributions

Discrete Distribution Vs Continuous Distribution

A comparison table showing difference between discrete distribution and
continuous distribution is given here.

Discrete Distributions Continuous Distribution

Discrete distributions have finite

Continuous distributions have infinite many
number of different possible
consecutive possible values
outcomes

We can add up individual values to We cannot add up individual values to find

find out the probability of an out the probability of an interval because
interval there are many of them

Discrete distributions can be

Continuous distributions can be expressed
expressed with a graph, piece-wise
with a continuous function or graph
function or table

In discrete distributions, graph

In continuous distributions, graph consists
consists of bars lined up one after
of a smooth curve
the other

Expected values might not be To calculate the chance of an interval, we

achievable required integrals

1. The probability distribution function / probability function has

ambiguous definition. They may be referred to:
• Probability density function (PDF)
• Cumulative distribution function (CDF)
• or probability mass function (PMF)
2. But what confirm is:
• Discrete case: Probability Mass Function (PMF)
• Continuous case: Probability Density Function (PDF)
• Both cases: Cumulative distribution function (CDF)
3. Probability at certain x value, P(X=x) can be directly obtained in:
• PMF for discrete case

@Gangadhar Tiwari
• PDF for continuous case
4. Probability for values less than x, P(X<x) or Probability for values
within a range from a to b, P(a<X<b) can be directly obtained in: •
CDF for both discrete / continuous case
5. Distribution function is referred to CDF or Cumulative Frequency
Function

A. Probability Density Function (PDF): - It is a statistical term that describes

the probability distribution of a continuous random variable. The probability
associate with a single value is always Zero. Below is the formula for PDF.

B. Probability Mass Function (PMF):- It is a statistical term that describes the

probability distribution of a discrete random variable.

@Gangadhar Tiwari
C. Cumulative Distribution Function (CDF):- It is another method to describe
the distribution of a random variable (either continuous or discrete).

@Gangadhar Tiwari
 Types of Probability Distribution: -
1. Normal or Gaussian Distribution
2. Bernoulli Distribution
3. Uniform Distribution
4. Poisson Distribution
5. Binomial Distribution
6. Log-Normal Distribution

1. Bernoulli Distribution: -

• Bernoulli distribution is a discrete probability distribution

• it’s concerned with discrete random variables {PMF}
• Bernoulli distribution applies to events that have one trial and two
possible outcomes. These are known as Bernoulli trials.

E.g.: -

▪ Tossing a coin {H,T}

Pr(H)=0.5 = p

Pr(T)=0.5 = 1-p=q

▪ Whether the person will

Pass/Fail
Pr(Pass)=0.85 = p

Pr(Fail)= 1-p = 0.15 = q

@Gangadhar Tiwari
----PMF=Pk*(1-P)1-K

K{0,1} ---- is outcomes

p Probability of one Outcome

q Probability of another Outcome

2. Binomial Distribution: - • it’s concerned with discrete

random variables {PMF}
• There are two possible outcomes: true or false, success or failure, yes
or no.
• These Experiments is Performs for n trials
• Every trial is an independent trial, which means the outcome of one
trial does not affect the outcome of another trial.

E.g.: -
Tossing a Coin 10 times

=PMF

n
Cx = n!/x!(n-x)! Where,
n = the number of experiments
x = 0, 1, 2, 3, 4, …
p = Probability of Success in a single experiment q = Probability of
Failure in a single experiment = 1 – p

Mean, μ = np

@Gangadhar Tiwari
Variance, σ2 = npq

Standard Deviation σ= √(npq) Where p is

the probability of success q is the

probability of failure, where q = 1-p

3. Poisson Distribution: - • it’s concerned with discrete

random variables {PMF}
• Describe the number of events occurring in a fixed time interval

E.g.: - No. of people visiting hospital every hour

No. of people visiting bank at 11am

@Gangadhar Tiwari
P(x, λ ) =(e– λ λx)/x! Where,
e is the base of the
logarithm x is the number of
occurrences (x=0,1,2,…..)
λ Expected no. of events occur at

every time
interval

@Gangadhar Tiwari
4. Normal or Gaussian Distribution: -
• it’s concerned with Continuous random variables {PDF}
• Normal distributions are symmetrical, but not all symmetrical
distributions are normal
Characteristics of Normal Distribution
• mean = median = mode
• Symmetrical about the center
• Unimodal
• 50% of values less than the mean and 50% greater than the mean

@Gangadhar Tiwari
Here, x is value of the variable;
f(x) represents the probability
density function; μ (mu) is the
mean; and σ (sigma) is the
standard deviation.

Examples that mainly follow a Normal Distribution

1. Blood pressure

2. Height of students in a class

3. Errors while taking measurements

4. Marks in a test, etc

Some Basic Terminology

1. Mean(μ) — is the average of a data set.

2. Median — is the middle of the set of numbers.

3. Mode — is the most common number(peak) in a data set. A

unimodal distribution only has one peak in the distribution, a
bimodal distribution has two peaks, and a multimodal
distribution has three or more peaks.

@Gangadhar Tiwari
4. Bias — is the tendency of a statistic to overestimate or
underestimate a parameter.

5. Skewness — refers to a distortion or asymmetry that

deviates from the symmetrical bell curve, or normal
distribution, in a set of data.

6. Standard deviation(σ) — is a measure of the amount of

variation or dispersion of a set of values. A low standard
deviation indicates that the values tend to be close to the mean
of the set, while a high standard deviation indicates that the
values are spread out over a wider range.

@Gangadhar Tiwari
@Gangadhar Tiwari
• Empirical Rule of Normal Distribution: - The empirical rule
in statistics, also known as the 68 95 99 rule, states that for normal
distributions, 68% of observed data points will lie inside one standard
deviation of the mean, 95% will fall within two standard deviations, and
99.7% will occur within three standard deviations.

@Gangadhar Tiwari
• 68.3% of values are within 1 standard deviation (1σ) of the mean

• 95.5% of values are within 2 standard deviations (2σ) of the mean

• 99.7% of values are within 3 standard deviations (3σ) of the mean

It is always good to know the standard deviation because we can say that
any value is:
• likely to be within 1 standard deviation (1σ)(68.3 out of 100 should be)
• very likely to be within 2 standard deviations (2σ) (95.5 out of 100
should be)
• almost certainly within 3 standard deviations (3σ) (997 out of 1000
should be)

5. Uniform Distribution: - I. Continuous Uniform Distribution (PDF) II.

Discrete Uniform Distribution (PMF)

I. Continuous Uniform Distribution (PDF): -

• Continuous random variables {PDF}

@Gangadhar Tiwari
@Gangadhar Tiwari
II. Discrete Uniform Distribution (PMF): -
• Discrete random variables {PMF}

 Standard Normal Distribution Z-Score: - The standard normal

distribution is a specific type of normal distribution where the mean is
equal to 0 and the standard deviation is equal to 1.

The normal distribution is the most commonly used probability distribution in

statistics.

It has the following properties:

• Symmetrical
• Bell-shaped
• Mean and median are equal; both located at the center of the
distribution

@Gangadhar Tiwari
The mean of the normal distribution determines its location and the standard
deviation determines its spread.

A standard normal distribution has the following properties:

• About 68% of data falls within one standard deviation of the mean
• About 95% of data falls within two standard deviations of the mean
• About 99.7% of data falls within three standard deviations of the mean

• What is a “Z-score”?

The number of standard deviations from the mean is also called the
“Standard Score”, “sigma” or “Z-score”. Simply, a Z-score describes
the position of a raw score in terms of its distance from the mean, when
measured in standard deviation units. z = (x – μ) / σ

@Gangadhar Tiwari
• Z is the “z-score” (Standard Score)
• x is the value to be standardized
• μ (mu) is the mean
• σ (sigma) is the standard deviation

Standardizing: - Standardization or Z-Score Normalization is the

transformation of features by subtracting from mean and dividing by
standard deviation. This is often called as Z-score.

We can take any Normal Distribution and convert it to The Standard Normal
Distribution.

@Gangadhar Tiwari
S.NO. Normalization Standardization

Minimum and maximum value of Mean and standard deviation is used for
1.
features are used for scaling scaling.

It is used when features are of different It is used when we want to ensure zero
2.
scales. mean and unit standard deviation.

3. Scales values between [0, 1] or [-1, 1]. It is not bounded to a certain range.

4. It is really affected by outliers. It is much less affected by outliers.

@Gangadhar Tiwari
Scikit-Learn provides a transformer Scikit-Learn provides a transformer
5.
called MinMaxScaler for Normalization. called StandardScaler for standardization.

This transformation squishes the It translates the data to the mean vector of
6. ndimensional data into an original data to the origin and squishes or
ndimensional unit hypercube. expands.

It is useful when we don’t know about It is useful when the feature distribution
7.
the distribution is Normal or Gaussian.

It is a often called as Scaling It is a often called as Z-Score

8.
Normalization Normalization.

@Gangadhar Tiwari
@Gangadhar Tiwari
Central limit Theorem: - For large sample sizes, the sampling distribution of
means will approximate to normal distribution even if the population distribution is
not normal.

@Gangadhar Tiwari
1. The sample size is sufficiently large. This condition is usually met if the size of
the sample is n ≥ 30.
2. The samples are independent and identically distributed, i.e., random
variables. The sampling should be random.
3. The population’s distribution has a finite variance. The central limit theorem
doesn’t apply to distributions with infinite variance.

@Gangadhar Tiwari
@Gangadhar Tiwari
1. What is Central Limit Theorem in Statistics?
Central Limit Theorem in statistics states that whenever we take a large
sample size of a population then the distribution of sample mean
approximates to the normal distribution.

2. When does Central Limit Theorem apply?

Central Limit theorem applies when the sample size is larger usually greater
than 30.

3. Why is Central Limit Theorem important?

Central Limit Theorem is important as it helps to make accurate prediction
about a population just by analyzing the sample.

4. How to solve Central Limit Theorem?

The Central Limit Theorem can be solved by finding Z
score which is calculated by using the formula.

how to check if distribution is normal or not

If you want to check the normal distribution using a histogram, plot the normal
distribution on the histogram of your data and check that the distribution curve of
the data approximately matches the normal distribution curve. A better way to do
this is to use a quantile-quantile plot, or Q-Q plot for short.

6. Log-Normal Distribution: - A log-normal distribution is a continuous

distribution of random variable y whose natural logarithm is normally
distributed. For example, if random variable y = exp { y } has log-normal
distribution then x = log ( y ) has normal distribution.

@Gangadhar Tiwari
 Inferential Statistics
Statistical inference provides methods for drawing conclusions about a
population from sample data.

1. Estimate: - it is an observed numerical value used to estimate an unknown

population parameter

I. Point Estimate: - Single numerical value used to estimate the

unknown population parameter.

II. Interval Estimate: - Range of value used to estimate the unknown

Population Parameter

@Gangadhar Tiwari
2. Hypothesis And Hypothesis Testing Mechanism: -

Inferential Stats is a Conclusion or inferences about the population data

 Hypothesis Testing Mechanism: - Hypothesis testing is a form of statistical

inference that uses data from a sample to draw conclusions about a population
parameter or a population probability distribution

- Null Hypothesis (H0):- The Null Hypothesis (H0) aims to nullify the
alternative hypothesis by implying that there exists no relation between
two variables in statistics. It states that the effect of one variable on the
other is solely due to chance and no empirical cause lies behind it.

- Alternative Hypothesis (H1):- Alternative Hypothesis (H1) or the

research hypothesis states that there is a relationship between two
variables (where one variable affects the other). The alternative
hypothesis is the main driving force for hypothesis testing.

@Gangadhar Tiwari
3.
4.
5.
6.
7.
8.
9.
10.
11.
12.
13.
14.
15.
16.

@Gangadhar Tiwari
3. P-Value: - P value is a number, calculated from a statistical test, that
describes how likely you are to have found a particular set of observation if
the null hypothesis were true, p values are used in hypothesis testing to help
decide whether to reject the null hypothesis

@Gangadhar Tiwari
4. Confidence Interval and Margin of Error: - Confidence intervals are a
range of values within which we can be confident that the true population
parameter lies. This range is estimated based on a sample from the

population and a chosen level of confidence. The level of confidence speaks

to the likelihood that the genuine populace parameter lies inside the certainty
interim.

Confidence Interval = [lower bound, upper bound]

The margin of error is equal to half the width of the entire confidence
interval.

lower bound, upper bound = sample mean ± margin of error

@Gangadhar Tiwari
@Gangadhar Tiwari
 Hypothesis Testing and Statistical Analysis: - 1.
Z-Test Average
2. T-Test
3. Chi Square --------- Categorical
4. Anova-------- Variance

1. Z-Test:-
• Population standard deviation is known
• Large sample size (n > 30)

@Gangadhar Tiwari
• Z-Test = (x̅ – μ) / (σ / √n) σ/√n---- Standard Error σ
----- Population standard deviation μ----- Population
Mean x̅ ----- Sample Mean n---- No. of Sample
• Degrees of Freedom Not applicable
• We Used Z Test when the population standard deviation is known
and the sample size is large

The z-test is also a hypothesis test in which the z-statistic follows a

normal distribution. The z-test is best used for greater-than-30 samples
because, under the central limit theorem, as the number of samples gets
larger, the samples are considered to be approximately normally
distributed.

Confidence interval = Point Estimate ± margin of error

Confidence interval = sample mean ± margin of error
C.I=x̅ ± Z α /2* σ/√n σ/√n---- Standard
Error σ ----- Population standard
deviation
α -----significance level n-----
no. of samples

@Gangadhar Tiwari
2. T-Test: - A t-test is an inferential statistic used to determine if there is a
significant difference between the means of two groups and how they are
related. T-tests are used when the data sets follow a normal distribution and
have unknown variances, like the data set recorded from flipping a coin 100
times.
• Population standard deviation is unknown
• Our sample size is small, n < 30
• T-Test = (x̅ – μ) / (s / √n) σ/√n---- Standard Error s --
--- sample standard deviation μ----- Population Mean x̅
----- Sample Mean n---- No. of Sample
• Degrees of Freedom is n-1
• We Used T-Test when the population standard deviation is unknown
or the sample size is small
• T-tests can be dependent or independent.

@Gangadhar Tiwari
Confidence interval = Point Estimate ± margin of error
Confidence interval = sample mean ± margin of error
C.I=X̅ ± T α /2* s/√n s/√n------ Standard error s-----
 Sample variance α -----significance level n-----
no. of samples

@Gangadhar Tiwari
• Z-Test & T-Tests are Parametric Tests, where the Null Hypothesis is less than,
greater than or equal to some value.
• A z-test is used if the population variance is known, or if the sample size is
larger than 30, for an unknown population variance.
• If the sample size is less than 30 and the population variance is unknown, we
must use a t-test.

Q1. When Are Z-test and T-test Used?

A. A z-test is used to test a Null Hypothesis if the population variance is known, or

if the sample size is larger than 30, for an unknown population variance. A t-test is
used when the sample size is less than 30 and the population variance is unknown.
Q2. What Is the Difference Between a Two-Tailed and One-Tailed Z-Test?

A. A one-tailed z-test allows for the possibility of rejection of the Null Hypothesis in
only one direction, whereas a two-tailed z-test tests the possibility of rejection in
both directions (left and right).

Q3. What Are the Assumptions of the T-Test and Z-Test?

A. It is assumed that the z-statistic follows a standard normal distribution, whereas

the t-statistic follows the t-distribution with a degree of freedom equal to n-1, where
n is the sample size

@Gangadhar Tiwari
3. Chi Square: -
• Chi Square test clams about Population proportions
• It is a non-parametric test is performed on categorical (nominal or
ordinal) data

@Gangadhar Tiwari
@Gangadhar Tiwari
@Gangadhar Tiwari
4. Anova(F-Test): -
• ANOVA, which stands for Analysis of Variance, is a statistical test
used to analyze the difference between the means of more than two
groups.
• ANOVA compares the variation between group means to the variation
within the groups. If the variation between group means is
significantly larger than the variation within groups, it suggests a
significant difference between the means of the groups.
• ANOVA calculates an F-statistic by comparing between-group
variability to within-group variability. If the F-statistic exceeds a
critical value, it indicates significant differences between group
means.
• ANOVA is used to compare treatments, analyse factors impact on a
variable, or compare means across multiple groups.
• Types of ANOVA include one-way (for comparing means of groups)
and two-way (for examining effects of two independent variables on
a dependent variable).

Types of Anova

1. One Way Annova:- One factor with at least 2 levels, these levels are
independent

@Gangadhar Tiwari
2. Repeated measures annova:- One factor with atleast 2 levels, levels are
dependents

3. Factorial Annova:- Two or More factors (Each of which with at least 2

levels)
Levels can be either independent or dependent

@Gangadhar Tiwari
 Hypothesis Testing of Annova:-
• Null Hypothesis H0 : μ1 = μ2 = μ3 = - - - - - μk
• Alternate hypothesis H1 : At least one of mean is not equal
• F Test Statistics

F = Variation between Samples / variation within samples

 One Way Annova:- One Factor with at least 2 levels, levels are
independent

@Gangadhar Tiwari
@Gangadhar Tiwari
@Gangadhar Tiwari
@Gangadhar Tiwari

Class 1
No ratings yet
Class 1
52 pages
Statistics
No ratings yet
Statistics
152 pages
Ai - Ssmda
No ratings yet
Ai - Ssmda
142 pages
MMW 0607
No ratings yet
MMW 0607
29 pages
Data Analysis and Statistical Treatment
No ratings yet
Data Analysis and Statistical Treatment
99 pages
Geo-Statistical Analysis Teaching Notes 1
No ratings yet
Geo-Statistical Analysis Teaching Notes 1
19 pages
Statistics
100% (1)
Statistics
6 pages
Handout-A-Preliminaries (Advance Statistics)
No ratings yet
Handout-A-Preliminaries (Advance Statistics)
29 pages
ISM - Session 1 - May 2025
No ratings yet
ISM - Session 1 - May 2025
54 pages
Statistics Notes Self Made
100% (1)
Statistics Notes Self Made
41 pages
Statistics
No ratings yet
Statistics
21 pages
Descriptive Analytics Notes
No ratings yet
Descriptive Analytics Notes
6 pages
Statistics
No ratings yet
Statistics
152 pages
Basic Stat
No ratings yet
Basic Stat
46 pages
Lesson 5 (Descriptive Statistics Part 1) - Oct 2024
No ratings yet
Lesson 5 (Descriptive Statistics Part 1) - Oct 2024
72 pages
Speakbro - Level 1 - Grammar Book
No ratings yet
Speakbro - Level 1 - Grammar Book
70 pages
Business Analytics
No ratings yet
Business Analytics
44 pages
Lesson 02 Probability and Statistics
No ratings yet
Lesson 02 Probability and Statistics
127 pages
Statistics
No ratings yet
Statistics
88 pages
Python
No ratings yet
Python
179 pages
Statistics
No ratings yet
Statistics
45 pages
Statistical Methods
No ratings yet
Statistical Methods
43 pages
Introduction To Statistics Lecture 7
No ratings yet
Introduction To Statistics Lecture 7
32 pages
HTML
No ratings yet
HTML
100 pages
Module I. Basic Calculations. Average, Standard Deviation by Excel
No ratings yet
Module I. Basic Calculations. Average, Standard Deviation by Excel
48 pages
PDS Unit4
No ratings yet
PDS Unit4
18 pages
Lecture 1
No ratings yet
Lecture 1
32 pages
Data Management
No ratings yet
Data Management
81 pages
Statistics
No ratings yet
Statistics
68 pages
Dsbda Unit 2
No ratings yet
Dsbda Unit 2
155 pages
Statistics and Probabilities Quarter 1
No ratings yet
Statistics and Probabilities Quarter 1
6 pages
Statistics - Compendium - DMS IIT DELHI - 2025
No ratings yet
Statistics - Compendium - DMS IIT DELHI - 2025
18 pages
Statistics Lecture 1
No ratings yet
Statistics Lecture 1
20 pages
CH 2 Lecture Notes
No ratings yet
CH 2 Lecture Notes
12 pages
8614.educational Statitics Unit 4
No ratings yet
8614.educational Statitics Unit 4
34 pages
2466939-EDA and STATISTICS NOTES
No ratings yet
2466939-EDA and STATISTICS NOTES
15 pages
NLRC Rules of Procedure 2011
No ratings yet
NLRC Rules of Procedure 2011
236 pages
Statistics
No ratings yet
Statistics
13 pages
Mait Cuet Ug
No ratings yet
Mait Cuet Ug
39 pages
Presentation 4
No ratings yet
Presentation 4
29 pages
Module 3 4 MMW
No ratings yet
Module 3 4 MMW
6 pages
It Is Also Including Hypothesis Testing and Sampling
No ratings yet
It Is Also Including Hypothesis Testing and Sampling
12 pages
Session 1 On Descriptive Statistics
No ratings yet
Session 1 On Descriptive Statistics
24 pages
Module 2 - Statistical Foundations
No ratings yet
Module 2 - Statistical Foundations
108 pages
Basic Concepts of Statistics
No ratings yet
Basic Concepts of Statistics
43 pages
Article Review 1 Eng
No ratings yet
Article Review 1 Eng
30 pages
Understandingstatisticsinresearch 151026064600 Lva1 App6892
No ratings yet
Understandingstatisticsinresearch 151026064600 Lva1 App6892
37 pages
f592b059 1643454320549
No ratings yet
f592b059 1643454320549
39 pages
Basic Statistics (3685) PPT - Lecture On 20-01-2019
100% (1)
Basic Statistics (3685) PPT - Lecture On 20-01-2019
64 pages
Chapter1 Statistics
No ratings yet
Chapter1 Statistics
17 pages
AFCONS Dahej
100% (1)
AFCONS Dahej
37 pages
Statistics Notes
No ratings yet
Statistics Notes
16 pages
Complete 4.0-5.0 RL
No ratings yet
Complete 4.0-5.0 RL
83 pages
CHAPTER 1 5 DRAFT Group
No ratings yet
CHAPTER 1 5 DRAFT Group
86 pages
Week 5A - Statistics Handout
No ratings yet
Week 5A - Statistics Handout
9 pages
Module 1 Various Kinds of Concept Papers
No ratings yet
Module 1 Various Kinds of Concept Papers
3 pages
Reviewer Part 1
No ratings yet
Reviewer Part 1
9 pages
Ge8 Statistics
No ratings yet
Ge8 Statistics
2 pages
10 1108 - Ecam 02 2023 0102
No ratings yet
10 1108 - Ecam 02 2023 0102
40 pages
Basic Statistics
No ratings yet
Basic Statistics
52 pages
100 C Programs
100% (1)
100 C Programs
82 pages
Statistical Analysis - Descriptive Stat
No ratings yet
Statistical Analysis - Descriptive Stat
6 pages
II B.Sc.,III Sem Major C6 DS Unit-V
No ratings yet
II B.Sc.,III Sem Major C6 DS Unit-V
25 pages
Marketing Research Sentiments of Youth About Egypt's Economy
No ratings yet
Marketing Research Sentiments of Youth About Egypt's Economy
41 pages
NITKclass 1
No ratings yet
NITKclass 1
50 pages
Vimala PDF
No ratings yet
Vimala PDF
62 pages
2 ND Year Chemistry 2 Marks
No ratings yet
2 ND Year Chemistry 2 Marks
9 pages
Sampling Design and Analysis MTH 494: Ossam Chohan Assistant Professor CIIT Abbottabad
No ratings yet
Sampling Design and Analysis MTH 494: Ossam Chohan Assistant Professor CIIT Abbottabad
34 pages
12 - Greenpeace V Plant Genetic System
No ratings yet
12 - Greenpeace V Plant Genetic System
12 pages
Key Differences Between Qualitative and Quantitative Research
No ratings yet
Key Differences Between Qualitative and Quantitative Research
2 pages
KS Test
No ratings yet
KS Test
17 pages
Statistics For Data Science
100% (1)
Statistics For Data Science
27 pages
Basics of Statistics: Definition: Science of Collection, Presentation, Analysis, and Reasonable
100% (1)
Basics of Statistics: Definition: Science of Collection, Presentation, Analysis, and Reasonable
33 pages
1934-Article Text-9912-2-10-20180527
No ratings yet
1934-Article Text-9912-2-10-20180527
19 pages
BUKU FOUR Cs PDF
No ratings yet
BUKU FOUR Cs PDF
38 pages
3rd Semester
No ratings yet
3rd Semester
11 pages
4 Marketing Research and Information Systems Philip Kotler and Gary Armstrong 100920094439 Phpapp02
100% (1)
4 Marketing Research and Information Systems Philip Kotler and Gary Armstrong 100920094439 Phpapp02
16 pages
TNA Competency Based
No ratings yet
TNA Competency Based
59 pages
MATM111
No ratings yet
MATM111
8 pages
Electric Charges and Fields
100% (1)
Electric Charges and Fields
9 pages
Literature Review On Microfinance Institutions
100% (1)
Literature Review On Microfinance Institutions
8 pages
Statistics
83% (6)
Statistics
33 pages
Influence of Green Accounting Environmental Perfor
No ratings yet
Influence of Green Accounting Environmental Perfor
16 pages
Programming in Ansi C by Balaguruswamy 6t
No ratings yet
Programming in Ansi C by Balaguruswamy 6t
3 pages
The Impact of Digital Communication On Promoting (2023)
No ratings yet
The Impact of Digital Communication On Promoting (2023)
7 pages
Ljmu-7505-Pubuni - Topic Overview Week 7
No ratings yet
Ljmu-7505-Pubuni - Topic Overview Week 7
10 pages
Research RRL
No ratings yet
Research RRL
12 pages
THM - Revise Chapter 1
No ratings yet
THM - Revise Chapter 1
6 pages
Giving Customers A Fair Hearing
No ratings yet
Giving Customers A Fair Hearing
13 pages
Oral Script
No ratings yet
Oral Script
7 pages
111111
No ratings yet
111111
4 pages
Country Analysis - A Frameowrk To Identify and Evaluate The Antiona Business Environment
No ratings yet
Country Analysis - A Frameowrk To Identify and Evaluate The Antiona Business Environment
7 pages
Cross-Cultural Training To Facilitate Expatriate Adjustment - It Works!
No ratings yet
Cross-Cultural Training To Facilitate Expatriate Adjustment - It Works!
17 pages
Chi Square Distribution
No ratings yet
Chi Square Distribution
3 pages
Educ 201
No ratings yet
Educ 201
2 pages
Daily Lesson Plan: Learning Area: Practical Research 1 Grade Level: 11 Duration: 120 Minutes
No ratings yet
Daily Lesson Plan: Learning Area: Practical Research 1 Grade Level: 11 Duration: 120 Minutes
3 pages
Research Methods
No ratings yet
Research Methods
2 pages
Elementary Statistics
From Everand
Elementary Statistics
jay prakash Maheshwari
5/5 (1)
Descriptive Statistics: Six Sigma Thinking, #3
From Everand
Descriptive Statistics: Six Sigma Thinking, #3
Sumeet Savant
No ratings yet