E Book - Unit 4
E Book - Unit 4
HYPOTHESIS
TO P I C S TO B E CO V E R E D
STATISTICS DISCRIPTIVE
MATHEMATICS
INFERENTIAL
DESCRIPTIVE STATISTICS
Demerits:
Ø Arithmetic mean can not be computed for open ended class intervals.
Ø It is highly affected by the extreme values (very high or very small values as compared to other
observations).
• Median: The middle number in the Data set while listed in either ascending or
descending order is the Median.
ADVANTAGES of Median
• (1) Simplicity:- In the case of simple statistical series, just a glance at the data is enough to
locate the median value.
(2) Free from the effect of extreme values: - Unlike arithmetic mean, median value is not
destroyed by the extreme values of the series.
(3) Certainty: - Certainty is another merits is the median. Median values are always a certain
specific value in the series.
(4) Real value: - Median value is real value and is a better representative value of the series
compared to arithmetic mean average, the value of which may not exist in the series at all.
(5) Graphic presentation: - Besides algebraic approach, the median value can be estimated also
through the graphic presentation of data.
(6) Possible even when data is incomplete: - Median can be estimated even in the case of
certain incomplete series. It is enough if one knows the number of items and the middle item of
the series.
Demerits of median:
• (2) Unrealistic:- When the median is located somewhere between the two middle values, it
remains only an approximate measure, not a precise value.
• (3) Lack of algebraic treatment: - Arithmetic mean is capable of further algebraic treatment, but
median is not. For example, multiplying the median with the number of items in the series will
not give us the sum total of the values of the series.
• Mode: The number that occurs the most in a Data set and ranges between the highest
and lowest value is the Mode.
Advantages of Mode
• Simple and popular: - Mode is very simple measure of central tendency. Sometimes,
just at the series is enough to locate the model value.
(2) Less effect of marginal values: - Compared top mean, mode is less affected by
marginal values in the series. Mode is determined only by the value with highest
frequencies.
(3) Graphic presentation:- Mode can be located graphically, with the help of histogram.
(4) Best representative: - Mode is that value which occurs most frequently in the series.
Accordingly, mode is the best representative value of the series.
(5) No need of knowing all the items or frequencies: - The calculation of mode does not
require knowledge of all the items and frequencies of a distribution. In simple series, it
is enough if one knows the items with highest frequencies in the distribution.
Demerits of mode:
(1) Uncertain and vague: - Mode is an uncertain and vague measure of the central
tendency.
(2) Not capable of algebraic treatment: - Unlike mean, mode is not capable of further
algebraic treatment.
(3) Difficult: - With frequencies of all items are identical, it is difficult to identify the
modal value.
(4) Complex procedure of grouping:- Calculation of mode involves cumbersome
procedure of grouping the data. If the extent of grouping changes there will be a change
in the model value.
R=H–L
• Mean Standard deviation : is the average amount of variability in your dataset. Average,
how far each value lies from the mean. A high standard deviation means that values are
generally far from the mean, while a low standard deviation indicates that values are
clustered close to the mean.
• Variance : Variance reflects the degree of spread in the data set. The more spread the
data, the larger the variance is in relation to the mean. Variance is the square of the
standard deviation. This means that the units of variance are much larger than those of
a typical value of a data set.
Descriptive Statistics- Measure of Shape-
SKEWNESS
Skewness is a statistical number that tells us if a distribution is symmetric or not.
A distribution is symmetric if the right side of the distribution is similar to the left side of
the distribution.
If a distribution is symmetric, then the Skewness value is 0. i.e.
If a distribution is Symmetric (normal distribution): median= mean= mode, (Skewness
value is 0)
If Skewness is greater than 0, then it is called right-skewed or that the right tail is longer
than the left tail.
If Skewness is less than 0, then it is called left-skewed or that the left tail is longer than
the right tail
Inferential analysis
INFERENTIAL
STATISTICS
1. Confidence Interval
• A confidence interval uses the variability around a statistic to come up with an interval
estimate for a parameter.
• A confidence level tells you the probability (in percentage) of the interval containing the
parameter estimate if you repeat the study again.
• A 95% confidence interval means that if you repeat your study with a new sample in
exactly the same way 100 times, you can expect your estimate to lie within the
specified range of values 95 times.
2. Regression TEST
3. Hypothesis testing
Ex:
Ho- There is no significant relationship between eating sugar and weight gain
Ha- There is significant relationship between eating sugar and weight gain
PARAMETRIC TEST V/S NON
PARAMETRIC
A z test is a test that is used to check if the means of two populations are different or not
provided the data follows a normal distribution.
It checks if the means of two large samples are different or not when the population
variance is known.
The null hypothesis and the alternative hypothesis must be set
The sample Size should be greater than 30
The z test formula to set up the required hypothesis tests for a one sample and a two-
sample z test
FORMULA –
x1, x2: sample means
σ1, σ2: population standard
deviations
n1, n2: sample sizes
A t-test (also known as Student's t-test) is a tool for evaluating the means of one or two
populations using hypothesis testing
When choosing a t test, you will need to consider two things: whether the groups being
compared come from a single population or two different populations, and whether you
want to test the difference in a specific direction.
t-Test assumptions
• The data are continuous.
• The sample data have been randomly sampled from a population.
• There is homogeneity of variance (i.e., the variability of the data in each group is
similar).
• The distribution is approximately normal.
• F test can be defined as a test that uses the f test statistic to check whether the variances
of two samples (or populations) are equal to the same value.
• The population should have a normal distribution
• The F-test is commonly employed to assess the fit of a proposed regression model to the
data, evaluating how well the model explains the variability in the data.
• The f test formula can be used to find the f statistic. The f test formula is given as
follows: