Unit 4 Class Notes
Unit 4 Class Notes
Central Tendencies in Statistics are the numerical values that are used to represent mid-value or central value a large
collection of numerical data. These obtained numerical values are called central or average values in Statistics. A
central or average value of any statistical data or series is the value of that variable that is representative of the entire data
or its associated frequency distribution. Such a value is of great significance because it depicts the nature or
characteristics of the entire data, which is otherwise very difficult to observe.
Arithmetic mean (
xˉ
) is defined as the sum of the individual observations (xi) divided by the total number of observations N. In other words,
the mean is given by the sum of all observations divided by the total number of observations.
Measures of Variability
a. Measures of central tendency (e.g., mean, median, mode) provide useful, but
limited information. Information is insufficient in regards to the dispersion (i.e.,
variability) of scores of a distribution.
● Positive Skewness: In this, the concentration of frequencies is more towards higher values of the variable i.e. the right tail is
longer than the left tail.
● Negative Skewness: In this, the concentration of frequencies is more towards the lower values of the variable i.e. the left tail is
longer than the right tail.
What is Kurtosis?
It is also a characteristic of the frequency distribution. It gives an idea about the shape of a frequency distribution. Basically, the
measure of kurtosis is the extent to which a frequency distribution is peaked in comparison with a normal curve. It is the degree of
peaked Ness of a distribution.
1. Leptokurtic: Leptokurtic is a curve having a high peak than the normal distribution. In this curve, there is too much
concentration of items near the central value.
2. Mesokurtic: Mesokurtic is a curve having a normal peak than the normal curve. In this curve, there is equal distribution of items
around the central value.
3. Platykurtic: Platykurtic is a curve having a low peak than the normal curve is called platykurtic. In this curve, there is less
concentration of items around the central value.
Sr. No.
Skewness
Kurtosis
1.
It indicates the shape and size of variation on either side of the central value.
It indicates the frequencies of distribution at the central value.
2.
The measure differences of skewness tell us about the magnitude and direction of the asymmetry of a
distribution.
It indicates the concentration of items at the central part of a distribution.
3.
It indicates how far the distribution differs from the normal distribution.
It studies the divergence of the given distribution from the normal distribution.
4.
The measure of skewness studies the extent to which deviation clusters is are above or below the average.
It indicates the concentration of items.
5.
In an asymmetrical distribution, the deviation below or above an average is not equal.
No such distribution takes place.
Hypothesis Testing
Hypothesis method compares two opposite statements about a population and uses
sample data to decide which one is more likely to be correct.To test this assumption we
first take a sample from the population and analyze it and use the results of the analysis
to decide if the claim is valid or not.
Suppose a company claims that its website gets an average of 50 user visits per day. To
verify this we use hypothesis testing to analyze past website traffic data and determine if
the claim is accurate. This helps us decide whether the observed data supports the
company’s claim or if there is a significant difference.
Key Terms of Hypothesis Testing
● Level of significance: It refers to the degree of significance in which we accept or reject the null
hypothesis. 100% accuracy is not possible for accepting a hypothesis so we select a level of
significance. This is normally denoted with α αand generally it is 0.05 or 5% which means your
output should be 95% confident to give a similar kind of result in each sample.
● P-value: When analyzing data the p-value tells you the likelihood of seeing your result if the null
hypothesis is true. If your P-value is less than the chosen significance level then you reject the
null hypothesis otherwise accept it.
● Test Statistic: Test statistic is the number that helps you decide whether your result is
significant. It’s calculated from the sample data you collect it could be used to test if a machine
learning model performs better than a random guess.
● Critical value: Critical value is a boundary or threshold that helps you decide if your test
statistic is enough to reject the null hypothesis
● Degrees of freedom: Degrees of freedom are important when we conduct statistical tests they
help you understand how much data can vary.
Types of Hypothesis Testing
It involves basically two types of testing:
1. One-Tailed Test
A one-tailed test is used when we expect a change in only one direction—either an increase or a decrease but not
both. Let’s say if we’re analyzing data to see if a new algorithm improves accuracy we would only focus on whether the
accuracy goes up not down.
The test looks at just one side of the data to decide if the result is enough to reject the null hypothesis. If the data falls in
the critical region on that side then we reject the null hypothesis.
How does Hypothesis Testing work?
Working of Hypothesis testing involves various steps:
Step 1: Define Null and Alternative Hypothesis
We start by defining the null hypothesis (H₀) which represents the assumption that there is no difference. The
alternative hypothesis (H₁) suggests there is a difference. These hypotheses should be contradictory to one
another. Imagine we want to test if a new recommendation algorithm increases user engagement.
● Null Hypothesis (H₀): The new algorithm has no effect on user engagement.
● Alternative Hypothesis (H₁): The new algorithm increases user engagement.
● Next we choose a significance level (α) commonly set at 0.05. This level defines the threshold for
deciding if the results are statistically significant. It also tells us the probability of making a Type I
error—rejecting a true null hypothesis.
● In this step we also calculate the p-value which is used to assess the evidence against the null
hypothesis.
●
Step 3 – Collect and Analyze data.
● Now we gather data this could come from user observations or an experiment. Once collected we analyze the
data using appropriate statistical methods to calculate the test statistic.
● Example: We collect data on user engagement before and after implementing the algorithm. We can also find
the mean engagement scores for each group.
The test statistic is a measure used to determine if the sample data support in reject the null hypothesis. The choice of the
test statistic depends on the type of hypothesis test being conducted it could be a Z-test, Chi-square, T-test and so on. For
our example we are dealing with a t-test because:
● We have a smaller sample size.
● The population standard deviation is unknown.
Step 5 – Comparing Test Statistic
Now we compare the test statistic to either the critical value or the p-value to decide whether to
reject the null hypothesis or not.
Method A: Using Critical values: We refer to a statistical distribution table like the t-distribution
in this case to find the critical value based on the chosen significance level (α). If:
● If Test Statistic>Critical Value then we Reject the null hypothesis.
● If Test Statistic≤Critical Value then we fail to reject the null hypothesis.
Example: If the p-value is 0.03 and α is 0.05 then we reject the null hypothesis because the
p-value is smaller than the significance level.
Real life Examples of Hypothesis Testing
Let’s understand hypothesis testing using real life situations. Imagine a pharmaceutical company has developed a new
drug that they believe can effectively lower blood pressure in patients with hypertension. Before bringing the drug to
market they need to conduct a study to see its impact on blood pressure.
Data:
● Before Treatment: 120, 122, 118, 130, 125, 128, 115, 121, 123, 119
● After Treatment: 115, 120, 112, 128, 122, 125, 110, 117, 119, 114