Unit 4 Class Notes

The document discusses measures of central tendency in statistics, including mean, median, and mode, which represent the central values of data sets. It also covers measures of variability such as range, variance, and standard deviation, as well as concepts like skewness and kurtosis that describe the shape of data distributions. Additionally, it explains hypothesis testing, including its steps, significance levels, and limitations.

Uploaded by

sashantnipate

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

2 views

Unit 4 Class Notes

Uploaded by

sashantnipate

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 58

Unit 4

Measures of Central Tendency in Statistics

Central Tendencies in Statistics are the numerical values that are used to represent mid-value or central value a large
collection of numerical data. These obtained numerical values are called central or average values in Statistics. A
central or average value of any statistical data or series is the value of that variable that is representative of the entire data
or its associated frequency distribution. Such a value is of great significance because it depicts the nature or
characteristics of the entire data, which is otherwise very difficult to observe.

Measures of Central Tendency Meaning

The representative value of a data set, generally the central value or the most occurring value that gives a general idea of
the whole data set is called Measure of Central Tendency.
Measures of Central Tendency
Some of the most commonly used measures of central tendency are:
● Mean
● Median
● Mode
Mean
Mean in general terms is used for the arithmetic mean of the data, but other than the arithmetic mean there are geometric
mean and harmonic mean as well that are calculated using different formulas. Here in this article, we will discuss the
arithmetic mean.

Mean for Ungrouped Data

Arithmetic mean (
xˉ

) is defined as the sum of the individual observations (xi) divided by the total number of observations N. In other words,
the mean is given by the sum of all observations divided by the total number of observations.
Measures of Variability
a. Measures of central tendency (e.g., mean, median, mode) provide useful, but
limited information. Information is insufficient in regards to the dispersion (i.e.,
variability) of scores of a distribution.

b. Three measures of variability that researchers typically examine: range,

variance, and standard deviation. Standard deviation is the most informative and
widely used of the three.
Range
a. Definition: The range is the difference between the largest (maximum value) score and the
smallest score (minimum value) of a distribution
b. Gives researchers a sense of how spread out the scores of a distribution, but it is not practical
and misleading at times.
c. When it may be used: Researchers may want to know whether all of the response categories
on a survey question have been used and/or to have a sense of the overall balance in the
distribution.
d. Interquartile Range (IQR) a. Definition: The difference between the 75th percentile (third
quartile) and 25th percentile (first quartile) scores in a distribution
b. When the scores in a distribution are arranged in order, from smallest to largest (or
vice-versa), the IQR contains scores in the two middle quartiles.
Variance a. Definition: The sum of the squared deviations (between the individual
scores and the mean of a distribution) divided by the number of cases in the
population, or by the number of cases minus one in the sample
b. Provides a squared statistical average of the amount of dispersion in a
distribution of scores. Rarely is variance looked at by itself because it does not use
the same scale as the original measure of a variable, because it is squared. a.
Why have variance? Why not go straight to standard deviation?
1. We need to calculate the variance before finding the standard deviation. That is
because we need to square the deviation scores so they will not sum to zero.
These squared deviations produce the variance. Then we need to take the square
root to find the standard deviation.
c. The fundamental piece of the variance formula, which is the sum of the squared
deviations, is used in a number of other statistics, most notably analysis of
variance (ANOVA)
. Standard Deviation
a. Definition: The average deviation between the individual scores in the
distribution and the mean for the distribution.
1. Note that this is not technically correct, particularly for sample data where the
sum of squared deviations is divided by n – 1, not N. But the sample estimate of
the standard deviation is an estimate of the average deviation (absolute value)
between the mean of a distribution and the scores in the distribution.
b. Useful statistic; provides a handy measure of how spread out the scores are in
the distribution.
c. When combined, the mean and standard deviation provide a pretty good picture
of what the distribution of the scores is like
What is Skewness?
Skewness is an important statistical technique that helps to determine the asymmetrical behavior of the frequency distribution, or
more precisely, the lack of symmetry of tails both left and right of the frequency curve. A distribution or dataset is symmetric if it
looks the same to the left and right of the center point.
1. Symmetric Skewness: A perfect symmetric distribution is one in which frequency distribution is the same on the sides of the center
point of the frequency curve. In this, Mean = Median = Mode. There is no skewness in a perfectly symmetrical distribution.
Asymmetric Skewness: A asymmetrical or skewed distribution is one in which the spread of the frequencies is different on both the
sides of the center point or the frequency curve is more stretched towards one side or value of Mean. Median and Mode falls at
different points.

● Positive Skewness: In this, the concentration of frequencies is more towards higher values of the variable i.e. the right tail is
longer than the left tail.
● Negative Skewness: In this, the concentration of frequencies is more towards the lower values of the variable i.e. the left tail is
longer than the right tail.
What is Kurtosis?
It is also a characteristic of the frequency distribution. It gives an idea about the shape of a frequency distribution. Basically, the
measure of kurtosis is the extent to which a frequency distribution is peaked in comparison with a normal curve. It is the degree of
peaked Ness of a distribution.
1. Leptokurtic: Leptokurtic is a curve having a high peak than the normal distribution. In this curve, there is too much
concentration of items near the central value.
2. Mesokurtic: Mesokurtic is a curve having a normal peak than the normal curve. In this curve, there is equal distribution of items
around the central value.
3. Platykurtic: Platykurtic is a curve having a low peak than the normal curve is called platykurtic. In this curve, there is less
concentration of items around the central value.
Sr. No.
Skewness
Kurtosis
1.
It indicates the shape and size of variation on either side of the central value.
It indicates the frequencies of distribution at the central value.
2.
The measure differences of skewness tell us about the magnitude and direction of the asymmetry of a
distribution.
It indicates the concentration of items at the central part of a distribution.
3.
It indicates how far the distribution differs from the normal distribution.
It studies the divergence of the given distribution from the normal distribution.
4.
The measure of skewness studies the extent to which deviation clusters is are above or below the average.
It indicates the concentration of items.
5.
In an asymmetrical distribution, the deviation below or above an average is not equal.
No such distribution takes place.
Hypothesis Testing
Hypothesis method compares two opposite statements about a population and uses
sample data to decide which one is more likely to be correct.To test this assumption we
first take a sample from the population and analyze it and use the results of the analysis
to decide if the claim is valid or not.
Suppose a company claims that its website gets an average of 50 user visits per day. To
verify this we use hypothesis testing to analyze past website traffic data and determine if
the claim is accurate. This helps us decide whether the observed data supports the
company’s claim or if there is a significant difference.
Key Terms of Hypothesis Testing
● Level of significance: It refers to the degree of significance in which we accept or reject the null
hypothesis. 100% accuracy is not possible for accepting a hypothesis so we select a level of
significance. This is normally denoted with α αand generally it is 0.05 or 5% which means your
output should be 95% confident to give a similar kind of result in each sample.
● P-value: When analyzing data the p-value tells you the likelihood of seeing your result if the null
hypothesis is true. If your P-value is less than the chosen significance level then you reject the
null hypothesis otherwise accept it.
● Test Statistic: Test statistic is the number that helps you decide whether your result is
significant. It’s calculated from the sample data you collect it could be used to test if a machine
learning model performs better than a random guess.
● Critical value: Critical value is a boundary or threshold that helps you decide if your test
statistic is enough to reject the null hypothesis
● Degrees of freedom: Degrees of freedom are important when we conduct statistical tests they
help you understand how much data can vary.
Types of Hypothesis Testing
It involves basically two types of testing:
1. One-Tailed Test

A one-tailed test is used when we expect a change in only one direction—either an increase or a decrease but not
both. Let’s say if we’re analyzing data to see if a new algorithm improves accuracy we would only focus on whether the
accuracy goes up not down.
The test looks at just one side of the data to decide if the result is enough to reject the null hypothesis. If the data falls in
the critical region on that side then we reject the null hypothesis.
How does Hypothesis Testing work?
Working of Hypothesis testing involves various steps:
Step 1: Define Null and Alternative Hypothesis

We start by defining the null hypothesis (H₀) which represents the assumption that there is no difference. The
alternative hypothesis (H₁) suggests there is a difference. These hypotheses should be contradictory to one
another. Imagine we want to test if a new recommendation algorithm increases user engagement.
● Null Hypothesis (H₀): The new algorithm has no effect on user engagement.
● Alternative Hypothesis (H₁): The new algorithm increases user engagement.

Step 2 – Choose significance level

● Next we choose a significance level (α) commonly set at 0.05. This level defines the threshold for
deciding if the results are statistically significant. It also tells us the probability of making a Type I
error—rejecting a true null hypothesis.
● In this step we also calculate the p-value which is used to assess the evidence against the null
hypothesis.
●
Step 3 – Collect and Analyze data.

● Now we gather data this could come from user observations or an experiment. Once collected we analyze the
data using appropriate statistical methods to calculate the test statistic.
● Example: We collect data on user engagement before and after implementing the algorithm. We can also find
the mean engagement scores for each group.

Step 4-Calculate Test Statistic

The test statistic is a measure used to determine if the sample data support in reject the null hypothesis. The choice of the
test statistic depends on the type of hypothesis test being conducted it could be a Z-test, Chi-square, T-test and so on. For
our example we are dealing with a t-test because:
● We have a smaller sample size.
● The population standard deviation is unknown.
Step 5 – Comparing Test Statistic

Now we compare the test statistic to either the critical value or the p-value to decide whether to
reject the null hypothesis or not.
Method A: Using Critical values: We refer to a statistical distribution table like the t-distribution
in this case to find the critical value based on the chosen significance level (α). If:
● If Test Statistic>Critical Value then we Reject the null hypothesis.
● If Test Statistic≤Critical Value then we fail to reject the null hypothesis.

Example: If the p-value is 0.03 and α is 0.05 then we reject the null hypothesis because the
p-value is smaller than the significance level.
Real life Examples of Hypothesis Testing
Let’s understand hypothesis testing using real life situations. Imagine a pharmaceutical company has developed a new
drug that they believe can effectively lower blood pressure in patients with hypertension. Before bringing the drug to
market they need to conduct a study to see its impact on blood pressure.
Data:
● Before Treatment: 120, 122, 118, 130, 125, 128, 115, 121, 123, 119
● After Treatment: 115, 120, 112, 128, 122, 125, 110, 117, 119, 114

Step 1: Define the Hypothesis

● Null Hypothesis: (H0)The new drug has no effect on blood pressure.
● Alternate Hypothesis: (H1)The new drug has an effect on blood pressure.
Limitations of Hypothesis Testing
Although hypothesis testing is a useful technique but it have some limitations as well:
● Limited Scope: Hypothesis testing focuses on specific questions or assumptions and not capture the
complexity of the problem being studied.
● Data Quality Dependence: The accuracy of the results depends on the quality of the data. Poor-quality or
inaccurate data can led to incorrect conclusions.
● Missed Patterns: By focusing only on testing specific hypotheses important patterns or relationships in the
data might be missed.
● Context Limitations: It doesn’t always consider the bigger picture which can oversimplify results and led to
incomplete insights.
● Need for Additional Methods: To get a better understanding of the data hypothesis testing should be
combined with other analytical methods such as data visualization or machine learning techniques which we
study later in upcoming articles.

Solution Manual Liboff PDF
26% (53)
Solution Manual Liboff PDF
3 pages
Statistics and Probability: Quarter 3 - Module 1: Illustrating A Random Variable (Discrete and Continuous)
100% (8)
Statistics and Probability: Quarter 3 - Module 1: Illustrating A Random Variable (Discrete and Continuous)
28 pages
Handbook of Social Medicine
100% (1)
Handbook of Social Medicine
577 pages
Borden, 1984 - The Concept of Marketing PDF
100% (2)
Borden, 1984 - The Concept of Marketing PDF
7 pages
Crack Me 3
No ratings yet
Crack Me 3
2,506 pages
Esmt Masters in Management 2014-16
No ratings yet
Esmt Masters in Management 2014-16
4 pages
GprMax Manual
100% (1)
GprMax Manual
81 pages
Business Statistics_KMBN104
No ratings yet
Business Statistics_KMBN104
25 pages
Lesson 4-Analysis-Interpretation-Descriptive Statistics
No ratings yet
Lesson 4-Analysis-Interpretation-Descriptive Statistics
25 pages
MMW FINALS REVIEWER
No ratings yet
MMW FINALS REVIEWER
9 pages
Presentation 4
No ratings yet
Presentation 4
29 pages
Descr Iptive Statis Tics: Inferential Statistics
No ratings yet
Descr Iptive Statis Tics: Inferential Statistics
36 pages
Measure of Central Tendency Dispersion A
No ratings yet
Measure of Central Tendency Dispersion A
8 pages
10 Question Answer
No ratings yet
10 Question Answer
2 pages
Emerging Trends & Analysis 1. What Does The Following Statistical Tools Indicates in Research
No ratings yet
Emerging Trends & Analysis 1. What Does The Following Statistical Tools Indicates in Research
7 pages
DS Module 2
No ratings yet
DS Module 2
113 pages
E Book - Unit 4
No ratings yet
E Book - Unit 4
12 pages
Lecture 9descriptivestatistics 171204035552
No ratings yet
Lecture 9descriptivestatistics 171204035552
26 pages
Stats Prac 1
No ratings yet
Stats Prac 1
10 pages
Name Roll No Learning Centre Subject Assignment No Date of Submission at The Learning Centre
No ratings yet
Name Roll No Learning Centre Subject Assignment No Date of Submission at The Learning Centre
11 pages
8614 Saba 2nd
No ratings yet
8614 Saba 2nd
44 pages
Week 13 Central Tendency For Ungrouped Data
No ratings yet
Week 13 Central Tendency For Ungrouped Data
27 pages
LabModule - Exploratory Data Analysis - 2023ic
No ratings yet
LabModule - Exploratory Data Analysis - 2023ic
24 pages
SSC CGL Tier 2 Statistics - Last Minute Study Notes: Measures of Central Tendency
No ratings yet
SSC CGL Tier 2 Statistics - Last Minute Study Notes: Measures of Central Tendency
10 pages
Introduction To Statistics
No ratings yet
Introduction To Statistics
21 pages
Profed 10
No ratings yet
Profed 10
4 pages
Unit 1 - Business Statistics & Analytics
No ratings yet
Unit 1 - Business Statistics & Analytics
25 pages
f592b059 1643454320549
No ratings yet
f592b059 1643454320549
39 pages
How Much Data Does Google Handle?
No ratings yet
How Much Data Does Google Handle?
132 pages
Chapter 10 with answer
No ratings yet
Chapter 10 with answer
10 pages
Statistics in Psychology (1)
No ratings yet
Statistics in Psychology (1)
15 pages
Descriptive Stat
No ratings yet
Descriptive Stat
13 pages
Analytics compendium (incl stats)
No ratings yet
Analytics compendium (incl stats)
31 pages
Cba101 MT
No ratings yet
Cba101 MT
4 pages
Descriptive Analytics Notes
No ratings yet
Descriptive Analytics Notes
6 pages
Bioepi Lesson 6. Descriptive Statistics
No ratings yet
Bioepi Lesson 6. Descriptive Statistics
38 pages
Skewness Kurtosis and Histogram
No ratings yet
Skewness Kurtosis and Histogram
4 pages
SPSS Session 1 Descriptive Statistics and Univariate
No ratings yet
SPSS Session 1 Descriptive Statistics and Univariate
8 pages
It0089 Finalreviewer
No ratings yet
It0089 Finalreviewer
143 pages
Psychology Project
No ratings yet
Psychology Project
14 pages
Freq. distribution Characteristics
No ratings yet
Freq. distribution Characteristics
13 pages
2 - Introduction To Statistics
No ratings yet
2 - Introduction To Statistics
97 pages
Powerpoint Presentation On: "Frequency
100% (2)
Powerpoint Presentation On: "Frequency
36 pages
Chapter Six (Statistics)
No ratings yet
Chapter Six (Statistics)
91 pages
Measures of Central Tendency Position and Dispersion 1.Pptx 20241015 145631 0000
No ratings yet
Measures of Central Tendency Position and Dispersion 1.Pptx 20241015 145631 0000
44 pages
Antim Prahar Business Statistics and Analysis - 240328 - 180758
No ratings yet
Antim Prahar Business Statistics and Analysis - 240328 - 180758
15 pages
New Microsoft Office Word Document
No ratings yet
New Microsoft Office Word Document
10 pages
Math2101Stat 2 2
No ratings yet
Math2101Stat 2 2
23 pages
Descriptive Statistics
No ratings yet
Descriptive Statistics
9 pages
Assignment
No ratings yet
Assignment
23 pages
Assignment
No ratings yet
Assignment
30 pages
Measures of Central Tendency and Variability
No ratings yet
Measures of Central Tendency and Variability
2 pages
Mmw Data Management
No ratings yet
Mmw Data Management
35 pages
Unit 1 Notes
No ratings yet
Unit 1 Notes
5 pages
BUSINESS AND STATISTICS
No ratings yet
BUSINESS AND STATISTICS
29 pages
Blue Modern Marketing and Emotions Presentation_20230909_071331_0000.pptx_20250108_191853_0000
No ratings yet
Blue Modern Marketing and Emotions Presentation_20230909_071331_0000.pptx_20250108_191853_0000
25 pages
Descriptive Statistics: Central Tendency
No ratings yet
Descriptive Statistics: Central Tendency
3 pages
Interpreting Test Score: Online Workshop 8602 Aiou
100% (1)
Interpreting Test Score: Online Workshop 8602 Aiou
39 pages
Statistical Analysis_ Descriptive Stat (2)
No ratings yet
Statistical Analysis_ Descriptive Stat (2)
6 pages
Biostatistics (Descriptive Statistics)
No ratings yet
Biostatistics (Descriptive Statistics)
30 pages
Statistics Long Essay
No ratings yet
Statistics Long Essay
22 pages
Descriptive Measures With Samples-1
No ratings yet
Descriptive Measures With Samples-1
33 pages
Previously On Statistics 1
No ratings yet
Previously On Statistics 1
48 pages
Descriptive Statistics
No ratings yet
Descriptive Statistics
51 pages
Methods of Research Week 15 Assessment
No ratings yet
Methods of Research Week 15 Assessment
8 pages
Statistical Foundations for Psychology
From Everand
Statistical Foundations for Psychology
James C. Ware
No ratings yet
Descriptive Statistics: Six Sigma Thinking, #3
From Everand
Descriptive Statistics: Six Sigma Thinking, #3
Sumeet Savant
No ratings yet
Instability at Joints Error
No ratings yet
Instability at Joints Error
12 pages
YouTube Creator Playbook V4
100% (4)
YouTube Creator Playbook V4
92 pages
Effectiveness of Aloe Barbadensis As An Ingredient in Hand Sanitizers
No ratings yet
Effectiveness of Aloe Barbadensis As An Ingredient in Hand Sanitizers
4 pages
Urriculum Itae: Riani Perdanasari, S.Si, M.M
No ratings yet
Urriculum Itae: Riani Perdanasari, S.Si, M.M
4 pages
Zombie Survival 40k
100% (2)
Zombie Survival 40k
9 pages
Thermal Printer
No ratings yet
Thermal Printer
15 pages
CHME 4050 - Bioengineering Design
No ratings yet
CHME 4050 - Bioengineering Design
10 pages
Automated Image Mapping
No ratings yet
Automated Image Mapping
8 pages
Process Costing
No ratings yet
Process Costing
26 pages
HASS - History Lesson Plans
No ratings yet
HASS - History Lesson Plans
13 pages
Social Sustainability and Pakikipagkapwa
No ratings yet
Social Sustainability and Pakikipagkapwa
8 pages
Reinforced Concrete
No ratings yet
Reinforced Concrete
9 pages
Election Application Form
No ratings yet
Election Application Form
2 pages
Alasdair MacIntyre, Daniel Callahan, H. Tristram Engelhardt Jr. (Eds.) - The Roots of Ethics. Science, Religion and Values. (1981) PDF
100% (7)
Alasdair MacIntyre, Daniel Callahan, H. Tristram Engelhardt Jr. (Eds.) - The Roots of Ethics. Science, Religion and Values. (1981) PDF
452 pages
G653
No ratings yet
G653
7 pages
Archaeology - Coursebook Chapter 2
No ratings yet
Archaeology - Coursebook Chapter 2
33 pages
University Institute of Engineering and Technology Kurukshetra University, Kurukshetra - 136119
No ratings yet
University Institute of Engineering and Technology Kurukshetra University, Kurukshetra - 136119
4 pages
Lesson Plan P4 2021-2022
No ratings yet
Lesson Plan P4 2021-2022
44 pages
PEARL Handout
No ratings yet
PEARL Handout
1 page
Reliability Measures: Failure and Root Cause Analysis (FRCA) For GIS Early Failure
No ratings yet
Reliability Measures: Failure and Root Cause Analysis (FRCA) For GIS Early Failure
5 pages
Transformational Leadership Concept
No ratings yet
Transformational Leadership Concept
2 pages
The Noun Phrase in Kwere
No ratings yet
The Noun Phrase in Kwere
36 pages
Team Bonding
No ratings yet
Team Bonding
1 page