0% found this document useful (0 votes)

64 views22 pages

Basics of Statistics

- Statistics is a way of collecting, analyzing, and interpreting quantitative data. It involves summarizing data using descriptive statistics such as measures of central tendency (mean, median, mode) and dispersion (range, standard deviation). - Measures of central tendency provide a single value to represent the center of a dataset, while measures of dispersion describe how spread out the data is from the central value. These statistical concepts help analyze and understand characteristics of data.

Uploaded by

Anshu Mani

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

64 views22 pages

Basics of Statistics

Uploaded by

Anshu Mani

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 22

DATA SCIENCE

INTERVIEW
PREPARATION SERIES
Aman Namita
Harsh Ashneer

Nandu

BASICS OF
STATISTICS
Harsh, the CEO of HungerKids (A
quick bites startup, producing quick
bite items like chips, sandwiches,
cakes, and other desi beverages)
recently got an opportunity to pitch his
business on Shark Tank’s platform.

1
After the pitch was over, Harsh got back
home and his 12 years old kid(Nandu) who
watched the complete show, got curious
to understand how he could answer all the
questions so easily?

How did you

calculate the
numbers??

2
Harsh replied that he used basic
Statistics. Now Nandu wanted to
know about what statistics is. Harsh
takes Nandu through his pitch again.
Let’s look at his pitch and the
questions asked by the sharks
Welcome
Harsh, what
Hello sharks I am have been your
Harsh. Today I will yearly sales till
present 5 products that now?
I am manufacturing in
my company - HungerKid
Pvt. Ltd. established 5
years ago.

On average
my sales have
been 35 lakhs
per year.
3
See, my
yearly sale
have been
But what is 10,20,25,45,
Average? 75

Then why
did you say
35 lakhs per
year?

This is to present my company

sales in one value or number!
Let's see how to calculate:
No. of years since the company
was established = 5.
Total sales =>10+20+25+45+75
=>175
Now divide= 175/5=35 lakhs

4
This number is called Average.

MEAN

Sales

Year
As it defines all other points in
graph. So, mean or average is that
central value that defines data.

Oh,
got it!

Let's revisit the pitch again..

5
What is your
highest selling
product??

How did Very simple.

you answer Look at the chart
below showing the
this? total sales per
product.

See, category 6 has the highest sales in terms of

quantity. Now, in statistics, we refer to this as
the mode.
6

6
Product 1 Product 2 Product 3 Product 4 Product 5 Product 6 Product 7

Let's revisit the pitch again..

You told me your
average sale is 35
lakh but what is
your middle
value?

You said mean is

the central point
so, why did he
ask for the
middle value? Are Yes
they different?

The central value is the mean which considers all points in the data.
Let me explain it.
Yearly Sales are 10,20,25,45,75 respectively
Now, If my sales increase, the mean will also increase.
For eg-> if my sales have been 5,15,10,20,150
Then my mean will be 40 (5+15+10+20+150/5)
which is due to the extreme value.
Right? So my mean will shift if I have extremely high or
extremely low values in data. It does not depict the exact
middle value as it represents all data points.
For this reason, we use "median" in statistics to determine the
middle value. Here,
▪︎My median is 10,20,25,45,75 lakhs
▪︎"Outliers" refers to values that are either extremely small or
extremely high. As a result of the presence of such values, the
mean will shift to the left or right.
7
Let's summarize-
Statistics is a way of collecting, analyzing, and interpreting
data. In simpler terms, it's a way to make sense of
numbers and use them to solve problems.
We have also summarized and described our data. This is
“Descriptive Statistics” - a part of statistics.
Descriptive statistics is a branch of statistics that involves
summarizing and describing a set of data. It provides a way
to analyze and understand the characteristics of a dataset,
such as its central tendency, variability, and shape.
As you saw,we summarized and described everything in just one
value, right ???
So, when we describe the data based on a single value around
which the entire data revolves, this point is known as the
"central value" or the "measure of central tendency".
These measures are:

Mean
Median
Mode
8
Measures of central tendency:

Measures of central tendency are statistical measures that

describe the typical or central value of a dataset. They provide a
way to summarize a large amount of data with a single value that
represents the "center" of the dataset.
Mean: The mean is the arithmetic average of a dataset. It is
calculated by adding up all of the values in the dataset and then
dividing by the number of values. The mean can be sensitive to
outliers, which are extreme values, and a few unusually high or
low values can skew it.
Median: It represents the central value in a dataset when the
values are sorted in numerical order. It remains unaffected by
outliers and is frequently utilized when the dataset is non-
normally distributed or comprises extreme values.
Mode: Mode represents the most frequently occurring value within
a dataset. This statistic is ideally suited for categorical or
discrete datasets, where values are non-continuous and cannot be
averaged.
Measures of central tendency are useful in summarizing data and
providing a solitary value that represents the typical or central
value in a dataset.

9
You asked about the
central value but
what about the
values which are not
Oh, I didn’t lying in the center?
think about And those that are
it. extreme?

Did you see the

scattered values?
These values also
imply something about
the data and give us
meaningful insights.

To study them, we have one more branch of Descriptive

statistics- “Measures Of Dispersion”
Now, let's look at the dispersed values around the Mean.

10
Measures of dispersion are statistical measures that
describe the spread or variability of a dataset. They
provide information on how spread out the data points
are from the central tendency of the dataset.

MEAN

Sales

Year
Let's revisit this.
Here’s the sales of the product-
Product Period 1 Period 2 Period 3 Period 4 Period 5
A 100 150 200 250 300
B 200 250 300 350 400
C 300 350 400 450 500
D 400 450 500 550 600
E 500 550 600 650 700

11
Okay, tell me Product A: 30
about the MRP Product B: 130
of 5 products Product C: 200
you have Product D: 300
Product E: 230

Do you know
why they No
asked ??

Range?
To get an idea
The range is the about the MRP
difference range of my
between the products..
largest and
smallest values in
a dataset.
It is important to measure dispersion as it provides a quick and
simple way to understand the spread of a dataset.
A larger range indicates a spread out data while a smaller range
indicates that the data is more tightly clustered around a central
value. This info is crucial to compare different datasets or to draw
conclusions about the variability of a particular variable.
So the range of my products is 300-30=270.
Ohhkay
12
What’s the
variation of
Sales for
various
products??

Oh, now This is

what’s another
measure of
that?? dispersion.

Here we have few measures of dispersion, i.e., standard

deviation and coefficient of variation.

Let’s first revisit variance:-

As the name says variance is the measure of variability.

It is an important measure of dispersion in statistics as it
provides info of how data points are spread out from the
mean of the dataset. Basically, it measures the average
of the squared differences between each data point and
the mean of the dataset.
13
In short, how deviated my product sales are from mean.
So, let,s visit the formula first:
2
V=∑(x-x)
N

Mean of Product A = (100 + 150 + 200 + 250 + 300) / 5 = 200

Variance of Product A = ((100 - 200)² + (150 - 200)² + (200 - 200)²
+ (250 - 200)² + (300 - 200)²) / 5
= (10000 + 2500 + 0 + 2500 + 10000) / 5
= 5100

Mean of Product B= (150 + 250 + 350 + 450 + 550) / 5 = 1750/5=350

Variance of Product B= ((150 - 350)² + (250 - 350)² + (350 - 350)² +
(450 - 350)² + (550 - 350)²) / 5
= 40000+10000+0+10000+40000
=100000/5=20000

We can now say that variance of Product A < Product B

So, we can say Sales of product B is varying more than the sales of
product A. For some periods, the sales of product B are very high
while sometimes it’s very low. Hence, product A sales are fluctuating
much above and below the mean value but in the case of product A,
they’re more stable.
Hence, product A is more stable in the market when compared with
product B.
A

14
While calculating
variance, we squared
the difference
between the value and
the mean. Why did
we square each
difference?

Good
Question!!!

When calculating the difference between a

particular data point and the mean, the
difference can be either positive or
negative. When summing these differences,
the positive ones can cancel out the negative
ones, leading to inaccurate results. To avoid
this issue, we square the values to remove
the negative term.

15
Then what is
the standard
variation?

Let’s visit the definition:-

Standard deviation is a statistical measure that describes
the amount of variation or dispersion of a dataset around
its mean or average.
It is calculated by taking the square root of the variance,
which is the average of the squared differences between
each data point and the mean of the dataset.

16
Why do we square root the value of variance?
So that we get the value in absolute terms.
Understanding variance is difficult. If we look at variance
of product A, it’s 5100. Now, it’s difficult to know what
this value represents. But if we take the square root of
the variance, we get 10.51 which means that the sales of
product A are 10 units above and below the average sale
value of product A.
SD Of PRODUCT A, sqrt(variance)=sqrt(5100)= 10.51
Let’s conclude - lesser deviation means data is clustered
more towards the mean, and if the deviation is more, the
data is more away from the mean and is spread widely.

17
Which
product has
more stable
sales?

Very simple!
Using one of the
How did most crucial
you answer concepts -
this? Coefficient of
Variation

Let's understand with an example:

X = [ 1, 2, 3 ] X = 2 Sx = 1
Y = [ 101, 102, 103 ] Y = 102 Sy = 1
Here series X and Y both have the same standard deviation.
Now how can we compare these two series?
Which series is having more variations while which one is
having fewer variations?
Simply, we will be using CV
The coefficient of variation (CV) is a statistical measure
that is the ratio between the standard deviation to the
mean of the data.
CV (X) = Sx = 1
= 0.5
X 2
Sy 1
CV (Y) =
Y
=
102
= 0.0098 18
How to interpret CV?
Again, very simple!

COV is lower, the series is

stable with less fluctuations.
COV is higher, the series is
having more variations and
fluctuations.

Let's take another example, consider two products, A and B.

Product A | standard deviation of Sales = 5 | Average Sales : 10
Product B | standard deviation of Sales = 10 | Average Sales: 10
The coefficient of variation for Product A is 0.5 (5/10), while
the coefficient of variation for Product B is 1.0 (10/10).
In this case, the higher coefficient of variation for
Product B indicates that it has a higher degree of risk or
variability compared to Product A.
So, we can say higher variation in sales means higher risk
in investing.
That's why product A is better.
19
STATISTICS

DESCRIPTIVE
STATISTICS

MEASURES OF MEASURES OF
CENTRAL TENDENCY DISPERSION

This is all about

Descriptive
Statistics where we
describe and
summarize the data
about certain
points.

20
If you’d like us to keep
posting, support us by
sharing this post.
Give it a big thumbs up
and tag people who will
find this helpful.
Content Designer : Hanit Kaur
Graphic Designer : Adithya Prasad
Content Lead : Sumit Shukla
With Love

Sample Portfolio With Movs-Annotations-A4
No ratings yet
Sample Portfolio With Movs-Annotations-A4
43 pages
The Book of Love and Creation A Channeled Text Multiformat Download
100% (17)
The Book of Love and Creation A Channeled Text Multiformat Download
17 pages
NCTE
No ratings yet
NCTE
4 pages
9 DCT Generic Standards
No ratings yet
9 DCT Generic Standards
40 pages
Singer Model NM-17-27 Electromagnetic Field Intensity Meter (1-500783-255 (Rec. A) ) Instruction Manual, 1973.
100% (1)
Singer Model NM-17-27 Electromagnetic Field Intensity Meter (1-500783-255 (Rec. A) ) Instruction Manual, 1973.
29 pages
Epson L3216 Brochure
No ratings yet
Epson L3216 Brochure
2 pages
RP 4-19 Truck Trailer Landing Gear (TTMA)
No ratings yet
RP 4-19 Truck Trailer Landing Gear (TTMA)
7 pages
2300 Vibration Datasheet 105m0340p
No ratings yet
2300 Vibration Datasheet 105m0340p
16 pages
BS en 00233-1999
No ratings yet
BS en 00233-1999
10 pages
Genmath DLL Week 5.1
No ratings yet
Genmath DLL Week 5.1
8 pages
Full Solved English Paper Class X 2025
No ratings yet
Full Solved English Paper Class X 2025
6 pages
Unit 5 - Increasing and Decreasing Function Behavior Assignment
No ratings yet
Unit 5 - Increasing and Decreasing Function Behavior Assignment
7 pages
Final + Sol - Spring 2023
No ratings yet
Final + Sol - Spring 2023
11 pages
Types of Brakes: Adhesive Brake
No ratings yet
Types of Brakes: Adhesive Brake
10 pages
Saints
No ratings yet
Saints
11 pages
20000027553a - EN - T25 Digital - 032023 - Web
No ratings yet
20000027553a - EN - T25 Digital - 032023 - Web
10 pages
Untitled
No ratings yet
Untitled
28 pages
Project Ligtas Mag Aaral
No ratings yet
Project Ligtas Mag Aaral
4 pages
Course 1 - Exam - Attempt Review
No ratings yet
Course 1 - Exam - Attempt Review
4 pages
Neumann Et Al., 2017
No ratings yet
Neumann Et Al., 2017
4 pages
Background of The Study
No ratings yet
Background of The Study
5 pages
ASD 89 The Sheet Is Designed For Calculating The Unity Ratio of A Member Subjecting To Moment and Axial Load.
No ratings yet
ASD 89 The Sheet Is Designed For Calculating The Unity Ratio of A Member Subjecting To Moment and Axial Load.
7 pages
Terminal Velocity of A Parachute
No ratings yet
Terminal Velocity of A Parachute
6 pages
Unit 5 Reading Questions
No ratings yet
Unit 5 Reading Questions
1 page
Judgement 12
No ratings yet
Judgement 12
4 pages
Statistik English
No ratings yet
Statistik English
16 pages
Shopping: Enter Your Title
No ratings yet
Shopping: Enter Your Title
12 pages
Zinc Calcium Bromide Chloride
No ratings yet
Zinc Calcium Bromide Chloride
2 pages
The Social Network Review
No ratings yet
The Social Network Review
16 pages
Topcon GR 5 Manual: Click Here To Download
No ratings yet
Topcon GR 5 Manual: Click Here To Download
3 pages
Principles: Life and Work
From Everand
Principles: Life and Work
Ray Dalio
4/5 (648)
The Glass Castle: A Memoir
From Everand
The Glass Castle: A Memoir
Jeannette Walls
4.5/5 (1856)
The Gifts of Imperfection: Let Go of Who You Think You're Supposed to Be and Embrace Who You Are
From Everand
The Gifts of Imperfection: Let Go of Who You Think You're Supposed to Be and Embrace Who You Are
Brené Brown
4/5 (1175)
The Emperor of All Maladies: A Biography of Cancer
From Everand
The Emperor of All Maladies: A Biography of Cancer
Siddhartha Mukherjee
4.5/5 (298)
The Perks of Being a Wallflower
From Everand
The Perks of Being a Wallflower
Stephen Chbosky
4.5/5 (4103)
Steve Jobs
From Everand
Steve Jobs
Walter Isaacson
4.5/5 (1139)
Angela's Ashes: A Memoir
From Everand
Angela's Ashes: A Memoir
Frank McCourt
4.5/5 (943)
The Outsider: A Novel
From Everand
The Outsider: A Novel
Stephen King
4/5 (2886)
Shoe Dog: A Memoir by the Creator of Nike
From Everand
Shoe Dog: A Memoir by the Creator of Nike
Phil Knight
4.5/5 (629)
Fear: Trump in the White House
From Everand
Fear: Trump in the White House
Bob Woodward
3.5/5 (836)
The Light Between Oceans: A Novel
From Everand
The Light Between Oceans: A Novel
M.L. Stedman
4.5/5 (815)
Little Women
From Everand
Little Women
Louisa May Alcott
4.5/5 (2369)
Rise of ISIS: A Threat We Can't Ignore
From Everand
Rise of ISIS: A Threat We Can't Ignore
Jay Sekulow
3.5/5 (144)
The World Is Flat 3.0: A Brief History of the Twenty-first Century
From Everand
The World Is Flat 3.0: A Brief History of the Twenty-first Century
Thomas L. Friedman
3.5/5 (2289)
Team of Rivals: The Political Genius of Abraham Lincoln
From Everand
Team of Rivals: The Political Genius of Abraham Lincoln
Doris Kearns Goodwin
4.5/5 (244)
The Unwinding: An Inner History of the New America
From Everand
The Unwinding: An Inner History of the New America
George Packer
4/5 (45)
The Yellow House: A Memoir (2019 National Book Award Winner)
From Everand
The Yellow House: A Memoir (2019 National Book Award Winner)
Sarah M. Broom
4/5 (100)
Sing, Unburied, Sing: A Novel
From Everand
Sing, Unburied, Sing: A Novel
Jesmyn Ward
4/5 (1267)
A Heartbreaking Work Of Staggering Genius: A Memoir Based on a True Story
From Everand
A Heartbreaking Work Of Staggering Genius: A Memoir Based on a True Story
Dave Eggers
3.5/5 (233)
Her Body and Other Parties: Stories
From Everand
Her Body and Other Parties: Stories
Carmen Maria Machado
4/5 (903)
John Adams
From Everand
John Adams
David McCullough
4.5/5 (2546)
Manhattan Beach: A Novel
From Everand
Manhattan Beach: A Novel
Jennifer Egan
3.5/5 (919)

Basics of Statistics

Uploaded by

Basics of Statistics

Uploaded by

DATA SCIENCE

How did you

This is to present my company

Let's revisit the pitch again..

How did Very simple.

See, category 6 has the highest sales in terms of

Let's revisit the pitch again..

You said mean is

Measures of central tendency are statistical measures that

Did you see the

To study them, we have one more branch of Descriptive

Oh, now This is

Here we have few measures of dispersion, i.e., standard

Let’s first revisit variance:-

As the name says variance is the measure of variability.

Mean of Product A = (100 + 150 + 200 + 250 + 300) / 5 = 200

Mean of Product B= (150 + 250 + 350 + 450 + 550) / 5 = 1750/5=350

We can now say that variance of Product A < Product B

When calculating the difference between a

Let’s visit the definition:-

Let's understand with an example:

COV is lower, the series is

Let's take another example, consider two products, A and B.

This is all about

You might also like