Discriptive Statistics

Descriptive statistics summarize data sets through measures of central tendency (mean, median, mode) and measures of variability (range, interquartile range, variance, standard deviation). The mean is the average but can be skewed by outliers, while the median provides the middle value and the mode indicates the most frequent score. Measures of variability describe the dispersion of data points around the center, with the standard deviation indicating the typical distance from the mean.

Uploaded by

abhijaychauhan88

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

9 views23 pages

Discriptive Statistics

Uploaded by

abhijaychauhan88

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

You are on page 1/ 23

Descriptive Statistics

Descriptive Statistics?
Descriptive statistics are brief descriptive coefficients that summarize a given data set, which can
be either a representation of the entire or a sample of a population.
Descriptive statistics are broken down into measures of central tendency and measures of
variability (spread).
Measures of central tendency include the mean, median, and mode, while measures of
variability include the standard deviation, variance, the minimum and maximum variables, and
the kurtosis and skewness.
What is a measure of Central
Tendency?
A measure of central tendency is a summary statistic that represents the centre point or typical
value of a dataset.
These measures indicate where most values in a distribution fall and are also referred to as the
central location of a distribution.
Mean (Arithmetic)
The mean is the arithmetic average, and it is probably the measure of central tendency that you
are most familiar. Calculating the mean is very simple. You just add up all of the values and
divide by the number of observations in your dataset.

n
xi  x2      xn
x x i 
i 1 n
n
When not to use the mean
The mean has one main disadvantage: it is particularly susceptible to the influence of outliers.
These are values that are unusual compared to the rest of the data set by being especially small
or large in numerical value.
For example, consider the wages of staff at a factory below:

Staff 1 2 3 4 5 6 7 8 9 10
Salary 15k 18k 16k 14k 15k 15k 12k 17k 90k 95k
The mean salary for these ten staff is $30.7k. However, inspecting the raw data suggests that
this mean value might not be the best way to accurately reflect the typical salary of a worker, as
most workers have salaries in the $12k to 18k range. The mean is being skewed by the two large
salaries. Therefore, in this situation, we would like to have a better measure of central tendency.
Median
The median is the middle value. It is the value that splits the dataset in half.
To find the median, order your data from smallest to largest, and then find the data point that
has an equal amount of values above it and below it.
The method for locating the median varies slightly depending on whether your dataset has an
even or odd number of values.
• If n is odd, the median is the middle number.
• If n is even, the median is the average of the 2 middle numbers.
suppose we have the data
below:
65 55 89 56 35 14 56 55 87 45 92

We first need to rearrange that data into order of magnitude (smallest first):

14 35 45 55 55 56 56 65 87 89 92

Our median mark is the middle mark - in this case, 56 (highlighted in bold). It is the middle mark
because there are 5 scores before it and 5 scores after it.
65 55 89 56 35 14 56 55 87 45

We again rearrange that data into order of magnitude (smallest first):

14 35 45 55 55 56 56 65 87 89

Only now we have to take the 5th and 6th score in our data set and average them to get a
median of 55.5.
Mode
The mode is the most frequent score in our data set. On a histogram it represents the highest
bar in a bar chart or histogram. You can, therefore, sometimes consider the mode as being the
most popular option.
Normally, the mode is used for categorical data where we wish to know which is the most
common category
Example:
x <- c(8,2,7,1,2,9,8,2,10,9,8)
sort(x)

names(table(x))[table(x) ==max(table(x))]
Measures of Variability:
Range
Interquartile Range
Variance
Standard Deviation
What is Measures of
Variability?
A measure of variability is a summary statistic that represents the amount of dispersion in a
dataset. How spread out are the values? While a measure of central tendency describes the
typical value, measures of variability define how far away the data points tend to fall from the
centre. We talk about variability in the context of a distribution of values. A low dispersion
indicates that the data points tend to be clustered tightly around the centre. High dispersion
signifies that they tend to fall further away.
Range
The range of a dataset is the difference between the largest and smallest values in that dataset.
For example, in the two datasets below, dataset 1 has a range of 20 – 38 = 18 while dataset 2
has a range of 11 – 52 = 41. Dataset 2 has a broader range and, hence, more variability than
dataset 1.
Interquartile Range
The interquartile range is the middle half of the data. To visualize it, think about the median
value that splits the dataset in half. Similarly, you can divide the data into quarters. Statisticians
refer to these quarters as quartiles and denote them from low to high as Q1, Q2, and Q3. The
lowest quartile (Q1) contains the quarter of the dataset with the smallest values. The upper
quartile (Q4) contains the quarter of the dataset with the highest values. The interquartile range
is the middle half of the data that is in between the upper and lower quartiles. In other words,
the interquartile range includes the 50% of data points that fall between Q1 and Q3.
Example:
Consider a dataset representing the salaries of employees in a company:
Salaries (in dollars): 40000,45000,50000,55000,60000,70000,90000,150000
Step 1: Calculate Quartiles:
Arrange the data in ascending order: 40000,45000,50000,55000,60000,70000,90000,150000
Calculate the median (Q2): Q2=57500
Split the dataset into two halves:
Lower half: 40000,45000,50000,55000 and Upper half: 60000,70000,90000,150000
Calculate the median of the lower half (Q1): Q1=47500
Calculate the median of the upper half (Q3): Q3=80000
Step 2: Calculate IQR:
IQR=Q3−Q1=80000−47500=32500 dollars

Step 3: Calculate range for outliers:

(Q1-1.5*IQR, Q3+1.5*IQR)

47500-1.532500, 80000+1.532500 = (-1250, 128750)

Variance
Variance is the average squared difference of the values from the mean.
Unlike the previous measures of variability, the variance includes all values in the calculation by
comparing each value to the mean.
To calculate this statistic, you calculate a set of squared differences between the data points and
the mean, sum them, and then divide by the number of observations.
There are two formulas for the variance depending on whether you are calculating the variance
for an entire population or using a sample to estimate the population variance.
Population variance
The formula for the variance of an entire population is the following:

In the equation, σ2 is the population parameter for the variance, μ is the parameter for the
population mean, and N is the number of data points, which should include the entire
population.
Sample variance
To use a sample to estimate the variance for a population, use the following formula.

In the equation, s2 is the sample variance, and M is the sample mean. N-1 in the denominator
corrects for the tendency of a sample to underestimate the population variance.
Example of calculating the sample
variance
Standard Deviation
The standard deviation is the standard or typical difference between each data point and the
mean. When the values in a dataset are grouped closer together, you have a smaller standard
deviation. On the other hand, when the values are spread out more, the standard deviation is
larger because the standard distance is greater.
The standard deviation is just the square root of the variance.
In the variance section, we calculated a variance of 201 in the table.

Therefore, the standard deviation for that dataset is 14.177.

Stata Book - Manual - Panel Data Analysis
100% (4)
Stata Book - Manual - Panel Data Analysis
35 pages
Quants
100% (1)
Quants
18 pages
Statistical Machine Learning
100% (1)
Statistical Machine Learning
12 pages
A Million Random Digits With 100,000 Normal Deviates (No OCR)
100% (2)
A Million Random Digits With 100,000 Normal Deviates (No OCR)
628 pages
Introduction To Descriptive Statistics 2014
67% (3)
Introduction To Descriptive Statistics 2014
72 pages
Real Statistics Examples Part 1A
No ratings yet
Real Statistics Examples Part 1A
853 pages
Descriptive Statistics
No ratings yet
Descriptive Statistics
83 pages
Descriptive Statistics PDF
100% (1)
Descriptive Statistics PDF
40 pages
Measure of Central Tendency and Variability
No ratings yet
Measure of Central Tendency and Variability
73 pages
1-Descriptive Statistics
No ratings yet
1-Descriptive Statistics
44 pages
Ch3 Numerically Summarizing Data
No ratings yet
Ch3 Numerically Summarizing Data
35 pages
Session 7 Statistics 2023 AP Daily Practice Sessions
100% (1)
Session 7 Statistics 2023 AP Daily Practice Sessions
2 pages
3 Descriptive Statistics - Numerical
No ratings yet
3 Descriptive Statistics - Numerical
82 pages
2 - Introduction To Statistics
No ratings yet
2 - Introduction To Statistics
97 pages
Autoregressive, MA and ARMA Processes
No ratings yet
Autoregressive, MA and ARMA Processes
100 pages
Lecture 3 Summarizing Data Measures of Central Location and Sampling
No ratings yet
Lecture 3 Summarizing Data Measures of Central Location and Sampling
53 pages
Share MBBS - Lecture 4 (1) - 1
No ratings yet
Share MBBS - Lecture 4 (1) - 1
68 pages
Lesson 3.2 Measures of Central Tendency Position and Variation
No ratings yet
Lesson 3.2 Measures of Central Tendency Position and Variation
62 pages
2.descriptive Statistics
No ratings yet
2.descriptive Statistics
49 pages
Chapter 1
No ratings yet
Chapter 1
44 pages
Part 2-Chapter 3 - Describing Data - Edit
No ratings yet
Part 2-Chapter 3 - Describing Data - Edit
46 pages
Central Tendency Variation Outliers
No ratings yet
Central Tendency Variation Outliers
59 pages
Topic 1 Describing Data II
No ratings yet
Topic 1 Describing Data II
68 pages
EECM3724 Unit 1 Ch3 Slides 2022
No ratings yet
EECM3724 Unit 1 Ch3 Slides 2022
48 pages
Click To Add Text Dr. Cemre Erciyes
No ratings yet
Click To Add Text Dr. Cemre Erciyes
69 pages
Week 6+7+8
No ratings yet
Week 6+7+8
37 pages
Descriptive Statistics W25
No ratings yet
Descriptive Statistics W25
41 pages
Descriptive Statistics
No ratings yet
Descriptive Statistics
44 pages
Unit 3 Measure of Central Location
No ratings yet
Unit 3 Measure of Central Location
29 pages
2 Measures of Location - Dispersion
No ratings yet
2 Measures of Location - Dispersion
61 pages
Central Tendency - Lecture Notes
No ratings yet
Central Tendency - Lecture Notes
34 pages
2a. Describing Variables With Numbers
No ratings yet
2a. Describing Variables With Numbers
30 pages
Measures of Location and VARIATION For 1 Variable
No ratings yet
Measures of Location and VARIATION For 1 Variable
44 pages
Unit 6 Interpreting Evaluation Results
No ratings yet
Unit 6 Interpreting Evaluation Results
54 pages
Descriptive Statistics: Mean or Average
No ratings yet
Descriptive Statistics: Mean or Average
5 pages
1-Descriptive Statistics
No ratings yet
1-Descriptive Statistics
44 pages
Analytics Overview
No ratings yet
Analytics Overview
34 pages
Descriptive Statsistics
No ratings yet
Descriptive Statsistics
34 pages
Chapter 3 - Numerical Technique - Send
No ratings yet
Chapter 3 - Numerical Technique - Send
49 pages
3-Measures of Dispersion
No ratings yet
3-Measures of Dispersion
33 pages
Brick Exchange - Descriptive Statistics and Data Representation
No ratings yet
Brick Exchange - Descriptive Statistics and Data Representation
24 pages
Chapter 3, Part A Descriptive Statistics: Numerical Measures
No ratings yet
Chapter 3, Part A Descriptive Statistics: Numerical Measures
7 pages
Logistic Regression
No ratings yet
Logistic Regression
25 pages
Instant Download Outlier Analysis 2nd Edition Charu C. Aggarwal (Auth.) PDF All Chapter
100% (5)
Instant Download Outlier Analysis 2nd Edition Charu C. Aggarwal (Auth.) PDF All Chapter
53 pages
2 Stats Intro 14022024 105150am
No ratings yet
2 Stats Intro 14022024 105150am
19 pages
Descriptive Statistics - Measures of Spread: April 2014
No ratings yet
Descriptive Statistics - Measures of Spread: April 2014
5 pages
Lecture 3 - Stat HO
No ratings yet
Lecture 3 - Stat HO
21 pages
Descriptive Statistics - Measures of Spread: April 2014
No ratings yet
Descriptive Statistics - Measures of Spread: April 2014
5 pages
Tutorial Answers
No ratings yet
Tutorial Answers
5 pages
Social Science Statistics (June-Aug) 2025-Topic 2
No ratings yet
Social Science Statistics (June-Aug) 2025-Topic 2
21 pages
Introduction To ML
No ratings yet
Introduction To ML
17 pages
Descriptive Statistics
No ratings yet
Descriptive Statistics
15 pages
Decision Tree
No ratings yet
Decision Tree
35 pages
College Statistics Cheat Sheet
100% (2)
College Statistics Cheat Sheet
2 pages
Corrections To: Bayesian Econometrics by Gary Koop (Published by Wiley)
No ratings yet
Corrections To: Bayesian Econometrics by Gary Koop (Published by Wiley)
4 pages
Hierarchical
No ratings yet
Hierarchical
31 pages
Freq. Distribution Characteristics
No ratings yet
Freq. Distribution Characteristics
13 pages
K Means
No ratings yet
K Means
25 pages
Presentation 4
No ratings yet
Presentation 4
29 pages
Descriptive Statistics
No ratings yet
Descriptive Statistics
14 pages
Unit 1 - Business Statistics & Analytics
No ratings yet
Unit 1 - Business Statistics & Analytics
25 pages
Stat 1101 4 7
No ratings yet
Stat 1101 4 7
18 pages
NITKclass 1
No ratings yet
NITKclass 1
50 pages
Lecture 3 & 4 Describing Data Numerical Measures
No ratings yet
Lecture 3 & 4 Describing Data Numerical Measures
24 pages
2nd Unit - Statistics
No ratings yet
2nd Unit - Statistics
15 pages
Statistics Unit1 Notes
No ratings yet
Statistics Unit1 Notes
11 pages
Lecture - 7 Classification (SVM)
No ratings yet
Lecture - 7 Classification (SVM)
48 pages
Unit - 2 Biostatistics
No ratings yet
Unit - 2 Biostatistics
9 pages
Probability
No ratings yet
Probability
22 pages
Statistics 1
No ratings yet
Statistics 1
10 pages
CH 2 Lecture Notes
No ratings yet
CH 2 Lecture Notes
12 pages
Jerome Statistics
No ratings yet
Jerome Statistics
12 pages
Setting The Unit of Analysis
No ratings yet
Setting The Unit of Analysis
34 pages
Confusion Matrix
No ratings yet
Confusion Matrix
16 pages
Lecture 5 Notes
No ratings yet
Lecture 5 Notes
23 pages
2-Siklus Regresi
No ratings yet
2-Siklus Regresi
27 pages
CHAID Decision Tree
No ratings yet
CHAID Decision Tree
14 pages
Statistical Data Analysis Assignment
No ratings yet
Statistical Data Analysis Assignment
17 pages
Analisa Pengaruh Fasilitas Dan Kepuasan Pelanggan Terhadap Loyalitas Pelanggan Menginap Di Mikie Holiday Resort Dan Hotel Berastagi
No ratings yet
Analisa Pengaruh Fasilitas Dan Kepuasan Pelanggan Terhadap Loyalitas Pelanggan Menginap Di Mikie Holiday Resort Dan Hotel Berastagi
13 pages
FIN213 - Semester Test 2 Solutions Memo 20240503
No ratings yet
FIN213 - Semester Test 2 Solutions Memo 20240503
13 pages
CHAPTER 1 Descriptive Statistics
No ratings yet
CHAPTER 1 Descriptive Statistics
5 pages
Usage of Color Measurements Obtained by Modified Seliwanoff Test To Determine Hydroxymethylfurfural
No ratings yet
Usage of Color Measurements Obtained by Modified Seliwanoff Test To Determine Hydroxymethylfurfural
8 pages
Individual Household Electric Power Consumption
No ratings yet
Individual Household Electric Power Consumption
29 pages
Statistical Analysis - Descriptive Stat
No ratings yet
Statistical Analysis - Descriptive Stat
6 pages
Ann PM
No ratings yet
Ann PM
1 page
ML Daily Tracker 8 Weeks
No ratings yet
ML Daily Tracker 8 Weeks
2 pages
QUARTILE
No ratings yet
QUARTILE
2 pages
Data Mining
No ratings yet
Data Mining
13 pages
Regression Metrics
No ratings yet
Regression Metrics
11 pages
II B.Tech (MIC23) SMDS Model Paper-1
No ratings yet
II B.Tech (MIC23) SMDS Model Paper-1
2 pages
Franchising and Firm Risk
No ratings yet
Franchising and Firm Risk
11 pages
Watson Studio
No ratings yet
Watson Studio
8 pages
Statistics
No ratings yet
Statistics
7 pages
Corelation
No ratings yet
Corelation
14 pages
Sem Stats
No ratings yet
Sem Stats
12 pages
Lampiran Hasil Analisis Jalur Dengan Lisrel
No ratings yet
Lampiran Hasil Analisis Jalur Dengan Lisrel
7 pages
For all problems, α = 0.05. Show screenshots from JMP as needed
No ratings yet
For all problems, α = 0.05. Show screenshots from JMP as needed
1 page
Forecast UPC-Level FMCG Demand, Part II: Hierarchical Reconciliation
No ratings yet
Forecast UPC-Level FMCG Demand, Part II: Hierarchical Reconciliation
9 pages
Penerapan Fungsi Manajemen Sebagai Metode Meningkatkan Kinerja Karyawan
No ratings yet
Penerapan Fungsi Manajemen Sebagai Metode Meningkatkan Kinerja Karyawan
8 pages
The Receiver Operating Characteristic ROC Curve
No ratings yet
The Receiver Operating Characteristic ROC Curve
3 pages
File PDF
No ratings yet
File PDF
2 pages
De-Mystifying Math and Stats for Machine Learning: Mastering the Fundamentals of Mathematics and Statistics for Machine Learning
From Everand
De-Mystifying Math and Stats for Machine Learning: Mastering the Fundamentals of Mathematics and Statistics for Machine Learning
Seaport AI Madhavan
No ratings yet
Overview Of Bayesian Approach To Statistical Methods: Software
From Everand
Overview Of Bayesian Approach To Statistical Methods: Software
Vinaitheerthan Renganathan
No ratings yet
Descriptive Statistics: Six Sigma Thinking, #3
From Everand
Descriptive Statistics: Six Sigma Thinking, #3
Sumeet Savant
No ratings yet

Discriptive Statistics

Uploaded by

Discriptive Statistics

Uploaded by

Descriptive Statistics

We again rearrange that data into order of magnitude (smallest first):

Step 3: Calculate range for outliers:

47500-1.5*32500, 80000+1.5*32500 = (-1250, 128750)

Therefore, the standard deviation for that dataset is 14.177.

You might also like

47500-1.532500, 80000+1.532500 = (-1250, 128750)