0% found this document useful (0 votes)

43 views8 pages

Session 12

The document discusses different statistical concepts including descriptive statistics, measures of central tendency, and measures of variability. Descriptive statistics describes and summarizes data numerically or visually. Measures of central tendency include the mean, median, and mode which show typical values in a dataset. Measures of variability like variance and standard deviation quantify how spread out the data is.

Uploaded by

darayir140

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

43 views8 pages

Session 12

Uploaded by

darayir140

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 8

Statistics

Statistics refers to the mathematics and techniques with which we understand data.

Descriptive Statistics
It is about describing and summarizing data. It uses two main approaches:

1. The quantitative approach describes and summarizes data numerically.

2. The visual approach illustrates data with charts, plots, histograms, and other graphs.

Types of Measures
Central tendency tells you about the centers of the data. Useful measures include the mean, median, and mode.
Variability tells you about the spread of the data. Useful measures include variance and standard deviation.
Correlation or joint variability tells you about the relation between a pair of variables in a dataset. Useful measures include covariance and
the correlation coefficient.

Getting Started With Python Statistics Libraries

In [1]: 1 import math
2 import statistics as st

In [2]: 1 x = [8.0, 1, 2.5, 4, 28.0]

2 x_with_nan = [8.0, 1, 2.5, math.nan, 4, 28.0]
3 x

Out[2]: [8.0, 1, 2.5, 4, 28.0]

In [3]: 1 x_with_nan

Out[3]: [8.0, 1, 2.5, nan, 4, 28.0]

Measures of Central Tendency

The measures of central tendency show the central or middle values of datasets. There are several definitions of what’s considered to be the
center of a dataset. In this tutorial, you’ll learn how to identify and calculate these measures of central tendency:

Mean
Weighted mean
Geometric mean
Harmonic mean
Median
Mode

Mean
The sample mean, also called the sample arithmetic mean or simply the average, is the arithmetic average of all the items in a dataset. The mean
of a dataset 𝑥 is mathematically expressed as
∑𝑥𝑖
𝑛
, where 𝑖 = 1, 2, …, 𝑛. In other words, it’s the sum of all the elements 𝑥ᵢ divided by the number of items in the dataset 𝑥.

In [4]: 1 mean_ = sum(x) / len(x)

2 mean_

Out[4]: 8.7

In [5]: 1 mean_ = st.mean(x)

2 mean_

Out[5]: 8.7
In [6]: 1 mean_ = st.fmean(x)
2 mean_

Out[6]: 8.7

However, if there are nan values among your data, then statistics.mean() and statistics.fmean() will return nan as the output:

In [7]: 1 mean_ = st.fmean(x_with_nan)

2 mean_

Out[7]: nan

Weighted Mean
The weighted mean, also called the weighted arithmetic mean or weighted average, is a generalization of the arithmetic mean that enables you to
define the relative contribution of each data point to the result.

You define one weight 𝑤ᵢ for each data point 𝑥ᵢ of the dataset 𝑥, where 𝑖 = 1, 2, …, 𝑛 and 𝑛 is the number of items in 𝑥. Then, you multiply each

∑𝑤𝑖𝑥𝑖
data point with the corresponding weight, sum all the products, and divide the obtained sum with the sum of weights:

∑𝑤𝑖
.

In [8]: 1 x = [8.0, 1, 2.5, 4, 28.0]

2 w = [0.1, 0.2, 0.3, 0.25, 0.15]
3 wmean = sum(w[i] * x[i] for i in range(len(x))) / sum(w)
4 wmean

Out[8]: 6.95

Geometric Mean
The Geometric Mean is a special type of average where we multiply the numbers together and then take a square root (for two numbers), cube
root (for three numbers) etc. where i = 1, 2, 3, ....n.
√𝑛 ⎯𝜋𝑥𝑖
⎯⎯⎯⎯⎯
In [9]: 1 gmean = st.geometric_mean(x)
2 print(round(gmean, 2))

4.68

Harmonic Mean
The harmonic mean is the reciprocal of the mean of the reciprocals of all items in the dataset.
For example, the harmonic mean of three values a, b and c will be equivalent to
3
(1/𝑎 + 1/𝑏 + 1/𝑐)
If one of the values is zero, the result will be zero.

The harmonic mean is a type of average, a measure of the central location of the data. It is often appropriate when averaging rates or ratios, for
example speeds.

Suppose a car travels 10 km at 40 km/hr, then another 10 km at 60 km/hr. What is the average speed?

In [10]: 1 st.harmonic_mean([40, 60])

Out[10]: 48.0

In [11]: 1 st.harmonic_mean([10, 30, 50, 70, 90])

Out[11]: 27.97513321492007

In [12]: 1 x = [8.0, 1, 2.5, 4, 28.0]

2 hmean = st.harmonic_mean(x)
3 print(round(hmean, 2))

2.76
Median
The sample median is the middle element of a sorted dataset. The dataset can be sorted in increasing or decreasing order. If the number of
elements 𝑛 of the dataset is odd, then the median is the value at the middle position: 0.5(𝑛 + 1). If 𝑛 is even, then the median is the arithmetic
mean of the two values in the middle, that is, the items at the positions 0.5𝑛 and 0.5𝑛 + 1.

For example, if you have the data points 2, 4, 1, 8, and 9, then the median value is 4, which is in the middle of the sorted dataset (1, 2, 4, 8, 9). If
the data points are 2, 4, 1, and 8, then the median is 3, which is the average of the two middle elements of the sorted sequence (2 and 4).

In [13]: 1 st.median([1, 3, 5])

Out[13]: 3

In [14]: 1 st.median([1, 3, 5, 7])

Out[14]: 4.0

In [15]: 1 st.median([5, 3, 7, 1])

Out[15]: 4.0

In [16]: 1 x = [8.0, 1, 2.5, 4, 28.0]

2 med = st.median(x)
3 med

Out[16]: 4

In [17]: 1 med = st.median(x[:-1])

2 med

Out[17]: 3.25

median_low() and median_high() are two more functions related to the median in the Python statistics library. They always return an element
from the dataset:

If the number of elements is odd, then there’s a single middle value, so these functions behave just like median().
If the number of elements is even, then there are two middle values. In this case, median_low() returns the lower and median_high() the
higher middle value.
g

In [18]: 1 st.median_low([1, 3, 5])

Out[18]: 3

In [19]: 1 st.median_low([1, 3, 5, 7])

Out[19]: 3

In [20]: 1 st.median_high([1, 3, 5])

Out[20]: 3

In [21]: 1 st.median_high([1, 3, 5, 7])

Out[21]: 5

Mode
The sample mode is the value in the dataset that occurs most frequently. If there isn’t a single such value, then the set is multimodal since it has
multiple modal values. For example, in the set that contains the points 2, 3, 2, 8, and 12, the number 2 is the mode because it occurs twice, unlike
the other items that occur only once.

In [22]: 1 st.mode([1, 1, 2, 3, 3, 3, 3, 4])

Out[22]: 3

In [23]: 1 st.multimode([1, 1, 1, 1, 2, 3, 3, 3, 3, 4])

Out[23]: [1, 3]

In [24]: 1 st.multimode('aabbbbccddddeeffffgg')

Out[24]: ['b', 'd', 'f']

Measures of Variability
The measures of central tendency aren’t sufficient to describe data. You’ll also need the measures of variability that quantify the spread of data
points.

Variance
Standard deviation

Variance
The sample variance quantifies the spread of the data. It shows numerically how far the data points are from the mean. You can express the
sample variance of the dataset 𝑥 with 𝑛 elements mathematically as

𝑆2 = ∑(𝑥𝑖 − 𝑥¯)2
𝑛−1
where
𝑆2 = sample variance
𝑥𝑖= the value of the one observation
𝑥¯
= the mean value of all observations
𝑛= the number of observations

In [25]: 1 x = [8.0, 1, 2.5, 4, 28.0]

2 st.variance(x)

Out[25]: 123.2

Standard Deviation

⎯∑(
⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯
𝑥 𝑖 − 𝜇 ) 2⎯
In statistics, the standard deviation is a measure of the amount of variation or dispersion of a set of values.

𝜎=√ 𝑁
where
𝜎= population standard deviation
𝑁 = the size of the population
𝑥𝑖 = each value from the population
𝜇 = the population mean
In [26]: 1 st.stdev(x)

Out[26]: 11.099549540409287

2025 IFT CFA Level I Facts and Formula Sheet hd4wwj
No ratings yet
2025 IFT CFA Level I Facts and Formula Sheet hd4wwj
17 pages
Ethics and AI Lab Manual
100% (1)
Ethics and AI Lab Manual
51 pages
Petite Fashion
No ratings yet
Petite Fashion
51 pages
The Relationship Between Social Media Usage and Grammar Skills of Grade 8 Students of Sacred Heart Academy of Santa Maria Bulacan Inc.
No ratings yet
The Relationship Between Social Media Usage and Grammar Skills of Grade 8 Students of Sacred Heart Academy of Santa Maria Bulacan Inc.
43 pages
Measures of Central Tendency
100% (1)
Measures of Central Tendency
48 pages
Social Psychology 9th Edition Kassin Fein Markus Solution Manual
100% (47)
Social Psychology 9th Edition Kassin Fein Markus Solution Manual
89 pages
Insler 2014 (JHR) The Health Consequences of Retirement
No ratings yet
Insler 2014 (JHR) The Health Consequences of Retirement
40 pages
DSHCS AhujaG
No ratings yet
DSHCS AhujaG
251 pages
Chapter 1 PPT 8th
No ratings yet
Chapter 1 PPT 8th
25 pages
Bda Unit 5
No ratings yet
Bda Unit 5
30 pages
Chapter 03 PowerPoint
No ratings yet
Chapter 03 PowerPoint
45 pages
Chapter Three Bio
No ratings yet
Chapter Three Bio
38 pages
Practical Research 2 REVIEWER
No ratings yet
Practical Research 2 REVIEWER
25 pages
Adane Arega Woldemariam
No ratings yet
Adane Arega Woldemariam
85 pages
Week1-2 Chap 3 Descri Data
No ratings yet
Week1-2 Chap 3 Descri Data
44 pages
Cloud-Based ROP Prediction and Optimization in Real Time Using Supervised Machine Learning
No ratings yet
Cloud-Based ROP Prediction and Optimization in Real Time Using Supervised Machine Learning
12 pages
Describing Data - Numerical Measure
No ratings yet
Describing Data - Numerical Measure
33 pages
Ch3 Numerically Summarizing Data
No ratings yet
Ch3 Numerically Summarizing Data
35 pages
CH 3 - Luc
No ratings yet
CH 3 - Luc
76 pages
Module 4 Univariate Analysis
No ratings yet
Module 4 Univariate Analysis
89 pages
3rd Week
No ratings yet
3rd Week
87 pages
Math236 Lecture 2
No ratings yet
Math236 Lecture 2
64 pages
Stats Lect
No ratings yet
Stats Lect
77 pages
Lecture 6
No ratings yet
Lecture 6
84 pages
Experiment No 1: Statistical Measures Such As Mean, Median and Mode of The Data
No ratings yet
Experiment No 1: Statistical Measures Such As Mean, Median and Mode of The Data
61 pages
Experiment No 1: Statistical Measures Such As Mean, Median and Mode of The Data
No ratings yet
Experiment No 1: Statistical Measures Such As Mean, Median and Mode of The Data
61 pages
Lecture 4 - Measures of Central Tendency and Dispersion
No ratings yet
Lecture 4 - Measures of Central Tendency and Dispersion
59 pages
Describing Data Numerical
No ratings yet
Describing Data Numerical
53 pages
Share MBBS - Lecture 4 (1) - 1
No ratings yet
Share MBBS - Lecture 4 (1) - 1
68 pages
DS Chapter - 2
No ratings yet
DS Chapter - 2
73 pages
3.3.1 Data Summarization
No ratings yet
3.3.1 Data Summarization
56 pages
03 - Measures - of - Center - Variation
No ratings yet
03 - Measures - of - Center - Variation
45 pages
Lind 19e Chap003 PPT Accessible
No ratings yet
Lind 19e Chap003 PPT Accessible
46 pages
Statistics & Psychology
No ratings yet
Statistics & Psychology
47 pages
Chapter 3
No ratings yet
Chapter 3
46 pages
(Ebook PDF) Mind On Statistics 5th Edition Download
100% (1)
(Ebook PDF) Mind On Statistics 5th Edition Download
50 pages
Chapter 3 Statistical Description of Data
No ratings yet
Chapter 3 Statistical Description of Data
55 pages
Chapter 3
No ratings yet
Chapter 3
36 pages
Measures of Location
No ratings yet
Measures of Location
33 pages
03 Numerical Description
No ratings yet
03 Numerical Description
52 pages
MTH281 Final Samples and Notes
No ratings yet
MTH281 Final Samples and Notes
159 pages
Lesson 3.2 Measures of Central Tendency Position and Variation
No ratings yet
Lesson 3.2 Measures of Central Tendency Position and Variation
62 pages
ML Lab Final R22
No ratings yet
ML Lab Final R22
67 pages
Measures of Central Tendency and Variation
No ratings yet
Measures of Central Tendency and Variation
30 pages
MCS Lecture 3
No ratings yet
MCS Lecture 3
57 pages
CH No 3 Statistics
No ratings yet
CH No 3 Statistics
25 pages
Data Mining and Predictive Modelling Assignment
No ratings yet
Data Mining and Predictive Modelling Assignment
34 pages
RSU - Statistics - Lecture 3 - Final - myRSU
No ratings yet
RSU - Statistics - Lecture 3 - Final - myRSU
34 pages
Measures of Central Tendency - Basic Formulas
No ratings yet
Measures of Central Tendency - Basic Formulas
4 pages
Modified Module 2-DM
No ratings yet
Modified Module 2-DM
107 pages
3.describing Data
No ratings yet
3.describing Data
35 pages
03 Numerical Description FULL
No ratings yet
03 Numerical Description FULL
51 pages
Measures of Central Tendency
No ratings yet
Measures of Central Tendency
27 pages
Data Description Analysis
No ratings yet
Data Description Analysis
40 pages
Lecture 2 - Descriptive Statistics
No ratings yet
Lecture 2 - Descriptive Statistics
40 pages
Business Intelligence and Data Analytics - Week 2
No ratings yet
Business Intelligence and Data Analytics - Week 2
24 pages
OPMT Pre-Reading Chapter 9 Quality
No ratings yet
OPMT Pre-Reading Chapter 9 Quality
12 pages
8614 Assignment No 2
No ratings yet
8614 Assignment No 2
26 pages
Descriptive Statistics Modified
No ratings yet
Descriptive Statistics Modified
36 pages
Notebook Statistics
No ratings yet
Notebook Statistics
6 pages
Statistics
No ratings yet
Statistics
23 pages
Data Science-3-Central Tendency
No ratings yet
Data Science-3-Central Tendency
8 pages
221 Chapter3 Student
No ratings yet
221 Chapter3 Student
16 pages
Summary Statistics
No ratings yet
Summary Statistics
28 pages
Business Statistics Chapter 2
No ratings yet
Business Statistics Chapter 2
13 pages
Sudan Return Index Beta (Final)
No ratings yet
Sudan Return Index Beta (Final)
21 pages
Vulnerability of Water Resources To Drought Risk and Flood Prevention in Mono River Basin (Gulf of Guinea Region)
No ratings yet
Vulnerability of Water Resources To Drought Risk and Flood Prevention in Mono River Basin (Gulf of Guinea Region)
13 pages
Measures of Central Tendency
No ratings yet
Measures of Central Tendency
12 pages
ML MCQs Set
No ratings yet
ML MCQs Set
18 pages
Random Variable
No ratings yet
Random Variable
10 pages
Statistics w3
No ratings yet
Statistics w3
10 pages
AQR Enhanced Portfolio Optimization
No ratings yet
AQR Enhanced Portfolio Optimization
49 pages
Lse Ppa M4u3 Notes
No ratings yet
Lse Ppa M4u3 Notes
15 pages
Student Resilience
No ratings yet
Student Resilience
46 pages
TP 245
No ratings yet
TP 245
4 pages
Measures of Central Tendency
No ratings yet
Measures of Central Tendency
6 pages
Business Analytics Syllabus (BVDU)
No ratings yet
Business Analytics Syllabus (BVDU)
18 pages
Chapter 2 Statistics
No ratings yet
Chapter 2 Statistics
16 pages
FM Chapter 4
No ratings yet
FM Chapter 4
16 pages
Central Tendency Statistics
No ratings yet
Central Tendency Statistics
3 pages
Stat Python
No ratings yet
Stat Python
4 pages
Factors Affecting The Earnings Management: The Case of Listed Firms in Vietnam
No ratings yet
Factors Affecting The Earnings Management: The Case of Listed Firms in Vietnam
11 pages
Module 5 - Statistical Methods
No ratings yet
Module 5 - Statistical Methods
5 pages
EJABM InfluenceofReviewQualityReviewQuantityandReviewCredibility
No ratings yet
EJABM InfluenceofReviewQualityReviewQuantityandReviewCredibility
17 pages
2436 7010 1 PB
No ratings yet
2436 7010 1 PB
7 pages
Uji Korelasi (Hubungan Variabel X Dan Y)
No ratings yet
Uji Korelasi (Hubungan Variabel X Dan Y)
2 pages
Tutorial Session 12 - Model Selection Solution
No ratings yet
Tutorial Session 12 - Model Selection Solution
4 pages
Worked Examples in Mathematics for Scientists and Engineers
From Everand
Worked Examples in Mathematics for Scientists and Engineers
G. Stephenson
No ratings yet
Co-Clustering: Models, Algorithms and Applications
From Everand
Co-Clustering: Models, Algorithms and Applications
Gérard Govaert
No ratings yet
Statistics: a QuickStudy Laminated Reference Guide
From Everand
Statistics: a QuickStudy Laminated Reference Guide
BarCharts Publishing, Inc.
No ratings yet

Session 12

Uploaded by

Session 12

Uploaded by

Statistics

1. The quantitative approach describes and summarizes data numerically.

Getting Started With Python Statistics Libraries

In [2]: 1 x = [8.0, 1, 2.5, 4, 28.0]

Out[2]: [8.0, 1, 2.5, 4, 28.0]

Out[3]: [8.0, 1, 2.5, nan, 4, 28.0]

Measures of Central Tendency

In [4]: 1 mean_ = sum(x) / len(x)

In [5]: 1 mean_ = st.mean(x)

In [7]: 1 mean_ = st.fmean(x_with_nan)

In [8]: 1 x = [8.0, 1, 2.5, 4, 28.0]

In [10]: 1 st.harmonic_mean([40, 60])

In [11]: 1 st.harmonic_mean([10, 30, 50, 70, 90])

In [12]: 1 x = [8.0, 1, 2.5, 4, 28.0]

In [13]: 1 st.median([1, 3, 5])

In [14]: 1 st.median([1, 3, 5, 7])

In [15]: 1 st.median([5, 3, 7, 1])

In [16]: 1 x = [8.0, 1, 2.5, 4, 28.0]

In [17]: 1 med = st.median(x[:-1])

In [18]: 1 st.median_low([1, 3, 5])

In [19]: 1 st.median_low([1, 3, 5, 7])

In [20]: 1 st.median_high([1, 3, 5])

In [21]: 1 st.median_high([1, 3, 5, 7])

In [22]: 1 st.mode([1, 1, 2, 3, 3, 3, 3, 4])

In [23]: 1 st.multimode([1, 1, 1, 1, 2, 3, 3, 3, 3, 4])

Out[24]: ['b', 'd', 'f']

In [25]: 1 x = [8.0, 1, 2.5, 4, 28.0]

You might also like