08.11 Week 5, Class 2 - Descriptive Analytics and Data Wrangling With Pandas

Uploaded by

Magambo Sahiil Moris

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

16 views12 pages

08.11 Week 5, Class 2 - Descriptive Analytics and Data Wrangling With Pandas

Uploaded by

Magambo Sahiil Moris

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 12

AI/ML

Data Analytics
Descriptive

Data Analytics
Descriptive Analytics (Recap)
• Help us to understand what the happened e.g the number of
accidents increased year-on-year.
• We can get an idea of the main trends in the data.
• Typically when describing data, there two types of analysis that we
can consider.
• Measures of Centrality (help to understand the typical center of the
data)
• Measures of Dispersion (help to understand how far apart samples are
from the central value)
Measures of Central Tendency
• Mean - sum of data points divided by the number of data points.
• Median - middle value in an ordered sample.
• Mode - most frequent value and usually the preferred measure for
categorical data.
Measures of Central Tendency (Mean)
• Most common measure for numerical data. e.g consider these values for
age
54, 54, 54, 55, 56, 57, 57, 58, 58, 60, 60
• The mean will be the sum(623) / number of samples(11) = 56.6 years
• One of the issues with the mean is that it cannot be used for categorical
data.
• It is heavily influenced by presence of outliers. e.g. for the values below,
the mean would be 78.5
54, 54, 54, 55, 56, 57, 57, 58, 58, 60, 300
Measures of Central Tendency (Median)
• Another common measure for numerical data. e.g. considering the
same values as in the previous slide
54, 54, 54, 55, 56, 57, 57, 58, 58, 60, 60
• The median value is 57 (remember to order the data).
• It is a better measure when there are outliers..
• Although it cannot be used for categorical(nominal) data because this
data cannot be ordered.
Measures of Central Tendency (Mode)
• Unlike mean and median, it can be used for both categorical and
numerical data.
54, 54, 54, 55, 56, 57, 57, 58, 58, 60, 60
• We need to get the frequency(number of occurrences) of each value
to get the mode.
• Here the mode is 54 as it has the highest frequency.
• Most common issue is having multi-modal data i.e. where more than
one value have the highest number of occurrences. e.g.
54, 54, 54, 55, 56, 57, 57, 58, 58, 58, 60, 60
Measures of Dispersion
• The central value describes the typical data point in the sample
whereas dispersion measures the difference between a sample
and the center.
• Range tells us the difference between the smallest and largest
value in the data.
• Variance and Standard Deviation tell us how spread the data are
around the mean.
Measures of Dispersion (Variance)
• Can be calculated for the population or a sample.
• The smaller the variance the closer the dataset is to the mean and
vice-versa.

Population Variance(Credit) Sample Variance(Credit)

Measures of Dispersion (Standard Deviation)
• Is the square root of variance. This helps to bring back the value of
spread to approximately the same units as samples.

• Consider the dataset 4, 5, 5, 5, 6, 6, 6, 6, 7, 7, 7, 8.

Mean = 6
• Xi - μ = -2, -1, -1, -1, 0, 0, 0, 0, 1, 1, 1, 2
• (Xi - μ)2 = 4, 1, 1, 1, 0, 0, 0, 0, 1,1,1, 4
• Variance 𝝈2 = 1.17
• Standard Deviation 𝝈 = 1.08
Measures of Dispersion (Standard Deviation)

Visual illustration of Spread of Data around Mean

Pandas for Data Manipulation
• Pandas is one of the most popular libraries for handling data in
python
• Over the next 2 weeks we are going to learn how to use pandas to
• load and interact with data
• modify the data
• explore data
• visualization (matplotlib and seaborn)

Industrial Statistics - A Computer Based Approach With Python
No ratings yet
Industrial Statistics - A Computer Based Approach With Python
140 pages
MMW Unit IV Statistics
No ratings yet
MMW Unit IV Statistics
62 pages
Marikina Flood Hazard Models Using Historical Data of Water Level
No ratings yet
Marikina Flood Hazard Models Using Historical Data of Water Level
12 pages
Unit Ii
No ratings yet
Unit Ii
38 pages
Chapter 3 - Data Presentation
100% (1)
Chapter 3 - Data Presentation
40 pages
Business Statistics I Essentials
From Everand
Business Statistics I Essentials
Louise Clark
5/5 (5)
Descriptive Statistics
No ratings yet
Descriptive Statistics
105 pages
STATISTICS
No ratings yet
STATISTICS
8 pages
Mining Geostatistics PDF
93% (15)
Mining Geostatistics PDF
610 pages
Pre - Week 3vs4 - Updated
No ratings yet
Pre - Week 3vs4 - Updated
34 pages
Statistics For People Who Think They Hate Statistics 3rd Edition Salkind Test Bank Instant Download
100% (2)
Statistics For People Who Think They Hate Statistics 3rd Edition Salkind Test Bank Instant Download
38 pages
Chap 3 Measures of Central Tendency
No ratings yet
Chap 3 Measures of Central Tendency
28 pages
Applied Nonparametric Statistical Methods 3rd Ed Edition Peter Sprent Download PDF
No ratings yet
Applied Nonparametric Statistical Methods 3rd Ed Edition Peter Sprent Download PDF
51 pages
Stat 309
No ratings yet
Stat 309
25 pages
PS QuestionBank
No ratings yet
PS QuestionBank
13 pages
Lecture 2 - Descriptive Statistics Part II
No ratings yet
Lecture 2 - Descriptive Statistics Part II
47 pages
04 Data Exploration
No ratings yet
04 Data Exploration
12 pages
Measures of Central Tendency and Dispersion
No ratings yet
Measures of Central Tendency and Dispersion
13 pages
Chapter 3 Numerical Descriptive S
No ratings yet
Chapter 3 Numerical Descriptive S
108 pages
DS Module 2
No ratings yet
DS Module 2
113 pages
Descriptive Statistics
No ratings yet
Descriptive Statistics
14 pages
Day 3 Educational Statistics
No ratings yet
Day 3 Educational Statistics
37 pages
CMP4101 - 6 Filters Part II
No ratings yet
CMP4101 - 6 Filters Part II
68 pages
Exp 3
No ratings yet
Exp 3
16 pages
Lesson 1
No ratings yet
Lesson 1
37 pages
Topic 8 Data Processing and Analysis PDF
No ratings yet
Topic 8 Data Processing and Analysis PDF
157 pages
Chapter 3 Data Presentation
No ratings yet
Chapter 3 Data Presentation
40 pages
2 - Introduction To Statistics
No ratings yet
2 - Introduction To Statistics
97 pages
CMP4101 - 4 - Frequency Domain Signal Processing - Part II
No ratings yet
CMP4101 - 4 - Frequency Domain Signal Processing - Part II
80 pages
Chapter 3
No ratings yet
Chapter 3
121 pages
Numerical Descriptive Techniques (6 Hours)
No ratings yet
Numerical Descriptive Techniques (6 Hours)
89 pages
Descriptive Analytics
No ratings yet
Descriptive Analytics
42 pages
Statistics FoundationalMathofAI S24
No ratings yet
Statistics FoundationalMathofAI S24
5 pages
Mat 241 Probability and Statistics
0% (1)
Mat 241 Probability and Statistics
2 pages
Central Tendency
No ratings yet
Central Tendency
10 pages
Describing Data - Numerical Measure
No ratings yet
Describing Data - Numerical Measure
33 pages
3 Numerical Descriptive Measures
No ratings yet
3 Numerical Descriptive Measures
55 pages
Lesson 02 Probability and Statistics
No ratings yet
Lesson 02 Probability and Statistics
127 pages
Chapter 4 Data Management Part 3
No ratings yet
Chapter 4 Data Management Part 3
68 pages
Statistics
No ratings yet
Statistics
10 pages
Chapter 5 Statistics and Data
No ratings yet
Chapter 5 Statistics and Data
25 pages
Lecture Notes 2.3
No ratings yet
Lecture Notes 2.3
7 pages
Intro To Statistics - Descriptive Statistics and NPC - 20250225 - 171911 - 0000
No ratings yet
Intro To Statistics - Descriptive Statistics and NPC - 20250225 - 171911 - 0000
23 pages
Class-10 Ch-13 & 14 Statistics & Probability-1
No ratings yet
Class-10 Ch-13 & 14 Statistics & Probability-1
7 pages
Lecture 5 (Descriptive Statistics)
No ratings yet
Lecture 5 (Descriptive Statistics)
39 pages
Assignment No 3
No ratings yet
Assignment No 3
16 pages
Community Medicine Solved BCQs 8th Semester MBBS LUMHS-1
No ratings yet
Community Medicine Solved BCQs 8th Semester MBBS LUMHS-1
27 pages
STAT - Lec.2 - Measures of Centeral Tendency - Measures of Dispersion.
100% (1)
STAT - Lec.2 - Measures of Centeral Tendency - Measures of Dispersion.
33 pages
Topic 3 - Data Presentation, Summarization, Measure of Central Tendency&Spread.
No ratings yet
Topic 3 - Data Presentation, Summarization, Measure of Central Tendency&Spread.
48 pages
DSP Assignment 2
No ratings yet
DSP Assignment 2
8 pages
Unit Iv
No ratings yet
Unit Iv
15 pages
2nd Unit - Statistics
No ratings yet
2nd Unit - Statistics
15 pages
Day2-Data Types Central Tendency Dispersion
No ratings yet
Day2-Data Types Central Tendency Dispersion
2 pages
Business Statistics & Analytics For Decision Making Assignment 1 Franklin Babu
100% (1)
Business Statistics & Analytics For Decision Making Assignment 1 Franklin Babu
9 pages
Descriptive Statistics PDF
No ratings yet
Descriptive Statistics PDF
130 pages
Decriptive Statistics in Data Science
No ratings yet
Decriptive Statistics in Data Science
9 pages
Refactory Mail - New Material - Class Slides - Technical Project
No ratings yet
Refactory Mail - New Material - Class Slides - Technical Project
1 page
Inbound 2987877137517492731
No ratings yet
Inbound 2987877137517492731
32 pages
Angilan, Ef
No ratings yet
Angilan, Ef
5 pages
ADM-SHS-StatProb-Q3-M12-Converting A Normal Random Variable To A Standard Normal Variable and Vice-Versa
No ratings yet
ADM-SHS-StatProb-Q3-M12-Converting A Normal Random Variable To A Standard Normal Variable and Vice-Versa
27 pages
Unit Iii
No ratings yet
Unit Iii
23 pages
Unit 2 Statistical Estimation
No ratings yet
Unit 2 Statistical Estimation
15 pages
Topic 3
No ratings yet
Topic 3
49 pages
Inquiries, Investigation, and Immersion: Quarter 2 Module 1-Lesson 2
100% (3)
Inquiries, Investigation, and Immersion: Quarter 2 Module 1-Lesson 2
17 pages
Black-Belt Course - Measure Phase
No ratings yet
Black-Belt Course - Measure Phase
133 pages
Presentation 4
No ratings yet
Presentation 4
29 pages
Describing, Exploring, and Comparing Data
100% (1)
Describing, Exploring, and Comparing Data
61 pages
Ncert Solutions Class 10 Maths Chapter 14 Statistics
No ratings yet
Ncert Solutions Class 10 Maths Chapter 14 Statistics
51 pages
Ch3 Numerically Summarizing Data
No ratings yet
Ch3 Numerically Summarizing Data
35 pages
Statistical Analysis - Descriptive Stat
No ratings yet
Statistical Analysis - Descriptive Stat
6 pages
Astm D5483
No ratings yet
Astm D5483
5 pages
Quantitative Methods: Sessions 1-3 Case: Catalog Marketing
No ratings yet
Quantitative Methods: Sessions 1-3 Case: Catalog Marketing
70 pages
Moisture Methods: Food Industry
No ratings yet
Moisture Methods: Food Industry
28 pages
Normal Binomial
100% (1)
Normal Binomial
12 pages
Basic Statistics (STA201)
No ratings yet
Basic Statistics (STA201)
25 pages
f592b059 1643454320549
No ratings yet
f592b059 1643454320549
39 pages
Chapter4 - Measures of Central Tendency and Variation
100% (2)
Chapter4 - Measures of Central Tendency and Variation
32 pages
2 2 T T T T M T 1 1 T 2 T 2 2 2 2 2 1 2 1 T 1 T 3 T 1 2 1! 2 2! 4 3!
No ratings yet
2 2 T T T T M T 1 1 T 2 T 2 2 2 2 2 1 2 1 T 1 T 3 T 1 2 1! 2 2! 4 3!
48 pages
Lecture 3 & 4 Describing Data Numerical Measures
No ratings yet
Lecture 3 & 4 Describing Data Numerical Measures
24 pages
Statistical Analysis of Data: Reported By: Kasandra Jane D. Comia
No ratings yet
Statistical Analysis of Data: Reported By: Kasandra Jane D. Comia
34 pages
Measures of Central Tendency/ Dispersion: Anastat Lesson3 Amdelosreyes
No ratings yet
Measures of Central Tendency/ Dispersion: Anastat Lesson3 Amdelosreyes
12 pages
Module 10 Introduction To Data and Statistics
No ratings yet
Module 10 Introduction To Data and Statistics
63 pages
Scnotes
No ratings yet
Scnotes
96 pages
Introduction To Business Statistics Through R Software: Software
From Everand
Introduction To Business Statistics Through R Software: Software
Editor IJSMI
No ratings yet
Chapter 6 Continuous Distributions
No ratings yet
Chapter 6 Continuous Distributions
22 pages
Descriptive Statistic
No ratings yet
Descriptive Statistic
37 pages
Tutorial 02 Probabilistic Analysis (Swedge) PDF
No ratings yet
Tutorial 02 Probabilistic Analysis (Swedge) PDF
28 pages
Statistics: Paper 2
No ratings yet
Statistics: Paper 2
12 pages
Statistics - Imp Points
No ratings yet
Statistics - Imp Points
6 pages
Grade Aggregation Moodle
No ratings yet
Grade Aggregation Moodle
4 pages
Introduction To Non Parametric Methods Through R Software
From Everand
Introduction To Non Parametric Methods Through R Software
Editor IJSMI
No ratings yet
MMW 6 Data Management Part 3 Central Location Variability PDF
No ratings yet
MMW 6 Data Management Part 3 Central Location Variability PDF
5 pages
4 1 2 A Candystatistics 1
No ratings yet
4 1 2 A Candystatistics 1
5 pages
Chapt3 Overheads
No ratings yet
Chapt3 Overheads
8 pages
Probable Maximum Precipitation Analysis Bakun
No ratings yet
Probable Maximum Precipitation Analysis Bakun
7 pages
Bow Stat and Proba
No ratings yet
Bow Stat and Proba
2 pages

08.11 Week 5, Class 2 - Descriptive Analytics and Data Wrangling With Pandas

Uploaded by

08.11 Week 5, Class 2 - Descriptive Analytics and Data Wrangling With Pandas

Uploaded by

AI/ML

Population Variance(Credit) Sample Variance(Credit)

• Consider the dataset 4, 5, 5, 5, 6, 6, 6, 6, 7, 7, 7, 8.

Visual illustration of Spread of Data around Mean

You might also like