0% found this document useful (0 votes)

12 views12 pages

S09 Notes

Uploaded by

mathycheok

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

12 views12 pages

S09 Notes

Uploaded by

mathycheok

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 12

9/21/20

Seminar 9:
Descriptive Analysis

Descriptive Statistics

• Numbers that summarize, re-

describe, highlight any
meaningful ways to extract data.
• Ways to describe data:
– Measures of central tendency
• Identifying the central position of
data
– Measures of spread Spread
• Identify the variability within the
data

Central

1
9/21/20

Measures of Central Tendency

• Mean
– Average value
– For discrete (whole numbers) and continuous data (decimals).
– Mean is susceptible to outliers and skewness.
– Not for categorical data.
• Median
– Middle value.
– Divides the distribution into half. Half of the data points are less than
the median and the other half of them are more than the median.
– Less susceptible to outliers and skewness.
– Not applicable for categorical data.
• Mode
– Most frequently occurring value.
– Suitable for discrete, continuous, and categorical data.
3
– In some data, mode may not be a good representation of centrality.

Measures of Spread / Dispersion

• If the mean comes with large spread value, mean may not be
representative.
• Less variation/risk is preferred.

Range
– Difference between the highest and lowest value in data.
Quartiles
– Divide data into quarters, four equal parts (Q1, Q2, and Q3) with Q2
sitting at the median (2nd quartile is the median)
Variance
– Measures the width of its spread from center.
– Average squared difference between a variable’s value and the mean.
– Denotes the variability.
Standard Deviation
– Square root of variance. 4

2
9/21/20

Online Retail Example

• Look at the following data file:
OnlineRetail.csv

Online Retail Example

• Read in the data using • Check the datatypes of all
pandas variables using dtypes attribute:
read_csv() function:

• Understand the dimension

or size of the given data InvoiceNo and etc have been
using shape attribute:
classified as object,
in particular String object.

3
9/21/20

Online Retail Example

• Utilize describe()
from DataFrame to Ø As observed from the output, only numerical
statistics are generated. Non-numerical columns are
generate descriptive not involved.
Ø It shows the count, mean, standard deviation, Q1,
statistics that Q2, Q3, and max.

summarize the central Ø Apart from generating numbers, the more

important task is to make sense out of these
numbers.
tendency, dispersion: ØFor example, looking at the Quantity column's mean
and standard deviation, what does it tell you?

ØAnd why is the number of mean and standard

deviation for UnitPrice so close to each other? What
could be the reason for this?

Skewness

Im age source: https://upload.wikim edia.org/wikipedia/com m ons/c/cc/Relationship_between_m ean_and_m edian_under_different_skewness.png

4
9/21/20

Online Retail Example

Ø Individual statistics numbers can be generated through the call of:
• Other basic statistics can § mean()
be generated like Pearson § var()
correlation: § percentile()
§ median()
§ mode()
§ std()
§ count()
§ sum()
§ min()
§ max()
§ abs()
§ cov() # covariance matrix
• There appears to be weak § kurt() # kurtosis value
negative correlations § skew() # skewness index, positive for right skew and negative for left
observed between pairs of skew.

variables.
9

Data Indexing
• Indexing refers to the
position of a subset
of data within an
iterable structure.
• Iterable means loop-
able, you can make a
for-loop to go from
one element to next
element.

5
9/21/20

String Revisit

String Indexing Revisit

• String associates each character with an
index number.
• Index number starts from 0, increments by 1
starting from the left.
• Use square bracket to embed index number.
• To refer to a particular character, refer to it
using the format:
– string_var_name[index]
12

6
9/21/20

String Indexing Common Mistake

• It is a common mistake to think that the

first character of a String has index
number 1. That is wrong!
• The first character of a String has index
number 0, as shown below:

String Reverse Indexing Revisit

• Default indexing starts from the left.

• Reverse indexing starts from the right,
using negative notation.
• To refer to a particular character, refer to it
using the format:
– string_var_name[-index]

7
9/21/20

String Slicing Revisit

• Slicing extracts a subset of string sequence.
• Syntax as follows:
– string_var_name[start: stop]
– start: starting index of extraction
– stop: stopping index of extraction, excluding last
position

String Slicing Common Mistake

• It is a common mistake to include the

stopping index as the last position of
extraction. That is wrong!
– In the example below, index 5 (which corresponds to
the character m) is excluded from the extraction.

8
9/21/20

String Slicing Revisit

• The default index for start is 0.
• The default index for stop if not specified is
assumed to be till the end of string.

• Slicing also works for step change, syntax as

follows:
– string_var_name[start: stop: step]
17

Now, we look back to DataFrame

• Connect the codes to a data file
• Then we will look at how to do data
retrieval from the dataset.

import pandas as pd
df = pd.read_csv("OnlineRetail.csv")

9
9/21/20

Try out the following pandas

functions:
Function name Description
df Display the content of dataframe
df.head() See the first 5 records
df.tail() See the last 5 records
df.loc[0] See the first row of data
df.loc[1:3] See the second to forth row of data
df.loc[0, “InvoiceNo”] See data for row 0 in a particular column
(InvoiceNo column in this example)
df.loc[0, ["InvoiceNo","Description"]] See data for row 0 in 2 columns
(InvoiceNo & Description columns)
df[‘colName’] See only one column of information

DataFrame: useful row operations

• Row is accessed via the use of index, index 0 is
the first row of data. Get a specific row or rows
(slicing) using loc[index]:
Getting first row Getting second to forth rows

Ø df.drop(number) # drop certain row

10
9/21/20

DataFrame Slicing Common Mistake

• When using loc[start:stop], it is a common

mistake to exclude the stopping index as the last
position of extraction. That is wrong!
– In the example below, index 3 (which corresponds to
the 4th row) is included in the extraction.

Getting data from column(s)

• Getting one column

• Getting multiple columns

How can you get data from multiple rows and multiple columns?

11
9/21/20

You have learnt...

1. To run descriptive analysis using pandas library.
2. Selecting data
3. Slicing data

Python For Finance - The Complete Beginner's Guide - by Behic Guven - Jul, 2020 - Towards Data Science PDF
100% (1)
Python For Finance - The Complete Beginner's Guide - by Behic Guven - Jul, 2020 - Towards Data Science PDF
12 pages
Namma Kalvi 12th Computer Applications Chapter 5 Study Material em 215027
100% (1)
Namma Kalvi 12th Computer Applications Chapter 5 Study Material em 215027
12 pages
On Data Handling Using Pandas-I
100% (2)
On Data Handling Using Pandas-I
64 pages
Chapter 2 - Python Pandas II
No ratings yet
Chapter 2 - Python Pandas II
71 pages
Pandas For Machine Learning: Acadview
No ratings yet
Pandas For Machine Learning: Acadview
18 pages
Python Data Science 101
100% (1)
Python Data Science 101
41 pages
Achievement Test
No ratings yet
Achievement Test
17 pages
Python Notes by Prof T
No ratings yet
Python Notes by Prof T
10 pages
Unit 2 Reading and Writing Files
No ratings yet
Unit 2 Reading and Writing Files
33 pages
Chapter-2 Python Pandas
100% (2)
Chapter-2 Python Pandas
33 pages
Pandas
No ratings yet
Pandas
13 pages
Catalogo Erico Pararrayos Dinasphere
100% (1)
Catalogo Erico Pararrayos Dinasphere
6 pages
Sources of Experimental Error
No ratings yet
Sources of Experimental Error
3 pages
Pest Identification Using Matlab
100% (1)
Pest Identification Using Matlab
14 pages
Daa Unit V Notes
No ratings yet
Daa Unit V Notes
7 pages
6 - Set Membership and Set Containment
No ratings yet
6 - Set Membership and Set Containment
30 pages
2D Shapes
No ratings yet
2D Shapes
62 pages
Data Understanding and Preparation
No ratings yet
Data Understanding and Preparation
48 pages
Data Manipulation With Pandas
No ratings yet
Data Manipulation With Pandas
39 pages
Introduction To Data Science Using Python Part2
No ratings yet
Introduction To Data Science Using Python Part2
45 pages
Numbers: # Basic Calculations 1+2 5/6 # Numbers A 123.1 Print (A) B 10 Print (B) A + B C A + B Print (C)
No ratings yet
Numbers: # Basic Calculations 1+2 5/6 # Numbers A 123.1 Print (A) B 10 Print (B) A + B C A + B Print (C)
80 pages
Relation and Function Enhanced
No ratings yet
Relation and Function Enhanced
50 pages
Mathematics
No ratings yet
Mathematics
57 pages
Python For Statistics
No ratings yet
Python For Statistics
40 pages
Data Analytics and Interactive Dashboards Using Python
No ratings yet
Data Analytics and Interactive Dashboards Using Python
96 pages
1 Data Handling Using Pandas 1
No ratings yet
1 Data Handling Using Pandas 1
63 pages
Project Report
No ratings yet
Project Report
37 pages
Intro To Pandas For Data Analytics
No ratings yet
Intro To Pandas For Data Analytics
20 pages
Introduction To Pandas
No ratings yet
Introduction To Pandas
27 pages
Saideep Mukhopadhyay IT
No ratings yet
Saideep Mukhopadhyay IT
19 pages
Kaycee Lesson Plan
No ratings yet
Kaycee Lesson Plan
5 pages
PYTHON Pandas and Manipulation Data
No ratings yet
PYTHON Pandas and Manipulation Data
36 pages
Class Xii Information Practices PPT On Data Handling Using Pandas-I
No ratings yet
Class Xii Information Practices PPT On Data Handling Using Pandas-I
64 pages
PDS Qba
No ratings yet
PDS Qba
12 pages
MATH 4 - Differential-Equations
No ratings yet
MATH 4 - Differential-Equations
23 pages
Navigational Aids Chief Mate F.G. Phase 2 Question Papers Till Nov24 A5gzkf
No ratings yet
Navigational Aids Chief Mate F.G. Phase 2 Question Papers Till Nov24 A5gzkf
92 pages
Murali Internship
No ratings yet
Murali Internship
34 pages
CE880 Lecture3 Slides
No ratings yet
CE880 Lecture3 Slides
44 pages
Pierian Data - Python For Finance & Algorithmic Trading Course Notes
No ratings yet
Pierian Data - Python For Finance & Algorithmic Trading Course Notes
11 pages
1.2 Resolution of Force
No ratings yet
1.2 Resolution of Force
43 pages
Introduction To Data Analytics: Instructor: Parisa Pouladzadeh Email: Parisa - Pouladzadeh@humber - Ca
No ratings yet
Introduction To Data Analytics: Instructor: Parisa Pouladzadeh Email: Parisa - Pouladzadeh@humber - Ca
81 pages
Exploratory Data Analysis: by Neha Mathur
No ratings yet
Exploratory Data Analysis: by Neha Mathur
14 pages
Python For Data Analysis Edgar
No ratings yet
Python For Data Analysis Edgar
49 pages
Python-for-Data-Analysis (Pandas
No ratings yet
Python-for-Data-Analysis (Pandas
31 pages
Pandas
No ratings yet
Pandas
29 pages
Exploratory Data Analysis: by Neha Mathur
No ratings yet
Exploratory Data Analysis: by Neha Mathur
14 pages
CSE445 NSU Week - 3
No ratings yet
CSE445 NSU Week - 3
48 pages
CO3 - 1 - Pandas Series and Data Frame
No ratings yet
CO3 - 1 - Pandas Series and Data Frame
37 pages
Rajni Ip File Final
No ratings yet
Rajni Ip File Final
42 pages
Data Aggregation Using Python
No ratings yet
Data Aggregation Using Python
33 pages
Daur Unit 2
No ratings yet
Daur Unit 2
28 pages
Data Wrangling With Python and Pandas
No ratings yet
Data Wrangling With Python and Pandas
7 pages
04-Data Manipulation With Pandas
No ratings yet
04-Data Manipulation With Pandas
28 pages
04 Getting Started With Pandas
No ratings yet
04 Getting Started With Pandas
85 pages
Tulane University Sea-Level Rise Study
No ratings yet
Tulane University Sea-Level Rise Study
11 pages
Wahid Ali - 231203 - 132815
No ratings yet
Wahid Ali - 231203 - 132815
10 pages
Pandas 2
No ratings yet
Pandas 2
17 pages
Data Analysis Tools
No ratings yet
Data Analysis Tools
26 pages
Course - Introduction To Data Science (SD211105)
No ratings yet
Course - Introduction To Data Science (SD211105)
10 pages
Applications of Trigonometry
No ratings yet
Applications of Trigonometry
7 pages
Pandas Numpy Handing Data
No ratings yet
Pandas Numpy Handing Data
32 pages
Assignment - Lagrangian Based Problems 1
No ratings yet
Assignment - Lagrangian Based Problems 1
15 pages
CLS - XIII Phy Target-1 Level-2 Chapter-3
No ratings yet
CLS - XIII Phy Target-1 Level-2 Chapter-3
16 pages
Lecture 7 Working With Pandas
No ratings yet
Lecture 7 Working With Pandas
15 pages
Pandas
No ratings yet
Pandas
5 pages
Lab 1 ML Lab
No ratings yet
Lab 1 ML Lab
15 pages
ML Report
No ratings yet
ML Report
12 pages
Pandas PDF
No ratings yet
Pandas PDF
25 pages
Supermarket Sales Data Analysis
No ratings yet
Supermarket Sales Data Analysis
6 pages
Yangon Wind Speed
No ratings yet
Yangon Wind Speed
5 pages
BigData - BCom Unit 4
No ratings yet
BigData - BCom Unit 4
9 pages
Getting Start With Pandas
No ratings yet
Getting Start With Pandas
11 pages
JJKJK
No ratings yet
JJKJK
10 pages
Revision Test
No ratings yet
Revision Test
6 pages
Class XII IP Key Points (Python Pandas)
No ratings yet
Class XII IP Key Points (Python Pandas)
5 pages
CW1 Balancing of Rotating Masses
No ratings yet
CW1 Balancing of Rotating Masses
5 pages
Data Handling Module
No ratings yet
Data Handling Module
10 pages
Det KSP
No ratings yet
Det KSP
4 pages
AB0403 Data Description and Analysis
No ratings yet
AB0403 Data Description and Analysis
5 pages
Bearing Stress: A P DT P
No ratings yet
Bearing Stress: A P DT P
5 pages
Pandas
No ratings yet
Pandas
20 pages
CBSE Class 8 Maths Activity 4
No ratings yet
CBSE Class 8 Maths Activity 4
2 pages
Chapter 2 Modeling in The Frequency Domain
No ratings yet
Chapter 2 Modeling in The Frequency Domain
3 pages
ITS62604 Tutorial 6 (Answer)
No ratings yet
ITS62604 Tutorial 6 (Answer)
2 pages
Act std4
No ratings yet
Act std4
3 pages
15-150703-Design and Analysis of Algorithms PDF
No ratings yet
15-150703-Design and Analysis of Algorithms PDF
2 pages
CH2114
No ratings yet
CH2114
2 pages
De-Mystifying Math and Stats for Machine Learning: Mastering the Fundamentals of Mathematics and Statistics for Machine Learning
From Everand
De-Mystifying Math and Stats for Machine Learning: Mastering the Fundamentals of Mathematics and Statistics for Machine Learning
Seaport AI Madhavan
No ratings yet
Illuminating Data: A hands on guide to data visualization in R
From Everand
Illuminating Data: A hands on guide to data visualization in R
Eman Ahmad
No ratings yet
Business Statistics I Essentials
From Everand
Business Statistics I Essentials
Louise Clark
5/5 (5)

S09 Notes

Uploaded by

S09 Notes

Uploaded by

9/21/20

• Numbers that summarize, re-

Measures of Central Tendency

Measures of Spread / Dispersion

Online Retail Example

Online Retail Example

• Understand the dimension

Online Retail Example

summarize the central Ø Apart from generating numbers, the more

ØAnd why is the number of mean and standard

Im age source: https://upload.wikim edia.org/wikipedia/com m ons/c/cc/Relationship_between_m ean_and_m edian_under_different_skewness.png

Online Retail Example

String Indexing Revisit

String Indexing Common Mistake

• It is a common mistake to think that the

String Reverse Indexing Revisit

• Default indexing starts from the left.

String Slicing Revisit

String Slicing Common Mistake

• It is a common mistake to include the

String Slicing Revisit

• Slicing also works for step change, syntax as

Now, we look back to DataFrame

Try out the following pandas

DataFrame: useful row operations

Ø df.drop(number) # drop certain row

DataFrame Slicing Common Mistake

• When using loc[start:stop], it is a common

Getting data from column(s)

• Getting multiple columns

You have learnt...

You might also like