0% found this document useful (0 votes)

23 views11 pages

Pivot Tables

The document discusses pivot tables, which are a statistics tool that summarizes and reorganizes data in a spreadsheet or database table. Pivot tables allow users to view data from different perspectives by pivoting or turning the data. The document provides examples of how to create and use pivot tables to analyze Titanic passenger data by factors like gender, class, age and fare using Pandas in Python.

Uploaded by

Smriti Sharma

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

23 views11 pages

Pivot Tables

Uploaded by

Smriti Sharma

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

You are on page 1/ 11

Pivot Tables

A pivot table is a statistics tool that summarizes and reorganizes selected columns
and rows of data in a spreadsheet or database table to obtain a desired report. The
tool does not actually change the spreadsheet or database itself, it simply “pivots”
or turns the data to view it from different perspectives.

Pivot tables are especially useful with large amounts of data that would be time-
consuming to calculate by hand. A few data processing functions a pivot table can
perform include identifying sums, averages, ranges or outliers. The table then
arranges this information in a simple, meaningful layout that draws attention to key
values.

How pivot tables work

When users create a pivot table, there are four main components:

1. Columns- When a field is chosen for the column area, only the unique values
of the field are listed across the top.

2. Rows- When a field is chosen for the row area, it populates as the first column.
Similar to the columns, all row labels are the unique values and duplicates are
removed.

3. Values- Each value is kept in a pivot table cell and display the summarized
information. The most common values are sum, average, minimum and
maximum.

4. Filters- Filters apply a calculation or restriction to the entire table.

For example, a store owner might list monthly sales totals for a large number of
merchandise items in an Excel spreadsheet. If they wanted to know which items
sold better in a particular financial quarter, they could use a pivot table. The sales
quarters would be listed across the top as column labels and the products would be
listed in the first column as rows. The values in the worksheet would show the sum
of sales for each product in each quarter. A filter could then be applied to only
show specific quarters, specific products or averages.

Uses of a pivot table

A pivot table helps users answer business questions with minimal effort. Common
pivot table uses include:

 To calculate sums or averages in business situations. For example, counting

sales by department or region.

 To show totals as a percentage of a whole. For example, comparing sales for a

specific product to total sales.

 To generate a list of unique values. For example, showing which states or

countries have ordered a product.

 To create a 2x2 table summary of a complex report.

 To identify the maximum and minimum values of a dataset.

 To query information directly from an online analytical processing (OLAP)

server.

Motivating Pivot Tables

For the examples in this section, we'll use the database of passengers on the Titanic,
available through the Seaborn library (see Visualization With Seaborn):

In [1]:
import numpy as np
import pandas as pd
import seaborn as sns
titanic = sns.load_dataset('titanic')
In [2]:
titanic.head()
Out[2]:
surviv pcla sibs parc embark clas adult_m dec embark_to aliv alon
sex age fare who
ed ss p h ed s ale k wn e e

22. 7.250 Thir Na Southampto Fals

00 3 male 1 0 S man True no
0 0 d N n e

fema 38. 71.28 wom Fals

11 1 1 0 C First False C Cherbourg yes
le 0 33 an e

fema 26. 7.925 Thir wom Na Southampto

21 3 0 0 S False yes True
le 0 0 d an N n

fema 35. 53.10 wom Southampto Fals

31 1 1 0 S First False C yes
le 0 00 an n e

35. 8.050 Thir Na Southampto

40 3 male 0 0 S man True no True
0 0 d N n
This contains a wealth of information on each passenger of that ill-fated voyage,
including gender, age, class, fare paid, and much more.

Pivot Tables by Hand

To start learning more about this data, we might begin by grouping according to
gender, survival status, or some combination thereof. If you have read the previous
section, you might be tempted to apply a GroupBy operation–for example, let's look at
survival rate by gender:

In [3]:
titanic.groupby('sex')[['survived']].mean()
Out[3]:

survived

sex

female0.742038

male 0.188908
This immediately gives us some insight: overall, three of every four females on board
survived, while only one in five males survived!

This is useful, but we might like to go one step deeper and look at survival by both
sex and, say, class. Using the vocabulary of GroupBy, we might proceed using
something like this: we group by class and gender, select survival, apply a mean
aggregate, combine the resulting groups, and then unstack the hierarchical index to
reveal the hidden multidimensionality. In code:

In [4]:
titanic.groupby(['sex', 'class'])['survived'].aggregate('mean').unstack()
Out[4]:

class First Second Third

sex

female0.9680850.9210530.500000

male 0.3688520.1574070.135447
This gives us a better idea of how both gender and class affected survival, but the
code is starting to look a bit garbled. While each step of this pipeline makes sense in
light of the tools we've previously discussed, the long string of code is not particularly
easy to read or use. This two-dimensional GroupBy is common enough that Pandas
includes a convenience routine, pivot_table, which succinctly handles this type of
multi-dimensional aggregation.

Pivot Table Syntax

Here is the equivalent to the preceding operation using the pivot_table method
of DataFrames:

In [5]:
titanic.pivot_table('survived', index='sex', columns='class')
Out[5]:

class First Second Third

sex

female0.9680850.9210530.500000

male 0.3688520.1574070.135447
This is eminently more readable than the groupby approach, and produces the same
result. As you might expect of an early 20th-century transatlantic cruise, the survival
gradient favors both women and higher classes. First-class women survived with
near certainty (hi, Rose!), while only one in ten third-class men survived (sorry,
Jack!).

Multi-level pivot tables

Just as in the GroupBy, the grouping in pivot tables can be specified with multiple
levels, and via a number of options. For example, we might be interested in looking
at age as a third dimension. We'll bin the age using the pd.cut function:

In [6]:
age = pd.cut(titanic['age'], [0, 18, 80])
titanic.pivot_table('survived', ['sex', age], 'class')
Out[6]:

class First Second Third

sex age

female(0, 18] 0.9090911.0000000.511628

(18, 80]0.9729730.9000000.423729

male (0, 18] 0.8000000.6000000.215686

(18, 80]0.3750000.0714290.133663
We can apply the same strategy when working with the columns as well; let's add
info on the fare paid using pd.qcut to automatically compute quantiles:

In [7]:
fare = pd.qcut(titanic['fare'], 2)
titanic.pivot_table('survived', ['sex', age], [fare, 'class'])
Out[7]:

fare [0, 14.454] (14.454, 512.329]

class FirstSecond Third First Second Third

sex age

female(0, 18] NaN 1.0000000.7142860.9090911.0000000.318182

(18, 80]NaN 0.8800000.4444440.9729730.9142860.391304

male (0, 18] NaN 0.0000000.2608700.8000000.8181820.178571

(18, 80]0.0 0.0980390.1250000.3913040.0303030.192308

The result is a four-dimensional aggregation with hierarchical indices
(see Hierarchical Indexing), shown in a grid demonstrating the relationship between
the values.

Additional pivot table options

The full call signature of the pivot_table method of DataFrames is as follows:

# call signature as of Pandas 0.18

DataFrame.pivot_table(data, values=None, index=None, columns=None,

aggfunc='mean', fill_value=None, margins=False,

dropna=True, margins_name='All')

We've already seen examples of the first three arguments; here we'll take a quick
look at the remaining ones. Two of the options, fill_value and dropna, have to do
with missing data and are fairly straightforward; we will not show examples of them
here.

The aggfunc keyword controls what type of aggregation is applied, which is a mean
by default. As in the GroupBy, the aggregation specification can be a string
representing one of several common choices
(e.g., 'sum', 'mean', 'count', 'min', 'max', etc.) or a function that implements an
aggregation (e.g., np.sum(), min(), sum(), etc.). Additionally, it can be specified as a
dictionary mapping a column to any of the above desired options:

In [8]:
titanic.pivot_table(index='sex', columns='class',
aggfunc={'survived':sum, 'fare':'mean'})
Out[8]:

fare survived

class First Second Third FirstSecondThird

sex

female106.12579821.97012116.11881091.0 70.0 72.0

male 67.226127 19.74178212.66163345.0 17.0 47.0

Notice also here that we've omitted the values keyword; when specifying a mapping
for aggfunc, this is determined automatically.

At times it's useful to compute totals along each grouping. This can be done via
the margins keyword:

In [9]:
titanic.pivot_table('survived', index='sex', columns='class', margins=True)
Out[9]:

class First Second Third All

sex

female0.9680850.9210530.5000000.742038

male 0.3688520.1574070.1354470.188908

All 0.6296300.4728260.2423630.383838
Here this automatically gives us information about the class-agnostic survival rate by
gender, the gender-agnostic survival rate by class, and the overall survival rate of
38%. The margin label can be specified with the margins_name keyword, which
defaults to "All".

Example: Birthrate Data

As a more interesting example, let's take a look at the freely available data on births
in the United States, provided by the Centers for Disease Control (CDC). This data
can be found at https://raw.githubusercontent.com/jakevdp/data-CDCbirths/master/
births.csv

In [10]:
# shell command to download the data:
# !curl -O
https://raw.githubusercontent.com/jakevdp/data-CDCbirths/master/births.csv
In [11]:
births = pd.read_csv('data/births.csv')

Taking a look at the data, we see that it's relatively simple–it contains the number of
births grouped by date and gender:

In [12]:
births.head()
Out[12]:
year monthdaygenderbirths

019691 1 F 4046

119691 1 M 4440

219691 2 F 4454

319691 2 M 4548

419691 3 F 4548
We can start to understand this data a bit more by using a pivot table. Let's add a
decade column, and take a look at male and female births as a function of decade:

In [13]:
births['decade'] = 10 * (births['year'] // 10)
births.pivot_table('births', index='decade', columns='gender', aggfunc='sum')
Out[13]:

genderF M

decade

1960 1753634 1846572

1970 1626307517121550

1980 1831035119243452

1990 1947945420420553

2000 1822930919106428
We immediately see that male births outnumber female births in every decade. To
see this trend a bit more clearly, we can use the built-in plotting tools in Pandas to
visualize the total number of births by year (see Introduction to Matplotlib for a
discussion of plotting with Matplotlib):

In [14]:
%matplotlib inline
import matplotlib.pyplot as plt
sns.set() # use Seaborn styles
births.pivot_table('births', index='year', columns='gender', aggfunc='sum').plot()
plt.ylabel('total births per year');
With a simple pivot table and plot() method, we can immediately see the annual
trend in births by gender. By eye, it appears that over the past 50 years male births
have outnumbered female births by around 5%.

Further data exploration

Though this doesn't necessarily relate to the pivot table, there are a few more
interesting features we can pull out of this dataset using the Pandas tools covered up
to this point. We must start by cleaning the data a bit, removing outliers caused by
mistyped dates (e.g., June 31st) or missing values (e.g., June 99th). One easy way
to remove these all at once is to cut outliers; we'll do this via a robust sigma-clipping
operation:

In [15]:
quartiles = np.percentile(births['births'], [25, 50, 75])
mu = quartiles[1]
sig = 0.74 * (quartiles[2] - quartiles[0])

This final line is a robust estimate of the sample mean, where the 0.74 comes from
the interquartile range of a Gaussian distribution (You can learn more about sigma-
clipping operations in a book I coauthored with Željko Ivezić, Andrew J. Connolly,
and Alexander Gray: "Statistics, Data Mining, and Machine Learning in
Astronomy" (Princeton University Press, 2014)).

With this we can use the query() method (discussed further in High-Performance
Pandas: eval() and query()) to filter-out rows with births outside these values:

In [16]:
births = births.query('(births > @mu - 5 * @sig) & (births < @mu + 5 * @sig)')
Next we set the day column to integers; previously it had been a string because
some columns in the dataset contained the value 'null':

In [17]:
# set 'day' column to integer; it originally was a string due to nulls
births['day'] = births['day'].astype(int)

Finally, we can combine the day, month, and year to create a Date index
(see Working with Time Series). This allows us to quickly compute the weekday
corresponding to each row:

In [18]:
# create a datetime index from the year, month, day
births.index = pd.to_datetime(10000 * births.year +
100 * births.month +
births.day, format='%Y%m%d')

births['dayofweek'] = births.index.dayofweek

Using this we can plot births by weekday for several decades:

In [19]:
import matplotlib.pyplot as plt
import matplotlib as mpl

births.pivot_table('births', index='dayofweek',
columns='decade', aggfunc='mean').plot()
plt.gca().set_xticklabels(['Mon', 'Tues', 'Wed', 'Thurs', 'Fri', 'Sat', 'Sun'])
plt.ylabel('mean births by day');

Apparently births are slightly less common on weekends than on weekdays! Note
that the 1990s and 2000s are missing because the CDC data contains only the
month of birth starting in 1989.

Another intersting view is to plot the mean number of births by the day of the year.
Let's first group the data by month and day separately:

In [20]:
births_by_date = births.pivot_table('births',
[births.index.month, births.index.day])
births_by_date.head()
Out[20]:
1 1 4009.225
2 4247.400
3 4500.900
4 4571.350
5 4603.625
Name: births, dtype: float64
The result is a multi-index over months and days. To make this easily plottable, let's
turn these months and days into a date by associating them with a dummy year
variable (making sure to choose a leap year so February 29th is correctly handled!)

In [21]:
births_by_date.index = [pd.datetime(2012, month, day)
for (month, day) in births_by_date.index]
births_by_date.head()
Out[21]:
2012-01-01 4009.225
2012-01-02 4247.400
2012-01-03 4500.900
2012-01-04 4571.350
2012-01-05 4603.625
Name: births, dtype: float64
Focusing on the month and day only, we now have a time series reflecting the
average number of births by date of the year. From this, we can use the plot method
to plot the data. It reveals some interesting trends:

In [22]:
# Plot the results
fig, ax = plt.subplots(figsize=(12, 4))
births_by_date.plot(ax=ax);

In particular, the striking feature of this graph is the dip in birthrate on US holidays
(e.g., Independence Day, Labor Day, Thanksgiving, Christmas, New Year's Day)
although this likely reflects trends in scheduled/induced births rather than some deep
psychosomatic effect on natural births.

Looking at this short example, you can see that many of the Python and Pandas
tools we've seen to this point can be combined and used to gain insight from a
variety of datasets. We will see some more sophisticated applications of these data
manipulations in future sections!

B "Hello, World!" Print (B (2:5) ) Llo
No ratings yet
B "Hello, World!" Print (B (2:5) ) Llo
52 pages
Lecture 14
No ratings yet
Lecture 14
33 pages
Python For DS Cheat Sheet
100% (2)
Python For DS Cheat Sheet
6 pages
Sa Lab Manual Final Merged
No ratings yet
Sa Lab Manual Final Merged
64 pages
EDA Module 3-1
No ratings yet
EDA Module 3-1
40 pages
Data Preprocessing
No ratings yet
Data Preprocessing
64 pages
Pandas 12 Pivot Table and Drop - A
No ratings yet
Pandas 12 Pivot Table and Drop - A
18 pages
Excel Pivot Tables - Basic Beginners Guide To Learn Excel Pivot Tables For Data Analysis and Modeling
No ratings yet
Excel Pivot Tables - Basic Beginners Guide To Learn Excel Pivot Tables For Data Analysis and Modeling
167 pages
Cee3804 Databases 2015
No ratings yet
Cee3804 Databases 2015
46 pages
Chapter-2 Python Pandas
100% (2)
Chapter-2 Python Pandas
33 pages
Data Analytics
No ratings yet
Data Analytics
16 pages
Python Data Cleaning
100% (1)
Python Data Cleaning
20 pages
Benton C.J.-excel Pivot Tables & Introduction To Dashboards. The Step-By-Step Guide-Amazon Digital Services (2017)
100% (2)
Benton C.J.-excel Pivot Tables & Introduction To Dashboards. The Step-By-Step Guide-Amazon Digital Services (2017)
135 pages
Pandas Cheat Sheet PDF
67% (3)
Pandas Cheat Sheet PDF
1 page
Python Libraries Cheat Sheets
No ratings yet
Python Libraries Cheat Sheets
6 pages
Data Science Unit 2 Second Half Notes
No ratings yet
Data Science Unit 2 Second Half Notes
18 pages
Exel 2016
100% (1)
Exel 2016
68 pages
Session 5
No ratings yet
Session 5
20 pages
Data Mining Lab Manaul
No ratings yet
Data Mining Lab Manaul
32 pages
Ariba Spend-Analysis-User-Guide PDF
0% (1)
Ariba Spend-Analysis-User-Guide PDF
136 pages
Pivot Tables
No ratings yet
Pivot Tables
9 pages
CS1010S Lecture 11 - Visualising Data
No ratings yet
CS1010S Lecture 11 - Visualising Data
68 pages
DV Unit - VI Notes
No ratings yet
DV Unit - VI Notes
68 pages
Data Mining - Week - 4
No ratings yet
Data Mining - Week - 4
8 pages
Excel 2013 Pivot Tables - Maayan Poleg
100% (6)
Excel 2013 Pivot Tables - Maayan Poleg
155 pages
Pandas - Data Manipulation and Analysis Library - Educative
No ratings yet
Pandas - Data Manipulation and Analysis Library - Educative
7 pages
Pivot N DF Pandas-II
No ratings yet
Pivot N DF Pandas-II
11 pages
Python Pandas
No ratings yet
Python Pandas
19 pages
Informatics Practices Class 12 Cbse Notes Data Handling
0% (1)
Informatics Practices Class 12 Cbse Notes Data Handling
17 pages
Pandas Plots
No ratings yet
Pandas Plots
14 pages
Textbook1-Learning To Use Microsoft Excel
No ratings yet
Textbook1-Learning To Use Microsoft Excel
518 pages
Session3 PDF
No ratings yet
Session3 PDF
82 pages
Python Pandas - II
No ratings yet
Python Pandas - II
28 pages
Pivot Table and Pivot Charts
No ratings yet
Pivot Table and Pivot Charts
2 pages
Pivottables and Pivotcharts
No ratings yet
Pivottables and Pivotcharts
22 pages
Excel 2016 Keyboard Shortcuts Cheat Sheet
50% (2)
Excel 2016 Keyboard Shortcuts Cheat Sheet
2 pages
Pivot Table ST
No ratings yet
Pivot Table ST
13 pages
Pandas Stands For "Python Data Analysis Library"
No ratings yet
Pandas Stands For "Python Data Analysis Library"
19 pages
Pivot Table Introduction
100% (1)
Pivot Table Introduction
13 pages
SQL To Pandas - Group Aggregations
No ratings yet
SQL To Pandas - Group Aggregations
6 pages
Unit 5 2
No ratings yet
Unit 5 2
6 pages
Python Cheat Sheet Code Academy
100% (1)
Python Cheat Sheet Code Academy
1 page
E Commerce Notes For Bca Students
No ratings yet
E Commerce Notes For Bca Students
88 pages
Unit 2
No ratings yet
Unit 2
22 pages
Define The Pivot Table and Pivot Chart
No ratings yet
Define The Pivot Table and Pivot Chart
27 pages
Expertexcel Pivot Tables A Step by Step Guide To Learn and Master Excel Pivot Tables 1724780212
100% (1)
Expertexcel Pivot Tables A Step by Step Guide To Learn and Master Excel Pivot Tables 1724780212
66 pages
Excel Shortcut Keys
No ratings yet
Excel Shortcut Keys
11 pages
Accounting Otm214
No ratings yet
Accounting Otm214
47 pages
Fair Dealing (Short Excerpt)
No ratings yet
Fair Dealing (Short Excerpt)
22 pages
Lesson 7 - Pivot - Tagged
No ratings yet
Lesson 7 - Pivot - Tagged
10 pages
TYIT-BI-Lab-Manual AY 23-24-1
No ratings yet
TYIT-BI-Lab-Manual AY 23-24-1
129 pages
Decision Tree On Classification Lab ML - Jupyter Notebook
No ratings yet
Decision Tree On Classification Lab ML - Jupyter Notebook
13 pages
MB 910
No ratings yet
MB 910
44 pages
Excel MCQ
No ratings yet
Excel MCQ
29 pages
Chikaa A2 TucsonData
No ratings yet
Chikaa A2 TucsonData
454 pages
Pivot Table Handout
No ratings yet
Pivot Table Handout
28 pages
4 PythonPandas
No ratings yet
4 PythonPandas
8 pages
Pivot Tables: Advance Excel
No ratings yet
Pivot Tables: Advance Excel
11 pages
PivotTable and Grouping For Data Analysis - Reading Material
No ratings yet
PivotTable and Grouping For Data Analysis - Reading Material
9 pages
Shahrukh - Data Analysis in Excel
No ratings yet
Shahrukh - Data Analysis in Excel
31 pages
Pivot Tables
No ratings yet
Pivot Tables
14 pages
DevOps Session 3 Pandas
No ratings yet
DevOps Session 3 Pandas
33 pages
MB-910 Exam July 2024, Q&A 101-150
No ratings yet
MB-910 Exam July 2024, Q&A 101-150
32 pages
Practice File
No ratings yet
Practice File
63 pages
Informatics Practices: Class XII (As Per CBSE Board)
No ratings yet
Informatics Practices: Class XII (As Per CBSE Board)
20 pages
Introduction To Pivot Table
No ratings yet
Introduction To Pivot Table
10 pages
Chapter 2 Advanced Operations On Dataframeseng
No ratings yet
Chapter 2 Advanced Operations On Dataframeseng
21 pages
NetSuite ReleaseNotes - 2024.2.0
No ratings yet
NetSuite ReleaseNotes - 2024.2.0
51 pages
Excel Study Material
No ratings yet
Excel Study Material
10 pages
BI Manual (E-Next - In)
No ratings yet
BI Manual (E-Next - In)
66 pages
Everything You Need For Clear and Efficient Data Visualization
No ratings yet
Everything You Need For Clear and Efficient Data Visualization
41 pages
Decision Tree Is An Upside
No ratings yet
Decision Tree Is An Upside
17 pages
Pivottable Lab Exercise
No ratings yet
Pivottable Lab Exercise
12 pages
Competancy Based Sample Questions Grade X
No ratings yet
Competancy Based Sample Questions Grade X
3 pages
It Front Page
No ratings yet
It Front Page
3 pages
Excel 2016 For Windows Pivot Tables - Tim Hill
95% (21)
Excel 2016 For Windows Pivot Tables - Tim Hill
68 pages
Pandas Dataframe Assignment No 3 - Answerkey
No ratings yet
Pandas Dataframe Assignment No 3 - Answerkey
10 pages
Bais and Variance
No ratings yet
Bais and Variance
4 pages
Matplotlib Legend
No ratings yet
Matplotlib Legend
2 pages
Pandas Cheat Sheet
No ratings yet
Pandas Cheat Sheet
2 pages
Linear Regression Mca Lab - Jupyter Notebook
No ratings yet
Linear Regression Mca Lab - Jupyter Notebook
2 pages
Excel Course Outline
No ratings yet
Excel Course Outline
27 pages
Time Tracking Template
No ratings yet
Time Tracking Template
5 pages
Simio 2
No ratings yet
Simio 2
21 pages
Syllabus Cochin Shipyard 2024
No ratings yet
Syllabus Cochin Shipyard 2024
5 pages
Advance Excel Course Outline
No ratings yet
Advance Excel Course Outline
2 pages
Mid Term Multiple Choice
No ratings yet
Mid Term Multiple Choice
5 pages
Excel 97 Glossary
No ratings yet
Excel 97 Glossary
5 pages
Quickly Publish "Live" Tables and Charts With SPSS Smart Viewer
No ratings yet
Quickly Publish "Live" Tables and Charts With SPSS Smart Viewer
2 pages
Preparing Journal Entries With Pivot Tables: Excel
No ratings yet
Preparing Journal Entries With Pivot Tables: Excel
2 pages
PowerBI 50 Interview Questions
100% (2)
PowerBI 50 Interview Questions
16 pages
The Practically Cheating Statistics Handbook TI-83 Companion Guide
From Everand
The Practically Cheating Statistics Handbook TI-83 Companion Guide
S. Deviant
3.5/5 (3)
EXCEL COURSE
From Everand
EXCEL COURSE
Robert Stetson
No ratings yet
Matrices with MATLAB (Taken from "MATLAB for Beginners: A Gentle Approach")
From Everand
Matrices with MATLAB (Taken from "MATLAB for Beginners: A Gentle Approach")
Peter Kattan
3/5 (4)
Advanced Lotto Rotation System
From Everand
Advanced Lotto Rotation System
Joseph Z. Vlasic
No ratings yet
The Practically Cheating Statistics Handbook, The Sequel! (2nd Edition)
From Everand
The Practically Cheating Statistics Handbook, The Sequel! (2nd Edition)
S. Deviant
4.5/5 (3)

Pivot Tables

Uploaded by

Pivot Tables

Uploaded by

Pivot Tables

How pivot tables work

4. Filters- Filters apply a calculation or restriction to the entire table.

Uses of a pivot table

 To calculate sums or averages in business situations. For example, counting

 To show totals as a percentage of a whole. For example, comparing sales for a

 To generate a list of unique values. For example, showing which states or

 To create a 2x2 table summary of a complex report.

 To identify the maximum and minimum values of a dataset.

 To query information directly from an online analytical processing (OLAP)

Motivating Pivot Tables

22. 7.250 Thir Na Southampto Fals

fema 38. 71.28 wom Fals

fema 26. 7.925 Thir wom Na Southampto

fema 35. 53.10 wom Southampto Fals

35. 8.050 Thir Na Southampto

Pivot Tables by Hand

class First Second Third

Pivot Table Syntax

class First Second Third

Multi-level pivot tables

class First Second Third

female(0, 18] 0.9090911.0000000.511628

male (0, 18] 0.8000000.6000000.215686

fare [0, 14.454] (14.454, 512.329]

class FirstSecond Third First Second Third

female(0, 18] NaN 1.0000000.7142860.9090911.0000000.318182

(18, 80]NaN 0.8800000.4444440.9729730.9142860.391304

male (0, 18] NaN 0.0000000.2608700.8000000.8181820.178571

(18, 80]0.0 0.0980390.1250000.3913040.0303030.192308

Additional pivot table options

# call signature as of Pandas 0.18

DataFrame.pivot_table(data, values=None, index=None, columns=None,

aggfunc='mean', fill_value=None, margins=False,

class First Second Third FirstSecondThird

female106.12579821.97012116.11881091.0 70.0 72.0

male 67.226127 19.74178212.66163345.0 17.0 47.0

class First Second Third All

Example: Birthrate Data

1960 1753634 1846572

Further data exploration

Using this we can plot births by weekday for several decades:

You might also like