IPython, pandas and matplotlib have a number of useful options you can use to make it easier to view and format your data. This notebook collects a bunch of them in one place. I hope this will be a useful reference.
The original blog posting is on http://pbpython.com/ipython-pandas-display-tips.html
First, do our standard pandas, numpy and matplotlib imports as well as configure inline displays of plots.
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
%matplotlib inline
One of the simple things we can do is override the default CSS to customize our DataFrame output.
This specific example is from - Brandon Rhodes' talk at pycon
For the purposes of the notebook, I'm defining CSS as a variable but you could easily read in from a file as well.
CSS = """
body {
margin: 0;
font-family: Helvetica;
}
table.dataframe {
border-collapse: collapse;
border: none;
}
table.dataframe tr {
border: none;
}
table.dataframe td, table.dataframe th {
margin: 0;
border: 1px solid white;
padding-left: 0.25em;
padding-right: 0.25em;
}
table.dataframe th:not(:empty) {
background-color: #fec;
text-align: left;
font-weight: normal;
}
table.dataframe tr:nth-child(2) th:empty {
border-left: none;
border-right: 1px dashed #888;
}
table.dataframe td {
border: 2px solid #ccf;
background-color: #f4f4ff;
}
"""
Now add this CSS into the current notebook's HTML.
from IPython.core.display import HTML
HTML('<style>{}</style>'.format(CSS))
SALES=pd.read_csv("../data/sample-sales-tax.csv", parse_dates=True)
SALES.head()
account number | name | sku | ... | Tax rate | Tax amount | date | |
---|---|---|---|---|---|---|---|
0 | 296809 | Carroll PLC | QN-82852 | ... | $0.07 | $40.48 | 2014-09-27 07:13:03 |
1 | 98022 | Heidenreich-Bosco | MJ-21460 | ... | $0.07 | $71.31 | 2014-07-29 02:10:44 |
... | ... | ... | ... | ... | ... | ... | ... |
3 | 93356 | Waters-Walker | AS-93055 | ... | $0.07 | $28.94 | 2013-11-17 20:41:11 |
4 | 659366 | Waelchi-Fahey | AS-93055 | ... | $0.07 | $125.55 | 2014-01-03 08:14:27 |
5 rows × 10 columns
You can see how the CSS is now applied to the DataFrame and how you could easily modify it to customize it to your liking.
Jupyter notebooks do a good job of automatically displaying information but sometimes you want to force data to display. Fortunately, ipython provides and option. This is especially useful if you want to display multiple dataframes.
from IPython.display import display
display(SALES.head(2))
display(SALES.tail(2))
display(SALES.describe())
account number | name | sku | ... | Tax rate | Tax amount | date | |
---|---|---|---|---|---|---|---|
0 | 296809 | Carroll PLC | QN-82852 | ... | $0.07 | $40.48 | 2014-09-27 07:13:03 |
1 | 98022 | Heidenreich-Bosco | MJ-21460 | ... | $0.07 | $71.31 | 2014-07-29 02:10:44 |
2 rows × 10 columns
account number | name | sku | ... | Tax rate | Tax amount | date | |
---|---|---|---|---|---|---|---|
998 | 304860 | Huel-Haag | LL-46261 | ... | $0.07 | $61.88 | 2014-07-26 01:10:57 |
999 | 98022 | Heidenreich-Bosco | LW-86841 | ... | $0.07 | $73.33 | 2014-06-27 05:58:33 |
2 rows × 10 columns
account number | quantity | unit price | ext price | Tax rate | Tax amount | |
---|---|---|---|---|---|---|
count | $1000.00 | $1000.00 | $1000.00 | $1000.00 | $1000.00 | $1000.00 |
mean | $535208.90 | $10.33 | $56.18 | $579.84 | $0.04 | $20.06 |
... | ... | ... | ... | ... | ... | ... |
75% | $750461.00 | $15.00 | $76.80 | $878.14 | $0.04 | $26.30 |
max | $995267.00 | $20.00 | $99.97 | $1994.80 | $0.15 | $191.25 |
8 rows × 6 columns
Pandas has many different options to control how data is displayed.
You can use max_rows to control how many rows are displayed
pd.set_option("display.max_rows",4)
SALES
account number | name | sku | ... | Tax rate | Tax amount | date | |
---|---|---|---|---|---|---|---|
0 | 296809 | Carroll PLC | QN-82852 | ... | $0.07 | $40.48 | 2014-09-27 07:13:03 |
1 | 98022 | Heidenreich-Bosco | MJ-21460 | ... | $0.07 | $71.31 | 2014-07-29 02:10:44 |
... | ... | ... | ... | ... | ... | ... | ... |
998 | 304860 | Huel-Haag | LL-46261 | ... | $0.07 | $61.88 | 2014-07-26 01:10:57 |
999 | 98022 | Heidenreich-Bosco | LW-86841 | ... | $0.07 | $73.33 | 2014-06-27 05:58:33 |
1000 rows × 10 columns
Depending on the data set, you may only want to display a smaller number of columns.
pd.set_option("display.max_columns",6)
SALES
account number | name | sku | ... | Tax rate | Tax amount | date | |
---|---|---|---|---|---|---|---|
0 | 296809 | Carroll PLC | QN-82852 | ... | $0.07 | $40.48 | 2014-09-27 07:13:03 |
1 | 98022 | Heidenreich-Bosco | MJ-21460 | ... | $0.07 | $71.31 | 2014-07-29 02:10:44 |
... | ... | ... | ... | ... | ... | ... | ... |
998 | 304860 | Huel-Haag | LL-46261 | ... | $0.07 | $61.88 | 2014-07-26 01:10:57 |
999 | 98022 | Heidenreich-Bosco | LW-86841 | ... | $0.07 | $73.33 | 2014-06-27 05:58:33 |
1000 rows × 10 columns
You can control how many decimal points of precision to display
pd.set_option('precision',2)
SALES
account number | name | sku | ... | Tax rate | Tax amount | date | |
---|---|---|---|---|---|---|---|
0 | 296809 | Carroll PLC | QN-82852 | ... | $0.07 | $40.48 | 2014-09-27 07:13:03 |
1 | 98022 | Heidenreich-Bosco | MJ-21460 | ... | $0.07 | $71.31 | 2014-07-29 02:10:44 |
... | ... | ... | ... | ... | ... | ... | ... |
998 | 304860 | Huel-Haag | LL-46261 | ... | $0.07 | $61.88 | 2014-07-26 01:10:57 |
999 | 98022 | Heidenreich-Bosco | LW-86841 | ... | $0.07 | $73.33 | 2014-06-27 05:58:33 |
1000 rows × 10 columns
pd.set_option('precision',7)
SALES
account number | name | sku | ... | Tax rate | Tax amount | date | |
---|---|---|---|---|---|---|---|
0 | 296809 | Carroll PLC | QN-82852 | ... | $0.07 | $40.48 | 2014-09-27 07:13:03 |
1 | 98022 | Heidenreich-Bosco | MJ-21460 | ... | $0.07 | $71.31 | 2014-07-29 02:10:44 |
... | ... | ... | ... | ... | ... | ... | ... |
998 | 304860 | Huel-Haag | LL-46261 | ... | $0.07 | $61.88 | 2014-07-26 01:10:57 |
999 | 98022 | Heidenreich-Bosco | LW-86841 | ... | $0.07 | $73.33 | 2014-06-27 05:58:33 |
1000 rows × 10 columns
You can also format floating point numbers using float_format
pd.set_option('float_format', '{:.2f}'.format)
SALES
account number | name | sku | ... | Tax rate | Tax amount | date | |
---|---|---|---|---|---|---|---|
0 | 296809 | Carroll PLC | QN-82852 | ... | 0.07 | 40.48 | 2014-09-27 07:13:03 |
1 | 98022 | Heidenreich-Bosco | MJ-21460 | ... | 0.07 | 71.31 | 2014-07-29 02:10:44 |
... | ... | ... | ... | ... | ... | ... | ... |
998 | 304860 | Huel-Haag | LL-46261 | ... | 0.07 | 61.88 | 2014-07-26 01:10:57 |
999 | 98022 | Heidenreich-Bosco | LW-86841 | ... | 0.07 | 73.33 | 2014-06-27 05:58:33 |
1000 rows × 10 columns
This does apply to all the data. In our example, applying dollar signs to everything would not be correct for this example.
pd.set_option('float_format', '${:.2f}'.format)
SALES
account number | name | sku | ... | Tax rate | Tax amount | date | |
---|---|---|---|---|---|---|---|
0 | 296809 | Carroll PLC | QN-82852 | ... | $0.07 | $40.48 | 2014-09-27 07:13:03 |
1 | 98022 | Heidenreich-Bosco | MJ-21460 | ... | $0.07 | $71.31 | 2014-07-29 02:10:44 |
... | ... | ... | ... | ... | ... | ... | ... |
998 | 304860 | Huel-Haag | LL-46261 | ... | $0.07 | $61.88 | 2014-07-26 01:10:57 |
999 | 98022 | Heidenreich-Bosco | LW-86841 | ... | $0.07 | $73.33 | 2014-06-27 05:58:33 |
1000 rows × 10 columns
Qtopian has a useful plugin called qgrid - https://github.com/quantopian/qgrid
Import it and install it.
import qgrid
qgrid.nbinstall(overwrite=True)
Showing the data is straighforward.
qgrid.show_grid(SALES, remote_js=True)
Widget Javascript not detected. It may not be installed or enabled properly.
The plugin is very similar to the capability of an Excel autofilter. It can be handy to quickly filter and sort your data.
I have mentioned before how the default pandas plots don't look so great. Fortunately, there are style sheets in matplotlib which go a long way towards improving the visualization of your data.
Here is a simple plot with the default values.
SALES.groupby('name')['quantity'].sum().plot(kind="bar")
<matplotlib.axes._subplots.AxesSubplot at 0x7ff1609ac7d0>
We can use some of the matplolib styles available to us to make this look better. http://matplotlib.org/users/style_sheets.html
plt.style.use('ggplot')
SALES.groupby('name')['quantity'].sum().plot(kind="bar")
<matplotlib.axes._subplots.AxesSubplot at 0x7ff159347a90>
You can see all the styles available
plt.style.available
[u'dark_background', u'bmh', u'grayscale', u'ggplot', u'fivethirtyeight']
plt.style.use('bmh')
SALES.groupby('name')['quantity'].sum().plot(kind="bar")
<matplotlib.axes._subplots.AxesSubplot at 0x7ff15919c410>
plt.style.use('fivethirtyeight')
SALES.groupby('name')['quantity'].sum().plot(kind="bar")
<matplotlib.axes._subplots.AxesSubplot at 0x7ff1537426d0>
Each of the different styles have subtle (and not so subtle) changes. Fortunately it is easy to experiment with them and your own plots.
You can find other articles at Practical Business Python
This notebook is referenced in the following post - http://pbpython.com/ipython-pandas-display-tips.html