Data Visualization II
Data Visualization II
Part II
1
First, Read Data from CSV file
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
%matplotlib inline
sales =
pd.read_csv("https://raw.githubusercontent.com/GerhardT
rippen/DataSets/master/sample-salesv2.csv",
parse_dates=['date'])
sales.head()
sales.dtypes
sales.describe()
sales['unit price'].describe() 2
Customers
customers = sales[['name','ext price','date']]
customers.head()
customer_group = customers.groupby('name')
customer_group.size()
sales_totals = customer_group.sum()
sales_totals.sort_values('ext price').head()
my_plot = sales_totals.plot(kind='bar')
my_plot = sales_totals.plot(kind='barh')
# identical
my_plot = sales_totals.plot.bar()
3
Customers – Title and Labels
my_plot = sales_totals.sort_values('ext price',
ascending=False).plot(kind='bar', legend=None,
title="Total Sales by Customer")
my_plot.set_xlabel("Customers")
my_plot.set_ylabel("Sales ($)")
4
Customers with Product Category
customers = sales[['name', 'category', 'ext price',
'date']]
customers.head()
category_group =
customers.groupby(['name','category']).sum()
category_group.head(10)
category_group = category_group.unstack()
category_group.head(10)
my_plot = category_group.plot(kind='bar', stacked=True,
title="Total Sales by Customer")
my_plot.set_xlabel("Customers")
my_plot.set_ylabel("Sales ($)")
my_plot.legend(["Belts","Shirts","Shoes"], loc='best',
ncol=3)
5
Customers with Product Category –
Sorted!
category_group = category_group.sort_values(('ext
price', 'Belt'), ascending=False)
category_group.head()
my_plot = category_group.plot(kind='bar', stacked=True,
title="Total Sales by Customer")
purchase_plot = purchase_patterns['ext
price'].hist(bins=20)
7
Purchase Patterns – Timeline
purchase_patterns = purchase_patterns.set_index('date')
purchase_patterns.head()
# sorted by time
purchase_patterns.sort_index()
# resampled by months
purchase_plot =
purchase_patterns.resample('M').sum().plot(title="Total
Sales by Month", legend=None)
9
Boxplots …
# Box and Whisker Plots
sales.boxplot(figsize=(14,10)) # Not very useful!
print(visitors.shape)
print(visitors.head())
print(visitors.dtypes)
12
Histograms, Density Plots, Box and
Whisker Plots
# Univariate Histograms
visitors.hist()
13
Correlation Matrix Plot
# correlation matrix
correlations = visitors.corr()
# plot correlation matrix (generic)
fig = plt.figure()
ax = fig.add_subplot(111)
cax = ax.matshow(correlations, vmin=-1, vmax=1)
fig.colorbar(cax)
15
Additional Readings
Python for Data Analysis: Data Wrangling with Pandas, NumPy, and
IPython by Wes McKinney (pub. yr. 2017). Chapter 9 and 10.
Machine Learning Mastery with Python by Jason Brownlee (pub. yr.
2017). Chapter 6.
https://github.com/chris1610/pbpython/blob/master/notebooks/Simple_
Graphing.ipynb
http://pbpython.com/simple-graphing-pandas.html
https://www.analyticsvidhya.com/blog/2016/01/12-pandas-techniques-p
ython-data-manipulation/
https://stackoverflow.com/questions/43832311/how-to-plot-by-category
-over-time
https://www.earthdatascience.org/courses/use-data-open-source-pytho
n/use-time-series-data-in-python/date-time-types-in-pandas-python/res
ample-time-series-data-pandas-python/
https://stackoverflow.com/questions/22642511/change-y-range-to-start-
from-0-with-matplotlib
16
Additional Readings (cont'd)
https://pandas.pydata.org/pandas-docs/stable/generated/pandas.DataFrame.html
https://pandas.pydata.org/pandas-docs/stable/generated/pandas.DataFrame.plot.html
http://pandas.pydata.org/pandas-docs/version/0.20.3/generated/pandas.DataFrame.boxp
lot.html
http://pandas.pydata.org/pandas-docs/version/0.20.3/generated/pandas.DataFrame.hist.
html
https://matplotlib.org/
https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.resampl
e.html
DataCamp:
– Course: Intermediate Python for Data Science
» Chapter: Matplotlib
– Introduction to Data Visualization with Python
» Chapter: Customizing Plots
17