0% found this document useful (0 votes)
6 views40 pages

Data Visualization With Python - Matplotlib and Seaborn

The document provides a comprehensive guide on data visualization using Python's Matplotlib and Seaborn, covering various types of plots including line plots, scatter plots, pie charts, histograms, and subplots. Each section includes code snippets for plotting stock data, along with mini challenges for users to practice their skills. The document emphasizes the importance of visual clarity and customization in data representation.

Uploaded by

ndiayemalickn638
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
6 views40 pages

Data Visualization With Python - Matplotlib and Seaborn

The document provides a comprehensive guide on data visualization using Python's Matplotlib and Seaborn, covering various types of plots including line plots, scatter plots, pie charts, histograms, and subplots. Each section includes code snippets for plotting stock data, along with mini challenges for users to practice their skills. The document emphasizes the importance of visual clarity and customization in data representation.

Uploaded by

ndiayemalickn638
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 40

Data Visualization with Python - Matplotlib and Seaborn

May 31, 2025

1 1. PLOT BASIC LINE PLOT


[ ]: from jupyterthemes import jtplot
jtplot.style(theme = 'monokai', context = 'notebook', ticks = True, grid =␣
↪False)

# setting the style of the notebook to be monokai theme


# this line of code is important to ensure that we are able to see the x and y␣
↪axes clearly

# If you don't run this code line, you will notice that the xlabel and ylabel␣
↪on any plot is black on black and it will be hard to see them.

[1]: import numpy as np


import matplotlib.pyplot as plt
import pandas as pd

[3]: # Read the stock prices data using pandas


stock_df = pd.read_csv('stock_data.csv')
stock_df

[3]: Date FB TWTR NFLX


0 2013-11-07 47.560001 44.900002 46.694286
1 2013-11-08 47.529999 41.650002 47.842857
2 2013-11-11 46.200001 42.900002 48.272858
3 2013-11-12 46.610001 41.900002 47.675713
4 2013-11-13 48.709999 42.599998 47.897144
… … … … …
1707 2020-08-20 269.010010 38.959999 497.899994
1708 2020-08-21 267.010010 39.259998 492.309998
1709 2020-08-24 271.390015 40.490002 488.809998
1710 2020-08-25 280.820007 40.549999 490.579987
1711 2020-08-26 303.910004 41.080002 547.530029

[1712 rows x 4 columns]

[4]: stock_df.plot(x = 'Date', y = 'FB', label = 'Facebook Stock Price', figsize=␣


↪(15, 10), linewidth = 3)

plt.ylabel('Price [$]')

1
plt.title('My first ploting exercice')
#plt.legend(loc = 'upper right')
plt.grid()

MINI CHALLLENGE #1: - Plot similar kind of graph for NFLX - Change the line color to red
and increase the line width
[ ]:

[5]: stock_df.plot(x = 'Date', y = 'NFLX', label = 'Netflix Stock Price', figsize=␣


↪(15, 10), linewidth = 5, color = 'red')

plt.ylabel('Price [$]')
plt.title('My first ploting exercice')
#plt.legend(loc = 'upper right')
plt.grid()

2
[6]: stock_df.plot(x = 'Date', y = 'TWTR', label = 'Twiter Stock Price', figsize=␣
↪(15, 10), linewidth = 5, color = 'green')

plt.ylabel('Price [$]')
plt.title('My first ploting exercice')
#plt.legend(loc = 'upper right')
plt.grid()

3
2 2. PLOT SCATTERPLOT
[7]: # Read daily return data using pandas
daily_return_df = pd.read_csv('stocks_daily_returns.csv')
daily_return_df

[7]: Date FB TWTR NFLX


0 2013-11-07 0.000000 0.000000 0.000000
1 2013-11-08 -0.063082 -7.238307 2.459768
2 2013-11-11 -2.798229 3.001200 0.898778
3 2013-11-12 0.887446 -2.331002 -1.237020
4 2013-11-13 4.505467 1.670635 0.464452
… … … … …
1707 2020-08-20 2.444881 0.179995 2.759374
1708 2020-08-21 -0.743467 0.770018 -1.122715
1709 2020-08-24 1.640390 3.132970 -0.710934
1710 2020-08-25 3.474701 0.148177 0.362102
1711 2020-08-26 8.222348 1.307036 11.608717

[1712 rows x 4 columns]

4
[9]: X = daily_return_df['FB']
X

[9]: 0 0.000000
1 -0.063082
2 -2.798229
3 0.887446
4 4.505467

1707 2.444881
1708 -0.743467
1709 1.640390
1710 3.474701
1711 8.222348
Name: FB, Length: 1712, dtype: float64

[10]: Y = daily_return_df['TWTR']
Y

[10]: 0 0.000000
1 -7.238307
2 3.001200
3 -2.331002
4 1.670635

1707 0.179995
1708 0.770018
1709 3.132970
1710 0.148177
1711 1.307036
Name: TWTR, Length: 1712, dtype: float64

[11]: plt.figure(figsize =(15, 10))


plt.scatter(X, Y)
plt.grid()

5
MINI CHALLLENGE #2: - Plot similar kind of graph for Facebook and Netflix
[8]: X = daily_return_df['FB']
X

[8]: 0 0.000000
1 -0.063082
2 -2.798229
3 0.887446
4 4.505467

1707 2.444881
1708 -0.743467
1709 1.640390
1710 3.474701
1711 8.222348
Name: FB, Length: 1712, dtype: float64

[9]: Y = daily_return_df['NFLX']
Y

[9]: 0 0.000000
1 2.459768
2 0.898778

6
3 -1.237020
4 0.464452

1707 2.759374
1708 -1.122715
1709 -0.710934
1710 0.362102
1711 11.608717
Name: NFLX, Length: 1712, dtype: float64

[12]: plt.scatter(X, Y, color = 'purple', marker = 'x', s = 100)

[12]: <matplotlib.collections.PathCollection at 0x7abd4612f100>

3 3. PLOT PIE CHART


[13]: values = [20, 55, 5, 17, 3]
colors = ['g', 'r', 'y', 'b', 'm']
labels =['AAPL', 'GOOG', 'T', 'TSLA', 'AMZN']
explode = [0, 0.2, 0, 0, 0.2]

7
# Use matplotlib to plot a pie chart
plt.figure(figsize = (10, 10))
plt.pie(values, colors = colors, labels = labels, explode = explode)
plt.title('STOCK PORTFOLIO')

[13]: Text(0.5, 1.0, 'STOCK PORTFOLIO')

MINI CHALLENGE #3: - Plot the pie chart for the same stocks assuming equal allocation -
Explode Amazon and Google slices
[5]: values = [20, 20, 20, 20, 20]
colors = ['g', 'r', 'y', 'b', 'm']
labels =['AAPL', 'GOOG', 'T', 'TSLA', 'AMZN']
explode = [0, 0.2, 0, 0, 0.2]

# Use matplotlib to plot a pie chart

8
plt.figure(figsize = (10, 10))
plt.pie(values, colors = colors, labels = labels, explode = explode)
plt.title('STOCK PORTFOLIO')

[5]: Text(0.5, 1.0, 'STOCK PORTFOLIO')

[ ]:

[ ]:

9
4 4. PLOT HISTOGRAMS
[14]: daily_return_df = pd.read_csv('stocks_daily_returns.csv')
daily_return_df

[14]: Date FB TWTR NFLX


0 2013-11-07 0.000000 0.000000 0.000000
1 2013-11-08 -0.063082 -7.238307 2.459768
2 2013-11-11 -2.798229 3.001200 0.898778
3 2013-11-12 0.887446 -2.331002 -1.237020
4 2013-11-13 4.505467 1.670635 0.464452
… … … … …
1707 2020-08-20 2.444881 0.179995 2.759374
1708 2020-08-21 -0.743467 0.770018 -1.122715
1709 2020-08-24 1.640390 3.132970 -0.710934
1710 2020-08-25 3.474701 0.148177 0.362102
1711 2020-08-26 8.222348 1.307036 11.608717

[1712 rows x 4 columns]

[16]: # A histogram represents data using bars of various heights.


# Each bar groups numbers into specific ranges.
# Taller bars show that more data falls within that specific range.

new_equals = daily_return_df['FB'].mean()
sigma = daily_return_df['FB'].std()

plt.figure(figsize = (15, 9))


plt.hist(daily_return_df['FB'], bins = 40, color = 'blue', edgecolor= 'black' )
plt.grid()

plt.title('Histogram:new_equals =' + str(new_equals) + ', sigma = ' +␣


↪str(sigma))

[16]: Text(0.5, 1.0, 'Histogram:new_equals =0.12914075422941618, sigma =


2.0337648345424886')

10
MINI CHALLENGE #4: - Plot the histogram for TWITTER returns using 30 bins
[19]: new_equals = daily_return_df['TWTR'].mean()
sigma = daily_return_df['TWTR'].std()

plt.figure(figsize = (15, 9))


plt.hist(daily_return_df['TWTR'], bins = 30, color = 'red', edgecolor= 'black' )
plt.grid()

plt.title('Histogram:new_equals =' + str(new_equals) + ', sigma = ' +␣


↪str(sigma))

[19]: Text(0.5, 1.0, 'Histogram:new_equals =0.053982182398835316, sigma =


3.4121023815827907')

11
5 5. PLOT MULTIPLE PLOTS
[22]: stock_df

[22]: Date FB TWTR NFLX


0 2013-11-07 47.560001 44.900002 46.694286
1 2013-11-08 47.529999 41.650002 47.842857
2 2013-11-11 46.200001 42.900002 48.272858
3 2013-11-12 46.610001 41.900002 47.675713
4 2013-11-13 48.709999 42.599998 47.897144
… … … … …
1707 2020-08-20 269.010010 38.959999 497.899994
1708 2020-08-21 267.010010 39.259998 492.309998
1709 2020-08-24 271.390015 40.490002 488.809998
1710 2020-08-25 280.820007 40.549999 490.579987
1711 2020-08-26 303.910004 41.080002 547.530029

[1712 rows x 4 columns]

[34]: stock_df.plot(x = 'Date', y = ['NFLX', 'FB'], figsize = (18, 10), linewidth = 4)


plt.ylabel('Price')
plt.title('Stock Price')
plt.grid()

12
MINI CHALLLENGE #5: - Plot a similar graph containing prices of Netflix, Twitter and Facebook
- Add legend indicating all the stocks - Place the legend in the “upper center” location
[35]: stock_df.plot(x = 'Date', y = ['NFLX', 'TWTR', 'FB'], figsize = (18, 10),␣
↪linewidth = 4)

plt.ylabel('Price [$]')
plt.title('Stock Prices')
plt.legend(loc = 'upper center')
plt.grid()

13
6 6. PLOT SUBPLOTS
[38]: plt.figure(figsize= (20, 10))

plt.subplot(1, 2, 1)
plt.plot(stock_df['NFLX'], color = 'red', linewidth = 4)
plt.grid()

plt.subplot(1, 2, 2)
plt.plot(stock_df['FB'], color = 'blue', linewidth = 4)
plt.grid()

14
[39]: plt.figure(figsize= (20, 10))

plt.subplot(2, 1, 1)
plt.plot(stock_df['NFLX'], color = 'red', linewidth = 4)
plt.grid()

plt.subplot(2, 1, 2)
plt.plot(stock_df['FB'], color = 'blue', linewidth = 4)
plt.grid()

15
MINI CHALLLENGE #6: - Create subplots like above for Twitter, Facebook and Netflix
[42]: plt.figure(figsize= (17, 17))

plt.subplot(3, 1, 1)
plt.plot(stock_df['NFLX'], color = 'red', linewidth = 4)
plt.grid()

plt.subplot(3, 1, 2)
plt.plot(stock_df['FB'], color = 'blue', linewidth = 4)
plt.grid()

plt.subplot(3, 1, 3)
plt.plot(stock_df['TWTR'], color = 'green', linewidth = 4)
plt.grid()

16
7 7. PLOT 3D PLOTS
[43]: # Toolkits are collections of application-specific functions that extend␣
↪Matplotlib.

# mpl_toolkits.mplot3d provides tools for basic 3D plotting.


# https://matplotlib.org/mpl_toolkits/index.html

from mpl_toolkits.mplot3d import Axes3D

fig = plt.figure(figsize = (15, 15))


ax = fig.add_subplot(111, projection = '3d')

17
x = [1, 2, 3, 4, 5, 6, 7, 8, 9, 10]
y = [5, 6, 2, 3, 13, 4, 1, 2, 4, 8]
z = [2, 3, 3, 3, 5, 7, 9, 11, 9, 10]

ax.scatter(x, y, z, c = 'b')
ax.set_xlabel('X label')
ax.set_ylabel('Y label')
ax.set_zlabel('Z label')

[43]: Text(0.5, 0, 'Z label')

MINI CHALLLENGE #7: - Create a 3D plot with daily return values of Twitter, Facebook and

18
Netflix
[47]: daily_return_df

[47]: Date FB TWTR NFLX


0 2013-11-07 0.000000 0.000000 0.000000
1 2013-11-08 -0.063082 -7.238307 2.459768
2 2013-11-11 -2.798229 3.001200 0.898778
3 2013-11-12 0.887446 -2.331002 -1.237020
4 2013-11-13 4.505467 1.670635 0.464452
… … … … …
1707 2020-08-20 2.444881 0.179995 2.759374
1708 2020-08-21 -0.743467 0.770018 -1.122715
1709 2020-08-24 1.640390 3.132970 -0.710934
1710 2020-08-25 3.474701 0.148177 0.362102
1711 2020-08-26 8.222348 1.307036 11.608717

[1712 rows x 4 columns]

[50]: fig = plt.figure(figsize = (15, 15))


ax = fig.add_subplot(111, projection = '3d')

x = daily_return_df['TWTR']
y = daily_return_df['FB']
z = daily_return_df['NFLX']

ax.scatter(x, y, z, c = 'r', s = 1000)


ax.set_xlabel('X label')
ax.set_ylabel('Y label')
ax.set_zlabel('Z label')

[50]: Text(0.5, 0, 'Z label')

19
8 8. SEABRON SCATTERPLOT & COUNTPLOT
[11]: # Seaborn is a visualization library that sits on top of matplotlib
# Seaborn offers enhanced features compared to matplotlib
# https://seaborn.pydata.org/examples/index.html

# import libraries
import seaborn as sns # Statistical data visualization

[5]: # Import Cancer data drom the Sklearn library


from sklearn.datasets import load_breast_cancer
cancer = load_breast_cancer()

20
cancer

[5]: {'data': array([[1.799e+01, 1.038e+01, 1.228e+02, …, 2.654e-01, 4.601e-01,


1.189e-01],
[2.057e+01, 1.777e+01, 1.329e+02, …, 1.860e-01, 2.750e-01,
8.902e-02],
[1.969e+01, 2.125e+01, 1.300e+02, …, 2.430e-01, 3.613e-01,
8.758e-02],
…,
[1.660e+01, 2.808e+01, 1.083e+02, …, 1.418e-01, 2.218e-01,
7.820e-02],
[2.060e+01, 2.933e+01, 1.401e+02, …, 2.650e-01, 4.087e-01,
1.240e-01],
[7.760e+00, 2.454e+01, 4.792e+01, …, 0.000e+00, 2.871e-01,
7.039e-02]]),
'target': array([0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 1,
1,
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0,
0, 0, 1, 0, 1, 1, 1, 1, 1, 0, 0, 1, 0, 0, 1, 1, 1, 1, 0, 1, 0, 0,
1, 1, 1, 1, 0, 1, 0, 0, 1, 0, 1, 0, 0, 1, 1, 1, 0, 0, 1, 0, 0, 0,
1, 1, 1, 0, 1, 1, 0, 0, 1, 1, 1, 0, 0, 1, 1, 1, 1, 0, 1, 1, 0, 1,
1, 1, 1, 1, 1, 1, 1, 0, 0, 0, 1, 0, 0, 1, 1, 1, 0, 0, 1, 0, 1, 0,
0, 1, 0, 0, 1, 1, 0, 1, 1, 0, 1, 1, 1, 1, 0, 1, 1, 1, 1, 1, 1, 1,
1, 1, 0, 1, 1, 1, 1, 0, 0, 1, 0, 1, 1, 0, 0, 1, 1, 0, 0, 1, 1, 1,
1, 0, 1, 1, 0, 0, 0, 1, 0, 1, 0, 1, 1, 1, 0, 1, 1, 0, 0, 1, 0, 0,
0, 0, 1, 0, 0, 0, 1, 0, 1, 0, 1, 1, 0, 1, 0, 0, 0, 0, 1, 1, 0, 0,
1, 1, 1, 0, 1, 1, 1, 1, 1, 0, 0, 1, 1, 0, 1, 1, 0, 0, 1, 0, 1, 1,
1, 1, 0, 1, 1, 1, 1, 1, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 1, 1, 1, 1, 1, 1, 0, 1, 0, 1, 1, 0, 1, 1, 0, 1, 0, 0, 1, 1,
1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 0, 1, 1, 0, 1, 0, 1, 1, 1, 1, 1,
1, 1, 1, 1, 1, 1, 1, 1, 1, 0, 1, 1, 1, 0, 1, 0, 1, 1, 1, 1, 0, 0,
0, 1, 1, 1, 1, 0, 1, 0, 1, 0, 1, 1, 1, 0, 1, 1, 1, 1, 1, 1, 1, 0,
0, 0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 0, 0, 1, 0, 0, 0, 1, 0, 0,
1, 1, 1, 1, 1, 0, 1, 1, 1, 1, 1, 0, 1, 1, 1, 0, 1, 1, 0, 0, 1, 1,
1, 1, 1, 1, 0, 1, 1, 1, 1, 1, 1, 1, 0, 1, 1, 1, 1, 1, 0, 1, 1, 0,
1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 0, 1, 0, 0, 1, 0, 1, 1, 1, 1,
1, 0, 1, 1, 0, 1, 0, 1, 1, 0, 1, 0, 1, 1, 1, 1, 1, 1, 1, 1, 0, 0,
1, 1, 1, 1, 1, 1, 0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 0, 1, 1, 1, 1,
1, 1, 1, 0, 1, 0, 1, 1, 0, 1, 1, 1, 1, 1, 0, 0, 1, 0, 1, 0, 1, 1,
1, 1, 1, 0, 1, 1, 0, 1, 0, 1, 0, 0, 1, 1, 1, 0, 1, 1, 1, 1, 1, 1,
1, 1, 1, 1, 1, 0, 1, 0, 0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,
1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 0, 0, 0, 0, 0, 0, 1]),
'frame': None,
'target_names': array(['malignant', 'benign'], dtype='<U9'),
'DESCR': '.. _breast_cancer_dataset:\n\nBreast cancer wisconsin (diagnostic)
dataset\n--------------------------------------------\n\n**Data Set
Characteristics:**\n\n :Number of Instances: 569\n\n :Number of

21
Attributes: 30 numeric, predictive attributes and the class\n\n :Attribute
Information:\n - radius (mean of distances from center to points on the
perimeter)\n - texture (standard deviation of gray-scale values)\n
- perimeter\n - area\n - smoothness (local variation in radius
lengths)\n - compactness (perimeter^2 / area - 1.0)\n - concavity
(severity of concave portions of the contour)\n - concave points (number
of concave portions of the contour)\n - symmetry\n - fractal
dimension ("coastline approximation" - 1)\n\n The mean, standard error,
and "worst" or largest (mean of the three\n worst/largest values) of
these features were computed for each image,\n resulting in 30 features.
For instance, field 0 is Mean Radius, field\n 10 is Radius SE, field 20
is Worst Radius.\n\n - class:\n - WDBC-Malignant\n
- WDBC-Benign\n\n :Summary Statistics:\n\n
===================================== ====== ======\n
Min Max\n ===================================== ====== ======\n radius
(mean): 6.981 28.11\n texture (mean):
9.71 39.28\n perimeter (mean): 43.79 188.5\n area
(mean): 143.5 2501.0\n smoothness (mean):
0.053 0.163\n compactness (mean): 0.019 0.345\n
concavity (mean): 0.0 0.427\n concave points (mean):
0.0 0.201\n symmetry (mean): 0.106 0.304\n
fractal dimension (mean): 0.05 0.097\n radius (standard error):
0.112 2.873\n texture (standard error): 0.36 4.885\n
perimeter (standard error): 0.757 21.98\n area (standard error):
6.802 542.2\n smoothness (standard error): 0.002 0.031\n
compactness (standard error): 0.002 0.135\n concavity (standard
error): 0.0 0.396\n concave points (standard error): 0.0
0.053\n symmetry (standard error): 0.008 0.079\n fractal
dimension (standard error): 0.001 0.03\n radius (worst):
7.93 36.04\n texture (worst): 12.02 49.54\n
perimeter (worst): 50.41 251.2\n area (worst):
185.2 4254.0\n smoothness (worst): 0.071 0.223\n
compactness (worst): 0.027 1.058\n concavity (worst):
0.0 1.252\n concave points (worst): 0.0 0.291\n
symmetry (worst): 0.156 0.664\n fractal dimension
(worst): 0.055 0.208\n =====================================
====== ======\n\n :Missing Attribute Values: None\n\n :Class Distribution:
212 - Malignant, 357 - Benign\n\n :Creator: Dr. William H. Wolberg, W. Nick
Street, Olvi L. Mangasarian\n\n :Donor: Nick Street\n\n :Date: November,
1995\n\nThis is a copy of UCI ML Breast Cancer Wisconsin (Diagnostic)
datasets.\nhttps://goo.gl/U2Uwz2\n\nFeatures are computed from a digitized image
of a fine needle\naspirate (FNA) of a breast mass. They
describe\ncharacteristics of the cell nuclei present in the image.\n\nSeparating
plane described above was obtained using\nMultisurface Method-Tree (MSM-T) [K.
P. Bennett, "Decision Tree\nConstruction Via Linear Programming." Proceedings of
the 4th\nMidwest Artificial Intelligence and Cognitive Science Society,\npp.
97-101, 1992], a classification method which uses linear\nprogramming to

22
construct a decision tree. Relevant features\nwere selected using an exhaustive
search in the space of 1-4\nfeatures and 1-3 separating planes.\n\nThe actual
linear program used to obtain the separating plane\nin the 3-dimensional space
is that described in:\n[K. P. Bennett and O. L. Mangasarian: "Robust
Linear\nProgramming Discrimination of Two Linearly Inseparable
Sets",\nOptimization Methods and Software 1, 1992, 23-34].\n\nThis database is
also available through the UW CS ftp server:\n\nftp ftp.cs.wisc.edu\ncd math-
prog/cpo-dataset/machine-learn/WDBC/\n\n.. topic:: References\n\n - W.N.
Street, W.H. Wolberg and O.L. Mangasarian. Nuclear feature extraction \n for
breast tumor diagnosis. IS&T/SPIE 1993 International Symposium on \n
Electronic Imaging: Science and Technology, volume 1905, pages 861-870,\n
San Jose, CA, 1993.\n - O.L. Mangasarian, W.N. Street and W.H. Wolberg. Breast
cancer diagnosis and \n prognosis via linear programming. Operations
Research, 43(4), pages 570-577, \n July-August 1995.\n - W.H. Wolberg,
W.N. Street, and O.L. Mangasarian. Machine learning techniques\n to diagnose
breast cancer from fine-needle aspirates. Cancer Letters 77 (1994) \n
163-171.',
'feature_names': array(['mean radius', 'mean texture', 'mean perimeter', 'mean
area',
'mean smoothness', 'mean compactness', 'mean concavity',
'mean concave points', 'mean symmetry', 'mean fractal dimension',
'radius error', 'texture error', 'perimeter error', 'area error',
'smoothness error', 'compactness error', 'concavity error',
'concave points error', 'symmetry error',
'fractal dimension error', 'worst radius', 'worst texture',
'worst perimeter', 'worst area', 'worst smoothness',
'worst compactness', 'worst concavity', 'worst concave points',
'worst symmetry', 'worst fractal dimension'], dtype='<U23'),
'filename': 'breast_cancer.csv',
'data_module': 'sklearn.datasets.data'}

[6]: # Create a dataFrame named df_cancer with input/output data


df_cancer = pd.DataFrame(np.c_[cancer['data'], cancer['target']], columns = np.
↪append(cancer['feature_names'], ['target']))

[7]: # Check out the head of the dataframe


df_cancer

[7]: mean radius mean texture mean perimeter mean area mean smoothness \
0 17.99 10.38 122.80 1001.0 0.11840
1 20.57 17.77 132.90 1326.0 0.08474
2 19.69 21.25 130.00 1203.0 0.10960
3 11.42 20.38 77.58 386.1 0.14250
4 20.29 14.34 135.10 1297.0 0.10030
.. … … … … …
564 21.56 22.39 142.00 1479.0 0.11100
565 20.13 28.25 131.20 1261.0 0.09780

23
566 16.60 28.08 108.30 858.1 0.08455
567 20.60 29.33 140.10 1265.0 0.11780
568 7.76 24.54 47.92 181.0 0.05263

mean compactness mean concavity mean concave points mean symmetry \


0 0.27760 0.30010 0.14710 0.2419
1 0.07864 0.08690 0.07017 0.1812
2 0.15990 0.19740 0.12790 0.2069
3 0.28390 0.24140 0.10520 0.2597
4 0.13280 0.19800 0.10430 0.1809
.. … … … …
564 0.11590 0.24390 0.13890 0.1726
565 0.10340 0.14400 0.09791 0.1752
566 0.10230 0.09251 0.05302 0.1590
567 0.27700 0.35140 0.15200 0.2397
568 0.04362 0.00000 0.00000 0.1587

mean fractal dimension … worst texture worst perimeter worst area \


0 0.07871 … 17.33 184.60 2019.0
1 0.05667 … 23.41 158.80 1956.0
2 0.05999 … 25.53 152.50 1709.0
3 0.09744 … 26.50 98.87 567.7
4 0.05883 … 16.67 152.20 1575.0
.. … … … … …
564 0.05623 … 26.40 166.10 2027.0
565 0.05533 … 38.25 155.00 1731.0
566 0.05648 … 34.12 126.70 1124.0
567 0.07016 … 39.42 184.60 1821.0
568 0.05884 … 30.37 59.16 268.6

worst smoothness worst compactness worst concavity \


0 0.16220 0.66560 0.7119
1 0.12380 0.18660 0.2416
2 0.14440 0.42450 0.4504
3 0.20980 0.86630 0.6869
4 0.13740 0.20500 0.4000
.. … … …
564 0.14100 0.21130 0.4107
565 0.11660 0.19220 0.3215
566 0.11390 0.30940 0.3403
567 0.16500 0.86810 0.9387
568 0.08996 0.06444 0.0000

worst concave points worst symmetry worst fractal dimension target


0 0.2654 0.4601 0.11890 0.0
1 0.1860 0.2750 0.08902 0.0
2 0.2430 0.3613 0.08758 0.0

24
3 0.2575 0.6638 0.17300 0.0
4 0.1625 0.2364 0.07678 0.0
.. … … … …
564 0.2216 0.2060 0.07115 0.0
565 0.1628 0.2572 0.06637 0.0
566 0.1418 0.2218 0.07820 0.0
567 0.2650 0.4087 0.12400 0.0
568 0.0000 0.2871 0.07039 1.0

[569 rows x 31 columns]

[8]: # Check out the tail of the dataframe


df_cancer.tail(7)

[8]: mean radius mean texture mean perimeter mean area mean smoothness \
562 15.22 30.62 103.40 716.9 0.10480
563 20.92 25.09 143.00 1347.0 0.10990
564 21.56 22.39 142.00 1479.0 0.11100
565 20.13 28.25 131.20 1261.0 0.09780
566 16.60 28.08 108.30 858.1 0.08455
567 20.60 29.33 140.10 1265.0 0.11780
568 7.76 24.54 47.92 181.0 0.05263

mean compactness mean concavity mean concave points mean symmetry \


562 0.20870 0.25500 0.09429 0.2128
563 0.22360 0.31740 0.14740 0.2149
564 0.11590 0.24390 0.13890 0.1726
565 0.10340 0.14400 0.09791 0.1752
566 0.10230 0.09251 0.05302 0.1590
567 0.27700 0.35140 0.15200 0.2397
568 0.04362 0.00000 0.00000 0.1587

mean fractal dimension … worst texture worst perimeter worst area \


562 0.07152 … 42.79 128.70 915.0
563 0.06879 … 29.41 179.10 1819.0
564 0.05623 … 26.40 166.10 2027.0
565 0.05533 … 38.25 155.00 1731.0
566 0.05648 … 34.12 126.70 1124.0
567 0.07016 … 39.42 184.60 1821.0
568 0.05884 … 30.37 59.16 268.6

worst smoothness worst compactness worst concavity \


562 0.14170 0.79170 1.1700
563 0.14070 0.41860 0.6599
564 0.14100 0.21130 0.4107
565 0.11660 0.19220 0.3215
566 0.11390 0.30940 0.3403

25
567 0.16500 0.86810 0.9387
568 0.08996 0.06444 0.0000

worst concave points worst symmetry worst fractal dimension target


562 0.2356 0.4089 0.14090 0.0
563 0.2542 0.2929 0.09873 0.0
564 0.2216 0.2060 0.07115 0.0
565 0.1628 0.2572 0.06637 0.0
566 0.1418 0.2218 0.07820 0.0
567 0.2650 0.4087 0.12400 0.0
568 0.0000 0.2871 0.07039 1.0

[7 rows x 31 columns]

[12]: # Plot scatter plot between mean area and mean smoothness
plt.figure(figsize = (10,10))
sns.scatterplot(x = 'mean area', y = 'mean smoothness', hue= 'target', data =␣
↪df_cancer)

[12]: <Axes: xlabel='mean area', ylabel='mean smoothness'>

26
[20]: # Let's print out countplot to know how many samples belong to class #0 and #1
plt.figure(figsize = (10,10))
sns.countplot(data = df_cancer, x= 'target', hue = 'target')

[20]: <Axes: xlabel='target', ylabel='count'>

27
MINI CHALLENGE #8: - Plot the scatterplot between the mean radius and mean area. Comment
on the plot
[22]: sns.scatterplot(x = 'mean radius', y = 'mean area', hue = 'target', data =␣
↪df_cancer)

[22]: <Axes: xlabel='mean radius', ylabel='mean area'>

28
9 9. SEABORN PAIRPLOT, DISPLOT, AND
HEATMAPS/CORRELATIONS
[23]: # Plot the pairplot
sns.pairplot(df_cancer, hue = 'target', vars = ['mean radius', 'mean texture',␣
↪'mean area', 'mean perimeter', 'mean smoothness'])

[23]: <seaborn.axisgrid.PairGrid at 0x7250a4521840>

29
[24]: # Strong correlation between the mean radius and mean perimeter, mean area and␣
↪mean primeter

plt.figure(figsize = (30, 30))


sns.heatmap(df_cancer.corr(), annot = True)

[24]: <Axes: >

30
[25]: # plot the distplot
# Displot combines matplotlib histogram function with kdeplot() (Kernel density␣
↪estimate)

# KDE is used to plot the Probability Density of a continuous variable.

sns.distplot(df_cancer['mean radius'], bins = 25, color = 'blue')

[25]: <Axes: xlabel='mean radius', ylabel='Density'>

31
MINI CHALLENGE #9: - Plot two separate distplot for each target class #0 and target class #1
[31]: class_0_df = df_cancer[ df_cancer['target'] == 0]
class_1_df = df_cancer[ df_cancer['target'] == 1]

class_0_df

class_1_df

[31]: mean radius mean texture mean perimeter mean area mean smoothness \
19 13.540 14.36 87.46 566.3 0.09779
20 13.080 15.71 85.63 520.0 0.10750
21 9.504 12.44 60.34 273.9 0.10240
37 13.030 18.42 82.61 523.8 0.08983
46 8.196 16.84 51.71 201.9 0.08600
.. … … … … …
558 14.590 22.68 96.39 657.1 0.08473
559 11.510 23.93 74.52 403.5 0.09261
560 14.050 27.15 91.38 600.4 0.09929
561 11.200 29.37 70.67 386.0 0.07449
568 7.760 24.54 47.92 181.0 0.05263

32
mean compactness mean concavity mean concave points mean symmetry \
19 0.08129 0.06664 0.047810 0.1885
20 0.12700 0.04568 0.031100 0.1967
21 0.06492 0.02956 0.020760 0.1815
37 0.03766 0.02562 0.029230 0.1467
46 0.05943 0.01588 0.005917 0.1769
.. … … … …
558 0.13300 0.10290 0.037360 0.1454
559 0.10210 0.11120 0.041050 0.1388
560 0.11260 0.04462 0.043040 0.1537
561 0.03558 0.00000 0.000000 0.1060
568 0.04362 0.00000 0.000000 0.1587

mean fractal dimension … worst texture worst perimeter worst area \


19 0.05766 … 19.26 99.70 711.2
20 0.06811 … 20.49 96.09 630.5
21 0.06905 … 15.66 65.13 314.9
37 0.05863 … 22.81 84.46 545.9
46 0.06503 … 21.96 57.26 242.2
.. … … … … …
558 0.06147 … 27.27 105.90 733.5
559 0.06570 … 37.16 82.28 474.2
560 0.06171 … 33.17 100.20 706.7
561 0.05502 … 38.30 75.19 439.6
568 0.05884 … 30.37 59.16 268.6

worst smoothness worst compactness worst concavity \


19 0.14400 0.17730 0.23900
20 0.13120 0.27760 0.18900
21 0.13240 0.11480 0.08867
37 0.09701 0.04619 0.04833
46 0.12970 0.13570 0.06880
.. … … …
558 0.10260 0.31710 0.36620
559 0.12980 0.25170 0.36300
560 0.12410 0.22640 0.13260
561 0.09267 0.05494 0.00000
568 0.08996 0.06444 0.00000

worst concave points worst symmetry worst fractal dimension target


19 0.12880 0.2977 0.07259 1.0
20 0.07283 0.3184 0.08183 1.0
21 0.06227 0.2450 0.07773 1.0
37 0.05013 0.1987 0.06169 1.0
46 0.02564 0.3105 0.07409 1.0
.. … … … …

33
558 0.11050 0.2258 0.08004 1.0
559 0.09653 0.2112 0.08732 1.0
560 0.10480 0.2250 0.08321 1.0
561 0.00000 0.1566 0.05905 1.0
568 0.00000 0.2871 0.07039 1.0

[357 rows x 31 columns]

[32]: class_0_df

[32]: mean radius mean texture mean perimeter mean area mean smoothness \
0 17.99 10.38 122.80 1001.0 0.11840
1 20.57 17.77 132.90 1326.0 0.08474
2 19.69 21.25 130.00 1203.0 0.10960
3 11.42 20.38 77.58 386.1 0.14250
4 20.29 14.34 135.10 1297.0 0.10030
.. … … … … …
563 20.92 25.09 143.00 1347.0 0.10990
564 21.56 22.39 142.00 1479.0 0.11100
565 20.13 28.25 131.20 1261.0 0.09780
566 16.60 28.08 108.30 858.1 0.08455
567 20.60 29.33 140.10 1265.0 0.11780

mean compactness mean concavity mean concave points mean symmetry \


0 0.27760 0.30010 0.14710 0.2419
1 0.07864 0.08690 0.07017 0.1812
2 0.15990 0.19740 0.12790 0.2069
3 0.28390 0.24140 0.10520 0.2597
4 0.13280 0.19800 0.10430 0.1809
.. … … … …
563 0.22360 0.31740 0.14740 0.2149
564 0.11590 0.24390 0.13890 0.1726
565 0.10340 0.14400 0.09791 0.1752
566 0.10230 0.09251 0.05302 0.1590
567 0.27700 0.35140 0.15200 0.2397

mean fractal dimension … worst texture worst perimeter worst area \


0 0.07871 … 17.33 184.60 2019.0
1 0.05667 … 23.41 158.80 1956.0
2 0.05999 … 25.53 152.50 1709.0
3 0.09744 … 26.50 98.87 567.7
4 0.05883 … 16.67 152.20 1575.0
.. … … … … …
563 0.06879 … 29.41 179.10 1819.0
564 0.05623 … 26.40 166.10 2027.0
565 0.05533 … 38.25 155.00 1731.0
566 0.05648 … 34.12 126.70 1124.0

34
567 0.07016 … 39.42 184.60 1821.0

worst smoothness worst compactness worst concavity \


0 0.1622 0.6656 0.7119
1 0.1238 0.1866 0.2416
2 0.1444 0.4245 0.4504
3 0.2098 0.8663 0.6869
4 0.1374 0.2050 0.4000
.. … … …
563 0.1407 0.4186 0.6599
564 0.1410 0.2113 0.4107
565 0.1166 0.1922 0.3215
566 0.1139 0.3094 0.3403
567 0.1650 0.8681 0.9387

worst concave points worst symmetry worst fractal dimension target


0 0.2654 0.4601 0.11890 0.0
1 0.1860 0.2750 0.08902 0.0
2 0.2430 0.3613 0.08758 0.0
3 0.2575 0.6638 0.17300 0.0
4 0.1625 0.2364 0.07678 0.0
.. … … … …
563 0.2542 0.2929 0.09873 0.0
564 0.2216 0.2060 0.07115 0.0
565 0.1628 0.2572 0.06637 0.0
566 0.1418 0.2218 0.07820 0.0
567 0.2650 0.4087 0.12400 0.0

[212 rows x 31 columns]

[33]: class_1_df

[33]: mean radius mean texture mean perimeter mean area mean smoothness \
19 13.540 14.36 87.46 566.3 0.09779
20 13.080 15.71 85.63 520.0 0.10750
21 9.504 12.44 60.34 273.9 0.10240
37 13.030 18.42 82.61 523.8 0.08983
46 8.196 16.84 51.71 201.9 0.08600
.. … … … … …
558 14.590 22.68 96.39 657.1 0.08473
559 11.510 23.93 74.52 403.5 0.09261
560 14.050 27.15 91.38 600.4 0.09929
561 11.200 29.37 70.67 386.0 0.07449
568 7.760 24.54 47.92 181.0 0.05263

mean compactness mean concavity mean concave points mean symmetry \


19 0.08129 0.06664 0.047810 0.1885

35
20 0.12700 0.04568 0.031100 0.1967
21 0.06492 0.02956 0.020760 0.1815
37 0.03766 0.02562 0.029230 0.1467
46 0.05943 0.01588 0.005917 0.1769
.. … … … …
558 0.13300 0.10290 0.037360 0.1454
559 0.10210 0.11120 0.041050 0.1388
560 0.11260 0.04462 0.043040 0.1537
561 0.03558 0.00000 0.000000 0.1060
568 0.04362 0.00000 0.000000 0.1587

mean fractal dimension … worst texture worst perimeter worst area \


19 0.05766 … 19.26 99.70 711.2
20 0.06811 … 20.49 96.09 630.5
21 0.06905 … 15.66 65.13 314.9
37 0.05863 … 22.81 84.46 545.9
46 0.06503 … 21.96 57.26 242.2
.. … … … … …
558 0.06147 … 27.27 105.90 733.5
559 0.06570 … 37.16 82.28 474.2
560 0.06171 … 33.17 100.20 706.7
561 0.05502 … 38.30 75.19 439.6
568 0.05884 … 30.37 59.16 268.6

worst smoothness worst compactness worst concavity \


19 0.14400 0.17730 0.23900
20 0.13120 0.27760 0.18900
21 0.13240 0.11480 0.08867
37 0.09701 0.04619 0.04833
46 0.12970 0.13570 0.06880
.. … … …
558 0.10260 0.31710 0.36620
559 0.12980 0.25170 0.36300
560 0.12410 0.22640 0.13260
561 0.09267 0.05494 0.00000
568 0.08996 0.06444 0.00000

worst concave points worst symmetry worst fractal dimension target


19 0.12880 0.2977 0.07259 1.0
20 0.07283 0.3184 0.08183 1.0
21 0.06227 0.2450 0.07773 1.0
37 0.05013 0.1987 0.06169 1.0
46 0.02564 0.3105 0.07409 1.0
.. … … … …
558 0.11050 0.2258 0.08004 1.0
559 0.09653 0.2112 0.08732 1.0
560 0.10480 0.2250 0.08321 1.0

36
561 0.00000 0.1566 0.05905 1.0
568 0.00000 0.2871 0.07039 1.0

[357 rows x 31 columns]

[34]: plt.figure(figsize = (10, 7))


sns.distplot(class_0_df['mean radius'], bins = 25, color = 'blue')
sns.distplot(class_1_df['mean radius'], bins = 25, color = 'red')
plt.grid()

10 EXCELLENT JOB!

11 MINI CHALLENGES SOLUTIONS


MINI CHALLENGE #1 SOLUTIONS: - Plot similar kind of graph for NFLX - Change the line
color to red and increase the line width
[ ]: stock_df.plot(x = 'Date', y = 'NFLX', label = 'Netflix Stock Price', figsize =␣
↪(15, 10), linewidth = 7, color = 'r');

plt.ylabel('Price')
plt.title('My first plotting exercise!')

37
plt.legend(loc = "upper left")
plt.grid()

MINI CHALLENGE #2 SOLUTIONS: - Plot similar kind of graph for Facebook and Netflix
[ ]: X = daily_return_df['FB']
Y = daily_return_df['NFLX']
plt.figure(figsize = (15, 10))
plt.grid()
plt.scatter(X, Y);

MINI CHALLLENGE #3 SOLUTIONS: - Plot the pie chart for the same stocks assuming equal
allocation - Explode Amazon and Google slices
[ ]: values = [20, 20, 20, 20, 20]
colors = ['g', 'r', 'y', 'b', 'm']
explode = [0, 0.2, 0, 0, 0.2]
labels = ['AAPL', 'GOOG', 'T', 'TSLA ', 'AMZN']
plt.figure(figsize = (10, 10))
plt.pie(values, colors = colors, labels = labels, explode = explode)
plt.title('STOCK PORTFOLIO')
plt.show()

MINI CHALLLENGE #4 SOLUTIONS: - Plot the histogram for TWITTER returns with 30 bins
[ ]: num_bins = 30
plt.figure(figsize = (10,7))
plt.hist(daily_return_df['TWTR'], num_bins, facecolor = 'blue');
plt.grid()

MINI CHALLENGE #5 SOLUTION: - Plot a similar graph containing prices of Netflix, Twitter
and Facebook - Add legend indicating all the stocks - Place the legend in the “upper center” location
[ ]: stock_df.plot(x = 'Date', y = ['FB', 'TWTR', 'NFLX'], figsize = (15, 10),␣
↪linewidth = 3)

plt.ylabel('Price')
plt.title('Stock Prices')
plt.legend(loc="upper center")
plt.grid()

MINI CHALLLENGE #6 SOLUTION: - Create subplots like above for Twitter, Facebook and
Netflix
[ ]: plt.figure(figsize = (17,17))
plt.subplot(3, 1, 1)
plt.plot(stock_df.index, stock_df['FB'], 'r--');
plt.grid()
plt.legend(['Facebook price'])

38
plt.subplot(3, 1, 2)
plt.plot(stock_df.index, stock_df['TWTR'], 'b.');
plt.grid()
plt.legend(['Twitter price'])

plt.subplot(3, 1, 3)
plt.plot(stock_df.index, stock_df['NFLX'], 'y--');
plt.grid()
plt.legend(['Netflix price'])

MINI CHALLLENGE #7 SOLUTION: - Create a 3D plot with daily return values of Twitter,
Facebook and Netflix
[ ]: daily_return_df

[ ]: fig = plt.figure(figsize=(20, 20))


ax = fig.add_subplot(111, projection = '3d')

x = daily_return_df['FB'].tolist()
y = daily_return_df['TWTR'].tolist()
z = daily_return_df['NFLX'].tolist()

ax.scatter(x, y, z, c = 'r', marker = 'o', s = 1000)

ax.set_xlabel('X Label')
ax.set_ylabel('Y Label')
ax.set_zlabel('Z Label')

MINI CHALLENGE #8 SOLUTION: - Plot the scatterplot between the mean radius and mean
area. Comment on the plot
[ ]: sns.scatterplot(x = 'mean radius', y = 'mean area', hue = 'target', data =␣
↪df_cancer);

# As mean radius increases, mean area increases


# class #0 generally has larger mean radius and mean area compared to class #1

MINI CHALLENGE #9 SOLUTION: - Plot two separate distplot for each target class #0 and
target class #1
[ ]: class_0_df = df_cancer[ df_cancer['target'] == 0];
class_1_df = df_cancer[ df_cancer['target'] == 1];

class_0_df

class_1_df

# Plot the distplot for both classes

39
plt.figure(figsize=(10, 7));
sns.distplot(class_0_df['mean radius'], bins = 25, color = 'blue');
sns.distplot(class_1_df['mean radius'], bins = 25, color = 'red');
plt.grid();

12 APPENDIX
[ ]: # np.C_ class object translates slice objects to concatenation along the second␣
↪axis.

x1 = np.array([1, 2, 3])
x1.shape

x2 = np.array([4, 5, 6])
x2.shape
z = np.c_[x1, x2]
print(z)
print(z.shape)

40

You might also like