The Glowing Python: plotting

Showing posts with label plotting. Show all posts

Thursday, December 12, 2013

Multiple axes and subplots in Plotly

Some time ago we have seen how to visualize 2D histograms with Plotly and in this post we will see how to use one of the mostin interesting new features introduced by the Plotly guys: multiple axes into subplots. This features makes us able to couple subplots, so when you zoom or pan in one subplot, it zooms and pans in the other subplots. Just like the graphs produced by D3.
Here's how to plot a subplots matrix where each cell is a scatterplot between the features of the Iris dataset that we already used here. The first thing we need to do is to convert the data in the format required by the Plotly API:

from sklearn.datasets import load_iris
iris = load_iris()

attr = [f.replace(' (cm)', '') for f in iris.feature_names]
colors = {'setosa': 'rgb(31, 119, 180)', 
          'versicolor': 'rgb(255, 127, 14)', 
          'virginica': 'rgb(44, 160, 44)'}

data = []
for i in range(4):
    for j in range(4):
        for t,flower in enumerate(iris.target_names):
            data.append({"name": flower, 
                         "x": iris.data[iris.target == t,i],
                         "y": iris.data[iris.target == t,j],
                         "type":"scatter", "mode":"markers",
                         'marker': {'color': colors[flower], 
                                    'opacity':0.7},
                         "xaxis": "x"+(str(i) if i!=0 else ''),
                         "yaxis": "y"+(str(j) if j!=0 else '')})

Then, we create a layout to adjust the look and feel:

d = 0.04; # padding
dms = [[i*d+i*(1-3*d)/4,i*d+((i+1)*(1-3*d)/4)] for i in range(4)]

layout = {
    "xaxis":{"domain":dms[0], "title":attr[0], 
             'zeroline':False,'showline':False},
    "yaxis":{"domain":dms[0], "title":attr[0], 
             'zeroline':False,'showline':False},
    "xaxis1":{"domain":dms[1], "title":attr[1], 
              'zeroline':False,'showline':False},
    "yaxis1":{"domain":dms[1], "title":attr[1], 
              'zeroline':False,'showline':False},
    "xaxis2":{"domain":dms[2], "title":attr[2], 
              'zeroline':False,'showline':False},
    "yaxis2":{"domain":dms[2], "title":attr[2], 
              'zeroline':False,'showline':False},
    "xaxis3":{"domain":dms[3], "title":attr[3], 
              'zeroline':False,'showline':False},
    "yaxis3":{"domain":dms[3], "title":attr[3], 
              'zeroline':False,'showline':False},
    "showlegend":False,
    "width": 500,
    "height": 550,
    "title":"Iris flower data set",
    "titlefont":{'color':'rgb(67,67,67)', 'size': 20}
    }

Finally, we import the plotly module (see this page for more details about the installation) and we are read to invoke the Plotly remote service:

import plotly
p = plotly.plotly('supersexyusername', 'mysecretkey')
# iplot shows the graph in the ipython notebook
# use plot if you're outside of the notebook
p.iplot(data,layout=layout, width=500,height=550)

The result should be as follows: This interactive graph of the iris data set below was inspired by this wonderful D3 example by Mike Bostock. Find out more example of Plotly visualizations in Python inside the IPython notebook here.

Sunday, June 16, 2013

2D Histrograms with Plotly

Plotly is an online tool that makes us able to create wonderful interactive visualizations of our data. It can plot data from csv files, spreadsheet, etc. but it also has a Python sandbox where we can put our Python snippets! In this post we will see a simple example that shows how to plot a 2D histogram in Plotly.

First, we need a snippet to generate some random sets of data:

from numpy import *
 
# generate some random sets of data
y0 = random.randn(100)/5. + 0.5 
x0 = random.randn(100)/5. + 0.5 
 
y1 = random.rayleigh(size=20)/7. + 0.1
x1 = random.rayleigh(size=20)/8. + 1.1
 
y2 = random.randn(50)/10. + 0.9
x2 = random.rayleigh(size=50)/10. + 0.1
 
y3 = random.randn(50)/8. + 0.1
x3 = random.randn(50)/8. + 0.1
 
y = concatenate([y0,y1,y2,y3])
x = concatenate([x0,x1,x2,x3])

The distribution of the variable x looks like:

The distribution of the variable y looks like: And the 2D histogram of both variables looks like this:

As showed in the colorbar, cells with lighter colors correspond to high density areas of the our distribution.

All the plots above were made with Plotly inside their Python sandbox using the following code:

## place the data into Plotly's dict format

# histograms
histx = {'x': x, 'type':'histogramx'}
histy = {'y': y, 'type':'histogramy'}
hist2d = {'x': x, 'y': y, 'type':'histogram2d'}

# scatter plots above the 1D histograms
# "jitter" the scatter plot points to make their distribution easier to distinguish
jitterx = {'x': x, 'y': 60+3*random.rand((len(x))), 'type':'scatter','mode':'markers','marker':{'size':4,'opacity':0.5,'symbol':'square'}}

jittery = {'x': y, 'y': 35+3*random.rand((len(x))), 'type':'scatter','mode':'markers','marker':{'size':4,'opacity':0.5,'symbol':'square'}}

# scatter points in the 2D histogram
xy = {'x': x, 'y': y, 'type':'scatter','mode':'markers','marker':{'size':5,'opacity':0.5,'symbol':'square'}}

# NOTE: the following lines plot all the graph above
plot([histx, jitterx], layout={'title': 'Distribution of Variable 1'})
plot([histy, jittery], layout={'title': 'Distribution of Variable 2'})
plot([hist2d,xy], layout={'title': 'Distribution of Variable 1 and Variable 2'})

Plots made with Plotly automatically provide interactions (click-drag to zoom, double-click to autoscale, shift-click to pan) and are very easy to embed in web page using the embedding snippet.

Thanks to the Plotly guys for providing the code of this post and this amazing tool :)

Friday, September 14, 2012

Boxplot with matplotlib

A boxplot (also known as a box-and-whisker diagram) is a way of summarizing a set of data measured on an interval scale. In this post I will show how to make a boxplot with pylab using a dataset that contains the monthly totals of the number of new cases of measles, mumps, and chicken pox for New York City during the years 1931-1971. The data was extracted from the Hipel-McLeod Time Series Datasets Collection and you can download it from here in the matlab format.
Let's make a box plot of the monthly distribution of chicken pox cases:

from pylab import *
from scipy.io import loadmat

NYCdiseases = loadmat('NYCDiseases.mat') # it's a matlab file

# multiple box plots on one figure
# Chickenpox cases by month
figure(1)
# NYCdiseases['chickenPox'] is a matrix 
# with 30 rows (1 per year) and 12 columns (1 per month)
boxplot(NYCdiseases['chickenPox'])
labels = ('Jan', 'Feb', 'Mar', 'Apr', 'May', 'Jun', 
          'Jul', 'Aug', 'Sep', 'Oct', 'Nov', 'Dec')
xticks(range(1,13),labels, rotation=15)
xlabel('Month')
ylabel('Chickenpox cases')
title('Chickenpox cases in NYC 1931-1971')

The result should be as follows:

On each box, the central mark is the median, the edges of the box are the lower hinge (defined as the 25th percentile) and the upper hinge (the 75th percentile), the whiskers extend to the most extreme data points not considered outliers, these ones are plotted individually.
Using the graph we can compare the range and distribution of the chickenpox cases for each month. We can observe that March and April are the month with the highest number of cases but also the ones with the greatest variability. We can compare the distribution of the three diseases in the same way:

# building the data matrix
data = [NYCdiseases['measles'], 
        NYCdiseases['mumps'], NYCdiseases['chickenPox']]

figure(2)
boxplot(data)
xticks([1,2,3],('measles','mumps','chickenPox'), rotation=15)
ylabel('Monthly cases')
title('Contagious childhood disease in NYC 1931-1971')

show()

And this is the result:

Here, we can observe that the chicken pox distribution has the median higher than the other diseases. The mumps distribution seems to have small variability compared to the other ones and the measles distribution has a low median but a very high number of outliers.

Thursday, December 15, 2011

Polar Charts with matplolib

A polar system is a two-dimensional coordinate system, where there are two coordinates: the radial and the angular coordinates. The radial coordinate denotes the point distance from a central point (pole) and the angular coordinate denotes the angle required to reach the point from the 0 degree ray (polar axis). Let's see an example of how to make polar charts with matplotlib:

from pylab import figure,polar,show
from numpy import arange,pi,cos

theta = arange(0, 2, 1./180)*pi # angular coordinates
figure(1)
polar(3*theta, theta/5) # drawing a spiral
figure(2)
polar(theta, cos(4*theta)) # drawing the polar rose
show()

The result of this script consists of two charts. The first with the spiral

and the second with the polar rose

Thursday, December 8, 2011

Lissajous curves

And after the Epitrochoids, we're going to see another family of wonderful figures: The Lissajous curves. The equations that describe these curves are the following

the curves vary with respect the parameter t and their appearance is determined by the ratio a/b and the value of δ.
As usual, I made a snippet to visualize them:

from numpy import sin,pi,linspace
from pylab import plot,show,subplot

a = [1,3,5,3] # plotting the curves for
b = [1,5,7,4] # different values of a/b
delta = pi/2
t = linspace(-pi,pi,300)

for i in range(0,4):
 x = sin(a[i] * t + delta)
 y = sin(b[i] * t)
 subplot(2,2,i+1)
 plot(x,y)

show()

This is the result

How to make Bubble Charts with matplotlib

In this post we will see how to make a bubble chart using matplotlib. The snippet that we are going to see was inspired by a tutorial on flowingdata.com where R is used to make a bubble chart that represents some data extracted from a csv file about the crime rates of America by states. I used the dataset provided by flowingdata to create a similar chart with Python. Let's see the code:

from pylab import *
from scipy import *

# reading the data from a csv file
durl = 'http://datasets.flowingdata.com/crimeRatesByState2005.csv'
rdata = genfromtxt(durl,dtype='S8,f,f,f,f,f,f,f,i',delimiter=',')

rdata[0] = zeros(8) # cutting the label's titles
rdata[1] = zeros(8) # cutting the global statistics

x = []
y = []
color = []
area = []

for data in rdata:
 x.append(data[1]) # murder
 y.append(data[5]) # burglary
 color.append(data[6]) # larceny_theft 
 area.append(sqrt(data[8])) # population
 # plotting the first eigth letters of the state's name
 text(data[1], data[5], 
      data[0],size=11,horizontalalignment='center')

# making the scatter plot
sct = scatter(x, y, c=color, s=area, linewidths=2, edgecolor='w')
sct.set_alpha(0.75)

axis([0,11,200,1280])
xlabel('Murders per 100,000 population')
ylabel('Burglaries per 100,000 population')
show()

The following figure is the resulting bubble chart

It shows the number of burglaries versus the number of murders per 100,000 population. Every bubble is a state of America, the size of the bubbles represents the population of the state and the color is the number of larcenies.

Thursday, November 17, 2011

Fun with Epitrochoids

An epitrochoid is a curve traced by a point attached to a circle of radius r rolling around the outside of a fixed circle of radius R, where the point is a distance d from the center of the exterior circle [Ref]. Lately I found the Epitrochoid's parametric equations on wikipedia:

So,I decided to plot them with pylab. This is the script I made

from numpy import sin,cos,linspace,pi
import pylab

# curve parameters
R = 14
r = 1
d = 18

t = linspace(0,2*pi,300)

# Epitrochoid parametric equations
x = (R-r)*cos(t)-d*cos( (R+r)*t / r )
y = (R-r)*sin(t)-d*sin( (R+r)*t / r )

pylab.plot(x,y,'r')
pylab.axis('equal')
pylab.show()

And this is the result

isn't it fashinating? :)

How to find the intersection of two functions

Previously we have seen how to find roots of a function with fsolve, in this example we use fsolve to find an intersection between two functions, sin(x) and cos(x):

from scipy.optimize import fsolve
import pylab
import numpy

def findIntersection(fun1,fun2,x0):
 return fsolve(lambda x : fun1(x) - fun2(x),x0)

result = findIntersection(numpy.sin,numpy.cos,0.0)
x = numpy.linspace(-2,2,50)
pylab.plot(x,numpy.sin(x),x,numpy.cos(x),result,numpy.sin(result),'ro')
pylab.show()

In the graph we can see sin(x) (blue), cos(x) (green) and the intersection found (red dot) starting from x = 0.

Monday, May 2, 2011

How to create a chart with Google Chart API

The example shows how to create a scatter plot using the Google Chart API.

import random
import urllib

def list2String(x):
 """ from a list like [1,2,5]
     return a string like '1,2,5' """
 data = ""
 for i in x:
  data += str(i)+","
 return data[0:len(data)-1]

def makeChart(x,y,filename):
 query_url = "http://chart.apis.google.com/chart?chxt=x,y&chs=300x200&cht=s&chd=t:"
 query_url += list2String(x)+"|"+list2String(y)
 chart = urllib.urlopen(query_url) # retrieve the chart
 print "saving",query_url
 f = open(filename,"wb")
 f.write(chart.read()) # save the pic
 f.close()

x = random.sample(range(0,100),10) # list with
y = random.sample(range(0,100),10) # random values in [0 100[
makeChart(x,y,"chart.png")

You can embed the picture in a web page:

<img alt="Google chart example" src="http://chart.apis.google.com/chart?chxt=x,y&amp;chs=300x200&amp;cht=s&amp;chd=t:64,10,18,42,49,83,73,27,44,51|77,89,13,87,27,34,38,44,22,42" />

Or use it from the disk.

Google chart example