The Glowing Python: plotly

Showing posts with label plotly. Show all posts

Wednesday, November 26, 2014

Comparing strikers statistics

Here we compare the scoring statistics of four of the best strikers of the recent football history: Del Piero, Trezeguet, Ronaldo and Vieri. The statistics that we will look at are the scoring trajectory, scoring rate and number of appearances.
To compute these values we need to scrape the career statistics (number of goals and appearances per season) on the Wikipedia pages of the players:

from bs4 import BeautifulSoup
from urllib2 import urlopen

def get_total_goals(url):
    """
    Given the url of a wikipedia page about a football striker
    returns three numy arrays:
    - years, each element corresponds to a season
    - apprearances, contains the number of appearances each season
    - goals, contains the number of goal scored each season
    
    Unfortunately this function is able to parse 
    only the pages of few strikers.
    """
    soup = BeautifulSoup(urlopen(url).read())
    table = soup.find("table", { "class" : "wikitable" })
    years = []
    apps = []
    goals = []
    for row in table.findAll("tr"):
        cells = row.findAll("td")
        if len(cells) > 1:
            years.append(int(cells[0].text[:4]))
            apps.append(int(cells[len(cells)-2].text))
            goals.append(int(cells[len(cells)-1].text))
    return np.array(years), 
           np.array(apps, dtype='float'), 
           np.array(goals)

ronaldo = get_total_goals('http://en.wikipedia.org/wiki/Ronaldo')
vieri = get_total_goals('http://en.wikipedia.org/wiki/Christian_Vieri')
delpiero = get_total_goals('http://en.wikipedia.org/wiki/Alessandro_Del_Piero')
trezeguet = get_total_goals('http://en.wikipedia.org/wiki/David_Trezeguet')

Now we are ready to compute our statistics. For each statistics we will produce an interactive chart using plotly.

Scoring trajectory

import plotly.plotly as py
from plotly.graph_objs import *
py.sign_in("sexyusername", "mypassword")

data = Data([
    Scatter(x=delpiero[0],y=cumsum(delpiero[2]), 
            name='Del Piero', mode='lines'),
    Scatter(x=trezeguet[0],y=cumsum(trezeguet[2]), 
            name='Trezeguet', mode='lines'),
    Scatter(x=ronaldo[0],y=cumsum(ronaldo[2]), 
            name='Ronaldo', mode='lines'),
    Scatter(x=vieri[0],y=cumsum(vieri[2]), 
            name='Vieri', mode='lines'),
])

layout = Layout(
    title='Scoring Trajectory',
    xaxis=XAxis(title='Year'),
    yaxis=YAxis(title='Cumuative goal'),
    legend=Legend(x=0.0,y=1.0))

fig = Figure(data=data, layout=layout)

py.iplot(fig, filename='cumulative-goals')

The scoring trajectory is given by the yearly cumulative totals of goals scored. From the scoring trajectories we can see that Ronaldo was a goal machine since his first professional season and his worse period was from 1999 to 2001. Del Piero and Trezeguet have the longest careers (and they're still playing!). Vieri had the shortest career but it's impressive to see that the number of goals he scored increased almost constantly from 1996 to 2004.

Scoring rate

data = Data([
    Bar(
        x=['Ronaldo', 'Vieri', 'Trezeguet', 'Del Piero'],
        y=[np.sum(ronaldo[2])/np.sum(ronaldo[1]), 
           np.sum(vieri[2])/np.sum(vieri[1]),
           np.sum(trezeguet[2])/np.sum(trezeguet[1]),
           np.sum(delpiero[2])/np.sum(delpiero[1])]
    )
])
py.iplot(data, filename='goal-average')

The scoring rate is the number of goals scored divided by the number of appearances. Ronaldo has a terrific 0.67 scoring rate, meaning that, on average he scored more than three goals each five games. Vieri and Trezeguet have a very similar scoring rate, almost one goal each two games. While Del Piero has 0.40, two goals each five games.

Appearances

data = Data([
    Bar(
        x=['Del Piero', 'Trezeguet', 'Ronaldo', 'Vieri'],
        y=[np.sum(delpiero[1]),
           np.sum(trezeguet[1]),
           np.sum(ronaldo[1]),
           np.sum(vieri[1])]
    )
])
py.iplot(data, filename='appearances')

The number of Del Piero's appearances on a football field is impressive. At the moment I'm writing, he played 773 games. No one of the other players was able to play the 70% of the games played by the Italian numero 10.

Wednesday, August 27, 2014

Visualizing electricity prices with Plotly

We have already mentioned plotly many times (here are other two posts about it) and this time we'll see how to use it in order to build an interactive visualization of the latest data about the domestic electricity prices provided by International Energy Agency (IEA).

In the chart that we are going to make, we will show the prices of the domestic electricity among the countries monitored by IEA in 2013 with a bar chart where each bar shows the electricity price and the fraction of the price represented by the taxes.

First, we import the data (the full data is available here, in this post we'll use only the Table 5.5.1 in cvs format) using pandas:

import pandas as pd
ieaprices = pd.read_csv('iea_prices.csv',
                        na_values=('..','+','-','+/-'))
ieaprices = ieaprices.dropna()
ieaprices.set_index(['Country'],inplace=True)
countries = ieaprices.sort('2013_with_tax').index

Then, we arrange the data in order create a plotly bar chart:

from plotly.graph_objs import Bar,Data,Layout,Figure
from plotly.graph_objs import XAxis,YAxis,Marker,Scatter,Legend

prices_bars = []

# computing the taxes
taxes = ieaprices['2013_with_tax']-ieaprices['2013_no_tax']

# adding the prices to the chart
prices_bars.append(Bar(x=countries.values, 
             y=ieaprices['2013_no_tax'].ix[countries].values,
             marker=Marker(color='#0074D9'),
             name='price without taxes'))

# adding the taxes to the chart
prices_bars.append(Bar(x=countries.values, 
             y=taxes.ix[countries].values,
             marker=Marker(color='#0099D9'),name='taxes'))

And now we are ready to submit the data to the plotly server to render the chart:

import plotly.plotly as py

py.sign_in("SexyUser", "asexykeyforasexyuser")

meadian_line = Scatter(
    x=countries.values,
    y=np.ones(len(countries))*ieaprices['2013_with_tax'].median(),
    marker=Marker(color='rgb(40, 40, 40)'),
    opacity=0.5,
    mode='lines',
    name='Median')

data = Data(prices_bars+[meadian_line])

layout = Layout(
    title='Domestic electricity prices in the IEA in 2013',
    xaxis=XAxis(type='category'),
    yaxis=YAxis(title='Price (Pence per Kwh)'),
    legend=Legend(x=0.0,y=1.0),
    barmode='stack',
    hovermode='closest')

fig = Figure(data=data, layout=layout)

# this line will work only in ipython
# use py.plot() in other environments
plot_url = py.iplot(fig, filename='ieaprices2013')

The result should look like this:

Looking at the chart we note that, during 2013, the average domestic electricity prices, including taxes, in Denmark and Germany were the highest in the IEA. We also note that in Denmark the fraction of taxes paid is higher than the actual electricity price whereas in Germany the actual electricity price and the taxes are almost the same. Interestingly, USA has the lowest price and the lowest taxation.

This post shows how to create one of the charts commented here, where a more insights about the IEA data are provided.

Thursday, December 12, 2013

Multiple axes and subplots in Plotly

Some time ago we have seen how to visualize 2D histograms with Plotly and in this post we will see how to use one of the mostin interesting new features introduced by the Plotly guys: multiple axes into subplots. This features makes us able to couple subplots, so when you zoom or pan in one subplot, it zooms and pans in the other subplots. Just like the graphs produced by D3.
Here's how to plot a subplots matrix where each cell is a scatterplot between the features of the Iris dataset that we already used here. The first thing we need to do is to convert the data in the format required by the Plotly API:

from sklearn.datasets import load_iris
iris = load_iris()

attr = [f.replace(' (cm)', '') for f in iris.feature_names]
colors = {'setosa': 'rgb(31, 119, 180)', 
          'versicolor': 'rgb(255, 127, 14)', 
          'virginica': 'rgb(44, 160, 44)'}

data = []
for i in range(4):
    for j in range(4):
        for t,flower in enumerate(iris.target_names):
            data.append({"name": flower, 
                         "x": iris.data[iris.target == t,i],
                         "y": iris.data[iris.target == t,j],
                         "type":"scatter", "mode":"markers",
                         'marker': {'color': colors[flower], 
                                    'opacity':0.7},
                         "xaxis": "x"+(str(i) if i!=0 else ''),
                         "yaxis": "y"+(str(j) if j!=0 else '')})

Then, we create a layout to adjust the look and feel:

d = 0.04; # padding
dms = [[i*d+i*(1-3*d)/4,i*d+((i+1)*(1-3*d)/4)] for i in range(4)]

layout = {
    "xaxis":{"domain":dms[0], "title":attr[0], 
             'zeroline':False,'showline':False},
    "yaxis":{"domain":dms[0], "title":attr[0], 
             'zeroline':False,'showline':False},
    "xaxis1":{"domain":dms[1], "title":attr[1], 
              'zeroline':False,'showline':False},
    "yaxis1":{"domain":dms[1], "title":attr[1], 
              'zeroline':False,'showline':False},
    "xaxis2":{"domain":dms[2], "title":attr[2], 
              'zeroline':False,'showline':False},
    "yaxis2":{"domain":dms[2], "title":attr[2], 
              'zeroline':False,'showline':False},
    "xaxis3":{"domain":dms[3], "title":attr[3], 
              'zeroline':False,'showline':False},
    "yaxis3":{"domain":dms[3], "title":attr[3], 
              'zeroline':False,'showline':False},
    "showlegend":False,
    "width": 500,
    "height": 550,
    "title":"Iris flower data set",
    "titlefont":{'color':'rgb(67,67,67)', 'size': 20}
    }

Finally, we import the plotly module (see this page for more details about the installation) and we are read to invoke the Plotly remote service:

import plotly
p = plotly.plotly('supersexyusername', 'mysecretkey')
# iplot shows the graph in the ipython notebook
# use plot if you're outside of the notebook
p.iplot(data,layout=layout, width=500,height=550)

The result should be as follows: This interactive graph of the iris data set below was inspired by this wonderful D3 example by Mike Bostock. Find out more example of Plotly visualizations in Python inside the IPython notebook here.

Sunday, June 16, 2013

2D Histrograms with Plotly

Plotly is an online tool that makes us able to create wonderful interactive visualizations of our data. It can plot data from csv files, spreadsheet, etc. but it also has a Python sandbox where we can put our Python snippets! In this post we will see a simple example that shows how to plot a 2D histogram in Plotly.

First, we need a snippet to generate some random sets of data:

from numpy import *
 
# generate some random sets of data
y0 = random.randn(100)/5. + 0.5 
x0 = random.randn(100)/5. + 0.5 
 
y1 = random.rayleigh(size=20)/7. + 0.1
x1 = random.rayleigh(size=20)/8. + 1.1
 
y2 = random.randn(50)/10. + 0.9
x2 = random.rayleigh(size=50)/10. + 0.1
 
y3 = random.randn(50)/8. + 0.1
x3 = random.randn(50)/8. + 0.1
 
y = concatenate([y0,y1,y2,y3])
x = concatenate([x0,x1,x2,x3])

The distribution of the variable x looks like:

The distribution of the variable y looks like: And the 2D histogram of both variables looks like this:

As showed in the colorbar, cells with lighter colors correspond to high density areas of the our distribution.

All the plots above were made with Plotly inside their Python sandbox using the following code:

## place the data into Plotly's dict format

# histograms
histx = {'x': x, 'type':'histogramx'}
histy = {'y': y, 'type':'histogramy'}
hist2d = {'x': x, 'y': y, 'type':'histogram2d'}

# scatter plots above the 1D histograms
# "jitter" the scatter plot points to make their distribution easier to distinguish
jitterx = {'x': x, 'y': 60+3*random.rand((len(x))), 'type':'scatter','mode':'markers','marker':{'size':4,'opacity':0.5,'symbol':'square'}}

jittery = {'x': y, 'y': 35+3*random.rand((len(x))), 'type':'scatter','mode':'markers','marker':{'size':4,'opacity':0.5,'symbol':'square'}}

# scatter points in the 2D histogram
xy = {'x': x, 'y': y, 'type':'scatter','mode':'markers','marker':{'size':5,'opacity':0.5,'symbol':'square'}}

# NOTE: the following lines plot all the graph above
plot([histx, jitterx], layout={'title': 'Distribution of Variable 1'})
plot([histy, jittery], layout={'title': 'Distribution of Variable 2'})
plot([hist2d,xy], layout={'title': 'Distribution of Variable 1 and Variable 2'})

Plots made with Plotly automatically provide interactions (click-drag to zoom, double-click to autoscale, shift-click to pan) and are very easy to embed in web page using the embedding snippet.

Thanks to the Plotly guys for providing the code of this post and this amazing tool :)