The Glowing Python: matplotlib

Showing posts with label matplotlib. Show all posts

Friday, May 1, 2020

Tornado plots with matplotlib

Lately there's a bit of attention about charts where the values of a time series are plotted against the change point by point. This thanks to this rather colorful and cluttered Tornado plot.

In this post we will see how to make one of those charts with our favorite plotting library, matplotlib, and we'll also try to understand how to read them.
Let's start loading the records of the concentration of CO2 in the atmosphere and aggregate the values on month level. After that we can plot straight away the value of the concentration against the change compared to the previous month:

import pandas as pd
import matplotlib.pyplot as plt
%matplotlib inline

data_url = 'ftp://aftp.cmdl.noaa.gov/products/trends/co2/co2_weekly_mlo.txt'
co2_data = pd.read_csv(data_url, sep='\s+', comment='#', na_values=-999.99,
                       names=['year', 'month', 'day', 'decimal', 'ppm', 
                       'days', '1_yr_ago',  '10_yr_ago', 'since_1800'])
co2_data = co2_data.groupby([co2_data.year, co2_data.month]).mean()
idx = co2_data.index.to_flat_index()
co2_data['timestamp'] = [pd.Timestamp(year=y, month=m, day=1) for y, m in idx]
co2_data.set_index('timestamp', inplace=True)
co2_data = co2_data['2018-04-01':]


plt.plot(co2_data.ppm.diff(), co2_data.ppm, '-o')

The result is nicely loopy. This is because we plotted data for 2 years and the time series presents a yearly seasonality. Still, apart from the loop it's quite hard to understand anything without adding few details. Let's print the name of the months and highlight beginning and end of the time series:

from calendar import month_abbr

plt.figure(figsize=(8, 12))
diffs = co2_data.ppm.diff()
plt.plot(diffs, co2_data.ppm, '-o', alpha=.6, zorder=0)
plt.scatter([diffs[1]], [co2_data.ppm[1]],
            c='C0', s=140)
plt.scatter([diffs[-1]], [co2_data.ppm[-1]],
            c='C0', s=140)

for d, v, ts in zip(diffs,
                    co2_data.ppm,
                    co2_data.index):
    plt.text(d, v-.15, '%s\n%d' % (month_abbr[ts.month], ts.year),
             horizontalalignment='left', verticalalignment='top')

plt.xlim([-np.abs(diffs).max()-.1,
          np.abs(diffs).max()+.1])
plt.ylabel('CO2 concentrations (ppm)')
plt.xlabel('change from previous month (ppm)')

The situation is much improved. Now we can see that from June to September (value on the left) the concentrations decrease while from October to May (value on the right) they increase. If we look at the points where the line of zero chance is crossed, we can spot when there's a change of trend. We see that the trend changes from negative to positive from September to October and the opposite from May to June.

Was this easier to observe just have time on the y axis? That's easy to see looking at the chart below.

Sunday, August 11, 2019

Visualizing distributions with scatter plots in matplotlib

Let's say that we want to study the time between the end of a marked point and next serve in a tennis game. After gathering our data, the first thing that we can do is to draw a histogram of the variable that we are interested in:

import pandas as pd
import matplotlib.pyplot as plt

url = 'https://raw.githubusercontent.com/fivethirtyeight'
url += '/data/master/tennis-time/serve_times.csv'
event = pd.read_csv(url)

plt.hist(event.seconds_before_next_point, bins=10)
plt.xlabel('Seconds before next serve')
plt.show()

The histogram reveals some interesting aspects of the distribution, indeed we can see that data is slightly skewed to the right and that on average the server takes 20 seconds. However, we couldn't tell how many time the serves happens before 10 seconds or after 35. Of course, one could increase the bins of the histogram, but this would lead to a chart which is not particularly elegant and that might hide some other details.

To have a better understanding of the situation we can draw a scatter plot of the variable we are studying:

import numpy as np
from scipy.stats.kde import gaussian_kde

def distribution_scatter(x, symmetric=True, cmap=None, size=None):
    """
    Plot the distribution of x showing all the points.
    The x axis represents the samples in x
    and the y axis is function of the probability of x
    and random assignment.
    
    Returns the position on the y axis.
    """
    pdf = gaussian_kde(x)
    w = np.random.rand(len(x))
    if symmetric:
        w = w*2-1
    pseudo_y = pdf(x) * w
    if cmap:
        plt.scatter(x, pseudo_y, c=x, cmap=cmap, s=size)
    else:
        plt.scatter(x, pseudo_y, s=size)
    return pseudo_y

In this chart each sample is represented with a point and the spread of the points in the y direction depends on the probability of occurrence. In this case we can easily see that 4 serves happened before 10 seconds and 3 after 35.

Since we're not really interested on the values on y axis but only on the spread, we can remove the axis and add few details on the outliers to enrich the chart:

url = 'https://raw.githubusercontent.com/fivethirtyeight'
url += '/data/master/tennis-time/serve_times.csv'
event = pd.read_csv(url)

plt.figure(figsize=(7, 11))
title = 'Time in seconds between'
title += '\nend of marked point and next serve'
title += '\nat 2015 French Open'
plt.title(title, loc='left', fontsize=18, color='gray')
py = distribution_scatter(event.seconds_before_next_point, cmap='cool');


cut_h = np.percentile(event.seconds_before_next_point, 98)
outliers = event.seconds_before_next_point> cut_h


ha = {True: 'right', False: 'left'}
for x, y, c in zip(event[outliers].seconds_before_next_point,
                   py[outliers],
                   event[outliers].server):
    plt.text(x, y+.0005, c,
             ha=ha[x<0], va='bottom', fontsize=12)

plt.xlabel('Seconds before next serve', fontsize=15)
plt.gca().spines['left'].set_visible(False)
plt.gca().spines['right'].set_visible(False)
plt.gca().spines['top'].set_visible(False)
plt.yticks([])
plt.xticks(np.arange(5, 41, 5))
plt.xlim([5, 40])
plt.show()

Where to go next:

The data used in this post has also been presented in this article on fivethirtyeight.com.
If you found this visualization technique useful, you can check out the seaborn documentation about swarm plots.

Wednesday, April 17, 2019

Visualizing atmospheric carbon dioxide

Let's have a look at how to create a visualization that shows how CO2 concentrations evolved in the atmosphere. First, we fetched from the Earth System Research Laboratory website like follows:

import pandas as pd

data_url = 'ftp://aftp.cmdl.noaa.gov/products/trends/co2/co2_weekly_mlo.txt'
co2_data = pd.read_csv(data_url, sep='\s+', comment='#', na_values=-999.99,
                       names=['year', 'month', 'day', 'decimal', 'ppm', 
                       'days', '1_yr_ago',  '10_yr_ago', 'since_1800'])

co2_data['timestamp'] = co2_data.apply(lambda x: pd.Timestamp(year=int(x.year),
                                                             month=int(x.month),
                                                             day=int(x.day)),
                                       axis=1)
co2_data = co2_data[['timestamp', 'ppm']].set_index('timestamp').ffill()

Then, we group the it by year and month at the same time storing the result in a matrix where each element represents the concentration in a specific year and month:

import numpy as np
import matplotlib.pyplot as plt
from calendar import month_abbr

co2_data = co2_data['1975':'2018']
n_years = co2_data.index.year.max() - co2_data.index.year.min()
z = np.ones((n_years +1 , 12)) * np.min(co2_data.ppm)
for d, y in co2_data.groupby([co2_data.index.year, co2_data.index.month]):
  z[co2_data.index.year.max() - d[0], d[1] - 1] = y.mean()[0]
  
plt.figure(figsize=(10, 14))
plt.pcolor(np.flipud(z), cmap='hot_r')
plt.yticks(np.arange(0, n_years+1)+.5,
           range(co2_data.index.year.min(), co2_data.index.year.max()+1));
plt.xticks(np.arange(13)-.5, month_abbr)
plt.xlim((0, 12))
plt.colorbar().set_label('Atmospheric Carbon Dioxide in ppm')
plt.show()

This visualization makes us able to compare the CO2 levels month by month with single glance. For example, we see that the period from April to June gets dark quicker than other periods, meaning that it contains the highest levels every year. Conversely, the period that goes from September to October gets darker more slowly, meaning that it's the period with the lowest CO2 levels. Also, looking at the color bar we note that in 43 years there was a 80 ppm increase.

Is this bad for the planet earth? Reading Hansen et al. (2008) we can classify CO2 levels less than 300 ppm as safe, levels between 300 and 350 ppm as dangerous, while levels beyond 350 ppm are considered catastrophic. According to this, the chart is a sad picture of how the levels transitioned from dangerous to catastrophic!

Concerned by this situation I created the CO2 Forecast twitter account where I'll publish short and long term forecasts of CO2 levels in the atmosphere.

Friday, June 29, 2018

Plotting a calendar in matplotlib

And here's a function to plot a compact calendar with matplotlib:

import calendar
import numpy as np
from matplotlib.patches import Rectangle
import matplotlib.pyplot as plt

def plot_calendar(days, months):
    plt.figure(figsize=(9, 3))
    # non days are grayed
    ax = plt.gca().axes
    ax.add_patch(Rectangle((29, 2), width=.8, height=.8, 
                           color='gray', alpha=.3))
    ax.add_patch(Rectangle((30, 2), width=.8, height=.8,
                           color='gray', alpha=.5))
    ax.add_patch(Rectangle((31, 2), width=.8, height=.8,
                           color='gray', alpha=.5))
    ax.add_patch(Rectangle((31, 4), width=.8, height=.8,
                           color='gray', alpha=.5))
    ax.add_patch(Rectangle((31, 6), width=.8, height=.8,
                           color='gray', alpha=.5))
    ax.add_patch(Rectangle((31, 9), width=.8, height=.8,
                           color='gray', alpha=.5))
    ax.add_patch(Rectangle((31, 11), width=.8, height=.8,
                           color='gray', alpha=.5))
    for d, m in zip(days, months):
        ax.add_patch(Rectangle((d, m), 
                               width=.8, height=.8, color='C0'))
    plt.yticks(np.arange(1, 13)+.5, list(calendar.month_abbr)[1:])
    plt.xticks(np.arange(1,32)+.5, np.arange(1,32))
    plt.xlim(1, 32)
    plt.ylim(1, 13)
    plt.gca().invert_yaxis()
    # remove borders and ticks
    for spine in plt.gca().spines.values():
        spine.set_visible(False)
    plt.tick_params(top=False, bottom=False, left=False, right=False)
    plt.show()

You can use it to highlight the days with full moon in 2018:

full_moon_day = [2, 31, 2, 31, 30, 29, 28, 27, 26, 25, 24, 23, 22]
full_moon_month = [1, 1, 3, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12]
plot_calendar(full_moon_day, full_moon_month)

Or just highlight all the weekends:

from datetime import datetime, timedelta

def get_weekends(year):
    weekend_day = []
    weekend_month = [] 
    start = datetime(year, 1, 1)
    for i in range(365):
        day_of_the_year = start + timedelta(days=i)
        if day_of_the_year.weekday() > 4:
            weekend_day.append(day_of_the_year.day)
            weekend_month.append(day_of_the_year.month)
    return weekend_day, weekend_month

weekend_day, weekend_month = get_weekends(2018)
plot_calendar(weekend_day, weekend_month)

Wednesday, May 30, 2018

Visualizing UK Carbon Emissions

Have you ever wanted to check carbon emissions in the UK and never had an easy way to do it? Now you can use the Official Carbon Intensity API developed by the National Grid. Let's see an example of how to use the API to summarize the emissions in the month of May. First, we download the data with a request to the API:

import urllib.request
import json
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt

period = ('2018-05-01T00:00Z', '2018-05-28T00:00Z')
url = 'https://api.carbonintensity.org.uk/intensity/%s/%s'
url = url % period
response = urllib.request.urlopen(url)
data = json.loads(response.read())['data']

We organize the result in a DataFrame indexed by timestamps:

carbon_intensity = pd.DataFrame()
carbon_intensity['timestamp'] = [pd.to_datetime(d['from']) for d in data]
carbon_intensity['intensity'] = [d['intensity']['actual'] for d in data]
carbon_intensity['classification'] = [d['intensity']['index'] for d in data]
carbon_intensity.set_index('timestamp', inplace=True)

From the classification provided we extract the thresholds to label emissions in low, high and moderate:

thresholds = carbon_intensity.groupby(by='classification').min()
threshold_high = thresholds[thresholds.index == 'high'].values[0][0]
threshold_moderate = thresholds[thresholds.index == 'moderate'].values[0][0]

Now we group the data by hour of the day and create a boxplot that shows some interesting facts about carbon emissions in May:

hour_group = carbon_intensity.groupby(carbon_intensity.index.hour)

plt.figure(figsize=(12, 6))
plt.title('UK Carbon Intensity in May 2018')
plt.boxplot([g.intensity for _,g in hour_group], 
            medianprops=dict(color='k'))

ymin, ymax = plt.ylim()

plt.fill_between(x=np.arange(26), 
                 y1=np.ones(26)*threshold_high, 
                 y2=np.ones(26)*ymax, 
                 color='crimson', 
                 alpha=.3, label='high')

plt.fill_between(x=np.arange(26), 
                 y1=np.ones(26)*threshold_moderate, 
                 y2=np.ones(26)*threshold_high, 
                 color='skyblue', 
                 alpha=.5, label='moderate')

plt.fill_between(x=np.arange(26), 
                 y1=np.ones(26)*threshold_moderate, 
                 y2=np.ones(26)*ymin, 
                 color='palegreen', 
                 alpha=.3, label='low')

plt.ylim(ymin, ymax)
plt.ylabel('carbon intensity (gCO_2/kWH)')
plt.xlabel('hour of the day')
plt.legend(loc='upper left', ncol=3,
           shadow=True, fancybox=True)
plt.show()

We notice that the medians almost always falls in the moderate emissions region and in two cases it even falls in the low region. In the early afternoon the medians reach their minimum while the maximum is reached in the evening. It's nice to see that most of the hours present outliers in the low emissions region and only few outliers are in the high region.

Do you want to know more about boxplots? Check this out!

Thursday, April 9, 2015

Stacked area plots with matplotlib

In a stacked area plot, the values on the y axis are accumulated at each x position and the area between the resulting values is then filled. These plots are helpful when it comes to compare quantities through time. For example, considering the the monthly totals of the number of new cases of measles, mumps, and chicken pox for New York City during the years 1931-1971 (data that we already considered here). We can compare the number of cases of each disease month by month. First, we need to load and organize the data properly:

from scipy.io import loadmat
NYCdiseases = loadmat('NYCDiseases.mat') # loading a matlab file
chickenpox = np.sum(NYCdiseases['chickenPox'],axis=0)
mumps = np.sum(NYCdiseases['mumps'],axis=0)
measles = np.sum(NYCdiseases['measles'],axis=0)

In the snippet above we read the data from a Matlab file and summed the number of cases for each month. We are now ready to visualize our values using the function stackplot:

import matplotlib.patches as mpatches
import matplotlib.pyplot as plt

plt.stackplot(arange(12)+1,
          [chickenpox, mumps, measles], 
          colors=['#377EB8','#55BA87','#7E1137'])
plt.xlim(1,12)

# creating the legend manually
plt.legend([mpatches.Patch(color='#377EB8'),  
            mpatches.Patch(color='#55BA87'), 
            mpatches.Patch(color='#7E1137')], 
           ['chickenpox','mumps','measles'])
plt.show()

The result is as follows:

We note that the highest number of cases happens between January and Jul, also we see that measles cases are more common than mumps and chicken pox cases.

Thursday, December 6, 2012

3D stem plot

A two-dimensional stem plot displays data as lines extending from a baseline along the x axis. In the following snippet we will see how to make a three-dimensional stem plot using the mplot3d toolkit. In this case we have that data is displayed as lines extending from the x-y plane along the z direction.Let's go with the code:

from numpy import linspace, sin, cos
from pylab import figure, show
from mpl_toolkits.mplot3d import Axes3D

# generating some data
x = linspace(-10,10,100);
y = sin(x);
z = cos(x);

fig = figure()
 
ax = Axes3D(fig)

# plotting the stems
for i in range(len(x)):
  ax.plot([x[i], x[i]], [y[i], y[i]], [0, z[i]], 
          '--', linewidth=2, color='b', alpha=.5)

# plotting a circle on the top of each stem
ax.plot(x, y, z, 'o', markersize=8, 
        markerfacecolor='none', color='b',label='ib')

ax.set_xlabel('x')
ax.set_ylabel('y')
ax.set_zlabel('z')

show()

And this is the result

Friday, September 14, 2012

Boxplot with matplotlib

A boxplot (also known as a box-and-whisker diagram) is a way of summarizing a set of data measured on an interval scale. In this post I will show how to make a boxplot with pylab using a dataset that contains the monthly totals of the number of new cases of measles, mumps, and chicken pox for New York City during the years 1931-1971. The data was extracted from the Hipel-McLeod Time Series Datasets Collection and you can download it from here in the matlab format.
Let's make a box plot of the monthly distribution of chicken pox cases:

from pylab import *
from scipy.io import loadmat

NYCdiseases = loadmat('NYCDiseases.mat') # it's a matlab file

# multiple box plots on one figure
# Chickenpox cases by month
figure(1)
# NYCdiseases['chickenPox'] is a matrix 
# with 30 rows (1 per year) and 12 columns (1 per month)
boxplot(NYCdiseases['chickenPox'])
labels = ('Jan', 'Feb', 'Mar', 'Apr', 'May', 'Jun', 
          'Jul', 'Aug', 'Sep', 'Oct', 'Nov', 'Dec')
xticks(range(1,13),labels, rotation=15)
xlabel('Month')
ylabel('Chickenpox cases')
title('Chickenpox cases in NYC 1931-1971')

The result should be as follows:

On each box, the central mark is the median, the edges of the box are the lower hinge (defined as the 25th percentile) and the upper hinge (the 75th percentile), the whiskers extend to the most extreme data points not considered outliers, these ones are plotted individually.
Using the graph we can compare the range and distribution of the chickenpox cases for each month. We can observe that March and April are the month with the highest number of cases but also the ones with the greatest variability. We can compare the distribution of the three diseases in the same way:

# building the data matrix
data = [NYCdiseases['measles'], 
        NYCdiseases['mumps'], NYCdiseases['chickenPox']]

figure(2)
boxplot(data)
xticks([1,2,3],('measles','mumps','chickenPox'), rotation=15)
ylabel('Monthly cases')
title('Contagious childhood disease in NYC 1931-1971')

show()

And this is the result:

Here, we can observe that the chicken pox distribution has the median higher than the other diseases. The mumps distribution seems to have small variability compared to the other ones and the measles distribution has a low median but a very high number of outliers.

Friday, May 4, 2012

Analyzing your Gmail with Matplotlib

Lately, I read this post about using Mathematica to analyze a Gmail account. I found it very interesting and I worked a with imaplib and matplotlib to create two of the graph they showed:

A diurnal plot, which shows the date and time each email was sent (or received), with years running along the x axis and times of day on the y axis.
And a daily distribution histogram, which represents the distribution of emails sent by time of day.

In order to plot those graphs I created three functions. The first one, retrieve the headers of the emails we want to analyze:

from imaplib import IMAP4_SSL
from datetime import date,timedelta,datetime
from time import mktime
from email.utils import parsedate
from pylab import plot_date,show,xticks,date2num
from pylab import figure,hist,num2date
from matplotlib.dates import DateFormatter

def getHeaders(address,password,folder,d):
 """ retrieve the headers of the emails 
     from d days ago until now """
 # imap connection
 mail = IMAP4_SSL('imap.gmail.com')
 mail.login(address,password)
 mail.select(folder) 
 # retrieving the uids
 interval = (date.today() - timedelta(d)).strftime("%d-%b-%Y")
 result, data = mail.uid('search', None, 
                      '(SENTSINCE {date})'.format(date=interval))
 # retrieving the headers
 result, data = mail.uid('fetch', data[0].replace(' ',','), 
                         '(BODY[HEADER.FIELDS (DATE)])')
 mail.close()
 mail.logout()
 return data

The second one, make us able to make the diurnal plot:

def diurnalPlot(headers):
 """ diurnal plot of the emails, 
     with years running along the x axis 
     and times of day on the y axis.
 """
 xday = []
 ytime = []
 for h in headers: 
  if len(h) > 1:
   timestamp = mktime(parsedate(h[1][5:].replace('.',':')))
   mailstamp = datetime.fromtimestamp(timestamp)
   xday.append(mailstamp)
   # Time the email is arrived
   # Note that years, month and day are not important here.
   y = datetime(2010,10,14, 
     mailstamp.hour, mailstamp.minute, mailstamp.second)
   ytime.append(y)

 plot_date(xday,ytime,'.',alpha=.7)
 xticks(rotation=30)
 return xday,ytime

And this is the function for the daily distribution histogram:

def dailyDistributioPlot(ytime):
 """ draw the histogram of the daily distribution """
 # converting dates to numbers
 numtime = [date2num(t) for t in ytime] 
 # plotting the histogram
 ax = figure().gca()
 _, _, patches = hist(numtime, bins=24,alpha=.5)
 # adding the labels for the x axis
 tks = [num2date(p.get_x()) for p in patches] 
 xticks(tks,rotation=75)
 # formatting the dates on the x axis
 ax.xaxis.set_major_formatter(DateFormatter('%H:%M'))

Now we got everything we need to make the graphs. Let's try to analyze the outgoing mails of last 5 years:

print 'Fetching emails...'
headers = getHeaders('[email protected]',
                      'ofcourseiamsupersexy','inbox',365*5)

print 'Plotting some statistics...'
xday,ytime = diurnalPlot(headers)
dailyDistributioPlot(ytime)
print len(xday),'Emails analysed.'
show()

The result would appear as follows

We can analyze the outgoing mails just using selecting the folder '[Gmail]/Sent Mail':

print 'Fetching emails...'
headers = getHeaders('[email protected]',
                     'ofcourseiamsupersexy','[Gmail]/Sent Mail',365*5)

print 'Plotting some statistics...'
xday,ytime = diurnalPlot(headers)
dailyDistributioPlot(ytime)
print len(xday),'Emails analysed.'
show()

And this is the result:

Sunday, February 5, 2012

Convolution with numpy

A convolution is a way to combine two sequences, x and w, to get a third sequence, y, that is a filtered version of x. The convolution of the sample x_t is computed as follows:

It is the mean of the weighted summation over a window of length k and w_t are the weights. Usually, the sequence w is generated using a window function. Numpy has a number of window functions already implemented: bartlett, blackman, hamming, hanning and kaiser. So, let's plot some Kaiser windows varying the parameter beta:

import numpy
import pylab

beta = [2,4,16,32]

pylab.figure()
for b in beta:
 w = numpy.kaiser(101,b) 
 pylab.plot(range(len(w)),w,label="beta = "+str(b))
pylab.xlabel('n')
pylab.ylabel('W_K')
pylab.legend()
pylab.show()

The graph would appear as follows:

And now, we can use the function convolve(...) to compute the convolution between a vector x and one of the Kaiser window we have seen above:

def smooth(x,beta):
 """ kaiser window smoothing """
 window_len=11
 # extending the data at beginning and at the end
 # to apply the window at the borders
 s = numpy.r_[x[window_len-1:0:-1],x,x[-1:-window_len:-1]]
 w = numpy.kaiser(window_len,beta)
 y = numpy.convolve(w/w.sum(),s,mode='valid')
 return y[5:len(y)-5]

Let's test it on a random sequence:

# random data generation
y = numpy.random.random(100)*100 
for i in range(100):
 y[i]=y[i]+i**((150-i)/80.0) # modifies the trend

# smoothing the data
pylab.figure(1)
pylab.plot(y,'-k',label="original signal",alpha=.3)
for b in beta:
 yy = smooth(y,b) 
 pylab.plot(yy,label="filtered (beta = "+str(b)+")")
pylab.legend()
pylab.show()

The program would have an output similar to the following:

As we can see, the original sequence have been smoothed by the windows.

Saturday, January 14, 2012

How to plot a function of two variables with matplotlib

In this post we will see how to visualize a function of two variables in two ways. First, we will create an intensity image of the function and, second, we will use the 3D plotting capabilities of matplotlib to create a shaded surface plot. So, let's go with the code:

from numpy import exp,arange
from pylab import meshgrid,cm,imshow,contour,clabel,colorbar,axis,title,show

# the function that I'm going to plot
def z_func(x,y):
 return (1-(x**2+y**3))*exp(-(x**2+y**2)/2)
 
x = arange(-3.0,3.0,0.1)
y = arange(-3.0,3.0,0.1)
X,Y = meshgrid(x, y) # grid of point
Z = z_func(X, Y) # evaluation of the function on the grid

im = imshow(Z,cmap=cm.RdBu) # drawing the function
# adding the Contour lines with labels
cset = contour(Z,arange(-1,1.5,0.2),linewidths=2,cmap=cm.Set2)
clabel(cset,inline=True,fmt='%1.1f',fontsize=10)
colorbar(im) # adding the colobar on the right
# latex fashion title
title('$z=(1-x^2+y^3) e^{-(x^2+y^2)/2}$')
show()

The script would have the following output:

And now we are going to use the values stored in X,Y and Z to make a 3D plot using the mplot3d toolkit. Here's the snippet:

from mpl_toolkits.mplot3d import Axes3D
from matplotlib import cm
from matplotlib.ticker import LinearLocator, FormatStrFormatter
import matplotlib.pyplot as plt

fig = plt.figure()
ax = fig.gca(projection='3d')
surf = ax.plot_surface(X, Y, Z, rstride=1, cstride=1, 
                      cmap=cm.RdBu,linewidth=0, antialiased=False)

ax.zaxis.set_major_locator(LinearLocator(10))
ax.zaxis.set_major_formatter(FormatStrFormatter('%.02f'))

fig.colorbar(surf, shrink=0.5, aspect=5)

plt.show()

And this is the result:

Tuesday, January 3, 2012

Fixed point iteration

A fixed point for a function is a point at which the value of the function does not change when the function is applied. More formally, x is a fixed point for a given function f if

and the fixed point iteration

converges to the a fixed point if f is continuous.
The following function implements the fixed point iteration algorithm:

from pylab import plot,show
from numpy import array,linspace,sqrt,sin
from numpy.linalg import norm

def fixedp(f,x0,tol=10e-5,maxiter=100):
 """ Fixed point algorithm """
 e = 1
 itr = 0
 xp = []
 while(e > tol and itr < maxiter):
  x = f(x0)      # fixed point equation
  e = norm(x0-x) # error at the current step
  x0 = x
  xp.append(x0)  # save the solution of the current step
  itr = itr + 1
 return x,xp

Let's find the fixed point of the square root funtion starting from x = 0.5 and plot the result

f = lambda x : sqrt(x)

x_start = .5
xf,xp = fixedp(f,x_start)

x = linspace(0,2,100)
y = f(x)
plot(x,y,xp,f(xp),'bo',
     x_start,f(x_start),'ro',xf,f(xf),'go',x,x,'k')
show()

The result of the program would appear as follows:

The red dot is the starting point, the blue ones are the sequence x_1,x_2,x_3,... and the green is the fixed point found.
In a similar way, we can compute the fixed point of function of multiple variables:

# 2 variables function
def g(x):
 x[0] = 1/4*(x[0]*x[0] + x[1]*x[1])
 x[1] = sin(x[0]+1)
 return array(x)

x,xf = fixedp(g,[0, 1])
print '   x =',x
print 'f(x) =',g(xf[len(xf)-1])

In this case g is a function of two variables and x is a vector, so the fixed point is a vector and the output is as follows:

   x = [ 0.          0.84147098]
f(x) = [ 0.          0.84147098]

Thursday, December 15, 2011

Polar Charts with matplolib

A polar system is a two-dimensional coordinate system, where there are two coordinates: the radial and the angular coordinates. The radial coordinate denotes the point distance from a central point (pole) and the angular coordinate denotes the angle required to reach the point from the 0 degree ray (polar axis). Let's see an example of how to make polar charts with matplotlib:

from pylab import figure,polar,show
from numpy import arange,pi,cos

theta = arange(0, 2, 1./180)*pi # angular coordinates
figure(1)
polar(3*theta, theta/5) # drawing a spiral
figure(2)
polar(theta, cos(4*theta)) # drawing the polar rose
show()

The result of this script consists of two charts. The first with the spiral

and the second with the polar rose

Thursday, December 8, 2011

Lissajous curves

And after the Epitrochoids, we're going to see another family of wonderful figures: The Lissajous curves. The equations that describe these curves are the following

the curves vary with respect the parameter t and their appearance is determined by the ratio a/b and the value of δ.
As usual, I made a snippet to visualize them:

from numpy import sin,pi,linspace
from pylab import plot,show,subplot

a = [1,3,5,3] # plotting the curves for
b = [1,5,7,4] # different values of a/b
delta = pi/2
t = linspace(-pi,pi,300)

for i in range(0,4):
 x = sin(a[i] * t + delta)
 y = sin(b[i] * t)
 subplot(2,2,i+1)
 plot(x,y)

show()

This is the result

How to make Bubble Charts with matplotlib

In this post we will see how to make a bubble chart using matplotlib. The snippet that we are going to see was inspired by a tutorial on flowingdata.com where R is used to make a bubble chart that represents some data extracted from a csv file about the crime rates of America by states. I used the dataset provided by flowingdata to create a similar chart with Python. Let's see the code:

from pylab import *
from scipy import *

# reading the data from a csv file
durl = 'http://datasets.flowingdata.com/crimeRatesByState2005.csv'
rdata = genfromtxt(durl,dtype='S8,f,f,f,f,f,f,f,i',delimiter=',')

rdata[0] = zeros(8) # cutting the label's titles
rdata[1] = zeros(8) # cutting the global statistics

x = []
y = []
color = []
area = []

for data in rdata:
 x.append(data[1]) # murder
 y.append(data[5]) # burglary
 color.append(data[6]) # larceny_theft 
 area.append(sqrt(data[8])) # population
 # plotting the first eigth letters of the state's name
 text(data[1], data[5], 
      data[0],size=11,horizontalalignment='center')

# making the scatter plot
sct = scatter(x, y, c=color, s=area, linewidths=2, edgecolor='w')
sct.set_alpha(0.75)

axis([0,11,200,1280])
xlabel('Murders per 100,000 population')
ylabel('Burglaries per 100,000 population')
show()

The following figure is the resulting bubble chart

It shows the number of burglaries versus the number of murders per 100,000 population. Every bubble is a state of America, the size of the bubbles represents the population of the state and the color is the number of larcenies.

Thursday, November 17, 2011

Fun with Epitrochoids

An epitrochoid is a curve traced by a point attached to a circle of radius r rolling around the outside of a fixed circle of radius R, where the point is a distance d from the center of the exterior circle [Ref]. Lately I found the Epitrochoid's parametric equations on wikipedia:

So,I decided to plot them with pylab. This is the script I made

from numpy import sin,cos,linspace,pi
import pylab

# curve parameters
R = 14
r = 1
d = 18

t = linspace(0,2*pi,300)

# Epitrochoid parametric equations
x = (R-r)*cos(t)-d*cos( (R+r)*t / r )
y = (R-r)*sin(t)-d*sin( (R+r)*t / r )

pylab.plot(x,y,'r')
pylab.axis('equal')
pylab.show()

And this is the result

isn't it fashinating? :)

How to use ginput

ginput is a function that enables you to select points from a figure using the mouse. This post is a simple example on how to use it.

from pylab import plot, ginput, show, axis

axis([-1, 1, -1, 1])
print "Please click three times"
pts = ginput(3) # it will wait for three clicks
print "The point selected are"
print pts # ginput returns points as tuples
x=map(lambda x: x[0],pts) # map applies the function passed as 
y=map(lambda x: x[1],pts) # first parameter to each element of pts
plot(x,y,'-o')
axis([-1, 1, -1, 1])
show()

And after three clicks this is the result on the console:

Please click three times
The point selected are
[(-0.77468982630272942, -0.3418367346938776), (0.11464019851116632, -0.21428571428571436), (0.420347394540943, 0.55612244897959173)]

and this is the figure generated:

Tuesday, July 12, 2011

Dice rolling experiment

If we roll a die a large number of times, and we compute the mean and variance, as exaplained here, we’d expect to obtain a mean = 3.5 and a variance = 2.916. Let's simulate that with a script:

import pylab
import math
 
# Rolling the die 1000 times
v = pylab.randint(1,7,size=(1000))

print 'mean',pylab.mean(v)
print 'variance',pylab.var(v)

pylab.hist(v, bins=6) # histogram of the outcoming
pylab.xlim(1,6)
pylab.show()

Here's the result:

mean 3.435
variance 2.781775

Monday, July 4, 2011

How to plot biorhythm

The following script plot the biorhythm of a person born in 14/3/1988 in a range of 20 days.

from datetime import date
import matplotlib.dates
from pylab import *
from numpy import array,sin,pi

t0 = date(1988,3,14).toordinal()
t1 = date.today().toordinal()
t = array(range((t1-10),(t1+10))) # range of 20 days

y = 100*[sin(2*pi*(t-t0)/23),  # Physical
         sin(2*pi*(t-t0)/28),  # Emotional
         sin(2*pi*(t-t0)/33)]; # Intellectual

# converting ordinals to date
label = []
for p in t:
 label.append(date.fromordinal(p))

fig = figure()
ax = fig.gca()
plot(label,y[0],label,y[1],label,y[2])
# adding a legend
legend(['Physical', 'Emotional', 'Intellectual'])
# formatting the dates on the x axis
ax.xaxis.set_major_formatter(matplotlib.dates.DateFormatter('%d/%b'))

show()

The resulting graph:

Thursday, June 30, 2011

Animation with matplotlib, bringin the Sierpinski triangle to life

Few post ago we have seen how to plot the Sierpinski triangle. This post is an update that shows how to create an animation where at each step a new point of the fractal is plotted:

from numpy import *
import pylab

x = [0, 0];

A = [ [.5, 0], [0, .5] ];
b1 = [0, 0];
b2 = [.5, 0];
b3 = [.25, sqrt(3)/4];

pylab.ion() # animation on

#Note the comma after line. This is placed here because plot returns a list of lines that are drawn.
line, = pylab.plot(x[0],x[1],'m.',markersize=6) 
pylab.axis([0,1,0,1])

data1 = []
data2 = []
iter = 0

while True:
 r = fix(random.rand()*3)
 if r==0:
  x = dot(A,x)+b1
 if r==1:
  x = dot(A,x)+b2
 if r==2:
  x = dot(A,x)+b3
 data1.append(x[0]) 
 data2.append(x[1])
 line.set_xdata(data1)  # update the data
 line.set_ydata(data2)
 pylab.draw() # draw the points again
 iter += 1
 print iter

This is the result:

Friday, May 1, 2020

Sunday, August 11, 2019

Wednesday, April 17, 2019

Friday, June 29, 2018

Wednesday, May 30, 2018

Thursday, April 9, 2015

Thursday, December 6, 2012

Friday, September 14, 2012

Friday, May 4, 2012

Sunday, February 5, 2012

Saturday, January 14, 2012

Tuesday, January 3, 2012

Thursday, December 15, 2011

Thursday, December 8, 2011

Wednesday, November 23, 2011

Thursday, November 17, 2011

Thursday, August 25, 2011

Tuesday, July 12, 2011

Monday, July 4, 2011

Thursday, June 30, 2011

Quote