Mastering Python Data Visualization - Sample Chapter
Mastering Python Data Visualization - Sample Chapter
$ 54.99 US
34.99 UK
P U B L I S H I N G
Kirthi Raman
Mastering Python
Data Visualization
ee
pl
C o m m u n i t y
E x p e r i e n c e
D i s t i l l e d
Mastering Python
Data Visualization
Generate effective results in a variety of visually appealing charts
using the plotting packages in Python
Sa
m
Kirthi Raman
Preface
Data visualization is intended to provide information clearly and help the viewer
understand them qualitatively. The well-known expression that a picture is worth
a thousand words may be rephrased as "a picture tells a story as well as a large
collection of words". Visualization is, therefore, a very precious tool that helps the
viewer understand a concept quickly. However, data visualization is more of an art
than a skill because if you try to overdo it, it could have a reverse effect.
We are currently faced with a plethora of data containing many insights that hold the
key to success in the modern day. It is important to find the data, clean it, and use the
right tool to visualize it. This book explains several different ways to visualize data
using Python packages, along with very useful examples in many different areas
such as numerical computing, financial models, statistical and machine learning, and
genetics and networks.
This book presents an example code developed on Mac OS X 10.10.5 using Python
2.7, IPython 0.13.2, matplotlib 1.4.3, NumPy 1.9.2, SciPy 0.16.0, and conda build
version 1.14.1.
Preface
[ 121 ]
NumPy
NumPy not only uses array objects, but also linear algebraic functions that can be
conveniently used for computations. It provides a fast implementation of arrays and
associated array functionalities. Using an array object, one can perform operations
that include matrix multiplication, transposition of vectors and matrices, solve
systems of equations, perform vector multiplication and normalization, and so on.
0.18010463
1.18010463
0.28126201
1.28126201
[ 122 ]
0.30701477
1.30701477
0.39013144]
1.39013144]
Chapter 4
NumPy's ndarray is similar to the lists in Python, but it is rather strict in storing
only a homogeneous type of object. In other words, with a Python list, one can mix
the element types, such as the first element as a number, the second element as a
list, and the next element as another list (or dictionary). The performance in terms of
operating the elements of ndarray is significantly faster for a large size array, which
will be demonstrated here. The example here demonstrates that it is faster because we
will measure the running time. However, for readers who are curious about NumPy
implementations in C, there is a documentation on the same available at http://
docs.scipy.org/doc/numpy/reference/internals.code-explanations.html.
import numpy as np
arr = np.arange(10000000)
listarr = arr.tolist()
def scalar_multiple(alist, scalar):
for i, val in enumerate(alist):
alist[i] = val * scalar
return alist
# Using IPython's magic timeit command
timeit arr * 2.4
10 loops, best of 3: 31.7 ms per loop
# above result shows 31.7 ms (not seconds)
timeit scalar_multiple(listarr, 2.4)
1 loops, best of 3: 1.39 s per loop
# above result shows 1.39 seconds (not ms)
In the preceding code, each array element occupies 4 bytes. Therefore, a million
integer arrays occupy approximately 44 MB of memory, and the list uses 711 MB
of memory. However, arrays are slower for small collection sizes, but for large
collection sizes, they use less memory space and are significantly faster than lists.
NumPy comes with many useful functions that are broadly categorized as
trigonometric functions, arithmetic functions, exponent and logarithmic functions,
and miscellaneous functions. Among many miscellaneous functions, convolve() for
linear convolution and interp() for linear interpolation are popular. In addition,
for most experimental work that involve equally spaced data, the linspace() and
random.rand() functions are among a few that are used widely.
[ 123 ]
0.63192759,
0.92865208,
0.12976726,
0.27762891,
0.56131001],
0.40429701]])
np.random.rand(8).reshape(2,4)
array([[ 0.39698544, 0.88843637,
[ 0.97622822, 0.47652548,
0.66260474,
0.56163488,
0.61106802],
0.43602828]])
In the preceding example, after creating 8 values, they are reshaped into a valid
dimension of choice, as shown in the following code:
#another example
a = np.array([[11,12,13,14,15,16],[17,18,19,20,21,22]])
print a
[[11, 12, 13, 14, 15, 16], [17, 18, 19, 20, 21, 22]]
# the following shows shape is used to know the dimensions
a.shape
(2,6)
#Now change the shape of the array
a.shape=(3,4)
print a
[[11 12 13] [14 15 16] [17 18 19]
[20 21 22]]
xrange is used instead of range because it is faster for loops and avoids the storage
of the list of integers; it just generates them one by one. The opposite of shape and
reshape is ravel(), as shown in the following code:
#ravel example
a = np.array([[11,12,13,14,15,16],[17,18,19,20,21,22]])
a.ravel()
array([11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22])
[ 124 ]
Chapter 4
An example of interpolation
Here is an example of interpolation using interp():
n=30
# create n values of x from 0 to 2*pi
x = np.linspace(0,2*np.pi,n)
y = np.zeros(n)
#for range of x values, evaluate y values
for i in xrange(n):
y[i] = np.sin(x[i])
The image displayed in the following picture is the result of a simple sine curve
interpolation:
The following code shows the plotting curves with and without interpolation:
import numpy as np
import matplotlib.pyplot as plt
# create n values of x from 0 to 2*pi
x = np.linspace(0, 8*np.pi, 100)
y = np.sin(x/2)
#interpolate new y-values
yinterp = np.interp(x, x, y)
[ 125 ]
Vectorizing functions
Vectorizing functions via vectorize() in NumPy and SciPy can be very efficient.
Vectorize has the capability to convert a function that takes scalars as arguments to
a function that takes arrays as arguments by applying the same rule element-wise.
We will demonstrate this here with two examples.
The first example uses a function that takes three scalar arguments to produce a
vectorized function that takes three array arguments, as shown in the following code:
import numpy as np
def addition(x, y, z):
return x + y + z
def addpoly():
i = np.random.randint(25)
poly1 = np.arange(i, i+10)
i = np.random.randint(25)
poly2 = np.arange(i, i+10)
poly3 = np.arange(10, 18)
print poly1
print poly2
print poly3
print '-' * 32
vecf = np.vectorize(addition)
print vecf(poly1,poly2,poly3)
addpoly()
[ 4 5 6 7 8 9 10 11 12 13]
[13 14 15 16 17 18 19 20 21 22]
[10 11 12 13 14 15 16 17 18 19]
-------------------------------[27 30 33 36 39 42 45 48 51 54]
Note that arrange is an array-valued version of the built-in Python range function.
[ 126 ]
Chapter 4
The second example uses a function that takes one scalar argument to produce a
vectorized function that takes an array argument, as shown in the following code:
import numpy as np
def posquare(x):
if x >= 0: return x**2
else: return -x
i = np.random.randint(25)
poly1 = np.arange(i,i+10)
print poly1
vecfunc = vectorize(posquare, otypes=[float])
vecfunc(poly1)
[14 15 16 17 18 19 20 21 22 23]
array([ 196., 225., 256., 289., 324., 361., 400., 441., 484., 529.])
There is yet another example that is interesting to study with the help of an example
code. This example shows three ways to increment the array elements by a constant
and measure the running time to determine which method is faster:
import numpy as np
from time import time
def incrembyone(x):
return x + 1
dataarray=np.linspace(1,5,1000000)
t1=time()
lendata = len(dataarray)
print "Len = "+str(lendata)
print dataarray[1:7]
for i in range(lendata):
dataarray[i]+=1
print " time for loop (No vectorization)->" + str(time() - t1)
t2=time()
vecincr = np.vectorize(incrembyone) #1
vecincr(dataarray) #2
print " time for vectorized version-1:" + str(time() - t2)
t3 = time()
[ 127 ]
Besides the vectorizing techniques, there is another simple coding practice that could
make programs more efficient. If there are prefix notations that are being used in
loops, it is best practice to create a local alias and use this alias in the loop. One such
example is shown here:
fastsin = math.sin
x = range(1000000)
for i in x:
x[i] = fastsin(x[i])
Description
dot(a,b)
linalg.norm(x)
linalg.cond(x)
linalg.solve(A,b)
linalg.inv(A)
linalg.pinv(A)
linalg.eig(A)
linalg.eigvals(A)
linalg.svd(A)
[ 128 ]
Chapter 4
SciPy
NumPy has already many convenient functions that can be used in computation.
Then, why do we need SciPy? SciPy is an extension of NumPy for mathematics,
science, and engineering that has many packages available for linear algebra,
integration, interpolation, fast Fourier transforms, large matrix manipulation,
statistical computation, and so on. The following table shows a brief description
of these packages:
Subpackage
scipy.cluster
scipy.fftpack
scipy.
integrate
scipy.
interpolate
This denotes the functions and classes for interpolation objects with
discrete numeric data and linear and spline interpolation.
scipy.linalg
scipy.
optimize
scipy.sparse
This specifies the functions that can work with large sparse matrices.
scipy.special
In addition to the preceding listed subpackages, SciPy also has a scipy.io package
that has functions to load a matrix called spio.loadmat(), save a matrix called
spio.savemat(), and read images via scio.imread(). When there is a need to
develop computational programs in Python, it is good practice to check the SciPy
documentation to see whether it contains the functions that already accomplish the
intended task.
Let's take a look at an example using scipy.polyId():
import scipy as sp
# function that multiplies two polynomials
[ 129 ]
3
2
3 x + 4 x + 5 x + 5
3
2
4 x + 1 x - 3 x + 3
-----------------------------------6
5
4
3
2
12 x + 19 x + 15 x + 22 x + 2 x + 15
The result matches with the multiplication done in the traditional term-by-term
method, as follows:
[ 130 ]
Chapter 4
t = np.arange(0, 2.5, .1)
x = np.sin(2*np.pi*t)
y = np.cos(2*np.pi*t)
tcktuples,uarray = sp.interpolate.splprep([x,y], s=0)
unew = np.arange(0, 1.01, 0.01)
splinevalues = sp.interpolate.splev(unew, tcktuples)
plt.figure(figsize=(10,10))
plt.plot(x, y, 'x', splinevalues[0], splinevalues[1],
np.sin(2*np.pi*unew), np.cos(2*np.pi*unew), x, y, 'b')
plt.legend(['Linear', 'Cubic Spline', 'True'])
plt.axis([-1.25, 1.25, -1.25, 1.25])
plt.title('Parametric Spline Interpolation Curve')
plt.show()
The following diagram is the result of this spline interpolation using SciPy and NumPy:
[ 131 ]
Let's take a look at an example in numerical integration and solve linear equations using
some of the SciPy functions (such as Simpson's and Romberg) and compare these with
the NumPy function trapezoidal. We know that when a function such as f(x) = 9 x2 is
integrated from -3 to 3, we expect 36 units, as shown in the following diagram:
The preceding plot shows the 9-x2 function (which is symmetric along the Y axis).
Mathematically, the integration from -3 to 3 is twice that of the integration from 0 to
3. How do we numerically integrate using SciPy? The following code shows one way
to perform it using the trapezoidal method from NumPy:
import numpy as np
from scipy.integrate import simps, romberg
a = -3.0; b = 3.0;
N = 10
x = np.linspace(a, b, N)
y = 9-x*x
yromb = lambda x: (9-x*x)
t = np.trapz(y, x)
s = simps(y, x)
r = romberg(yromb, a, b)
#actual integral value
aiv = (9*b-(b*b*b)/3.0) - (9*a-(a*a*a)/3.0)
print 'trapezoidal = {0} ({1:%} error)'.format(t, (t - aiv)/aiv)
print 'simpsons = {0} ({1:%} error)'.format(s, (s - aiv)/aiv)
print 'romberg = {0} ({1:%} error)'.format(r, (r - aiv)/aiv)
[ 132 ]
Chapter 4
print 'actual value = {0}'.format(aiv)
trapezoidal = 35.5555555556 (-1.234568% error)
simpsons = 35.950617284 (-0.137174% error)
romberg = 36.0 (0.000000% error)
actual value = 36.0
x + 2y z = 2
2x 3y + 2z = 2
3x + y z = 2
Note that np.dot(A,v) is a matrix multiplication (not A*v). The solution vector
v = [1,2,3] is the correct expected result.
[ 133 ]
In the following example, we can see how you can plot the actual function, its
derivative, and the forward difference in the same plot. The actual derivative is
plugged into dy_actual, and the forward difference is calculated using diff()
from NumPy.
[ 134 ]
Chapter 4
[ 135 ]
MKL functions
The MKL functions from Intel provide high-performance routines on vectors and
matrices. In addition, they include FFT functions and vector statistical functions.
These functions have been enhanced and optimized to work efficiently on Intel
processors. For Anaconda users, Continuum has packaged these FFT functions
into binary versions of Python libraries for MKL optimizations. However MKL
optimizations are available as an add-on as part of the Anaconda Accelerate package.
The graph here shows the difference in slowness without MKL:
[ 136 ]
Chapter 4
For larger array inputs, MKL offers a significant improvement over performance, as
shown in the following screenshot:
[ 137 ]
There are few other options available to improve the performance of the
computationally intensive programs in Python:
Use Scipy.weave: This is a module that lets you insert snippets of the C code
and seamlessly transports the arrays of NumPy into the C layer. It also has
some efficient macros.
Use process pool called pool: This is another class in the multiprocessing
package. With pool, you can define the number of worker processes to be
created in the pool and then pass an iterable object containing the parameters
for each process.
Scalar selection
Scalar selection is the simplest method to select elements from an array and
is implemented using [rowindex] for one-dimensional arrays, [rowindex,
columnindex] for two-dimensional arrays, and so on. The following is a simple code
that shows an array element reference:
import numpy as np
x = np.array([[2.0,4,5,6], [1,3,5,9]])
x[1,2]
5.0
[ 138 ]
Chapter 4
A pure scalar selection always returns a single element and not an array. The data
type of the selected element matches the data type of the array used in the selection.
Scalar selection can also be used to assign a value to an array element, as shown in
the following code:
x[1,2] = 8
x
array([[2, 4, 5, 6],[1, 3, 8, 9]])
Slicing
Arrays can be sliced just like lists and tuples. Array slicing is identical to list slicing,
except that the syntax is simpler. Arrays are sliced using the [ : , :, ... :]
syntax, where the number of dimensions of the arrays determine the size of the slice,
except that these dimensions for which slices are omitted, all elements are selected.
For example, if b is a three-dimensional array, b[0:2] is the same as b[0:2,:,:].
There are shorthand notations for slicing. Some common ones are:
: and: are the same as 0:n:1, where n is the length of the array
m: and m:n: are the same as m:n:1, where n is the length of the array
All these slicing methods have been referenced with the usage of arrays. This can
also be applicable to lists. Slicing of one-dimensional arrays is identical to slicing
a simple list (as one-dimensional arrays can be seen equivalent to a list), and the
returned type of all the slicing operations matches the array being sliced. The
following is a simple mechanism that shows array slices:
x = array([5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20])
# interpret like this default start but end index is 2
y = x[:2]
array([5, 6])
# interpretation default start and end, but steps of 2
y = x[::2]
array([5,7,9,11,13,15,17,19])
[ 139 ]
NumPy attempts to convert data type automatically if an element with one data type is
inserted into an array with a different data type. For example, if an array has an integer
data type, place a float into the array results in the float being truncated and store it as
an integer. This can be dangerous; therefore in such cases, arrays should be initialized
to contain floats unless a considered decision is taken to use a different data type for
a good reason. This example shows that even if one element is float and the rest is
integer, it is assumed to be the float type for the benefit of making it work properly:
a = [1.0, 2,3,6,7]
b = array(a)
b.dtype
dtype('float64')
Linear slicing assigns an index to each element of the array in the order of the
elements read. In two-dimensional arrays or lists, linear slicing works by first
counting across the rows and then down the columns. In order to use linear slicing,
you have to use the flat function, as shown in the following code:
a=array([[4,5,6],[7,8,9],[1,2,3]])
b = a.flat[:]
print b
[4, 5, 6, 7, 8, 9, 1, 2, 3]
Array indexing
Elements from NumPy arrays can be selected using four methods: scalar selection,
slicing, numerical indexing, and logical (or Boolean) indexing. Scalar selection and slicing
are the basic methods to access elements in an array, which has already been discussed
here. Numerical indexing and logical indexing are closely related and allows more
flexible selection. Numerical indexing uses lists or arrays of locations to select elements,
whereas logical indexing uses arrays that contain Boolean values to select elements.
[ 140 ]
Chapter 4
Numerical indexing
Numerical indexing is an alternative to slice notation. The idea in numerical indexing
is to use coordinates to select elements. This is similar to slicing. Arrays created using
numerical indexing create copies of data, whereas slices are only views of data, and
not copies. For performance sake, slicing should be used. Slices are similar to onedimensional arrays, but the shape of the slice is determined by the slice inputs.
Numerical indexing in one-dimensional arrays uses the numerical index values
as locations in the array (0-based indexing) and returns an array with the same
dimensions as the numerical index.
Note that the numerical index can be either a list or a NumPy array and must contain
integer data, as shown in the following code:
a = 10 * arange(4.0)
array([0.,10.,20.,30.])
a[[1]] # arrays index is list with first element
array([ 10.])
a[[0,3,2]] # arrays index are 0-th, 3-rd and 2-nd
array([ 0., 30., 20.])
sel = array([3,1,4,2,3,3])
a[sel]
array([ 30. 10.
0. 20.
30.])
sel = array([4,1],[3,2]])
a[sel]
array([[ 30.,10.], [ 0.,20.]])
These examples show that the numerical indices determine the element location, and
the shape of the numerical index array determines the shape of the output.
Similar to slicing, numerical indexing can be combined using the flat function to
select elements from an array using the row-major ordering of the array. The behavior
of numerical indexing with flat is identical to that of using numerical indexing on a
flattened version of the underlying array. A few examples are shown here:
a = 10 * arange(8.0)
array([ 0., 10., 20.,
30.,
a.flat[[3,4,1]]
array([ 30., 40., 10.])
a.flat[[[3,4,7],[1,5,3]]]
array([[ 30., 40., 70.], [ 10., 50., 30.]])
[ 141 ]
Logical indexing
Logical indexing is different from slicing and numeric indexing; it rather uses logical
indices to select elements, rows, or columns. Logical indices act as light switches and
are either true or false. Pure logical indexing uses a logical indexing array with the
same size as the array being used for selection and always returns a one-dimensional
array, as shown in the following code:
x = arange(-4,5)
x < 0
array([True, True, True, True, False, False, False, False, False],
dtype=bool)
x[x>0]
array([1, 2, 3, 4])
x[abs(x) >= 2]
array([-4, -3, -2,
2,
3,
4])
1,
2,
3], [ 4,
5,
x[x<0]
array([-8, -7, -6, -5, -4, -3, -2, -1])
Chapter 4
Stacks
The list method is very convenient to be used as a stack, which is known to be
an abstract data type with the principle of operation last-in, first-out. The known
operations include adding of an item at the top of the stack using append(),
extracting of the item from the top of the stack using pop(), and removing of the
item using remove(item-value), as shown in the following code:
stack = [5, 6, 8]
stack.append(6)
stack.append(8)
stack
[5, 6, 8, 6, 8]
stack.remove(8)
stack
[5, 6, 6, 8]
stack.pop()
8
stack.remove(8)
Traceback (most recent call last):
File "<ipython-input-339-61d6322e3bb8>", line 1, in <module>
stack.remove(8)
ValueError: list.remove(x): x not in list
The pop() function is most efficient (constant-time) because all the other elements
remain in their location. However, the parameterized version, pop(k), removes the
element that is at the k < n index of a list, shifting all the subsequent elements to
fill the gap that results from the removal. The efficiency of this operation is linear
because the amount of shifting depends on the choice of index k, as illustrated in the
following image:
[ 143 ]
Tuples
A tuple is a sequence of immutable objects that look similar to lists. Tuples are
heterogeneous data structures, which means that their elements have different
meanings, whereas lists are a homogeneous sequence of elements. Tuples have
structure, and lists have order. Some examples of tuples are days of the week, course
names, and grading scales, as shown in the following code:
#days of the week
weekdays = ("Sunday", "Monday", "Tuesday", "Wednesday", "Thursday",
"Friday", "Saturday")
#course names
courses = ("Chemistry", "Physics", "Mathematics", "Digital Logic",
"Circuit Theory")
#grades
grades = ("A+", "A", "B+", "B", "C+", "C", "I")
Tuples have immutable objects. This means that you cannot change or remove them
from tuple. However, the tuple can be deleted completely, for example, "del grades"
will delete this tuple. After this, if an attempt is made to use that tuple, an error will
occur. The following are the built-in tuple functions:
cmp(tup1, tup2): This function can be used to compare the elements of two
len(tuple): This function can be used to get the total length of the tuple
tuples
the tuple
the tuple
Python has a max() function that behaves as expected for numerical values.
However, if we pass a list of strings, max() returns the item that is the longest.
weekdays = ("Sunday", "Monday", "Tuesday", "Wednesday", "Thursday",
"Friday", "Saturday")
print max(weekdays)
Wednesday
Chapter 4
When we need to find how many elements are in an array or list, len() is a
convenient method that does the job.
len(weekdays)
7
Sets
Sets are similar to lists, but are different in two aspects. Firstly, they are an unordered
collection as compared to lists (which are ordered by location or index). Secondly,
they do not have duplicates (if you know the mathematical definition of sets). The
notation used for a set is shown in the following command:
setoftrees = { 'Basswood', 'Red Pine', 'Chestnut', 'Gray Birch',
'Black Cherry'}
newtree = 'Tulip Tree'
if newtree not in setoftrees:
setoftrees.add(newtree)
Then, build charsinmath and charsinchem using the appropriate spelling, as shown
in the following code
#example of set of operation on letters
charsinmath = set('mathematics')
charsinchem = set('chem')
Now, let's try to see what the values are in these sets:
Charsinmath # typing this shows letters in charsinmath
{'a', 'c', 'e', 'h', 'i', 'm', 's', 't'}
charsinchem # typing this shows letters in charsinchem
{'c', 'e', 'h', 'm'}
[ 145 ]
Queues
Just like stacks, it is possible to use a list as a queue. However, the difference is that
elements can be added or removed from the end of the list or from the beginning of the
list. Although adding and removing from the end of a list is efficient, doing the same
from the beginning is not efficient because in this case, elements have to be shifted.
Fortunately, Python has deque in its collections package that efficiently implements
the adding and removing of elements from both ends using append(), pop(),
appendleft(), and popleft(), as shown in the following code:
from collections import deque
queue = deque(["Daniel", "Sid", "Mathew", "Michael"])
queue.append("Dirk")
# Dirk arrives
queue.append("Monte")
# Monte arrives queue
queue
deque(['Daniel', 'Sid', 'Mathew', 'Michael', 'Dirk', 'Monte'])
queue.popleft()
'Daniel'
queue.pop()
'Monte'
queue.appendleft('William')
queue
deque(['William', 'Sid', 'Mathew', 'Michael', 'Dirk'])
queue.append('Lastone')
queue
deque(['William', 'Sid', 'Mathew', 'Michael', 'Dirk', 'Lastone'])
Dictionaries
Dictionaries are a collection of unordered data values that are composed of a
key/value pair, which has the unique advantage of accessing a value based on the
key as an index. The question is that if the key is a string, then how does the indexing
work? The key has to be hashable: a hash function is applied on the key to extract
the location where the value is stored. In other words, the hash function takes a key
value and returns an integer. Dictionaries then use these integers (or hash values) to
store and retrieve the value. Some examples are shown here:
[ 146 ]
Chapter 4
#example 1: Top 10 GDP of Africa
gdp_dict = { 'South Africa': 285.4, 'Egypt': 188.4, 'Nigeria': 173,
'Algeria': 140.6, 'Morocco': 91.4, 'Angola': 75.5, 'Libya': 62.3,
'Tunisia': 39.6, 'Kenya': 29.4, 'Ethiopia': 28.5, 'Ghana': 26.2,
'Cameron': 22.2}
gdp_dict['Angola']
75.5
#example 2: English to Spanish for numbers one to ten
english2spanish = { 'one' : 'uno', 'two' : 'dos', 'three': 'tres',
'four': 'cuatro', 'five': 'cinvo', 'six': 'seis', 'seven': 'seite',
'eight': 'ocho', 'nine': 'nueve', 'ten': 'diez'}
english2spanish['four']
'cuatro'
The keys should be immutable to have a predictable hash value; otherwise, the hash
value change will result in a different location. Also, unpredictable things could
occur. The default dictionary does not keep the values in the order they is inserted;
therefore, by iterating after the insertion, the order of the key/value pair is arbitrary.
Python's collections package has an equivalent OrderedDict() function that keeps
the order of pairs in the inserted order. One additional difference between the
default dictionary and the ordered dictionary is that in the former, equality always
returns true if they have an identical set of key/value pairs (not necessarily in the
same order), and in the latter, equality returns true only when they have an identical
set of key/value pairs and when they are in the same order. The following example
demonstrates this:
# using default dictionary
dict = {}
dict['cat-ds1']
dict['cat-ds2']
dict['cat-la1']
dict['cat-la2']
dict['cat-pda']
dict['cat-ps1']
dict['cat-ps2']
=
=
=
=
=
=
=
print key,val
=
=
=
=
=
=
=
print key,val
[ 148 ]
Chapter 4
Sparse matrices
Let's examine the space utilization of a matrix; for a 100 x 100 matrix represented
using a list, each element occupies 4 bytes; therefore, the matrix would need 40,000
bytes, which is approximately 40 KB of space. However, among these 40,000 bytes,
if only 100 of them have a nonzero value and the others are all zero, then the space
is wasted. Now, let's consider a smaller matrix for the simplicity of discussion, as
shown in the following image:
However, this representation makes it harder to access the (i,j)th value of A. There
is a better way to represent this sparse matrix using dictionary, as shown in the
following code:
def getElement(row, col):
if (row,col) in A.keys():
r = A[row,col]
else:
r = 0
return r
A={(0,4): 2, (0,7): 1, (1,1): 4, (1,3):3, (1,8): 1, (2,0): 6, (0,9):
2, (2,2):1, (2,5): 7, (3,9): 1, (5,0): 3, (5,2): 2, (5,8): 3, (6,3):
2, (6,6):1, (7,8): 1, (8,0): 3, (8,2): 2, (8,9): 1, (9,1): 3}
[ 149 ]
To access an element at (1, 3) of the matrix A, we could use A[(1, 3)], but if the
key does not exist, it will throw an exception. In order to get the nonzero value
using the key and return 0 if the key does not exist, we can use a function called
getElement(), as shown in the preceding code.
Visualizing sparseness
We can visually see how sparse the matrix is with the help of SquareBox diagrams.
The following image shows the sparseDisplay() function. This uses square boxes
for each matrix entry that attempts to view the display. The black color represents
sparseness, whereas the green color represents nonzero elements:
Chapter 4
if (row,col) in nonzero.keys():
el = nonzero[(row,col)]
if el == 0: color='black'
else: color = '#008000'
rect = plt.Rectangle([col,row], 1, 1,
facecolor=color, edgecolor=color)
ax.add_patch(rect)
ax.autoscale_view()
ax.invert_yaxis()
if __name__ == '__main__':
nonzero={(0,4): 2, (0,7): 1, (1,1): 4, (1,3): 3, (1,8): 1,
(2,0): 6, (0,9): 2, (2,2): 1, (2,5): 7, (3,9): 1, (5,0): 3,
(5,2): 2, (5,8): 3, (6,3): 2, (6,6): 1, (7,8): 1, (8,0): 3, (8,2): 2,
(8,9): 1, (9,1): 3}
plt.figure(figsize=(4,4))
sparseDisplay(nonzero, 10)
plt.show()
This is only a quick example to display the sparse matrix. Imagine that you have a 30
x 30 matrix with only a few nonzero values, then the display would look somewhat
similar to the following image. The saving in this case is 97 percent as far as space
utilization is concerned. In other words, the larger the matrix, the lesser the space
utilized, as shown in the following image:
Having found a way to store the sparse matrix using dictionary, you may have to
remember that there is no need to reinvent the wheel. Moreover, it makes sense
to consider the possibility of storing the sparse matrix to understand the power of
dictionary. However, what is really recommended is to take a look at the SciPy and
pandas package for the sparse matrix. There may be further opportunities in this
book to use these approaches in some examples.
[ 151 ]
print sorted(fibvalues.values())
[0, 1, 1, 2, 3, 5, 8, 13, 21, 34, 55, 89, 144, 233, 377, 610, 987,
1597, 2584, 4181, 6765, 10946, 17711, 28657, 46368, 75025, 121393,
196418, 317811, 514229, 832040, 1346269, 2178309, 3524578, 5702887,
9227465, 14930352, 24157817, 39088169, 63245986, 102334155]
#regular fibonacci without using dictionary
def fib(n):
if n <= 1 : return 1
sumval = fib(n-1)+fib(n-2)
return sumval
[ 152 ]
Chapter 4
Tries
Trie (pronounced trie or trai) is a data structure that has different names (digital tree,
radix tree, or prefix tree). Tries are very efficient for search, insert, and delete functions.
This data structure is very optimal for storage. For example, when the words add,
also, algebra, assoc, all, to, trie, tree, tea, and ten are stored in the trie, it will look similar
to the following diagram:
[ 153 ]
The characters are shown in uppercase just for clarity purposes in the preceding
diagram, whereas in real storage, the characters are stored as they appear in words.
In the implementation of trie, it makes sense to store the word count. The search
functionality is very efficient and in particular when the pattern does not match, the
results are even quicker. In other words, if the search is for are, then the failure is
determined at the level when the letter r is not found.
One of the popular functionalities is longest prefix matching. In other words, if we
were to find all the words in the dictionary that have the longest prefix match with a
particular search string: base (for example). The results could be base, based, baseline,
or basement, or even more words if they are found in the dictionary of words.
Python has many different implementations: suffix_tree, pytire, trie, datrie,
and so on. There is a nice comparison study done by J. F. Sebastian that can be
accessed at https://github.com/zed/trie-benchmark.
Most search engines have an implementation of trie called inverted index. This is the
central component where space optimization is very important. Moreover, searching
for this kind of structure is very efficient to find the relevance between
the search string and the documents. Another interesting application of trie is IP
routing, where the ability to contain large ranges of values is particularly suitable.
It also saves space.
A simple implementation in Python (not necessarily the most efficient) is shown in
the following code:
_end = '_end_'
# to search if a word is in trie
def in_trie(trie, word):
current_dict = trie
for letter in word:
if letter in current_dict:
current_dict = current_dict[letter]
else:
return False
else:
if _end in current_dict:
return True
else:
return False
#create trie stored with words
def create_trie(*words):
root = dict()
[ 154 ]
Chapter 4
for word in words:
current_dict = root
for letter in word:
current_dict = current_dict.setdefault(letter, {})
current_dict = current_dict.setdefault(_end, _end)
return root
def insert_word(trie, word):
if in_trie(trie, word): return
current_dict = trie
for letter in word:
current_dict = current_dict.setdefault(letter, {})
current_dict = current_dict.setdefault(_end, _end)
def remove_word(trie, word):
current_dict = trie
for letter in word:
current_dict = current_dict.get(letter, None)
if current_dict is None:
# the trie doesn't contain this word.
break
else:
del current_dict[_end]
dict = create_trie('foo', 'bar', 'baz', 'barz', 'bar')
print dict
print in_trie(dict, 'bar')
print in_trie(dict, 'bars')
insert_word(dict, 'bars')
print dict
print in_trie(dict, 'bars')
community. John Hunter, the creator and project leader of this package summed it
up as matplotlib tries to make easy things easy and hard things possible. You can generate
very high-quality, publication-ready graphs with very little effort. In this section, we
will pick a few interesting examples to illustrate the power of matplotlib.
[ 155 ]
Word clouds
Word clouds give greater prominence to words that appear more frequently in any
given text. They are also called tag clouds or weighted words. You can tweak word
clouds with different fonts, layouts, and color schemes. The significance of a word's
strength in terms of the number of occurrences visually maps to the size of their
appearance. In other words, the word that appears the largest in visualization is the
one that has appeared the most in the text.
Beyond the obvious map to their occurrences, word clouds have several useful
applications for social media and marketing. Some of the applications are as follows:
Businesses can get to know their customers and how they view their
products. Some organizations have used a very creative method of asking
their fans or followers to post words about what they think of their brand,
taking all these words into a word cloud to better understand the most
common impressions of their product brand.
In order to create a word cloud, you can write the Python code or use something
that already exists. Andreas Mueller from the NYU Center for Data Science created
a pretty simple and easy-to-use word cloud in Python. It can be installed with the
instructions given in the next section.
Alternatively, you can obtain the package via wget on Linux or curl on Mac OS with
the following code:
wget https://github.com/amueller/word_cloud/archive/master.zip
unzip master.zip
rm master.zip
cd word_cloud-master
sudo pip install -r requirements.txt
[ 156 ]
Chapter 4
For the Anaconda IDE, you will have to install it using conda with the following
three steps:
#step-1 command
conda install wordcloud
| Package Types
| conda
Found 1 packages
# step-2 command
binstar show derickl/wordcloud
wordcloud
Summary:
Access:
public
Package Types:
conda
Versions:
+ 1.0
# step-3 command
conda install --channel https://conda.binstar.org/derickl wordcloud
[ 157 ]
package
build
---------------------------|----------------cython-0.22
py27_0
2.2 MB
django-1.8
py27_0
3.2 MB
pillow-2.8.1
py27_1
454 KB
image-1.3.4
py27_0
24 KB
setuptools-15.1
py27_1
435 KB
wordcloud-1.0
np19py27_1
58 KB
conda-3.11.0
py27_0
167 KB
-----------------------------------------------------------Total:
1.8-py27_0
image:
1.3.4-py27_0
pillow:
2.8.1-py27_1
wordcloud:
1.0-np19py27_1
cython:
0.21-py27_0
--> 0.22-py27_0
setuptools: 15.0-py27_0
--> 15.1-py27_1
libtiff:
4.0.3-0
--> 4.0.2-1
Proceed ([y]/n)? y
[ 158 ]
6.5 MB
Chapter 4
Web feeds
There are well grouped and structured RSS or atom feeds in most of the news and
technology service websites today. Although our aim is to restrict the context to
technology alone, we can determine a handful of feed lists, as shown in the following
code. In order to be able to parse these feeds, the parser() method of feedparser
comes in handy. Word cloud has its own stopwords list, but in addition to this, we
can also use one while collecting the data, as shown here (stopwords here is not
complete, but you can gather more from any known resource on the Internet):
import feedparser
from os import path
import re
d = path.dirname(__file__)
mystopwords = [ 'test', 'quot', 'nbsp']
feedlist = ['http://www.techcrunch.com/rssfeeds/',
'http://www.computerweekly.com/rss',
'http://feeds.twit.tv/tnt.xml',
'https://www.apple.com/pr/feeds/pr.rss',
'https://news.google.com/?output=rss'
'http://www.forbes.com/technology/feed/'
'http://rss.
nytimes.com/services/xml/rss/nyt/Technology.xml',
'http://www.
nytimes.com/roomfordebate/topics/technology.rss',
'http://feeds.webservice.techradar.com/us/rss/reviews'
'http://feeds.webservice.techradar.com/us/rss/news/software',
'http://feeds.webservice.techradar.com/us/rss',
'http://www.cnet.com/rss/',
'http://feeds.feedburner.com/ibm-big-data-hub?format=xml',
'http://feeds.feedburner.com/ResearchDiscussions-DataScien
ceCentral?format=xml',
'http://feeds.feedburner.com/
BdnDailyPressReleasesDiscussions-BigDataNews?format=xml',
[ 159 ]
[ 160 ]
Chapter 4
tweepy
json
sys
codecs
counter = 0
MAX_TWEETS = 500
#Variables that contains the user credentials to access Twitter API
access_token = "Access Token"
access_token_secret = "Access Secret"
consumer_key = "Consumer Key"
consumer_secret = "Consumer Secret"
fp = codecs.open("filtered_tweets.txt", "w", "utf-8")
class CustomStreamListener(tweepy.StreamListener):
def on_status(self, status):
global counter
[ 161 ]
Using any bag of words, you can write fewer than 20 lines of the Python code to
generate word clouds. A word cloud generates an image, and using matplotlib.
pyplot, you can use imshow() to display the word cloud image. The following word
cloud can be used with any input file of words:
from wordcloud import WordCloud, STOPWORDS
import matplotlib.pyplot as plt
from os import path
d = path.dirname("__file__")
text = open(path.join(d, 'filtered_tweets.txt')).read()
wordcloud = WordCloud(
font_path='/Users/MacBook/kirthi/RemachineScript.ttf',
stopwords=STOPWORDS,
background_color='#222222',
width=1000,
height=800).generate(text)
# Open a plot of the generated image.
plt.figure(figsize=(13,13))
plt.imshow(wordcloud)
plt.axis("off")
plt.show()
[ 162 ]
Chapter 4
The required font file can be downloaded from any of a number of sites
(one specific resource for this font is available at http://www.dafont.com/
remachine-script.font). Wherever the font file is located, you will have to use this
exact path set to font_path. For using the data from feeds, there is only one line that
changes, as shown in the following code:
text = open(path.join(d, 'wordcloudInput_fromFeeds.txt')).read()
[ 163 ]
Using the similar idea of extracting text from tweets to create word clouds, you could
extract text within the context of mobile phone vendors with keywords, such as iPhone,
Samsung Galaxy, Amazon Fire, LG Optimus, Nokia Lumia, and so on, to determine the
sentiments of consumers. In this case, you may need an additional set of information,
that is, the positive and negative sentiment values associated with words.
There are a few approaches that you can follow in a sentiment analysis on tweets in a
restricted context. First, a very nave approach would be to just associate weights to
words that correspond to a positive sentiment as wp and a negative sentiment as wn,
applying the following notation p(+) as the probability of a positive sentiment and
p(-) for a negative sentiment:
The second approach would be to use a natural language processing tool and apply
trained classifiers to obtain better results. TextBlob is a text processing package that
also has sentiment analysis (http://textblob.readthedocs.org/en/dev).
TextBlob builds a text classification system and creates a training set in the JSON
format. Later, using this training and the Nave Bayes classifier, it performs the
sentiment analysis. We will attempt to use this tool in later chapters to demonstrate
our working examples.
Obtaining data
One of the websites to obtain data is Yahoo, which provides data via the API, for
example, to obtain the stock price (low, high, open, close, and volume) of Amazon,
the URL is http://chartapi.finance.yahoo.com/instrument/1.0/amzn/
chartdata;type=quote;range=3y/csv. Depending on the plotting method you
select, there is some data conversion that is required. For instance, the data obtained
from this resource includes date in a format that does not have any format, as shown
in the following code:
[ 164 ]
Chapter 4
uri:/instrument/1.0/amzn/chartdata;type=quote;range=3y/csv
ticker:amzn
Company-Name:Amazon.com, Inc.
Exchange-Name:NMS
unit:DAY
timestamp:
first-trade:19970516
last-trade:20150430
currency:USD
previous_close_price:231.9000
Date:20120501,20150430
labels:20120501,20120702,20121001,20130102,20130401,20130701,20131001,
20140102,20140401,20140701,20141001,20150102,20150401
values:Date,close,high,low,open,volume
close:208.2200,445.1000
high:211.2300,452.6500
low:206.3700,439.0000
open:207.4000,443.8600
volume:984400,23856100
20120501,230.0400,232.9700,228.4000,229.4000,6754900
20120502,230.2500,231.4400,227.4000,227.8200,4593400
20120503,229.4500,232.5300,228.0300,229.7400,4055500
...
...
20150429,429.3700,434.2400,426.0300,426.7500,3613300
20150430,421.7800,431.7500,419.2400,427.1100,3609700
We will discuss three approaches in creating the plots. Each one has its own
advantages and limitations.
In the first approach, with the matplotlib.cbook package and the pylab package,
you can create a plot with the following lines of code:
from pylab import plotfile show, gca
import matplotlib.cbook as cbook
fname = cbook.get_sample_data('/Users/MacBook/stocks/amzn.csv',
asfileobj=False)
plotfile(fname, ('date', 'high', 'low', 'close'), subplots=False)
show()
[ 165 ]
This will create a plot similar to the one shown in the following screenshot:
There is one additional programming effort that is required before attempting to plot
using this approach. The date values have to be formatted to represent 20150430 as
%d-%b-%Y. With this approach, the plot can also be split into two, one showing the
stock price and the other showing the volume, as shown in the following code:
from pylab import plotfile show, gca
import matplotlib.cbook as cbook
fname = cbook.get_sample_data('/Users/MacBook/stocks/amzn.csv',
asfileobj=False)
plotfile(fname, (0,1,5), plotfuncs={f:'bar'})
show()
[ 166 ]
Chapter 4
When you attempt to plot the stock price comparison, it does not make sense to
display the volume information because for each stock ticker, the volumes are
different. Also, it becomes too cluttered to view the stock chart.
[ 167 ]
matplotlib already has a working example to plot the stock chart, which is
elaborate enough and includes Relative Strength Indicator (RSI) and Moving
Average Convergence/Divergence (MACD), and is available at http://
matplotlib.org/examples/pylab_examples/finance_work2.html. For details
on RSI and MACD, you can find many resources online, but there is one interesting
explanation at http://easyforextrading.co/how-to-trade/indicators/.
In an attempt to use the existing code, modify it, and make it work for multiple
charts, a function called plotTicker() was created. This helps in plotting each ticker
within the same axis, as shown in the following code:
import datetime
import numpy as np
import
import
import
import
matplotlib.finance as finance
matplotlib.dates as mdates
matplotlib.mlab as mlab
matplotlib.pyplot as plt
startdate = datetime.date(2014,4,12)
today = enddate = datetime.date.today()
plt.rc('axes', grid=True)
plt.rc('grid', color='0.75', linestyle='-', linewidth=0.5)
rect = [0.4, 0.5, 0.8, 0.5]
fig = plt.figure(facecolor='white', figsize=(12,11))
axescolor = '#f6f6f6' # the axes background color
ax = fig.add_axes(rect, axisbg=axescolor)
ax.set_ylim(10,800)
def plotTicker(ticker, startdate, enddate, fillcolor):
"""
matplotlib.finance has fetch_historical_yahoo() which fetches
stock price data the url where it gets the data from is
http://ichart.yahoo.com/table.csv stores in a numpy record
array with fields:
date, open, high, low, close, volume, adj_close
"""
fh = finance.fetch_historical_yahoo(ticker, startdate, enddate)
r = mlab.csv2rec(fh);
fh.close()
r.sort()
[ 168 ]
Chapter 4
### plot the relative strength indicator
### adjusted close removes the impacts of splits and dividends
prices = r.adj_close
### plot the price and volume data
ax.plot(r.date, prices, color=fillcolor, lw=2, label=ticker)
ax.legend(loc='top right', shadow=True, fancybox=True)
# set the labels rotation and alignment
for label in ax.get_xticklabels():
# To display date label slanting at 30 degrees
label.set_rotation(30)
label.set_horizontalalignment('right')
ax.fmt_xdata = mdates.DateFormatter('%Y-%m-%d')
#plot the tickers now
plotTicker('BIDU', startdate, enddate, 'red')
plotTicker('GOOG', startdate, enddate, '#1066ee')
plotTicker('AMZN', startdate, enddate, '#506612')
plt.show()
When you use this to compare the stock prices of Bidu, Google, and Amazon, the
plot would look similar to the following screenshot:
[ 169 ]
Use the following code to compare the stock prices of Twitter, Facebook,
and LinkedIn:
plotTicker('TWTR', startdate, enddate, '#c72020')
plotTicker('LNKD', startdate, enddate, '#103474')
plotTicker('FB', startdate, enddate, '#506612')
Now, you can add the volume plot as well. For a single ticker plot with volume, use
the following code:
import datetime
import
import
import
import
matplotlib.finance as finance
matplotlib.dates as mdates
matplotlib.mlab as mlab
matplotlib.pyplot as plt
startdate = datetime.date(2013,3,1)
today = enddate = datetime.date.today()
rect = [0.1, 0.3, 0.8, 0.4]
fig = plt.figure(facecolor='white', figsize=(10,9))
ax = fig.add_axes(rect, axisbg='#f6f6f6')
[ 170 ]
Chapter 4
def plotSingleTickerWithVolume(ticker, startdate, enddate):
global ax
fh = finance.fetch_historical_yahoo(ticker, startdate, enddate)
# a numpy record array with fields:
#
date, open, high, low, close, volume, adj_close
r = mlab.csv2rec(fh);
fh.close()
r.sort()
plt.rc('axes', grid=True)
plt.rc('grid', color='0.78', linestyle='-', linewidth=0.5)
axt = ax.twinx()
prices = r.adj_close
fcolor = 'darkgoldenrod'
ax.plot(r.date, prices, color=r'#1066ee', lw=2, label=ticker)
ax.fill_between(r.date, prices, 0, prices, facecolor='#BBD7E5')
ax.set_ylim(0.5*prices.max())
ax.legend(loc='upper right', shadow=True, fancybox=True)
volume = (r.close*r.volume)/1e6
vmax = volume.max()
With the single ticker plot along with volume and the preceding changes in the
earlier code, the plot will look similar to the following screenshot:
You may also have the option of using the third approach: using the blockspring
package. In order to install blockspring, you have to use the following pip command:
pip install blockspring
Blockspring's approach is to generate the HTML code. It autogenerates data for the
plots in the JavaScript format. When this is integrated with D3.js, it provides a very
nice interactive plot. Amazingly, there are only two lines of code:
import blockspring
import json
print blockspring.runParsed("stock-price-comparison",
{ "tickers": "FB, LNKD, TWTR",
"start_date": "2014-01-01", "end_date": "2015-01-01" }).params
Depending on the operating system, when this code is run, it generates the HTML
code in a default area.
[ 172 ]
Chapter 4
[ 173 ]
The team value is one significant factor in comparing different teams, but
championships also have a value. A simple plot of this data with years completed
along the x axis, the number of championships along the y axis, and the bubble size
representing the number of championship per year average would give us something
similar to the following image:
However, unless you can make it interactive by displaying the labels or details,
the preceding plot may not be very useful. The preceding plot is possible with
matplotlib, as shown in the following code:
import matplotlib.pyplot as plt
fig = plt.figure(figsize=(15,10), facecolor='w')
def plotCircle(x,y,radius,color, alphaval):
circle = plt.Circle((x, y), radius=radius, fc=color,\
alpha=alphaval)
fig.gca().add_patch(circle)
nofcircle = plt.Circle((x, y), radius=radius, ec=color, \
fill=False)
fig.gca().add_patch(nofcircle)
x = [55,83,90,13,55,82,96,55,69,19,55,95,62,96,82,30,22,39, \
54,50,69,56,58,55,55,47,55,20,86,78,56]
y = [5,3,4,0,1,0,1,3,5,2,2,0,2,4,6,0,0,1,0,0,0,0,1,1,0,0,3,0, \
0,1,0]
r = [23,17,15,13,13,12,12,11,11,10,10,10,10,10,9,9,9,8,8,8,8, \
8,8,8,7,7,7,7,6,6,6]
for i in range(0,len(x)):
plotCircle(x[i],y[i],r[i],'b', 0.1)
plt.axis('scaled')
plt.show()
[ 174 ]
Chapter 4
You can even use this numeric data to convert into a format that JavaScript can
understand (JSON format) so that when integrated with an SVG map, it is possible to
display the valuation on the map, as shown in the following screenshot:
The preceding map with bubbles would be better if there were associated labels
displayed. However, due to the lack of space in certain regions of the map, it would
make much more sense to add an interactive implementation to this map and have
the information displayed via navigation.
You can refer to the original data source at http://tinyurl.com/oyxk72r.
An alternate source is available at http://www.knapdata.com/python/nfl_
franch.html.
There are several other visualization methods you could apply, apart from the
plain bubble chart and the bubble chart on maps. One of the visual formats that
will look cluttered when displaying the statistics of 32 teams would be a pie chart
or a bar chart.
[ 175 ]
It not only looks cluttered, the labels are hardly readable. The whole point in
showing this pie chart is to illustrate that in this sort of data, one has to seek alternate
methods of visualization, as shown in the following image:
If we combine a set of teams within a certain range of their team value, then by
reducing them, we may be able to show them in a more organized fashion, as shown
in the following image:
[ 176 ]
Chapter 4
The preceding image is one alternative to display the value of teams by segregating
them into groups, for example, denote 2300 million dollars for $2300,000,000, which
means 2300 million dollars. This way, the data labels are readable.
Summary
During the last several decades, computing has emerged as a very important
part of many fields. In fact, the curriculum of computer science in many schools,
such as Stanford, UC-Berkeley, MIT, Princeton, Harvard, Caltech, and so on, has
been revised to accommodate interdisciplinary courses because of this change. In
most scientific disciplines, computational work is an important complement to
experiments and theory. Moreover, a vast majority of experimental and theoretical
papers involve some numerical calculations, simulations, or computer modeling.
Python has come a long way, and today the community of Python has grown to the
extent that there are sources and tools to help write minimal code to accomplish
almost everything that one may need in computing very efficiently. We could only
pick a few working examples in this chapter, but in the following chapters, we will
take a look at more examples.
[ 177 ]
www.PacktPub.com
Stay Connected: