Lab description file (4)
Lab description file (4)
• NumPy
• Matplotlib
• SciPy
1
1 Introduction
2 NumPy
NumPy is the main library for using and manipulating matrices in Python,
including linear algebra and statistics capabilities. It is, therefore, a very
fundamental library for scientific computing and data science. NumPy ma-
trices are similar to data structures used in other scientific computing frame-
works, like Matlab. In fact, if you are a Matlab user, you can see the main
similarities and differences at https://docs.scipy.org/doc/numpy/user/
numpy-for-matlab-users.html.
As mentioned previously, we first need to import the library. We do that
by typing:
1 import numpy as np
Now we can use np to refer to functions implemented in the NumPy li-
brary.
The main object in NumPy is the array which can be seen as a table of
elements (usually numbers), all of the same type, indexed by a tuple of non-
negative integers. Note, it starts (same as in C language from 0). In NumPy
dimensions are called axes. For example, the coordinates of a point in 3D
space [3, 4, 2] has one axis. That axis has 3 elements in it, so we say it has a
length of 3. We may also consider examples with more axes, e.g. an example
of an array that has 2 axes is:
2
1 [[ 1. , 0. , 0.] ,
2 [ 0. , 1. , 2.]]
The first axis has a length of 2, the second axis has a length of 3.
NumPy’s array class is called ndarray.
You can check the dimension of the ndarray with the command ndarray.shape.
For a matrix with n rows and m columns, the shape is (n,m).
Arithmetic operators apply on arrays element-wise.
Arrays can be initialised by zeros, or sequential numbers as follows:
1 >>> np . zeros ((2 , 1) )
will produce
1 array ([[0.] ,
2 [0.]])
To create sequences of numbers, NumPy provides the arange function
which is analogous to the Python built-in range, but returns an array.
1 c = np . arange (3 ,21 ,2)
will produce
1 array ([ 3 , 5, 7, 9 , 11 , 13 , 15 , 17 , 19])
We defined a sequence starting with 3 with step 2 finishing with 21. You
can note that the last value (21 is not included.
Exercise 1:
For the scalar λ = 0.1 and the two vectors a = [3, 45, 7, 2] and b =
[2, 54, 13, 15] define them as ndarray and then calculate:
• sum of a and b;
• multiplication of lambda by a;
• element-wise product of a and b.
One-dimensional arrays can be indexed, sliced and iterated over, much like
lists and other Python sequences.
Let us take the sequence, c we defined earlier and index its first element
(remember, same as in C language, the indexing starts with 0. Then we get:
1 c [0]=3
2 c [2]=7
3 c [8]=19
3
Some examples of slicing of the same array follow:
1 c [1:4]
2 array ([5 , 7 , 9])
1 c [7:3: -1]
2 array ([17 , 15 , 13 , 11])
1 c [1:7:2]=100
2 array ([ 3 , 100 , 7 , 100 , 11 , 100 , 15 , 17 , 19])
3 Matplotlib
4
1 import matplotlib . pyplot as plt
2
3 plt . plot ([1 , 4 , 3 , 2])
4 plt . ylabel ( ’ some numbers ’)
5 plt . show ()
5
You can create multiple figures by using multiple figure calls with an
increasing figure number. Each figure can contain as many axes and subplots
as necessary:
1 import matplotlib . pyplot as plt
2 plt . figure (1) # the first figure
3 plt . subplot (211) # the first subplot in
the first figure
4 plt . plot ([1 , 2 , 3])
5 plt . subplot (212) # the second subplot
in the first figure
6 plt . plot ([6 , 5 , 4])
6
You can find more details We refer you now to the official PyPlot tutorial.
Exercise 2:
Please follow the tutorial at https://matplotlib.org/tutorials/introductory/
pyplot.html, and try the examples on your own computer.
4 SciPy
7
• linalg: Linear algebra
• ndimage: N-dimensional image processing
• odr: Orthogonal distance regression
• optimize: Optimization and root-finding routines
• signal: Signal processing
• sparse: Sparse matrices and associated routines
• spatial: Spatial data structures and algorithms
• special: Special functions
• stats: Statistical distributions and functions
4.1 Statistics
8
Here we are generating two samples of the same distribution (rvs1, rvs2 ),
and then performing a t-test in the third line. The output of the t-test may
vary slightly each time you run, given that rvs1 and rvs2 will be re-generated,
but you will see something like:
Ttest_indResult(statistic=-0.5489036175088705, pvalue=0.5831943748663959
As you can see, the pvalue was around 0.58. Since 0.58 > 0.01, the t-test
correctly identified that the two samples come from the same distribution.
Now if you try:
1 rvs3 = stats . norm . rvs ( loc =8 , scale =10 , size =500)
2 Result_1_3 = stats . ttest_ind ( rvs1 , rvs3 )
3 print ( Result_1_3 )
The output will be similar to:
Ttest_indResult(statistic=-4.533414290175026, pvalue=6.507128186389019e-
As we can see, this time we compared rvs1 against a sample from a differ-
ent distribution (rvs3 ), since rvs3 has a different mean. The t-test returned
a very low value (6.50 × 10−6 ), correctly identifying that the underlying dis-
tributions that generated the samples are different.
SciPy provides several functions for linear algebra in the linalg package. It
has more linear algebra functions than the ones in NumPy, and they usually
run faster. Hence, we will show here some examples of linear algebra using
SciPy.
For instance, let’s see how to find the inverse of a matrix. As you may
recall, the inverse of a matrix A is a matrixB such that AB = I, where I
1 0 0
is the identity matrix. That is, I = 0 1 0. In SciPy, we can obtain the
0 0 1
inverse of a matrix A by using linalg.inv(A). Additionally, we can multiply
a matrix A by a matrix B using A.dot(B). Hence, the following example
calculates the inverse and checks the result:
1 import numpy as np
2 from scipy import linalg
3
4 A = np . array ([[1 ,3 ,5] ,[2 ,5 ,1] ,[2 ,3 ,8]])
5 B = linalg . inv ( A )
9
6 print ( B )
7 A . dot ( B )
−1.48 0.36 0.88
You will find the matrix B = 0.56 0.08 −0.36 which when multi-
0.16 −0.12 0.04
1.00000000e + 00 −1.11022302e − 16 −5.55111512e − 17
plied to A leads to: 3.05311332e − 16 1.00000000e + 00 1.87350135e − 16 .
2.22044605e − 16 −1.11022302e − 16 1.00000000e + 00
As we can see, this is very close to the “ideal” identity matrix.
Another common linear algebra operation is to find eigenvalues and eigen-
vectors of a matrix A. For instance, we will need these operations when
implementing PCA in the next lab session. Fundamentally the eigenvalues
and eigenvectors are the scalars λ and corresponding vectors v such that:
Av = λv. These can be found in SciPy using the function linalg.eig(A),
which returns the eigenvalues followed by the eigenvectors. For instance,
consider the following example:
1 import numpy as np
2 from scipy import linalg
3
4 A = np . array ([[1 , 2] , [3 , 4]])
5 la , v = linalg . eig ( A )
6 l1 , l2 = la
7
8 print ( l1 , l2 ) # eigenvalues
9 print ( v [: , 0]) # first eigenvector
10 print ( v [: , 1]) # second eigenvector
11 print ( A . dot ( v [: ,0]) - l1 * v [: ,0]) # check the
computation
12 print ( A . dot ( v [: ,1]) - l2 * v [: ,1])
1 2
By running this example, you will find that for the matrix A = ,
3 4
we can find the following eigenvalues: -0.3722, 5.3722. Additionally, the
respective eigenvectors are: [−0.8245, 0.5657]; [−0.41597356, −0.90937671].
We check the computation by executing Av − λv, which should return a
vector with zeros. Indeed, our code outputs:
0.00000000e + 00 + 0.j 5.55111512e − 17 + 0.j
−4.4408921e − 16 + 0.j 0.0000000e + 00 + 0.j ,
which, as we can see, is very close to 0 for all values.
Exercise 3:
10
Try additional examples in the SciPy tutorials (https://docs.scipy.
org/doc/scipy/reference/). While doing so, plot graphs using the PyPlot
library.
11