Applied Machine Learning for Engineers
FS 2020 - B. Vennemann
Introduction to NumPy
What is NumPy
Python module for scientific computing
Provides efficiency-boost over Python's built-in datatypes
Offers matrix operations and linear algebra
Provides other useful mathematical operations (trigonometic functions, statistical computations, random numbers,
...)
Installation: conda install -c conda-forge numpy
Module import using alias
In [1]:
import numpy as np
Creating NumPy arrays
NumPy arrays belong to the ndarray class
In [2]:
a = np.array([1, 2, 3])
print(a)
print(type(a))
print(a.dtype) # element datatype
print(a.shape) # shape
print(a.ndim) # number of dimensions
print(a.size) # total number of elements
[1 2 3]
<class 'numpy.ndarray'>
int64
(3,)
1
3
Unlike Python lists, all elements in NumPy arrays must have the same datatype. It can be explicitly defined during array
creation.
In [3]:
b = np.array([1, 2, 3], dtype='int16')
print(b)
print(b.dtype)
[1 2 3]
int16
Arrays can also be multidimensional
In [4]:
b = np.array([[1, 2, 3], [4, 5, 6]])
print(b)
print(b.shape)
[[1 2 3]
[4 5 6]]
(2, 3)
Creating placeholder arrays
In [5]:
z = np.zeros((3, 4))
print(z)
print(z.dtype)
[[0. 0. 0. 0.]
[0. 0. 0. 0.]
[0. 0. 0. 0.]]
float64
The datatype can also be explicitly defined
In [6]:
o = np.ones((3, 4), dtype='uint8')
print(o)
print(o.dtype)
[[1 1 1 1]
[1 1 1 1]
[1 1 1 1]]
uint8
Arrays can also be filled with any other number
In [7]:
threes = np.full((4, 2), fill_value=3)
print(threes)
[[3 3]
[3 3]
[3 3]
[3 3]]
Create sequence of numbers using np.arange (similar to Python's range, but returns a NumPy array)
In [8]:
r = np.arange(10, 30, 5) # (start, stop[exclusive], step)
print(r)
print(type(r))
[10 15 20 25]
<class 'numpy.ndarray'>
When a fixed number of elements is required, use np.linspace instead
In [9]:
r = np.linspace(10, 30, 8) # (start, stop[inclusive], Nelements)
print(r)
[10. 12.85714286 15.71428571 18.57142857 21.42857143 24.28571429
27.14285714 30. ]
Basic operations
Basic operations are performed elementwise!
In [10]:
a = np.arange(0, 10)
print(a)
print(a + 2)
print(a - 2)
print(a * 2)
print(a / 2)
print(a ** 2)
[0 1 2 3 4 5 6 7 8 9]
[ 2 3 4 5 6 7 8 9 10 11]
[-2 -1 0 1 2 3 4 5 6 7]
[ 0 2 4 6 8 10 12 14 16 18]
[0. 0.5 1. 1.5 2. 2.5 3. 3.5 4. 4.5]
[ 0 1 4 9 16 25 36 49 64 81]
This also applies to operations between multiple arrays
In [11]:
a = np.arange(0, 10)
b = np.arange(10, 20)
print(a)
print(b)
print(a + b)
print(a * b)
[0 1 2 3 4 5 6 7 8 9]
[10 11 12 13 14 15 16 17 18 19]
[10 12 14 16 18 20 22 24 26 28]
[ 0 11 24 39 56 75 96 119 144 171]
The ndarray class provides some handy methods, e.g.
mean
max
min
std
...
By default, these work on the entire array
In [14]:
A = np.array([[1, 5, 7],
[2, 3, 0],
[14, 12, 11]
])
print(A.min())
print(A.max())
print(A.mean())
print(A.std())
0
14
6.111111111111111
4.863570806275398
These operations can also be computed along one axis using the axis keyword
In [15]:
print(np.min(A, axis=0))
print(np.min(A, axis=1))
[1 3 0]
[ 1 0 11]
Accessing elements of an ndarray
In [16]:
A = np.array([[1, 2, 3, 4], [5, 6, 7, 8]])
print(A)
[[1 2 3 4]
[5 6 7 8]]
Indices start at 0!!
In [17]:
A[0, 1] # [row, column]
Out[17]:
Accessing entire rows/columns
In [18]:
print(A[:, 0])
print(A[0, :])
[1 5]
[1 2 3 4]
3D example
In [19]:
B = np.random.uniform(low=0, high=1, size=(3, 5, 2))
B
Out[19]:
array([[[0.08324252, 0.15436984],
[0.64798569, 0.18131407],
[0.6383836 , 0.21887796],
[0.39368218, 0.98799556],
[0.22712382, 0.50594579]],
[[0.31433161, 0.59736486],
[0.91134491, 0.9936921 ],
[0.33073117, 0.00934313],
[0.65878591, 0.28912556],
[0.4903908 , 0.32505673]],
[[0.63503595, 0.24154275],
[0.32045564, 0.21570191],
[0.88073534, 0.8438529 ],
[0.47166065, 0.55325182],
[0.97349501, 0.12223283]]])
In [20]:
B[0,:,:]
Out[20]:
array([[0.08324252, 0.15436984],
[0.64798569, 0.18131407],
[0.6383836 , 0.21887796],
[0.39368218, 0.98799556],
[0.22712382, 0.50594579]])
In [21]:
B[0,...]
Out[21]:
array([[0.08324252, 0.15436984],
[0.64798569, 0.18131407],
[0.6383836 , 0.21887796],
[0.39368218, 0.98799556],
[0.22712382, 0.50594579]])
In [22]:
B[0]
Out[22]:
array([[0.08324252, 0.15436984],
[0.64798569, 0.18131407],
[0.6383836 , 0.21887796],
[0.39368218, 0.98799556],
[0.22712382, 0.50594579]])
Accessing multiple elements
In [23]:
print(A[1, 1:3]) # stop index is not inclusive!
[6 7]
Slicing with defined step
In [24]:
print(A[1, 0:3:2]) # [start:stop:step]
print(A[1, ::2])
[5 7]
[5 7]
Indices can also be defined relative to the end of the array
In [25]:
print(A[1, -1]) # second row, last column
print(A[0, -2:]) # first row, last two columns
8
[3 4]
It is also possible to provide a list of indices
In [26]:
columns = [0, 2, 3]
print(A[0, columns])
[1 3 4]
Quick exercise
Create the following matrix using numpy slicing operations
In [27]:
%%latex
\begin{bmatrix}
1 & 1 & 1 & 1 & 1\\
1 & 2 & 2 & 2 & 1\\
1 & 2 & 3 & 2 & 1\\
1 & 2 & 2 & 2 & 1\\
1 & 1 & 1 & 1 & 1\\
\end{bmatrix}
⎡1 1 1 1 1⎤
⎢ ⎥
⎢1 2 2 2 1⎥
⎢1 2 3 2 1⎥
⎢ ⎥
⎢1 2 2 2 1⎥
⎣1 1 1 1 1⎦
[ ]
1 1 1 1 1
1 2 2 2 1
1 2 3 2 1
1 2 2 2 1
1 1 1 1 1
In [28]:
M = np.ones((5, 5))
M[1:-1, 1:-1] = np.full((3, 3), fill_value=2)
M[2, 2] = 3
M
Out[28]:
array([[1., 1., 1., 1., 1.],
[1., 2., 2., 2., 1.],
[1., 2., 3., 2., 1.],
[1., 2., 2., 2., 1.],
[1., 1., 1., 1., 1.]])
In [29]:
# Boolean indexing / masking
A = np.array([0, 1, 2, 3, 4, 5])
mask = np.array([0, 0, 0, 0, 1, 1], dtype='bool')
mask2 = np.array([False, False, False, False, True, True])
print(A[mask])
print(A[mask2])
print(A[A > 3])
[4 5]
[4 5]
[4 5]
In [30]:
# Boolean masking is very handy for thresholding
A = np.array([-1, 1, 5, 6, 7, -3, 2, -8])
negIndices = A < 0
print(negIndices)
A[negIndices] = 0
print(A)
# or in one single step
A = np.array([-1, 1, 5, 6, 7, -3, 2, -8])
A[A < 0] = 0
print(A)
[ True False False False False True False True]
[0 1 5 6 7 0 2 0]
[0 1 5 6 7 0 2 0]
Mutating array elements
In [31]:
arr = np.array([[1, 4, 6], [2, 5, 7]])
In [32]:
arr[0,0] = 20
arr
Out[32]:
array([[20, 4, 6],
[ 2, 5, 7]])
Matrix operations
In [33]:
print(arr * 3) # Basic operations are performed elementwise
[[60 12 18]
[ 6 15 21]]
In [34]:
print(arr + 3)
[[23 7 9]
[ 5 8 10]]
In [35]:
print(arr / 2)
[[10. 2. 3. ]
[ 1. 2.5 3.5]]
In [36]:
A = np.array([[1, 2], [1, 2]])
B = np.array([[3, 4], [3, 4]])
print(A)
print(B)
print(A + B)
print(A * B) # Also operations between matrices are performed elementwise
[[1 2]
[1 2]]
[[3 4]
[3 4]]
[[4 6]
[4 6]]
[[3 8]
[3 8]]
In [37]:
# Matrix multiplication is achieved using np.dot(a, b)
print(A)
print(B)
print(np.dot(A, B))
[[1 2]
[1 2]]
[[3 4]
[3 4]]
[[ 9 12]
[ 9 12]]
In [38]:
A = np.array([[1, 2, 3]])
B = np.array([[4], [5], [6]])
print(A.shape)
print(B.shape)
print(np.dot(A, B)) # Shapes must be consistent
(1, 3)
(3, 1)
[[32]]
In [39]:
# When arrays of different datatypes are combined, the resulting array has the more preci
se datatype
A = np.array([1, 2, 3], dtype='uint8')
B = np.array([[4], [5], [6]], dtype='int64')
C = A * B
print(C)
print(C.dtype)
[[ 4 8 12]
[ 5 10 15]
[ 6 12 18]]
int64
Reshaping arrays
In [40]:
A = np.array([1, 2, 3, 4, 5, 6, 7, 8])
print(np.reshape(A, (8, 1)))
print(np.reshape(A, (2, 4)))
[[1]
[2]
[3]
[4]
[5]
[6]
[7]
[8]]
[[1 2 3 4]
[5 6 7 8]]
In [41]:
B = A.reshape((2, 4))
print(B) # also works
[[1 2 3 4]
[5 6 7 8]]
In [42]:
print(B.ravel()) # Flattens the array
[1 2 3 4 5 6 7 8]
In [43]:
print(B)
print(B.T) # Transpose the array
[[1 2 3 4]
[5 6 7 8]]
[[1 5]
[2 6]
[3 7]
[4 8]]
In [44]:
# When dimension is given as '-1', it is automatically inferred
print(A)
print(A.reshape((2, -1)))
[1 2 3 4 5 6 7 8]
[[1 2 3 4]
[5 6 7 8]]
Stacking arrays
Stacking along first axis using np.vstack
In [45]:
A = np.random.uniform(low=0, high=10, size=(2, 2))
B = np.random.uniform(low=0, high=1., size=(2, 2))
print(A)
print(' ')
print(B)
print(' ')
print(np.vstack((A, B)))
[[3.16485187 5.50691504]
[8.19219751 2.46261689]]
[[0.57457897 0.8490203 ]
[0.70893401 0.2810346 ]]
[[3.16485187 5.50691504]
[8.19219751 2.46261689]
[0.57457897 0.8490203 ]
[0.70893401 0.2810346 ]]
Stacking along second axis using np.hstack
In [46]:
print(np.hstack((A, B)))
[[3.16485187 5.50691504 0.57457897 0.8490203 ]
[8.19219751 2.46261689 0.70893401 0.2810346 ]]
np.vstack stacks along the first axis, np.hstack stacks along the second axis.
np.concatenate allows to specify the axis explicitly: np.concatenate((a1, a2, ...), axis=0)
In [47]:
print(np.concatenate((A, B), axis=0))
[[3.16485187 5.50691504]
[8.19219751 2.46261689]
[0.57457897 0.8490203 ]
[0.70893401 0.2810346 ]]
In [48]:
print(np.concatenate((A, B), axis=1))
[[3.16485187 5.50691504 0.57457897 0.8490203 ]
[8.19219751 2.46261689 0.70893401 0.2810346 ]]
Also see np.r_ , np.c_ (similar to hstack and vstack, but allows slicing notation : )
In [49]:
a = np.r_[1, 2, 3, 4:7, 9]
a
Out[49]:
array([1, 2, 3, 4, 5, 6, 9])
Splitting arrays
Horizontal splitting using np.hsplit by specifying either the number of equally-shaped arrays, or the index where to
split
In [50]:
C = np.random.random(size=(6, 6))
C
Out[50]:
array([[1.61868575e-01, 4.13366875e-01, 8.73954107e-01, 7.30927857e-01,
4.99028994e-01, 8.62912958e-01],
[3.72698791e-02, 4.71009944e-01, 3.72423965e-01, 3.87900439e-01,
7.35999477e-01, 2.59235618e-01],
[4.56622699e-01, 8.28516837e-01, 1.68090024e-01, 1.50134252e-01,
9.11764736e-01, 4.13908301e-01],
[9.63742500e-02, 9.32547885e-01, 2.25458400e-01, 9.27527998e-01,
3.70444468e-01, 7.99349586e-01],
[4.05271519e-01, 6.24753424e-02, 6.29190614e-01, 6.46237548e-01,
5.82782336e-01, 5.79910001e-01],
[4.99657300e-04, 7.05094876e-01, 8.96608319e-01, 1.15221035e-01,
8.87348878e-01, 8.20046926e-01]])
In [51]:
D, E, F = np.hsplit(C, 3)
print(D)
print('')
print(E)
print('')
print(F)
[[1.61868575e-01 4.13366875e-01]
[3.72698791e-02 4.71009944e-01]
[4.56622699e-01 8.28516837e-01]
[9.63742500e-02 9.32547885e-01]
[4.05271519e-01 6.24753424e-02]
[4.99657300e-04 7.05094876e-01]]
[[0.87395411 0.73092786]
[0.37242396 0.38790044]
[0.16809002 0.15013425]
[0.2254584 0.927528 ]
[0.62919061 0.64623755]
[0.89660832 0.11522103]]
[[0.49902899 0.86291296]
[0.73599948 0.25923562]
[0.91176474 0.4139083 ]
[0.37044447 0.79934959]
[0.58278234 0.57991 ]
[0.88734888 0.82004693]]
In [52]:
H, I, J = np.hsplit(C, (2, 5)) # split after second and fith column
print(H)
print('')
print(I)
print('')
print(J)
[[1.61868575e-01 4.13366875e-01]
[3.72698791e-02 4.71009944e-01]
[4.56622699e-01 8.28516837e-01]
[9.63742500e-02 9.32547885e-01]
[4.05271519e-01 6.24753424e-02]
[4.99657300e-04 7.05094876e-01]]
[[0.87395411 0.73092786 0.49902899]
[0.37242396 0.38790044 0.73599948]
[0.16809002 0.15013425 0.91176474]
[0.2254584 0.927528 0.37044447]
[0.62919061 0.64623755 0.58278234]
[0.89660832 0.11522103 0.88734888]]
[[0.86291296]
[0.25923562]
[0.4139083 ]
[0.79934959]
[0.57991 ]
[0.82004693]]
np.vsplit works in the same way along the vertical axis
np.array_split allows to explicitly specify the axis along which to split (c.f. np.concatenate )
Common pitfalls:
Be careful when reassigning an array to a different variable!
NumPy differentiates between copies and views!
In [53]:
a = np.arange(0, 10)
a
Out[53]:
array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9])
In [54]:
b = a # this creates a view, not a copy! (a.k.a. shallow copy)
b
Out[54]:
array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9])
In [55]:
b[0] = 10
b
Out[55]:
array([10, 1, 2, 3, 4, 5, 6, 7, 8, 9])
So far, so good. But we also changed the original array a in the process, because b points to the same object in
memory.
In [56]:
Out[56]:
array([10, 1, 2, 3, 4, 5, 6, 7, 8, 9])
We need to make a deep copy using .copy() to avoid this
In [57]:
a = np.arange(0, 10)
b = a.copy() # this create a seperate object in memory
b[0] = 10
print(b)
print(a)
[10 1 2 3 4 5 6 7 8 9]
[0 1 2 3 4 5 6 7 8 9]
Views are very memory-efficient, but need to be handled with care.
Further reading
More info at https://docs.scipy.org/doc/ (https://docs.scipy.org/doc/)