0% found this document useful (0 votes)
74 views

Applied Machine Learning For Engineers: Introduction To Numpy

NumPy is a Python library used for working with multi-dimensional arrays and matrices. It provides high-performance operations for numerical computing. NumPy arrays have a fixed datatype and can be created, accessed, and operated on efficiently. Basic operations like addition, subtraction, and multiplication are performed elementwise on arrays. NumPy also provides linear algebra operations like matrix multiplication using dot products. Arrays can be reshaped, stacked, and sliced for both accessing and modifying elements.

Uploaded by

Gilbe Testa
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
74 views

Applied Machine Learning For Engineers: Introduction To Numpy

NumPy is a Python library used for working with multi-dimensional arrays and matrices. It provides high-performance operations for numerical computing. NumPy arrays have a fixed datatype and can be created, accessed, and operated on efficiently. Basic operations like addition, subtraction, and multiplication are performed elementwise on arrays. NumPy also provides linear algebra operations like matrix multiplication using dot products. Arrays can be reshaped, stacked, and sliced for both accessing and modifying elements.

Uploaded by

Gilbe Testa
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 13

Applied Machine Learning for Engineers

FS 2020 - B. Vennemann

Introduction to NumPy

What is NumPy
Python module for scientific computing
Provides efficiency-boost over Python's built-in datatypes
Offers matrix operations and linear algebra
Provides other useful mathematical operations (trigonometic functions, statistical computations, random numbers,
...)

Installation: conda install -c conda-forge numpy

Module import using alias

In [1]:

import numpy as np

Creating NumPy arrays

NumPy arrays belong to the ndarray class

In [2]:

a = np.array([1, 2, 3])
print(a)
print(type(a))
print(a.dtype) # element datatype
print(a.shape) # shape
print(a.ndim) # number of dimensions
print(a.size) # total number of elements

[1 2 3]
<class 'numpy.ndarray'>
int64
(3,)
1
3

Unlike Python lists, all elements in NumPy arrays must have the same datatype. It can be explicitly defined during array
creation.
In [3]:

b = np.array([1, 2, 3], dtype='int16')


print(b)
print(b.dtype)

[1 2 3]
int16

Arrays can also be multidimensional

In [4]:

b = np.array([[1, 2, 3], [4, 5, 6]])


print(b)
print(b.shape)

[[1 2 3]
[4 5 6]]
(2, 3)

Creating placeholder arrays

In [5]:

z = np.zeros((3, 4))
print(z)
print(z.dtype)

[[0. 0. 0. 0.]
[0. 0. 0. 0.]
[0. 0. 0. 0.]]
float64

The datatype can also be explicitly defined

In [6]:

o = np.ones((3, 4), dtype='uint8')


print(o)
print(o.dtype)

[[1 1 1 1]
[1 1 1 1]
[1 1 1 1]]
uint8

Arrays can also be filled with any other number

In [7]:

threes = np.full((4, 2), fill_value=3)


print(threes)

[[3 3]
[3 3]
[3 3]
[3 3]]

Create sequence of numbers using np.arange (similar to Python's range, but returns a NumPy array)
In [8]:

r = np.arange(10, 30, 5) # (start, stop[exclusive], step)


print(r)
print(type(r))

[10 15 20 25]
<class 'numpy.ndarray'>

When a fixed number of elements is required, use np.linspace instead

In [9]:

r = np.linspace(10, 30, 8) # (start, stop[inclusive], Nelements)


print(r)

[10. 12.85714286 15.71428571 18.57142857 21.42857143 24.28571429


27.14285714 30. ]

Basic operations

Basic operations are performed elementwise!

In [10]:

a = np.arange(0, 10)
print(a)
print(a + 2)
print(a - 2)
print(a * 2)
print(a / 2)
print(a ** 2)

[0 1 2 3 4 5 6 7 8 9]
[ 2 3 4 5 6 7 8 9 10 11]
[-2 -1 0 1 2 3 4 5 6 7]
[ 0 2 4 6 8 10 12 14 16 18]
[0. 0.5 1. 1.5 2. 2.5 3. 3.5 4. 4.5]
[ 0 1 4 9 16 25 36 49 64 81]

This also applies to operations between multiple arrays

In [11]:

a = np.arange(0, 10)
b = np.arange(10, 20)
print(a)
print(b)
print(a + b)
print(a * b)

[0 1 2 3 4 5 6 7 8 9]
[10 11 12 13 14 15 16 17 18 19]
[10 12 14 16 18 20 22 24 26 28]
[ 0 11 24 39 56 75 96 119 144 171]
The ndarray class provides some handy methods, e.g.

mean
max
min
std
...

By default, these work on the entire array

In [14]:

A = np.array([[1, 5, 7],
[2, 3, 0],
[14, 12, 11]
])
print(A.min())
print(A.max())
print(A.mean())
print(A.std())

0
14
6.111111111111111
4.863570806275398

These operations can also be computed along one axis using the axis keyword

In [15]:

print(np.min(A, axis=0))
print(np.min(A, axis=1))

[1 3 0]
[ 1 0 11]

Accessing elements of an ndarray

In [16]:

A = np.array([[1, 2, 3, 4], [5, 6, 7, 8]])


print(A)

[[1 2 3 4]
[5 6 7 8]]

Indices start at 0!!

In [17]:

A[0, 1] # [row, column]

Out[17]:

Accessing entire rows/columns

In [18]:

print(A[:, 0])
print(A[0, :])

[1 5]
[1 2 3 4]
3D example

In [19]:

B = np.random.uniform(low=0, high=1, size=(3, 5, 2))


B

Out[19]:

array([[[0.08324252, 0.15436984],
[0.64798569, 0.18131407],
[0.6383836 , 0.21887796],
[0.39368218, 0.98799556],
[0.22712382, 0.50594579]],

[[0.31433161, 0.59736486],
[0.91134491, 0.9936921 ],
[0.33073117, 0.00934313],
[0.65878591, 0.28912556],
[0.4903908 , 0.32505673]],

[[0.63503595, 0.24154275],
[0.32045564, 0.21570191],
[0.88073534, 0.8438529 ],
[0.47166065, 0.55325182],
[0.97349501, 0.12223283]]])

In [20]:

B[0,:,:]

Out[20]:

array([[0.08324252, 0.15436984],
[0.64798569, 0.18131407],
[0.6383836 , 0.21887796],
[0.39368218, 0.98799556],
[0.22712382, 0.50594579]])

In [21]:

B[0,...]

Out[21]:

array([[0.08324252, 0.15436984],
[0.64798569, 0.18131407],
[0.6383836 , 0.21887796],
[0.39368218, 0.98799556],
[0.22712382, 0.50594579]])

In [22]:

B[0]

Out[22]:

array([[0.08324252, 0.15436984],
[0.64798569, 0.18131407],
[0.6383836 , 0.21887796],
[0.39368218, 0.98799556],
[0.22712382, 0.50594579]])

Accessing multiple elements

In [23]:

print(A[1, 1:3]) # stop index is not inclusive!

[6 7]
Slicing with defined step

In [24]:

print(A[1, 0:3:2]) # [start:stop:step]


print(A[1, ::2])

[5 7]
[5 7]

Indices can also be defined relative to the end of the array

In [25]:

print(A[1, -1]) # second row, last column


print(A[0, -2:]) # first row, last two columns

8
[3 4]

It is also possible to provide a list of indices

In [26]:

columns = [0, 2, 3]
print(A[0, columns])

[1 3 4]

Quick exercise
Create the following matrix using numpy slicing operations

In [27]:

%%latex
\begin{bmatrix}
1 & 1 & 1 & 1 & 1\\
1 & 2 & 2 & 2 & 1\\
1 & 2 & 3 & 2 & 1\\
1 & 2 & 2 & 2 & 1\\
1 & 1 & 1 & 1 & 1\\
\end{bmatrix}

⎡1 1 1 1 1⎤
⎢ ⎥
⎢1 2 2 2 1⎥
⎢1 2 3 2 1⎥
⎢ ⎥
⎢1 2 2 2 1⎥
⎣1 1 1 1 1⎦

[ ]
1 1 1 1 1
1 2 2 2 1
1 2 3 2 1
1 2 2 2 1
1 1 1 1 1
In [28]:

M = np.ones((5, 5))
M[1:-1, 1:-1] = np.full((3, 3), fill_value=2)
M[2, 2] = 3
M

Out[28]:

array([[1., 1., 1., 1., 1.],


[1., 2., 2., 2., 1.],
[1., 2., 3., 2., 1.],
[1., 2., 2., 2., 1.],
[1., 1., 1., 1., 1.]])

In [29]:

# Boolean indexing / masking


A = np.array([0, 1, 2, 3, 4, 5])
mask = np.array([0, 0, 0, 0, 1, 1], dtype='bool')
mask2 = np.array([False, False, False, False, True, True])
print(A[mask])
print(A[mask2])
print(A[A > 3])

[4 5]
[4 5]
[4 5]

In [30]:

# Boolean masking is very handy for thresholding


A = np.array([-1, 1, 5, 6, 7, -3, 2, -8])
negIndices = A < 0
print(negIndices)
A[negIndices] = 0
print(A)

# or in one single step


A = np.array([-1, 1, 5, 6, 7, -3, 2, -8])
A[A < 0] = 0
print(A)

[ True False False False False True False True]


[0 1 5 6 7 0 2 0]
[0 1 5 6 7 0 2 0]

Mutating array elements


In [31]:

arr = np.array([[1, 4, 6], [2, 5, 7]])

In [32]:

arr[0,0] = 20
arr

Out[32]:

array([[20, 4, 6],
[ 2, 5, 7]])

Matrix operations
In [33]:

print(arr * 3) # Basic operations are performed elementwise

[[60 12 18]
[ 6 15 21]]

In [34]:

print(arr + 3)

[[23 7 9]
[ 5 8 10]]

In [35]:

print(arr / 2)

[[10. 2. 3. ]
[ 1. 2.5 3.5]]

In [36]:

A = np.array([[1, 2], [1, 2]])


B = np.array([[3, 4], [3, 4]])
print(A)
print(B)
print(A + B)
print(A * B) # Also operations between matrices are performed elementwise

[[1 2]
[1 2]]
[[3 4]
[3 4]]
[[4 6]
[4 6]]
[[3 8]
[3 8]]

In [37]:

# Matrix multiplication is achieved using np.dot(a, b)


print(A)
print(B)
print(np.dot(A, B))

[[1 2]
[1 2]]
[[3 4]
[3 4]]
[[ 9 12]
[ 9 12]]

In [38]:

A = np.array([[1, 2, 3]])
B = np.array([[4], [5], [6]])
print(A.shape)
print(B.shape)
print(np.dot(A, B)) # Shapes must be consistent

(1, 3)
(3, 1)
[[32]]
In [39]:

# When arrays of different datatypes are combined, the resulting array has the more preci
se datatype
A = np.array([1, 2, 3], dtype='uint8')
B = np.array([[4], [5], [6]], dtype='int64')
C = A * B
print(C)
print(C.dtype)

[[ 4 8 12]
[ 5 10 15]
[ 6 12 18]]
int64

Reshaping arrays

In [40]:

A = np.array([1, 2, 3, 4, 5, 6, 7, 8])
print(np.reshape(A, (8, 1)))
print(np.reshape(A, (2, 4)))

[[1]
[2]
[3]
[4]
[5]
[6]
[7]
[8]]
[[1 2 3 4]
[5 6 7 8]]

In [41]:

B = A.reshape((2, 4))
print(B) # also works

[[1 2 3 4]
[5 6 7 8]]

In [42]:

print(B.ravel()) # Flattens the array

[1 2 3 4 5 6 7 8]

In [43]:

print(B)
print(B.T) # Transpose the array

[[1 2 3 4]
[5 6 7 8]]
[[1 5]
[2 6]
[3 7]
[4 8]]
In [44]:

# When dimension is given as '-1', it is automatically inferred


print(A)
print(A.reshape((2, -1)))

[1 2 3 4 5 6 7 8]
[[1 2 3 4]
[5 6 7 8]]

Stacking arrays

Stacking along first axis using np.vstack

In [45]:

A = np.random.uniform(low=0, high=10, size=(2, 2))


B = np.random.uniform(low=0, high=1., size=(2, 2))
print(A)
print(' ')
print(B)
print(' ')
print(np.vstack((A, B)))

[[3.16485187 5.50691504]
[8.19219751 2.46261689]]

[[0.57457897 0.8490203 ]
[0.70893401 0.2810346 ]]

[[3.16485187 5.50691504]
[8.19219751 2.46261689]
[0.57457897 0.8490203 ]
[0.70893401 0.2810346 ]]

Stacking along second axis using np.hstack

In [46]:

print(np.hstack((A, B)))

[[3.16485187 5.50691504 0.57457897 0.8490203 ]


[8.19219751 2.46261689 0.70893401 0.2810346 ]]

np.vstack stacks along the first axis, np.hstack stacks along the second axis.
np.concatenate allows to specify the axis explicitly: np.concatenate((a1, a2, ...), axis=0)

In [47]:

print(np.concatenate((A, B), axis=0))

[[3.16485187 5.50691504]
[8.19219751 2.46261689]
[0.57457897 0.8490203 ]
[0.70893401 0.2810346 ]]

In [48]:

print(np.concatenate((A, B), axis=1))

[[3.16485187 5.50691504 0.57457897 0.8490203 ]


[8.19219751 2.46261689 0.70893401 0.2810346 ]]

Also see np.r_ , np.c_ (similar to hstack and vstack, but allows slicing notation : )
In [49]:

a = np.r_[1, 2, 3, 4:7, 9]
a

Out[49]:

array([1, 2, 3, 4, 5, 6, 9])

Splitting arrays

Horizontal splitting using np.hsplit by specifying either the number of equally-shaped arrays, or the index where to
split

In [50]:

C = np.random.random(size=(6, 6))
C

Out[50]:

array([[1.61868575e-01, 4.13366875e-01, 8.73954107e-01, 7.30927857e-01,


4.99028994e-01, 8.62912958e-01],
[3.72698791e-02, 4.71009944e-01, 3.72423965e-01, 3.87900439e-01,
7.35999477e-01, 2.59235618e-01],
[4.56622699e-01, 8.28516837e-01, 1.68090024e-01, 1.50134252e-01,
9.11764736e-01, 4.13908301e-01],
[9.63742500e-02, 9.32547885e-01, 2.25458400e-01, 9.27527998e-01,
3.70444468e-01, 7.99349586e-01],
[4.05271519e-01, 6.24753424e-02, 6.29190614e-01, 6.46237548e-01,
5.82782336e-01, 5.79910001e-01],
[4.99657300e-04, 7.05094876e-01, 8.96608319e-01, 1.15221035e-01,
8.87348878e-01, 8.20046926e-01]])

In [51]:

D, E, F = np.hsplit(C, 3)
print(D)
print('')
print(E)
print('')
print(F)

[[1.61868575e-01 4.13366875e-01]
[3.72698791e-02 4.71009944e-01]
[4.56622699e-01 8.28516837e-01]
[9.63742500e-02 9.32547885e-01]
[4.05271519e-01 6.24753424e-02]
[4.99657300e-04 7.05094876e-01]]

[[0.87395411 0.73092786]
[0.37242396 0.38790044]
[0.16809002 0.15013425]
[0.2254584 0.927528 ]
[0.62919061 0.64623755]
[0.89660832 0.11522103]]

[[0.49902899 0.86291296]
[0.73599948 0.25923562]
[0.91176474 0.4139083 ]
[0.37044447 0.79934959]
[0.58278234 0.57991 ]
[0.88734888 0.82004693]]
In [52]:

H, I, J = np.hsplit(C, (2, 5)) # split after second and fith column


print(H)
print('')
print(I)
print('')
print(J)

[[1.61868575e-01 4.13366875e-01]
[3.72698791e-02 4.71009944e-01]
[4.56622699e-01 8.28516837e-01]
[9.63742500e-02 9.32547885e-01]
[4.05271519e-01 6.24753424e-02]
[4.99657300e-04 7.05094876e-01]]

[[0.87395411 0.73092786 0.49902899]


[0.37242396 0.38790044 0.73599948]
[0.16809002 0.15013425 0.91176474]
[0.2254584 0.927528 0.37044447]
[0.62919061 0.64623755 0.58278234]
[0.89660832 0.11522103 0.88734888]]

[[0.86291296]
[0.25923562]
[0.4139083 ]
[0.79934959]
[0.57991 ]
[0.82004693]]

np.vsplit works in the same way along the vertical axis


np.array_split allows to explicitly specify the axis along which to split (c.f. np.concatenate )

Common pitfalls:
Be careful when reassigning an array to a different variable!

NumPy differentiates between copies and views!

In [53]:

a = np.arange(0, 10)
a

Out[53]:

array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9])

In [54]:

b = a # this creates a view, not a copy! (a.k.a. shallow copy)


b

Out[54]:

array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9])

In [55]:

b[0] = 10
b

Out[55]:

array([10, 1, 2, 3, 4, 5, 6, 7, 8, 9])
So far, so good. But we also changed the original array a in the process, because b points to the same object in
memory.

In [56]:

Out[56]:

array([10, 1, 2, 3, 4, 5, 6, 7, 8, 9])

We need to make a deep copy using .copy() to avoid this

In [57]:

a = np.arange(0, 10)
b = a.copy() # this create a seperate object in memory
b[0] = 10
print(b)
print(a)

[10 1 2 3 4 5 6 7 8 9]
[0 1 2 3 4 5 6 7 8 9]

Views are very memory-efficient, but need to be handled with care.

Further reading
More info at https://docs.scipy.org/doc/ (https://docs.scipy.org/doc/)

You might also like