05 Numpy
05 Numpy
NumPy is one of the fundamental package for scientific computing in Python. It is a Python library that provides a multidimensional array object, various derived objects (such as masked arrays and matrices), and an assortment of routines for fast operations on arrays, including
mathematical, logical, shape manipulation, sorting, selecting, I/O, discrete Fourier transforms, basic linear algebra, basic statistical operations, random simulation and much more. In this tutorial we will be covering some of the basic concepts in numpy such as arrays and its operations, a
few mathematical functions, and sampling random numbers from some widely used probability distributions.
You can find the documentation for the latest stable NumPy version here, and more resources for learning NumPy here.
NumPy Overview
In [377… # Do you know which version of numpy are you using?
# Why is it important to know that? Answer it yourself with the reason.
import numpy as np
print(np.__version__)
1.25.2
In [378… # Is it necessary to always use "np" as an alias for numpy? Uncomment the line below and check it out
''' Ans: no , np is a general alias '''
import numpy as a
print(a.array([1]))
[1]
Arrays
NumPy works with "multidimensional homogeneous arrays", which are multidimensional tensors containing elements of the same data type.
Two dimensional arrays are similar to matrices that we are familiar with, whereas higher dimensional arrays are analogous to tensors from linear algebra.
In [379… # You can create a numpy array using an iterable (eg: lists, tuples)
[1 2 3 4 5] [5. 6. 7. 8.] [0 1 2 3]
-----
[[1 2]
[3 4]]
-----
[0 1 2 3 4 5]
In [380… # Q: Can you create arrays of non numerial types? Like strings? Try it!
'''Yes '''
list1 = ["a","b","c","d"]
arr = np.array(list1)
print(arr)
# Q: Can you create an array of mixed types? Try it! WHat happens to the elements of the array?
# What can you conclude from this?
In [381… # Remember, everything in python is an object, which means that the numpy array
# is also an object. You can see what type of object it is by using the
# type() function
type(arr1)
Out[381… numpy.ndarray
Out[382… dtype('int32')
In [383… # You can check the number of dimensions of a numpy array by using the "shape" attribute of the array
arr4.shape
Out[383… (2, 2)
In [384… # You can change the shape of an array using the "reshape" method.
arr6 = arr5.reshape(3,2)
print(arr6)
[[0 1]
[2 3]
[4 5]]
In [385… # Note that the default behaviour of 'reshape' is to arrange the values row-wise
# It is also possible to arrange the values column-wise as well
arr6 = arr5.reshape(3,2,order='F')
print(arr6)
[[0 3]
[1 4]
[2 5]]
In [386… # What's the easiest way to create an array with shape (2,5) that returns
# the same array as the statement below? (Hint: Use the "numpy.arange" function)
#np.array([[1,2,3,4,5],[6,7,8,9,10]])
[[ 1 2 3 4 5]
[ 6 7 8 9 10]]
In [387… # numpy has some convenient functions to create certain special arrays ...
# How would you create a numpy array with shape (3,10) that contains all 0's?
# Write code below:
''' Using np.zeros(x) function'''
zero_array = np.zeros(30).reshape(3,10)
print(zero_array)
[[0. 0. 0. 0. 0. 0. 0. 0. 0. 0.]
[0. 0. 0. 0. 0. 0. 0. 0. 0. 0.]
[0. 0. 0. 0. 0. 0. 0. 0. 0. 0.]]
Slicing arrays
In [389… # Akin to a regular list, we can slice numpy arrays as well, in various ways
# Note that indexing starts at 0 and not 1, and that the last index is not included
print(arr1)
print(arr1[0:5])
print(arr1[0:4])
print(arr1[0:4:1])
print(arr1[0:4:2])
print(arr1[::-1])
[1 2 3 4 5]
[1 2 3 4 5]
[1 2 3 4]
[1 2 3 4]
[1 3]
[5 4 3 2 1]
[[ 0 1 2 3 4 5]
[ 6 7 8 9 10 11]
[12 13 14 15 16 17]
[18 19 20 21 22 23]
[24 25 26 27 28 29]
[30 31 32 33 34 35]]
[[ 8 9 10]
[14 15 16]
[20 21 22]]
Copies
new_array = arr : creates connected copies
arr = np.arange(6)
print(arr)
new_arr = arr
print(new_arr)
new_arr[2] = 100 # Here we are changing a value in new_arr only, but ...
print(arr)
print(new_arr)
[0 1 2 3 4 5]
[0 1 2 3 4 5]
[ 0 1 100 3 4 5]
[ 0 1 100 3 4 5]
In [392… # Why did "arr" also change when only new_arr was changed?
# Because "new_arr = arr" was execcuted, the pointer to the array object that
# "arr" contains, was also assigned to "new_arr"
[0 1 2 3 4 5] [ 0 1 100 3 4 5]
Mathematical Operations
In [393… # You can calculate the sum of all the elements in an array using np.sum()
print(np.sum(arr1))
15
# There are many aggregate operations you can run on numpy arrays other than
# the sum() method.
'''2. arr.mean() '''
print(f"the mean is {arr1.mean()}") #calculate mean
[1 2 3 4 5]
the sum is 15
the mean is 3.0
the std deviation is 1.4142135623730951
the min is 1
the max is 5
[0 1 2 3] [0 1 4 9]
In [396… # How to obtain an array that has the squares of each element in "arr"?
# Type code below:
squared_arr = arr*arr
print(squared_arr)
[0 1 4 9]
Operations on 2d array
In [397… # In the case of multi-dimensional arrays, the above functions can take an additional parameter 'axis'
# to define the dimension along which the operation is to be performed. For example
arr = np.arange(36).reshape(6,6)
print(arr,"\n-----")
[[ 0 1 2 3 4 5]
[ 6 7 8 9 10 11]
[12 13 14 15 16 17]
[18 19 20 21 22 23]
[24 25 26 27 28 29]
[30 31 32 33 34 35]]
-----
[ 90 96 102 108 114 120]
-----
[ 15 51 87 123 159 195]
print(np.dot(arr1, arr2))
[0 1 2 3]
[3 4 5 6]
32
# The following lines are all equivalent, they all calculate the dot product of the two arrays
print(np.dot(arr1,arr2),"\n-----")
print(arr1.dot(arr2),"\n-----")
print(arr1 @ arr2)
[[0 1]
[2 3]
[4 5]]
-----
[[1. 1. 1.]
[1. 1. 1.]]
-----
[[1. 1. 1.]
[5. 5. 5.]
[9. 9. 9.]]
-----
[[1. 1. 1.]
[5. 5. 5.]
[9. 9. 9.]]
-----
[[1. 1. 1.]
[5. 5. 5.]
[9. 9. 9.]]
arr1 = np.arange(6).reshape(2,3)
arr2 = np.arange(12).reshape(4,3)
'''print(arr1,"\n-----")
print(arr2,"\n-----")
print(arr1 @ arr2)'''
In [401… # As you might have guessed, the matrix dimensions are not conducive for matrix multiplication
# One has to often 'transpose' a matrix prior to carrying out operations like multiplication
# This can be done by a method `.T` as shown below
# We can also use the `np.transpose()` function to acheive the same result
print(arr1,"\n-----") # 2 x 3 matrix
print(arr2.T,"\n-----") # 3 x 4 matrix
print(arr1 @ arr2.T,"\n-----") # Now there will be no error
print(arr1 @ np.transpose(arr2)) # Same effect as the above statement
[[0 1 2]
[3 4 5]]
-----
[[ 0 3 6 9]
[ 1 4 7 10]
[ 2 5 8 11]]
-----
[[ 5 14 23 32]
[ 14 50 86 122]]
-----
[[ 5 14 23 32]
[ 14 50 86 122]]
In [402… # Let's look into some standard operations that can be performed on square matrices.
# You can obtain the determinant of a square matrix by using the `np.linalg.det()` function
arr1 = np.arange(9).reshape(3,3)
arr1 *= arr1
# arr1 = np.array([1,2,10,5,6,3,8,9,23]).reshape(3,3)
print(arr1,"\n-----")
print("Value of determinant " , np.linalg.det(arr1))
[[ 0 1 4]
[ 9 16 25]
[36 49 64]]
-----
Value of determinant -216.00000000000006
In [403… # You can use the `np.linalg.inv()` function to calculate the inverse of a
# square matrix. What will happen when you run the following code?
print(np.linalg.inv(arr1))
In [404… # Q: What changes will you make to the above matrix to make it invertible?
# Try it out here ...
''' To make the array invertible we square the array ( arr1 *= arr1)'''
# What do you think will be the output if we try finding a determinant and
# inverse for a rectangular matrix? Do try it out.
''' ValueError '''
# Write code here:
In [405… # Now let's find out the eigen values and vectors of the matrix
mat = np.arange(1,10).reshape(3,3)
eigen_value, eigen_vector = np.linalg.eig(mat)
Broadcasting arrays
arr = np.arange(9).reshape(3,3)
print(arr,"\n-----")
[[0 1 2]
[3 4 5]
[6 7 8]]
-----
[[ 0 2 4]
[ 6 8 10]
[12 14 16]]
-----
[[ 0 1 4]
[ 9 16 25]
[36 49 64]]
print(arr2*arr)
[[0 1 2]
[3 4 5]
[6 7 8]]
-----
[0 1 2]
-----
[[ 0 1 4]
[ 0 4 10]
[ 0 7 16]]
What happened here?! To meaningfully perform multiplication, "arr2" is transformed (broadcasted) into another array of shape (3,3), after which element wise multiplication is performed. There are 2 rules for broadcasting:
1. Identify the array with smaller dimension and increase the dimension (by prepending "1" to the shape) such that it matches the dimension of the other array.
2. Identify arrays of size "1" along a dimension and increase the size along that dimension so that it matches the other array.
In the above example, the shape of arr2 changed as follows: (3,) -> (1,3) -> (3,3) following which element-wise multiplication took place.
print(arr1+arr2)
[[ 0 1 2 3]
[ 4 5 6 7]
[ 8 9 10 11]]
-----
[0 1 2 3]
-----
[[ 0 2 4 6]
[ 4 6 8 10]
[ 8 10 12 14]]
In [425… # Can you explain why the following code throws an error?
try:
arr1 = np.arange(12).reshape(3,4)
arr2 = np.arange(3)
print(arr1,"\n-----")
print(arr2,"\n-----")
print(arr1+arr2) # Error in this line
except ValueError as e :
print(f"An error occurred: {e}")
[[ 0 1 2 3]
[ 4 5 6 7]
[ 8 9 10 11]]
-----
[0 1 2]
-----
An error occurred: operands could not be broadcast together with shapes (3,4) (3,)
Comparison Functions
In [424… # Say you want to check whether a numpy array contains any number greater
# than 5. Would a regular python-styled comparison work?
try:
arr = np.arange(8)
if arr > 5:
print("True")
except ValueError as e :
print(f"An error occurred: {e}")
An error occurred: The truth value of an array with more than one element is ambiguous. Use a.any() or a.all()
check = np.any(arr>5)
check is a boolean
In [412… # The > operator, like any other comparison operator in numpy, return a boolean
# array of the same size. To check if any number in the array is greater than 5
# use numpy.any().
np.any(arr > 5)
Out[412… True
To check if all the numbers in the array are greater than 5, you can use
numpy.all()
In [413… # To check if all the numbers in the array are greater than 5, you can use
# numpy.all()
np.all(arr > 5)
Out[413… False
In [414… # What if you want to see which elements in arr are greater than 5?
# You can use the boolean array as an index to index the original array!
print(arr)
print(arr>5)
print(arr[arr > 5])
[0 1 2 3 4 5 6 7]
[False False False False False False True True]
[6 7]
In [415… # Say you want to create a new array arr2 which has the same size as arr,
# but all the elements less than or equal to 5 are replaced by 5. How would
# you create such an array? (Hint: can you assigned values to indexed arrays?)
Constants
In [416… # A number of oft used constants are defined in the numpy package
# These include numpy.inf, numpy.e, numpy.pi
rg = np.random.default_rng()
print(rg.random(5))
# Did you observe that everytime you run this cell, you get 5 different numbers.
# Can you keep it to be a constant? We will try it out in the next cell
In [418… # Run this cell multiple times and you will see the numbers in the array are always constant
# What is the reason for this? It is because we have provided a 'seed' value of 42
# Providing a seed value allows you to reproduce the same random numbers, which helps to
# repeat, verify and validate simulations and experiments
rg = np.random.default_rng(42)
print(rg.random(5))
# Now change the seed and re-run the cell, you will find a different set of numbers repeating ...
rg = np.random.default_rng(56)
print(rg.random(5))
In [419… # The above statements create random numbers between 0 and 1. What if we want integers?
rg = np.random.default_rng(42)
print(rg.integers(22))
# Further, how will you generate a 3 x 5 matrix of random integers between 0 and 45?
# Write your code here ...
arr = np.arange(12).reshape(6,2,order='F')
np.savetxt("numpy-array.csv", arr, delimiter=',',fmt='%.4f',newline='\n', header='y,x', footer='', comments='', encoding=None)
In [421… # Read back the array from the file and print it
new_arr = np.loadtxt("numpy-array.csv", delimiter=',', skiprows=1, comments='#', encoding=None)
print(new_arr)
[[ 0. 6.]
[ 1. 7.]
[ 2. 8.]
[ 3. 9.]
[ 4. 10.]
[ 5. 11.]]
In [422… # Multiple numpy arrays can be compressed and written to a binary file using the "np.savez" function
# Data written in this way can be read back using the "np.load" function
# Refer to the documentation at
# https://numpy.org/doc/stable/reference/generated/numpy.savez.html and
# https://numpy.org/doc/stable/reference/generated/numpy.load.html
That's it ...
In [423… # This unit has introduced the basics of numpy and numpy arrays.
# The functionality covered in this unit is sufficient to start using numpy for data analysis.
# We will start solving problems using these concepts and functions ...
# The numpy package has many more sub modules, functions and methods.
# Definitely visit the following links: