Numpy - Ipynb - Colaboratory
Numpy - Ipynb - Colaboratory
ipynb - Colaboratory
This tutorial series is a beginner-friendly introduction to programming and data analysis using the Python programming language. These
tutorials take a practical and coding-focused approach. The best way to learn the material is to execute the code and experiment with it
yourself.
https://colab.research.google.com/drive/1YAk035QAPEJmzo93hWc8OTcsR0jBxUDQ#scrollTo=T8BCFcGBL0T_&printMode=true 1/32
04/06/2023, 20:17 Copy of python-numerical-computing-with-numpy.ipynb - Colaboratory
The easiest way to start executing the code is to click the Run button at the top of this page and select Run on Binder. You can also select "Run
on Colab" or "Run on Kaggle", but you'll need to create an account on Google Colab or Kaggle to use these platforms.
To run the code on your computer locally, you'll need to set up Python, download the notebook and install the required libraries. We recommend
using the Conda distribution of Python. Click the Run button at the top of this page, select the Run Locally option, and follow the instructions.
Jupyter Notebooks: This tutorial is a Jupyter notebook - a document made of cells. Each cell can contain code written in Python
or explanations in plain English. You can execute code cells and view the results, e.g., numbers, messages, graphs, tables, files,
etc., instantly within the notebook. Jupyter is a powerful platform for experimentation and analysis. Don't be afraid to mess
around with the code & break things - you'll learn a lot by encountering and fixing errors. You can use the "Kernel > Restart & Clear
Output" menu option to clear all outputs and start again from the top.
Suppose we want to use climate data like the temperature, rainfall, and humidity to determine if a region is well suited for
growing apples. A simple approach for doing this would be to formulate the relationship between the annual yield of apples (tons
https://colab.research.google.com/drive/1YAk035QAPEJmzo93hWc8OTcsR0jBxUDQ#scrollTo=T8BCFcGBL0T_&printMode=true 2/32
04/06/2023, 20:17 Copy of python-numerical-computing-with-numpy.ipynb - Colaboratory
per hectare) and the climatic conditions like the average temperature (in degrees Fahrenheit), rainfall (in millimeters) & average
relative humidity (in percentage) as a linear equation.
We're expressing the yield of apples as a weighted sum of the temperature, rainfall, and humidity. This equation is an approximation since the
actual relationship may not necessarily be linear, and there may be other factors involved. But a simple linear model like this often works well in
practice.
Based on some statical analysis of historical data, we might come up with reasonable values for the weights w1 , w2 , and w3 . Here's an
example set of values:
w1, w2, w3 = 0.3, 0.2, 0.5
Given some climate data for a region, we can now predict the yield of apples. Here's some sample data:
To begin, we can define some variables to record climate data for a region.
https://colab.research.google.com/drive/1YAk035QAPEJmzo93hWc8OTcsR0jBxUDQ#scrollTo=T8BCFcGBL0T_&printMode=true 3/32
04/06/2023, 20:17 Copy of python-numerical-computing-with-numpy.ipynb - Colaboratory
kanto_temp = 73
kanto_rainfall = 67
kanto_humidity = 43
We can now substitute these variables into the linear equation to predict the yield of apples.
56.8
The expected yield of apples in Kanto region is 56.8 tons per hectare.
To make it slightly easier to perform the above computation for multiple regions, we can represent the climate data for each region as a vector,
i.e., a list of numbers.
The three numbers in each vector represent the temperature, rainfall, and humidity data, respectively.
We can also represent the set of weights used in the formula as a vector.
https://colab.research.google.com/drive/1YAk035QAPEJmzo93hWc8OTcsR0jBxUDQ#scrollTo=T8BCFcGBL0T_&printMode=true 4/32
04/06/2023, 20:17 Copy of python-numerical-computing-with-numpy.ipynb - Colaboratory
We can now write a function crop_yield to calcuate the yield of apples (or any other crop) given the climate data and the respective weights.
def crop_yield(region, weights):
result = 0
for x, w in zip(region, weights):
result += x * w
return result
crop_yield(kanto, weights)
56.8
crop_yield(johto, weights)
76.9
crop_yield(unova, weights)
74.9
The Numpy library provides a built-in function to compute the dot product of two vectors. However, we must first convert the lists into Numpy
arrays.
Let's install the Numpy library using the pip package manager.
https://colab.research.google.com/drive/1YAk035QAPEJmzo93hWc8OTcsR0jBxUDQ#scrollTo=T8BCFcGBL0T_&printMode=true 5/32
04/06/2023, 20:17 Copy of python-numerical-computing-with-numpy.ipynb - Colaboratory
Next, let's import the numpy module. It's common practice to import numpy with the alias np .
import numpy as np
kanto
weights
type(kanto)
numpy.ndarray
type(weights)
numpy.ndarray
weights[0]
0.3
kanto[2]
43
np.dot(kanto, weights)
56.8
We can achieve the same result with low-level operations supported by Numpy arrays: performing an element-wise multiplication and
calculating the resulting numbers' sum.
(kanto * weights).sum()
56.8
The * operator performs an element-wise multiplication of two arrays if they have the same size. The sum method calculates the sum of
numbers in an array.
https://colab.research.google.com/drive/1YAk035QAPEJmzo93hWc8OTcsR0jBxUDQ#scrollTo=T8BCFcGBL0T_&printMode=true 7/32
04/06/2023, 20:17 Copy of python-numerical-computing-with-numpy.ipynb - Colaboratory
arr1 * arr2
arr2.sum()
15
Ease of use: You can write small, concise, and intuitive mathematical expressions like (kanto * weights).sum() rather than using
loops & custom functions like crop_yield .
Performance: Numpy operations and functions are implemented internally in C++, which makes them much faster than using Python
statements & loops that are interpreted at runtime
Here's a comparison of dot products performed using Python loops vs. Numpy arrays on two vectors with a million elements each.
# Python lists
arr1 = list(range(1000000))
arr2 = list(range(1000000, 2000000))
# Numpy arrays
arr1_np = np.array(arr1)
arr2_np = np.array(arr2)
%%time
result = 0
for x1, x2 in zip(arr1, arr2):
result += x1*x2
result
CPU times: user 151 ms, sys: 1.35 ms, total: 153 ms
https://colab.research.google.com/drive/1YAk035QAPEJmzo93hWc8OTcsR0jBxUDQ#scrollTo=T8BCFcGBL0T_&printMode=true 8/32
04/06/2023, 20:17 Copy of python-numerical-computing-with-numpy.ipynb - Colaboratory
%%time
np.dot(arr1_np, arr2_np)
CPU times: user 1.96 ms, sys: 751 µs, total: 2.71 ms
Wall time: 1.6 ms
833332333333500000
As you can see, using np.dot is 100 times faster than using a for loop. This makes Numpy especially useful while working with really large
datasets with tens of thousands or millions of data points.
import jovian
jovian.commit()
climate_data
If you've taken a linear algebra class in high school, you may recognize the above 2-d array as a matrix with five rows and three columns. Each
row represents one region, and the columns represent temperature, rainfall, and humidity, respectively.
Numpy arrays can have any number of dimensions and different lengths along each dimension. We can inspect the length along each
dimension using the .shape property of an array.
# 2D array (matrix)
climate_data.shape
(5, 3)
https://colab.research.google.com/drive/1YAk035QAPEJmzo93hWc8OTcsR0jBxUDQ#scrollTo=T8BCFcGBL0T_&printMode=true 10/32
04/06/2023, 20:17 Copy of python-numerical-computing-with-numpy.ipynb - Colaboratory
weights
# 1D array (vector)
weights.shape
(3,)
# 3D array
arr3 = np.array([
[[11, 12, 13],
[13, 14, 15]],
[[15, 16, 17],
[17, 18, 19.5]]])
arr3.shape
(2, 2, 3)
All the elements in a numpy array have the same data type. You can check the data type of an array using the .dtype property.
weights.dtype
dtype('float64')
climate_data.dtype
dtype('int64')
If an array contains even a single floating point number, all the other elements are also converted to floats.
https://colab.research.google.com/drive/1YAk035QAPEJmzo93hWc8OTcsR0jBxUDQ#scrollTo=T8BCFcGBL0T_&printMode=true 11/32
04/06/2023, 20:17 Copy of python-numerical-computing-with-numpy.ipynb - Colaboratory
arr3.dtype
dtype('float64')
We can now compute the predicted yields of apples in all the regions, using a single matrix multiplication between climate_data (a 5x3
matrix) and weights (a vector of length 3). Here's what it looks like visually:
You can learn about matrices and matrix multiplication by watching the first 3-4 videos of this playlist: https://www.youtube.com/watch?
v=xyAuNHPsq-g&list=PLFD0EB975BA0CC1E0&index=1 .
We can use the np.matmul function or the @ operator to perform matrix multiplication.
np.matmul(climate_data, weights)
climate_data @ weights
temperature,rainfall,humidity
25.00,76.00,99.00
39.00,65.00,70.00
59.00,45.00,77.00
84.00,63.00,38.00
66.00,50.00,52.00
41.00,94.00,77.00
91.00,57.00,96.00
49.00,96.00,99.00
67.00,20.00,28.00
...
CSVs: A comma-separated values (CSV) file is a delimited text file that uses a comma to separate values. Each line of the file is
a data record. Each record consists of one or more fields, separated by commas. A CSV file typically stores tabular data
(numbers and text) in plain text, in which case each line will have the same number of fields. (Wikipedia)
To read this file into a numpy array, we can use the genfromtxt function.
import urllib.request
urllib.request.urlretrieve(
'https://gist.github.com/BirajCoder/a4ffcb76fd6fb221d76ac2ee2b8584e9/raw/4054f90adfd361b7aa4255e99c2e874664094cea/clim
'climate.txt')
https://colab.research.google.com/drive/1YAk035QAPEJmzo93hWc8OTcsR0jBxUDQ#scrollTo=T8BCFcGBL0T_&printMode=true 13/32
04/06/2023, 20:17 Copy of python-numerical-computing-with-numpy.ipynb - Colaboratory
climate_data
climate_data.shape
(10000, 3)
We can now perform a matrix multiplication using the @ operator to predict the yield of apples for the entire dataset using a given set of
weights.
yields
yields.shape
(10000,)
Let's add the yields to climate_data as a fourth column using the np.concatenate function.
https://colab.research.google.com/drive/1YAk035QAPEJmzo93hWc8OTcsR0jBxUDQ#scrollTo=T8BCFcGBL0T_&printMode=true 14/32
04/06/2023, 20:17 Copy of python-numerical-computing-with-numpy.ipynb - Colaboratory
climate_results
Since we wish to add new columns, we pass the argument axis=1 to np.concatenate . The axis argument specifies the dimension for
concatenation.
The arrays should have the same number of dimensions, and the same length along each except the dimension used for concatenation.
We use the np.reshape function to change the shape of yields from (10000,) to (10000,1) .
Here's a visual explanation of np.concatenate along axis=1 (can you guess what axis=0 results in?):
https://colab.research.google.com/drive/1YAk035QAPEJmzo93hWc8OTcsR0jBxUDQ#scrollTo=T8BCFcGBL0T_&printMode=true 15/32
04/06/2023, 20:17 Copy of python-numerical-computing-with-numpy.ipynb - Colaboratory
The best way to understand what a Numpy function does is to experiment with it and read the documentation to learn about its arguments &
return values. Use the cells below to experiment with np.concatenate and np.reshape .
Let's write the final results from our computation above back to a file using the np.savetxt function.
climate_results
np.savetxt('climate_results.txt',
climate_results,
fmt='%.2f',
delimiter=',',
header='temperature,rainfall,humidity,yeild_apples',
comments='')
The results are written back in the CSV format to the file climate_results.txt .
https://colab.research.google.com/drive/1YAk035QAPEJmzo93hWc8OTcsR0jBxUDQ#scrollTo=T8BCFcGBL0T_&printMode=true 16/32
04/06/2023, 20:17 Copy of python-numerical-computing-with-numpy.ipynb - Colaboratory
temperature,rainfall,humidity,yeild_apples
25.00,76.00,99.00,72.20
39.00,65.00,70.00,59.70
59.00,45.00,77.00,65.20
84.00,63.00,38.00,56.80
...
Numpy provides hundreds of functions for performing operations on arrays. Here are some commonly used functions:
How to find the function you need? The easiest way to find the right function for a specific operation or use-case is to do a web
search. For instance, searching for "How to join numpy arrays" leads to this tutorial on array concatenation.
https://colab.research.google.com/drive/1YAk035QAPEJmzo93hWc8OTcsR0jBxUDQ#scrollTo=T8BCFcGBL0T_&printMode=true 17/32
04/06/2023, 20:17 Copy of python-numerical-computing-with-numpy.ipynb - Colaboratory
import jovian
jovian.commit()
The first time you run jovian.commit , you'll be asked to provide an API Key to securely upload the notebook to your Jovian account. You can
get the API key from your Jovian profile page after logging in / signing up.
jovian.commit uploads the notebook to your Jovian account, captures the Python environment, and creates a shareable link for your
notebook, as shown above. You can use this link to share your work and let anyone (including you) run your notebooks and reproduce your
work.
# Adding a scalar
https://colab.research.google.com/drive/1YAk035QAPEJmzo93hWc8OTcsR0jBxUDQ#scrollTo=T8BCFcGBL0T_&printMode=true 18/32
04/06/2023, 20:17 Copy of python-numerical-computing-with-numpy.ipynb - Colaboratory
arr2 + 3
array([[ 4, 5, 6, 7],
[ 8, 9, 10, 11],
[12, 4, 5, 6]])
# Element-wise subtraction
arr3 - arr2
# Division by scalar
arr2 / 2
array([[0.5, 1. , 1.5, 2. ],
[2.5, 3. , 3.5, 4. ],
[4.5, 0.5, 1. , 1.5]])
# Element-wise multiplication
arr2 * arr3
array([[1, 2, 3, 0],
[1, 2, 3, 0],
[1, 1, 2, 3]])
Array Broadcasting
https://colab.research.google.com/drive/1YAk035QAPEJmzo93hWc8OTcsR0jBxUDQ#scrollTo=T8BCFcGBL0T_&printMode=true 19/32
04/06/2023, 20:17 Copy of python-numerical-computing-with-numpy.ipynb - Colaboratory
Numpy arrays also support broadcasting, allowing arithmetic operations between two arrays with different numbers of dimensions but
compatible shapes. Let's look at an example to see how it works.
arr2.shape
(3, 4)
arr4.shape
(4,)
arr2 + arr4
array([[ 5, 7, 9, 11],
[ 9, 11, 13, 15],
[13, 6, 8, 10]])
When the expression arr2 + arr4 is evaluated, arr4 (which has the shape (4,) ) is replicated three times to match the shape (3, 4) of
arr2 . Numpy performs the replication without actually creating three copies of the smaller dimension array, thus improving performance and
using lower memory.
https://colab.research.google.com/drive/1YAk035QAPEJmzo93hWc8OTcsR0jBxUDQ#scrollTo=T8BCFcGBL0T_&printMode=true 20/32
04/06/2023, 20:17 Copy of python-numerical-computing-with-numpy.ipynb - Colaboratory
Broadcasting only works if one of the arrays can be replicated to match the other array's shape.
arr5.shape
(2,)
arr2 + arr5
---------------------------------------------------------------------------
ValueError Traceback (most recent call last)
<ipython-input-72-c22e92053c39> in <module>
----> 1 arr2 + arr5
ValueError: operands could not be broadcast together with shapes (3,4) (2,)
https://colab.research.google.com/drive/1YAk035QAPEJmzo93hWc8OTcsR0jBxUDQ#scrollTo=T8BCFcGBL0T_&printMode=true 21/32
04/06/2023, 20:17 Copy of python-numerical-computing-with-numpy.ipynb - Colaboratory
In the above example, even if arr5 is replicated three times, it will not match the shape of arr2 . Hence arr2 + arr5 cannot be evaluated
successfully. Learn more about broadcasting here: https://numpy.org/doc/stable/user/basics.broadcasting.html .
Array Comparison
Numpy arrays also support comparison operations like == , != , > etc. The result is an array of booleans.
arr1 == arr2
arr1 != arr2
https://colab.research.google.com/drive/1YAk035QAPEJmzo93hWc8OTcsR0jBxUDQ#scrollTo=T8BCFcGBL0T_&printMode=true 22/32
04/06/2023, 20:17 Copy of python-numerical-computing-with-numpy.ipynb - Colaboratory
Array comparison is frequently used to count the number of equal elements in two arrays using the sum method. Remember that True
evaluates to 1 and False evaluates to 0 when booleans are used in arithmetic operations.
(arr1 == arr2).sum()
arr3 = np.array([
[[11, 12, 13, 14],
[13, 14, 15, 19]],
arr3.shape
(3, 2, 4)
# Single element
arr3[1, 1, 2]
36.0
https://colab.research.google.com/drive/1YAk035QAPEJmzo93hWc8OTcsR0jBxUDQ#scrollTo=T8BCFcGBL0T_&printMode=true 23/32
04/06/2023, 20:17 Copy of python-numerical-computing-with-numpy.ipynb - Colaboratory
array([[[15., 16.]],
[[98., 32.]]])
array([18., 43.])
https://colab.research.google.com/drive/1YAk035QAPEJmzo93hWc8OTcsR0jBxUDQ#scrollTo=T8BCFcGBL0T_&printMode=true 24/32
04/06/2023, 20:17 Copy of python-numerical-computing-with-numpy.ipynb - Colaboratory
---------------------------------------------------------------------------
IndexError Traceback (most recent call last)
<ipython-input-87-fbde713646b3> in <module>
1 # Using too many indices
----> 2 arr3[1,3,2,1]
IndexError: too many indices for array: array is 3-dimensional, but 4 were indexed
Numpy also provides some handy functions to create arrays of desired shapes with fixed or random values. Check out the official
documentation or use the help function to learn more.
# All zeros
np.zeros((3, 2))
array([[0., 0.],
[0., 0.],
[0., 0.]])
# All ones
np.ones([2, 2, 3])
# Identity matrix
np.eye(3)
# Random vector
np.random.rand(5)
# Random matrix
np.random.randn(2, 3) # rand vs. randn - what's the difference?
https://colab.research.google.com/drive/1YAk035QAPEJmzo93hWc8OTcsR0jBxUDQ#scrollTo=T8BCFcGBL0T_&printMode=true 26/32
04/06/2023, 20:17 Copy of python-numerical-computing-with-numpy.ipynb - Colaboratory
# Fixed value
np.full([2, 3], 42)
array([10, 13, 16, 19, 22, 25, 28, 31, 34, 37, 40, 43, 46, 49, 52, 55, 58,
61, 64, 67, 70, 73, 76, 79, 82, 85, 88])
array([ 3., 6., 9., 12., 15., 18., 21., 24., 27.])
import jovian
jovian.commit()
https://colab.research.google.com/drive/1YAk035QAPEJmzo93hWc8OTcsR0jBxUDQ#scrollTo=T8BCFcGBL0T_&printMode=true 27/32
04/06/2023, 20:17 Copy of python-numerical-computing-with-numpy.ipynb - Colaboratory
Exercises
Try the following exercises to become familiar with Numpy arrays and practice your skills:
Check out the following resources for learning more about Numpy:
You are ready to move on to the next tutorial: Analyzing Tabular Data using Pandas.
https://colab.research.google.com/drive/1YAk035QAPEJmzo93hWc8OTcsR0jBxUDQ#scrollTo=T8BCFcGBL0T_&printMode=true 28/32
04/06/2023, 20:17 Copy of python-numerical-computing-with-numpy.ipynb - Colaboratory
Try answering the following questions to test your understanding of the topics covered in this notebook:
1. What is a vector?
2. How do you represent vectors using a Python list? Give an example.
3. What is a dot product of two vectors?
4. Write a function to compute the dot product of two vectors.
5. What is Numpy?
6. How do you install Numpy?
7. How do you import the numpy module?
8. What does it mean to import a module with an alias? Give an example.
9. What is the commonly used alias for numpy ?
10. What is a Numpy array?
11. How do you create a Numpy array? Give an example.
12. What is the type of Numpy arrays?
13. How do you access the elements of a Numpy array?
14. How do you compute the dot product of two vectors using Numpy?
15. What happens if you try to compute the dot product of two vectors which have different sizes?
16. How do you compute the element-wise product of two Numpy arrays?
17. How do you compute the sum of all the elements in a Numpy array?
18. What are the benefits of using Numpy arrays over Python lists for operating on numerical data?
19. Why do Numpy array operations have better performance compared to Python functions and loops?
20. Illustrate the performance difference between Numpy array operations and Python loops using an example.
21. What are multi-dimensional Numpy arrays?
22. Illustrate the creation of Numpy arrays with 2, 3, and 4 dimensions.
23. How do you inspect the number of dimensions and the length along each dimension in a Numpy array?
24. Can the elements of a Numpy array have different data types?
25. How do you check the data type of the elements of a Numpy array?
26. What is the data type of a Numpy array?
https://colab.research.google.com/drive/1YAk035QAPEJmzo93hWc8OTcsR0jBxUDQ#scrollTo=T8BCFcGBL0T_&printMode=true 29/32
04/06/2023, 20:17 Copy of python-numerical-computing-with-numpy.ipynb - Colaboratory
https://colab.research.google.com/drive/1YAk035QAPEJmzo93hWc8OTcsR0jBxUDQ#scrollTo=T8BCFcGBL0T_&printMode=true 30/32
04/06/2023, 20:17 Copy of python-numerical-computing-with-numpy.ipynb - Colaboratory
54. How do you create a Numpy array with a given shape containing all ones?
55. How do you create an identity matrix of a given shape?
56. How do you create a random vector of a given length?
57. How do you create a Numpy array with a given shape with a fixed value for each element?
58. How do you create a Numpy array with a given shape containing randomly initialized elements?
59. What is the difference between np.random.rand and np.random.randn ? Illustrate with examples.
60. What is the difference between np.arange and np.linspace ? Illustrate with examples.
https://colab.research.google.com/drive/1YAk035QAPEJmzo93hWc8OTcsR0jBxUDQ#scrollTo=T8BCFcGBL0T_&printMode=true 31/32
04/06/2023, 20:17 Copy of python-numerical-computing-with-numpy.ipynb - Colaboratory
https://colab.research.google.com/drive/1YAk035QAPEJmzo93hWc8OTcsR0jBxUDQ#scrollTo=T8BCFcGBL0T_&printMode=true 32/32