Round - 0 - Jupyter Notebook
Round - 0 - Jupyter Notebook
Content
What is Python?
Jupyter Notebooks
Printing output
Iterations
Python Libraries
NumPy Arrays
What is Python?
Python is a programming language used to "give instructions" to a computer to produce the desired actions or
output. Like many other programming languages, such as Ruby, PHP, C++, and Java, Python is high-level
programming language (https://computersciencewiki.org/index.php/Higher_level_and_lower_level_languages),
which makes it easy to learn and use Python.
There are plenty of resources for learning basic Python, and we recommend you to utilize these if you are new
to Python or programming in general. Here (https://wiki.python.org/moin/BeginnersGuide/NonProgrammers),
you can find a comprehensive list of books and courses for beginners. For example, the books "Automate the
Boring Stuff with Python" (https://automatetheboringstuff.com) by Al Sweigart and "Think Python: How to
Think Like a Computer Scientist" (http://greenteapress.com/thinkpython/html/index.html) by Allen B. Downey
are freely available online and are excellent places to start. You do not need to read the whole book, as the
chapters about data types, indexing, loops, and functions will provide sufficient knowledge for this course. If
you prefer more interactive learning, you can find short tutorials and Python exercises that you can run in-
browser here (https://www.w3schools.com/python/python_intro.asp), amongst other places.
https://jupyter.cs.aalto.fi/user/rojasa3/notebooks/notebooks/mlpython2021b/R0_Intro/Round_0.ipynb 1/35
01/06/2021 Round_0 - Jupyter Notebook
Jupyter Notebooks
Jupyter Notebook is an interactive environment for running Python code in the browser. You can run notebooks
locally on your computer (given pre-installed python and Jupyter notebook), but we will be using Jupyter Hub
on this course. If you are reading this, you probably successfully logged in to Jupyter Hub and fetched a
notebook. After completing the notebook exercises, you will need to submit the latest notebook version.
A Jupyter notebook consists of blocks/cells containing text (markdown) or code (Python in our case). Below
you can see an example for both types of cells:
In [1]:
print("Hello world!")
Hello world!
https://jupyter.cs.aalto.fi/user/rojasa3/notebooks/notebooks/mlpython2021b/R0_Intro/Round_0.ipynb 2/35
01/06/2021 Round_0 - Jupyter Notebook
You can find a more elaborate introduction to Jupyter notebooks here (https://realpython.com/jupyter-
notebook-introduction/).
Printing output
In [1]:
# Display output
print("The answer is =", myvar)
print(f"The answer is = {myvar+2}")
print("The answer is = {}".format(myvar*0.5))
The answer is = 42
The answer is = 44
https://jupyter.cs.aalto.fi/user/rojasa3/notebooks/notebooks/mlpython2021b/R0_Intro/Round_0.ipynb 3/35
01/06/2021 Round_0 - Jupyter Notebook
In [2]:
# Numeric: integers
myint = 42
print(myint)
# Boolean
mybool = 40+2 == 42
print("The statement 40+2 equals 42 is", mybool)
# Strings
mystr = "forty two"
print(mystr)
# Lists
mylist = [1, 2, "cat", 0.5, False]
print(mylist)
42
42.5
forty two
<class 'int'> <class 'float'> <class 'bool'> <class 'str'> <class 'lis
t'>
You can create a sequence of integers using the built-in functions range(start, stop[, step]) and
list() . See https://docs.python.org/3/library/stdtypes.html#range
(https://docs.python.org/3/library/stdtypes.html#range) for more information. This built-in function creates the
sequence [start,start+step,start+2*step,...]. If the argument step is omitted, it defaults to 1. If the start
argument is omitted, it defaults to 0.
Caution!
In range(stop) the sequence starts from 0 and does not include stop value
Below is a blue markdown cell with an example of "Demo" coding exercise, which explains the task and a
code cell with implementation of this task. "Demo" exercises also help to do "Student task" exercises, which
are in yellow.
For student tasks you need to fill out the part after ### STUDENT TASK ### expression. Often the variables
names are already provided, for example:
https://jupyter.cs.aalto.fi/user/rojasa3/notebooks/notebooks/mlpython2021b/R0_Intro/Round_0.ipynb 4/35
01/06/2021 Round_0 - Jupyter Notebook
# Create lists
# list1 = ...
list2 = ...`
In this case # Create lists is a comment to clarify the task and # list1 = ... and # list2 =
... are the lines you need to first, uncomment (remove # ) and second, complete. In addition, you need to
remove raise NotImplementedError() line.
You will also see "Sanity check" cells after the student tasks. These cells are used to catch really obvious
mistakes, such as returning string data type instead of float or list with wrong number of elements (length). If
your answer passed these tests, "Sanity checks passed!" will be printed out.
Caution!
Passing sanity checks does NOT mean, that the task is solved correctly. You will
know if the students tasks were solved correctly only after the deadline.
In [3]:
1. list list1 which stores a sequence of integers from 1 to 10 (including 10) with step size=1.
2. list list2 which stores a sequence of integers from 0 to 10 (including 10) with step size=2.
https://jupyter.cs.aalto.fi/user/rojasa3/notebooks/notebooks/mlpython2021b/R0_Intro/Round_0.ipynb 5/35
01/06/2021 Round_0 - Jupyter Notebook
In [23]:
In [ ]:
# %load solutions/student-task-1.py
list1 = list(range(1,11))
list2 = list(range(0,11,2))
Iterations
In [39]:
hi
how
are
you
One of the main use of range() is to create loops that iterate over a sequence of values.
Caution!
Indexing in Python starts by default at 0 (and not at 1!)
https://jupyter.cs.aalto.fi/user/rojasa3/notebooks/notebooks/mlpython2021b/R0_Intro/Round_0.ipynb 6/35
01/06/2021 Round_0 - Jupyter Notebook
In [37]:
index: 0 value: hi
In [26]:
# Nested for-loops
# create a list
mylist = [[1,2,3],[4,5,6],[7,8,9]]
# outer loop
for i in range(len(mylist)):
print("\nouter loop, iteration: {} values: {}\n ".format(i, mylist[i]))
# inner loop
for j in range(len(mylist[0])):
print("inner loop, iteration: {} value: {} ".format(j, mylist[i][j]))
https://jupyter.cs.aalto.fi/user/rojasa3/notebooks/notebooks/mlpython2021b/R0_Intro/Round_0.ipynb 7/35
01/06/2021 Round_0 - Jupyter Notebook
In [27]:
# create a list
some_sequence = ["hi","how","are","you"]
index: 0 value: hi
If you need to iterate over two sequences of the same size, you can use the built-in function zip()
In [28]:
# create lists
some_sequence = ["one","two","three","four"]
another_sequence = ["eins","zwei","drei","vier"]
one eins
two zwei
three drei
four vier
In [29]:
# create lists
some_sequence = ["one","two","three","four"]
another_sequence = ["eins","zwei","drei","vier"]
index: 0
index: 1
index: 2
index: 3
https://jupyter.cs.aalto.fi/user/rojasa3/notebooks/notebooks/mlpython2021b/R0_Intro/Round_0.ipynb 8/35
01/06/2021 Round_0 - Jupyter Notebook
In [31]:
# initialize variables
odd_count = 0
even_count = 0
In [ ]:
# %load solutions/student-task-2.py
for num in numbers:
if not num%2:
even_count+=1
else:
odd_count+=1
User-Defined Functions
Like in other programming languages, user can define their own functions in Python. The basic syntax for
Python function contain def and return expressions.
https://jupyter.cs.aalto.fi/user/rojasa3/notebooks/notebooks/mlpython2021b/R0_Intro/Round_0.ipynb 9/35
01/06/2021 Round_0 - Jupyter Notebook
The code snippet below shows how to define a function multiply() which reads in two arguments x and
y . This function computes the product of the arguments and returns it.
In [2]:
# define a function
def multiply(x,y):
'''
this function takes input x and y
and returns multiplication of x and y
'''
# perform computation
out = x*y
return out
<class 'int'>
https://jupyter.cs.aalto.fi/user/rojasa3/notebooks/notebooks/mlpython2021b/R0_Intro/Round_0.ipynb 10/35
01/06/2021 Round_0 - Jupyter Notebook
In [3]:
out = power_of_two(5)
print(out)
In [ ]:
# %load solutions/student-task-3.py
def power_of_two(n):
out = [ 2**i for i in range(1,n+1) ]
return out
Python Libraries
Python programs can import functions from libraries or so-called packages. Some of the most commonly used
Python libraries are:
NumPy - (Numerical Python) for operations involving arrays of numbers. One-dimensional NumPy arrays are
used to represent Euclidean vectors. Two-dimensional NumPy arrays can represent matrices and higher-
dimensional arrays represent tensors.
https://numpy.org/ (https://numpy.org/)
https://pandas.pydata.org/docs/ (https://pandas.pydata.org/docs/)
Matplotlib - A library for data visualization containing many useful tools, e.g., for plotting time series or images.
https://matplotlib.org/3.1.1/contents.html (https://matplotlib.org/3.1.1/contents.html)
Scikit-learn - A library containing implementations of several traditional machine learning methods, such as
linear regression, decision trees, and clustering methods.
https://scikit-learn.org/stable/ (https://scikit-learn.org/stable/)
In order to use functions/classes provided by a library, it must first be imported via the command
import numpy as np
https://jupyter.cs.aalto.fi/user/rojasa3/notebooks/notebooks/mlpython2021b/R0_Intro/Round_0.ipynb 11/35
01/06/2021 Round_0 - Jupyter Notebook
p py p
imports the main numpy module under the name np .
arises if a function of a library "np" is used, where the library has not been imported beforehand.
The library Pandas provides the class (object type) DataFrame . A DataFrame is a two-dimensional (with
rows and columns) tabular structure. Dataframes are convenient for storing and manipulating heterogeneous
data such mixtures of numeric and text data.
In [6]:
# create dictionary
mydict = {'animal':['cat', 'dog','mouse','rat', 'cat'],
'name':['Fluffy','Chewy','Squeaky','Spotty', 'Diablo'],
'age, years': [3,5,0.5,1,8]}
https://jupyter.cs.aalto.fi/user/rojasa3/notebooks/notebooks/mlpython2021b/R0_Intro/Round_0.ipynb 12/35
01/06/2021 Round_0 - Jupyter Notebook
In [7]:
animal cat
name Fluffy
age, years 3
animal cat
name Fluffy
age, years 3
In [8]:
id1 cat
id2 dog
id3 mouse
id4 rat
id5 cat
id1 cat
id2 dog
id3 mouse
id4 rat
id5 cat
id1 cat
id2 dog
id3 mouse
id4 rat
id5 cat
https://jupyter.cs.aalto.fi/user/rojasa3/notebooks/notebooks/mlpython2021b/R0_Intro/Round_0.ipynb 13/35
01/06/2021 Round_0 - Jupyter Notebook
In [11]:
animal cat
name Fluffy
animal cat
name Fluffy
In [12]:
# select cats and dogs by using "|" operator (equivalent to `OR` opeartor)
print(df[(df["animal"]=="cat") | (df["animal"]=="dog")])
https://jupyter.cs.aalto.fi/user/rojasa3/notebooks/notebooks/mlpython2021b/R0_Intro/Round_0.ipynb 14/35
01/06/2021 Round_0 - Jupyter Notebook
In [18]:
Out[18]:
0 1
0 0.471435 -1.190976
1 1.432707 -0.312652
2 -0.720589 0.887163
3 0.859588 -0.636524
4 0.015696 -2.242685
In [16]:
Out[16]:
[ 1.43270697, -0.3126519 ],
[-0.72058873, 0.88716294],
...,
[ 3.16009399, 3.83897138],
[ 3.28939313, 3.68964166],
[ 3.39549918, 4.36393359]])
With pd.read_<format name> it is possible to read also excel, json, html, sql and many others types of
files:
https://pandas.pydata.org/pandas-docs/stable/reference/io.html (https://pandas.pydata.org/pandas-
docs/stable/reference/io.html)
Matplotlib is a library that provides plotting functionality for Python. Good introductory tutorials for Matplotlib
can be found at https://matplotlib.org/tutorials/index.html (https://matplotlib.org/tutorials/index.html).
https://jupyter.cs.aalto.fi/user/rojasa3/notebooks/notebooks/mlpython2021b/R0_Intro/Round_0.ipynb 15/35
01/06/2021 Round_0 - Jupyter Notebook
can be found at https://matplotlib.org/tutorials/index.html (https://matplotlib.org/tutorials/index.html).
A useful command for creating a plot in Python is
plt.subplots() returns figure and axes (Axes object or array of Axes objects).
https://jupyter.cs.aalto.fi/user/rojasa3/notebooks/notebooks/mlpython2021b/R0_Intro/Round_0.ipynb 16/35
01/06/2021 Round_0 - Jupyter Notebook
In [19]:
https://jupyter.cs.aalto.fi/user/rojasa3/notebooks/notebooks/mlpython2021b/R0_Intro/Round_0.ipynb 17/35
01/06/2021 Round_0 - Jupyter Notebook
In [20]:
Numpy Arrays
The Python library numpy provides implementations of many matrix operations as well as other useful
features, such as random number generators. Many functions of this library are based on the data type "numpy
𝑁
array". A numpy array is an object that stores -dimensional arrays of numbers, where is the number of𝑁
dimensions. The shape of a numpy array is given by a sequence of 𝑁
integers that indicate the number of
"elements" in each dimension. Maybe the most important special case of numpy arrays is when , 𝑁=1
corresponding to vectors, or when 𝑁=2 5 2
for matrices. A matrix with rows and columns is represented by
a numpy array of shape . (5,2)
Some additional resources to learn more about numpy arrays and related operations can be found here:
https://jupyter.cs.aalto.fi/user/rojasa3/notebooks/notebooks/mlpython2021b/R0_Intro/Round_0.ipynb 18/35
01/06/2021 Round_0 - Jupyter Notebook
Using NumPy arrays allows for vectorized computation which allows, in turn, faster code execution:
https://www.pythonlikeyoumeanit.com/Module3_IntroducingNumpy/VectorizedOperations.html
(https://www.pythonlikeyoumeanit.com/Module3_IntroducingNumpy/VectorizedOperations.html)
https://www.oreilly.com/library/view/python-for-data/9781449323592/ch04.html
(https://www.oreilly.com/library/view/python-for-data/9781449323592/ch04.html)
In [22]:
Out[22]:
(array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9]),
array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9]),
array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9]))
In [23]:
print(zeroarray,'\n')
print(onesarray)
[[0. 0. 0.]
[0. 0. 0.]
[0. 0. 0.]
[0. 0. 0.]]
[[1. 1. 1.]
[1. 1. 1.]
[1. 1. 1.]
[1. 1. 1.]]
https://jupyter.cs.aalto.fi/user/rojasa3/notebooks/notebooks/mlpython2021b/R0_Intro/Round_0.ipynb 19/35
01/06/2021 Round_0 - Jupyter Notebook
In [1]:
Number of rows: 2
Number of columns: 3
[[1 2 3]
[4 5 6]]
(2, 3)
Caution!
A numpy array of shape (n,1) is different from a numpy array of shape (n,)!
In [7]:
# Note! Array of shape (n,1) is not equal to the array of shape (n,)
# Use .shape attribute to check the array's dimensions
# Use .reshape() function to get the array with desired dimensions
myarray1 = np.array(range(10))
myarray2 = np.array(range(10)).reshape(-1,1)
[0 1 2 3 4 5 6 7 8 9] [[0]
[1]
[2]
[3]
[4]
[5]
[6]
[7]
[8]
[9]]
https://jupyter.cs.aalto.fi/user/rojasa3/notebooks/notebooks/mlpython2021b/R0_Intro/Round_0.ipynb 20/35
01/06/2021 Round_0 - Jupyter Notebook
In [12]:
# 1D array
myarray = np.arange(10,0,-1)
print(myarray)
print("First element of the array: {}\n".format(myarray[0]))
# 2D array
myarray = np.array([[1,2,3],[4,5,6]])
print(myarray)
print("2nd row, 3rd column element of the array: {}\n".format(myarray[1,2]))
[10 9 8 7 6 5 4 3 2 1]
[[1 2 3]
[4 5 6]]
[[1 2 3]
[4 5 6]]
Values >2: [3 4 5 6]
In [15]:
[[ 0 1 2 3 4]
[ 5 6 7 8 9]
[10 11 12 13 14]
Sliced array:
[[1 2]
[6 7]]
https://jupyter.cs.aalto.fi/user/rojasa3/notebooks/notebooks/mlpython2021b/R0_Intro/Round_0.ipynb 21/35
01/06/2021 Round_0 - Jupyter Notebook
In [16]:
[[0. 0. 0. 0. 0.]
[0. 0. 0. 0. 0.]
[2. 2. 2. 2. 2.]
[2. 2. 2. 2. 2.]]
[[0. 0. 0. 0. 0. 2. 2. 2. 2. 2.]
[0. 0. 0. 0. 0. 2. 2. 2. 2. 2.]]
Caution!
Modification of an array slice will modify the original array
https://jupyter.cs.aalto.fi/user/rojasa3/notebooks/notebooks/mlpython2021b/R0_Intro/Round_0.ipynb 22/35
01/06/2021 Round_0 - Jupyter Notebook
In [17]:
# Slice view, creates view of the array and any modification of it will update that
# modify variable 'myslice' - assign value zero to all entries of the array
myslice[:] = 0
Original array: [0 1 2 3 4 5 6 7 8 9]
Original array: [0 1 2 3 4 0 0 0 0 0]
In [18]:
Original array: [0 1 2 3 4 5 6 7 8 9]
Original array: [0 1 2 3 4 5 6 7 8 9]
You can find further reading about view and copy of NumPy Arrays here:
https://jupyter.cs.aalto.fi/user/rojasa3/notebooks/notebooks/mlpython2021b/R0_Intro/Round_0.ipynb 23/35
01/06/2021 Round_0 - Jupyter Notebook
https://scipy-cookbook.readthedocs.io/items/ViewsVsCopies.html (https://scipy-
cookbook.readthedocs.io/items/ViewsVsCopies.html)
In [19]:
x = np.arange(10)
y = np.arange(20,30)
print(x, y)
print(x + y)
print(x - y)
print(x * y)
print(x / y)
# elementwise power
print(x**2)
[0 1 2 3 4 5 6 7 8 9] [20 21 22 23 24 25 26 27 28 29]
[20 22 24 26 28 30 32 34 36 38]
[-20 -20 -20 -20 -20 -20 -20 -20 -20 -20]
[ 0 1 4 9 16 25 36 49 64 81]
https://jupyter.cs.aalto.fi/user/rojasa3/notebooks/notebooks/mlpython2021b/R0_Intro/Round_0.ipynb 24/35
01/06/2021 Round_0 - Jupyter Notebook
In [20]:
print(x)
print("\nSum of the array: ", x_sum)
print("\nMaximum and minimun values: {}, {} \nIndices of maximum and minimum values:
x_max, x_min, x_indmax, x_indmin))
[10 9 8 7 6 5 4 3 2 1]
Broadcasting
Sometimes we need to add the same constant value to all entries of a numpy array. Consider a numpy array
a of arbitrary size and a numpy array b containing a single number. We would like to be able to write a+b
to get a numpy array whose entries are given by adding the value in b to all entries in a . The concept of
"broadcasting" for numpy arrays makes this possible!
https://docs.scipy.org/doc/numpy/user/basics.broadcasting.html
(https://docs.scipy.org/doc/numpy/user/basics.broadcasting.html)
https://numpy.org/devdocs/user/theory.broadcasting.html
(https://numpy.org/devdocs/user/theory.broadcasting.html)
https://jupyter.cs.aalto.fi/user/rojasa3/notebooks/notebooks/mlpython2021b/R0_Intro/Round_0.ipynb 25/35
01/06/2021 Round_0 - Jupyter Notebook
In [21]:
x =
[[ 1 2 3]
[ 4 5 6]
[ 7 8 9]
[10 11 12]]
y = [[1. 1. 1.]]
x+y =
[[ 2. 3. 4.]
[ 5. 6. 7.]
[ 8. 9. 10.]
https://jupyter.cs.aalto.fi/user/rojasa3/notebooks/notebooks/mlpython2021b/R0_Intro/Round_0.ipynb 26/35
01/06/2021 Round_0 - Jupyter Notebook
In [22]:
import numpy as np
### STUDENT TASK ###
x1 = np.arange(12).reshape(3,4)
x2 = np.copy(x1[:,:2])
x3 = x2*5
x2 = np.hstack([x2, np.zeros((x2.shape[0],2))])
x3 = x1+x2
# remove the line raise NotImplementedError() before testing your solution and submi
print (x1)
print (x2)
print (x3)
[[ 0 1 2 3]
[ 4 5 6 7]
[ 8 9 10 11]]
[[0. 1. 0. 0.]
[4. 5. 0. 0.]
[8. 9. 0. 0.]]
[[ 0. 2. 2. 3.]
[ 8. 10. 6. 7.]
In [ ]:
# %load solutions/student-task-4.py
x1 = np.arange(12).reshape(3,4)
x2 = np.copy(x1[:,:2])
x2 = x2*5
x2 = np.hstack([x2, np.zeros((x2.shape[0],2))])
x3 = x1+x2
It is often useful to represent data in a numerical format as vectors or matrices. For example, suppose we have
collected weather data (daily minimum, maximum, and average temperatures) for many days. In that case, we
can represent observations for one day as a vector (or as NumPy array in Python code) and stack all
observations in a matrix. Each row of this matrix would contain the weather observations for one day and each
column - the minimum, maximum, or average temperatures across all days.
We will soon present the mathematical notation, and the basic operations commonly used when working with
vectors and matrices. If the concepts seem difficult to grasp, you can start by watching the animated video
series "Essence of linear algebra" (https://www.youtube.com/watch?v=kjBOesZCoqc&list=PL0-
GT3co4r2y2YErbmuJw2L5tW4Ew2O5B) from 3Blue1Brown. For more detailed but still accessible
explanations, you can check "Mathematics for Machine Learning" (https://mml-book.github.io) book by
M.P.Deisenroth, A.A.Faisal, and C.S.Ong (PDF is available on the website).
Vectors
We denote vectors with lower-case bold letters, e.g. vector 𝐱 consisting of 𝑛 elements:
https://jupyter.cs.aalto.fi/user/rojasa3/notebooks/notebooks/mlpython2021b/R0_Intro/Round_0.ipynb 27/35
01/06/2021 Round_0 - Jupyter Notebook
𝑥1
𝐱 = 𝑥⋮2
𝑥𝑛
Traditionally vectors are represented as column vectors (elements of the vector stacked vertically). Also,
vectors sometimes represented as 𝐱 = (𝑥1 ,…, 𝑥𝑛 )𝑇 or a transpose (see below) of a row vector, just for
convenience.
Below, you can see how to create vector 𝐱 consisting of 𝑛 elements, where 𝑛 = 5 with Python numpy library.
In [25]:
[1 2 3 4 5]
𝑖
The :th entry of vector 𝐱 is denoted as 𝑥𝑖 , e.g. first element of vector 𝐱 is 𝑥1 .
Note! Indexing in Python starts from zero!
In [26]:
𝑦1
𝐱𝑇 𝐲 = (𝑥1 , 𝑥2 ,…, 𝑥𝑚 ) ⋅ 𝑦⋮2 = 𝑥1 𝑦1 + 𝑥2 𝑦2 + … + 𝑥𝑚 𝑦𝑚
𝑦𝑚
Geometrically, it is the product of the Euclidean distances of the two vectors and the cosine of the angle
between them.
The dot product is also defined for NumPy arrays with more than one dimension (see numpy documentation
(https://numpy.org/doc/stable/reference/generated/numpy.dot.html?highlight=dot#numpy.dot) for more info).
https://jupyter.cs.aalto.fi/user/rojasa3/notebooks/notebooks/mlpython2021b/R0_Intro/Round_0.ipynb 28/35
01/06/2021 Round_0 - Jupyter Notebook
In [27]:
[0 1 2] [3 4 5]
Out[27]:
14
𝑥1 𝑥1 𝑦1 𝑥1 𝑦2 … 𝑥1 𝑦𝑚
𝐱𝐲𝑇 = 𝑥⋮2 ⋅ (𝑦1 , 𝑦2 ,…, 𝑦𝑚 ) = 𝑥2⋮𝑦1 𝑥2 𝑦2
⋮
…
⋱
𝑥2 𝑦𝑚
⋮
𝑥𝑚 𝑥𝑚 𝑦1 𝑥𝑚 𝑦2 … 𝑥𝑚 𝑦𝑚
As you can see, the result of the outer product is a matrix, whereas the output of the dot product is scalar.
In [28]:
[0 1 2] [3 4 5]
Out[28]:
array([[ 0, 0, 0],
[ 3, 4, 5],
[ 6, 8, 10]])
Matrices
We will discuss how to represent our data as a matrix for further analyses in the next round.
https://jupyter.cs.aalto.fi/user/rojasa3/notebooks/notebooks/mlpython2021b/R0_Intro/Round_0.ipynb 29/35
01/06/2021 Round_0 - Jupyter Notebook
Matrices are denoted in bold capital letters, e.g. matrix 𝐗 with 𝑚 rows and 𝑛 columns or 𝑚 × 𝑛 matrix.
𝑥(1)1 𝑥(1)2 … 𝑥(1)𝑛
𝐗 = 1 𝑥 (2) 𝑥(2)2 … 𝑥(2)𝑛
𝑥⋮(𝑚) ⋮ ⋱ ⋮
1 𝑥(𝑚)2 … 𝑥(𝑚)
𝑛
Below, you can see how to create a matrix 𝐗 with 𝑚 = 3 and 𝑛 = 4, containing the range of numbers
0,1,…,11 .
In [32]:
X = np.arange(12).reshape(3,4)
print(X)
[[ 0 1 2 3]
[ 4 5 6 7]
[ 8 9 10 11]]
[[ 0 4 8]
[ 1 5 9]
[ 2 6 10]
[ 3 7 11]]
[0 1 2]
In Python, matrix multiplication can be performed using NumPy with the @ operator, which is equivalent to the
function numpy.matmul() .
https://jupyter.cs.aalto.fi/user/rojasa3/notebooks/notebooks/mlpython2021b/R0_Intro/Round_0.ipynb 31/35
01/06/2021 Round_0 - Jupyter Notebook
In [35]:
[[ 20 23 26 29]
[ 56 68 80 92]
Note! (1) For matrix multiplication, the number of columns in the first matrix must be equal to the number of
rows in the second matrix. The result matrix has the number of rows of the first and the number of columns of
the second matrix (2) Order of matrix multiplication is important: np.matmul(A,B) != np.matmul(B,A) (3) A*B is
elemet-waise multiplication in Python, and not the matrix multiplication.
https://jupyter.cs.aalto.fi/user/rojasa3/notebooks/notebooks/mlpython2021b/R0_Intro/Round_0.ipynb 32/35
01/06/2021 Round_0 - Jupyter Notebook
In [36]:
[[ 20 23 26 29]
[ 56 68 80 92]
[[ 42 48 54]
Matrix Z:
[[0 1 2]
[3 4 5]
[6 7 8]]
[[ 0 1 4]
[ 9 16 25]
[36 49 64]]
[[ 15 18 21]
[ 42 54 66]
[ 69 90 111]]
The L1-norm of a vector 𝐱 is defined as the sum of the absolute values of its elements, and is denoted by
||𝐱||1 = |𝑥1 |+...+|𝑥𝑛 |
L2-norm of a vector or matrix
https://jupyter.cs.aalto.fi/user/rojasa3/notebooks/notebooks/mlpython2021b/R0_Intro/Round_0.ipynb 33/35
01/06/2021 Round_0 - Jupyter Notebook
The L2-norm of a vector or matrix is defined as the square root of sum of the squared components of a vector
or matrix, which corresponds to the intuitive notion of distance. It is denoted by
‖𝐱‖2 = √⎯𝑥⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯
1 2 +...+ 𝑥 𝑛
⎯,
2
although the subscript is often omitted since the L2-norm is the standard norm in ℝ𝑛 . It is often useful to
calculate the squared L2-norm
‖𝐱‖2 = 𝑥1 2 +...+𝑥𝑛 2 ,
which is equivalent to the inner (dot) product of the vector with itself.
In the picture below, you can see the L1 and L2 norms between two points in the Euclidean plane:
Summation
𝑛
The sum of all elements (from 1 to ) of an indexed collection (𝑥1 , 𝑥2 ,…, 𝑥𝑛 ) (e.g., a vector 𝐱) is denoted by
𝑛
∑ 𝑥𝑖 = 𝑥1 +...+𝑥𝑛
𝑖=1
For example, we can re-write vector norm formula as
𝑛
‖𝐱‖2 = ∑ 𝑥𝑖 2 = 𝑥1 2 +...+𝑥𝑛 2
𝑖=1
Product
Product notation ∏ is used to indicate repeated multiplication.
For example,
7
∏ 𝑘 = 3 · 4 · 5 · 6 · 7,
𝑘=3
or
https://jupyter.cs.aalto.fi/user/rojasa3/notebooks/notebooks/mlpython2021b/R0_Intro/Round_0.ipynb 34/35
01/06/2021 Round_0 - Jupyter Notebook
𝑛
∏ 𝑥𝑖 = 𝑥1 ·...·𝑥𝑛 .
𝑖=1
In [37]:
np.random.seed(42)
x = np.arange(5).reshape(-1,)
w = np.random.rand(5).reshape(-1,)
A = np.arange(15).reshape(5,3)
In [ ]:
# %load solutions/student-task-5.py
vector_sum = x.sum()
vector_norm = sum(x**2)
vector_dotprod = w.dot(x)
vector_outerprod = np.outer(w,x)
mat_mult = vector_outerprod@A
https://jupyter.cs.aalto.fi/user/rojasa3/notebooks/notebooks/mlpython2021b/R0_Intro/Round_0.ipynb 35/35