CS229
Python & Numpy
Jingbo Yang, Andrey Kurenkov
How is python related to with
others?
Python 2.0 released in
2000
(Python 2.7 “end-of-life” in 2020)
Python 3.0 released in
2008
(Python 3.6 for CS 229)
Can run interpreted, like
MATLAB
https://www.researchgate.net/figure/Genealogy-of-Programming-Languages-doi101371-
journalpone0088941g001_fig1_260447599
Before you start
Use Anaconda
Create an environment (full Conda)
conda create -n cs229
Create an environment (Miniconda)
conda env create -f environment.yml
Activate an environment
conda activate cs229
Notepad is not your friend …
Get a text editor/IDE
• PyCharm (IDE)
• Visual Studio Code (IDE??)
• Sublime Text (IDE??)
• Notepad ++/gedit
• Vim (for Linux)
To make you more prepared
PyCharm magic:
FYI, professional version free for students:
https://www.jetbrains.com/student/
Basic Python
Where does my program
start?
It just works
A function
Properly
What is a class?
Initialize the class
to get an instance
using some
parameters
Instance
variable
Does something
with the
instance
To use a class
Instantiate a class,
get an instance
Call an instance
method
HW1 with random classifier
Data
Structures
Basic data structures
List
example_list = [1, 2, '3', 'four’]
Set (unordered, unique)
example_set = set([1, 2, '3', 'four’])
Dictionary (mapping)
example_dictionary =
{
'1': 'one',
'2': 'two',
'3': 'three'
}
More on List
2D list
list_of_list = [[1,2,3], [4,5,6], [7,8,9]]
List comprehension
initialize_a_list = [i for i in range(9)]
initialize_a_list = [i ** 2 for i in range(9)]
initialize_2d_list = [[i + j for i in range(5)] for j in
range(9)]
Insert/Pop
my_list.insert(0, ‘stuff)
print(my_list.pop(0))
More on List
Sort a list
random_list = [3,12,5,6]
sorted_list = sorted(random_list)
random_list = [(3, ‘A’),(12, ’D’),(5, ‘M’),(6, ‘B’)]
sorted_list = sorted(random_list, key=lambda
x: x[1])
More on Dict/Set
Comprehension
my_dict = {i: i ** 2 for i in range(10)}
my_set = {i ** 2 for i in range(10)}
Get dictionary keys
my_dict.keys()
Numpy &
Scipy
What is Numpy? What is
Scipy?
Numpy – package for vector and matrix manipulation
Scipy – package for scientific and technical computing
How do “those guys” make things run faster?
Read on AVX instruction set (SIMD) and structure of x86
and RISC
Read on OpenMP and CUDA for multiprocessing
Read on assembly-level optimization, memory stride,
caching, etc.
Or even about memory management, virtualization
More bare metal FPGA, TPU
Some numpy usage
Popular usage, read before
use!
Python Command Description
scipy.linalg.inv Inverse of matrix (numpy as equivalent)
scipy.linalg.eig Get eigen value (Read documentation on eigh and numpy equivalent)
scipy.spatial.distance Compute pairwise distance
np.matmul Matrix multiply
np.zeros Create a matrix filled with zeros (Read on np.ones)
np.arange Start, stop, step size (Read on np.linspace)
np.identity Create an identity matrix
np.vstack Vertically stack 2 arrays (Read on np.hstack)
Your friend for debugging
Python Command Description
array.shape Get shape of numpy array
array.dtype Check data type of array (for precision, for weird behavior)
type(stuff) Get type of a variable
import pdb;
Set a breakpoint (https://docs.python.org/3/library/pdb.html)
pdb.set_trace()
print(f’My name is
Easy way to construct a message
{name}’)
So many things to remember
Why can’t I just write loops?
Remember all the fancy low-level
stuff?
Power of vectorization
a = [i for i in range(10000)];
b = [i for i in range(10000)];
tic = time.clock()
dot = 0.0;
for i in range(len(a)):
dot += a[i] * b[i]
toc = time.clock()
print("dot_product = "+ str(dot));
print("Computation time = " + str(1000*(toc - tic )) + "ms")
n_tic = time.clock()
n_dot_product = np.array(a).dot(np.array(b))
n_toc = time.clock()
print("\nn_dot_product = "+str(n_dot_product))
print("Computation time = "+str(1000*(n_toc - n_tic ))+"ms")
Plotting
Matplotlib is your friend
Scatter plot
Line plot
Duo y-axis
Log-log
Bar plot (Histogram)
3D plot
Jupyter Notebook is another
friend
And if you want to get fancy:
Example plots
https://matplotlib.org/3.1.1/gallery/index.html
import matplotlib
Import import matplotlib.pyplot as plt
import numpy as np
# Data for plotting
Create t = np.arange(0.0, 2.0, 0.01)
s = 1 + np.sin(2 * np.pi * t)
data
fig, ax = plt.subplots()
Plottin ax.plot(t, s)
g
ax.set(xlabel='time (s)', ylabel='voltage (mV)',
Format title='About as simple as it gets, folks')
ax.grid()
plot
fig.savefig("test.png")
Save/ plt.show()
show
Plot with dash lines and
legend
https://matplotlib.org/3.1.1/gallery/index.html
import numpy as np
import matplotlib.pyplot as plt
x = np.linspace(0, 10, 500)
y = np.sin(x)
fig, ax = plt.subplots()
line1, = ax.plot(x, y, label='Using set_dashes()')
line1.set_dashes([2, 2, 10, 2]) # 2pt line, 2pt break, 10pt line, 2pt
break
line2, = ax.plot(x, y - 0.2, dashes=[6, 2],
label='Using the dashes parameter')
ax.legend()
plt.show()
Another way for legend
Using subplot
Scatter plot
Plot area under curve
Confusion matrix
https://scikit-learn.org/stable/auto_examples/model_selection/
plot_confusion_matrix.html
fig, ax = plt.subplots()
im = ax.imshow(cm, interpolation='nearest', cmap=cmap)
ax.figure.colorbar(im, ax=ax)
# We want to show all ticks...
ax.set(xticks=np.arange(cm.shape[1]),
yticks=np.arange(cm.shape[0]),
xticklabels=classes, yticklabels=classes,
title=title, ylabel='True label', xlabel='Predicted label')
# Rotate the tick labels and set their alignment.
plt.setp(ax.get_xticklabels(), rotation=45, ha="right", rotation_mode="anchor")
# Loop over data dimensions and create text annotations.
fmt = '.2f' if normalize else 'd'
thresh = cm.max() / 2.
for i in range(cm.shape[0]):
for j in range(cm.shape[1]):
ax.text(j, i, format(cm[i, j], fmt),
ha="center", va="center",
color="white" if cm[i, j] > thresh else "black")
fig.tight_layout()
Numpy &
Scipy
What is Numpy? What is
Scipy?
Numpy – package for vector and matrix manipulation
Scipy – package for scientific and technical computing
How do “those guys” make things run faster?
Read on AVX instruction set (SIMD) and structure of x86
and RISC
Read on OpenMP and CUDA for multiprocessing
Read on assembly-level optimization, memory stride,
caching, etc.
Or even about memory management, virtualization
More bare metal FPGA, TPU
Power of vectorization
a = [i for i in range(10000)];
b = [i for i in range(10000)];
tic = time.clock()
dot = 0.0;
for i in range(len(a)):
dot += a[i] * b[i]
toc = time.clock()
print("dot_product = "+ str(dot));
print("Computation time = " + str(1000*(toc - tic )) + "ms")
n_tic = time.clock()
n_dot_product = np.array(a).dot(np.array(b))
n_toc = time.clock()
print("\nn_dot_product = "+str(n_dot_product))
print("Computation time = "+str(1000*(n_toc - n_tic ))+"ms")
Popular usage, read before
use!
Python Command Description
scipy.linalg.inv Inverse of matrix (numpy as equivalent)
scipy.linalg.eig Get eigen value (Read documentation on eigh and numpy equivalent)
scipy.spatial.distance Compute pairwise distance
np.matmul Matrix multiply
np.zeros Create a matrix filled with zeros (Read on np.ones)
np.arange Start, stop, step size (Read on np.linspace)
np.identity Create an identity matrix
np.vstack Vertically stack 2 arrays (Read on np.hstack)
Your friend for debugging
Python Command Description
array.shape Get shape of numpy array
array.dtype Check data type of array (for precision, for weird behavior)
type(stuff) Get type of a variable
import pdb;
Set a breakpoint (https://docs.python.org/3/library/pdb.html)
pdb.set_trace()
print(f’My name is
Easy way to construct a message
{name}’)
Links
CS 231N Python Tutorial
Good luck on your
HW/Project!
Questions?