Python Tutorial Completed - Michigan.pdf
Python Tutorial Completed - Michigan.pdf
python --version
If you have Python 2.x, install Python 3.7+, and use python3 .
This assumes you are using Ubuntu or another Linux distribution that uses apt for package
management.
To create a virtual environment named env in your current folder, run this (only once):
This will create an env/ folder which contains the files and binaries for your virtual environment.
source env/bin/activate
deactivate
Note!
Install all packages and do your work inside the (activated) virtual environment —
this will make your life a lot easier since you will not have to keep track of outside
dependencies!
To install dependencies, first make sure your virtualenv is activated. You should see (env) to the
left of your shell prompt.
Then, run
pip install <package names>
So, just create a file (e.g. helloworld.py ) and write some code (e.g. print("hello,
world") ). Then, to run this code, just run the following in your terminal:
python3 helloworld.py
Just run:
jupyter notebook
in the same directory as the notebook(s) you want to open. This should launch Jupyter Notebook in
a new browser tab.
Note!
If you are on WSL, you may get an error and the browser tab won't open. This is
because your browser is not installed through WSL, and it's totally okay! Just copy
and paste the URL it specifies in this message into your browser
You can click open the notebooks and run the cells inside by using the buttons on the top bar or
pressing shift+enter to run a single cell.
The interface should be similar to (but not exactly the same as) the local Jupyter notebook.
Note!
Loading external data on Colab is slightly different than with the other methods,
since notebooks are stored in the cloud. See this Colab notebook
(https://colab.research.google.com/notebooks/io.ipynb) for how to load datasets.
4. NumPy Basics
From the NumPy homepage (https://numpy.org/):
NumPy is the fundamental package for scientific computing with Python. It contains
among other things:
We will focus mostly on the first line (the array object), with some coverage of other functions
numpy provides.
We can create numpy arrays from python lists, or in any number of ways described in the
documentation (https://numpy.org/devdocs/user/quickstart.html#array-creation).
In [2]: # creating numpy array from python list using np.array() function
a = np.array([1, 2, 3, 4, 5, 6])
In [3]: l = [1, 2, 3, 4]
In [5]: type(l)
Out[5]: list
Properties:
Homogeneous
All the elements of a numpy array are of the same type. You can see what type our values are
stored as with dtype
N-dimensional
Numpy arrays don't need to be 1-dimensional. They can be 2-dimensional (like a matrix), or even
more. Each dimension is called an axis and the shape is a tuple of sizes along each axis.
Remember these terms: axis and shape . These are essential in using numpy.
In [8]: b
Caution!
Out[11]: (array([[1],
[2],
[3],
[4]]),
(4, 1))
You can think of x as being 1D (you index once to get to a value) and y as being 2D (you have to
index twice to get to a value)
In [12]: x[0]
Out[12]: 1
In [13]: print(y[0])
print(y[0][0])
[1]
1
Indexing
Like vanilla Python lists, we can access data stored in a numpy array by indexing into it with
integers. Slices work, too.
In [15]: a
Out[14]: 1
Unlike Python lists, we have some more options when indexing into Numpy arrays.
Multiple Indices
You can pass a list (or a numpy array) as the index to get data from multiple indices.
What if, instead of the first row, we want just the second value of the first row? We can use
commas to separate along axes.
In [20]: b
In [21]: # index 0 of axis 0 gives us the first row: [1.1, 2.2, 3.3].
# index 1 of axis 1 gives us the second column: [[2.2], [5.5]]
# together, we get the value at the first row, second column.
b[0, 1]
Out[21]: 2.2
We can also use slices to get, for instance, the entire second column.
In [29]: np.random.seed(42)
# let's create two 3x3 matrices filled with random integers in the range [0
x = np.random.randint(0, 10, size=(3, 3))
y = np.random.randint(0, 10, size=(3, 3))
In [30]: x
In [31]: y
In [32]: x + y # addition
In [33]: x * y # multiplication
In [34]: x - y # subtraction
In [35]: x / y # division
Out[35]: array([[1.5 , 1. , 1. ],
[0.57142857, 3. , 1.8 ],
[0.5 , 6. , 1. ]])
In [39]: x
In [38]: np.sum(x)
Out[38]: 50
In [41]: np.mean(x)
Out[41]: 5.555555555555555
We can even only sum across a single axis (e.g. if we want the sum of each row, we take the sum
across the columns (axis 1))
In [43]: g
In [44]: g.shape
Out[44]: (2, 3)
Out[48]: (3,)
[[3, 0, 0, 2],
[2, 2, 1, 3],
[3, 3, 3, 2]]])
In [52]: p.shape
Out[52]: (2, 3, 4)
Out[55]: (1, 3)
Linear Algebra
There are also linear algebra functions you can use.
Out[57]: 9.695359714832659
Out[58]: 105
In [59]: # multiply two matrices
np.matmul(x, y)
In [63]: g
In [64]: g.shape
Out[64]: (2, 3)
Broadcasting
The term broadcasting describes how numpy treats arrays with different shapes during arithmetic
operations.
Subject to certain constraints, the smaller array is “broadcast” across the larger array so that they
have compatible shapes.
This prevents making needless copies of data and usually leeds to efficient algorithm
implementations.
If these conditions are not met, a ValueError: operands could not be broadcast together exception
is thrown, indicating that the arrays have incompatible shapes.
The size of the resulting array is the size that is not 1 along each axis of the input
In [66]: # The simplest broadcasting example occurs when an array
# and a scalar value are combined in an operation:
y = np.ones(5) #[1,1,1,1,1]
z = np.ones((3,4)) #([[1., 1., 1., 1.], [1., 1., 1., 1.], [1., 1., 1., 1.]
In [68]: x.shape
Out[68]: (4,)
In [69]: y.shape
Out[69]: (5,)
In [70]: x + y
#Unable to broadcacast
-------------------------------------------------------------------------
--
ValueError Traceback (most recent call las
t)
<ipython-input-70-dbfbbd7bce8d> in <module>
----> 1 x + y
2
3 #Unable to broadcacast
In [71]: xp = np.arange(5)
In [72]: xp
In [78]: xp.dtype
Out[78]: dtype('int64')
In [79]: y.dtype
Out[79]: dtype('float64')
In [73]: xp + y
If we check the broadcasting rules, we'll see numpy tries to match 4 with 5 . They're not equal,
and neither of them are 1, so we get an error. How can we fix this?
In [75]: xx = x.reshape(4, 1)
In [76]: xx.shape
Out[76]: (4, 1)
In [77]: y.shape
Out[77]: (5,)
Now, what comparisons is numpy making? Remember, it "starts with the trailing dimensions and
works its way forward".
4, 1, <- shape of xx
5, <- shape of y
So what does numpy do? It basically clones the array across the size-1 axis 5 times, to give us an
array with shape (4, 5) which also aligns with our y (shape (5,) )
In [81]: xx
Out[81]: array([[0],
[1],
[2],
[3]])
Out[95]: (1, 3)
Out[94]: (1, 3)
In [82]: y
In [ ]: [[0 0 0 0 0],
[1 1 1 1 1],
[2 2 2 2 2],
[3 3 3 3 3]]
[[ 1 1 1 1 1],
[ 1 1 1 1 1],
...]
In [80]: xx + y
In [83]: (x + z).shape
Out[83]: (3, 4)
In [84]: x+z
In [87]: xpp.shape
Out[87]: (4, 1, 1)
In [ ]: (4, 1, 5)
In [90]: y.shape
(4, 1, 5)
Out[90]: (5,)
In [88]: xpp + y
Out[89]: (4, 1, 5)
Broadcasting is tricky, and there is room for bugs! If you have buggy numpy code, a safe bet is to: