Python - Week 1 PDF
Python - Week 1 PDF
You are currently looking at version 1.1 of this notebook. To download notebooks and datafiles, as well as get
help on Jupyter notebooks in the Coursera platform, visit the Jupyter Notebook FAQ
(https://www.coursera.org/learn/python-data-analysis/resources/0dhYG) course resource.
add_numbers is a function that takes two numbers and adds them together.
In [1]:
add_numbers(1, 2)
Out[1]:
add_numbers updated to take an optional 3rd parameter. Using print allows printing of multiple expressions
within a single cell.
In [2]:
def add_numbers(x,y,z=None):
if (z==None):
return x+y
else:
return x+y+z
print(add_numbers(1, 2))
print(add_numbers(1, 2, 3))
3
6
In [4]:
Flag is true!
6
In [6]:
def add_numbers(x,y):
return x+y
a = add_numbers
a(1,2)
Out[6]:
In [7]:
type('This is a string')
Out[7]:
str
In [8]:
type(None)
Out[8]:
NoneType
In [9]:
type(1)
Out[9]:
int
In [10]:
type(1.0)
Out[10]:
float
In [11]:
type(add_numbers)
Out[11]:
function
In [1]:
Out[1]:
tuple
In [3]:
Out[3]:
list
In [4]:
x.append(3.3)
print(x)
In [15]:
for item in x:
print(item)
1
a
2
b
3.3
In [5]:
i=0
while( i != len(x) ):
print(x[i])
i = i + 1
1
a
2
b
3.3
In [17]:
[1,2] + [3,4]
Out[17]:
[1, 2, 3, 4]
In [18]:
[1]*3
Out[18]:
[1, 1, 1]
In [19]:
1 in [1, 2, 3]
Out[19]:
True
In [20]:
x = 'This is a string'
print(x[0]) #first character
print(x[0:1]) #first character, but we have explicitly set the end character
print(x[0:2]) #first two characters
T
T
Th
In [21]:
x[-1]
Out[21]:
'g'
This will return the slice starting from the 4th element from the end and stopping before the 2nd element from
the end.
In [22]:
x[-4:-2]
Out[22]:
'ri'
This is a slice from the beginning of the string and stopping before the 3rd element.
In [23]:
x[:3]
Out[23]:
'Thi'
And this is a slice starting from the 4th element of the string and going all the way to the end.
https://vkxxljrcgrcikgyhgouxyg.coursera-apps.org/notebooks/Week 1.ipynb 5/28
6/25/2020 Week 1
g g g g y
In [24]:
x[3:]
Out[24]:
's is a string'
In [25]:
firstname = 'Christopher'
lastname = 'Brooks'
Christopher Brooks
ChristopherChristopherChristopher
True
split returns a list of all the words in a string, or a list split on a specific character.
In [26]:
firstname = 'Christopher Arthur Hansen Brooks'.split(' ')[0] # [0] selects the first elemen
lastname = 'Christopher Arthur Hansen Brooks'.split(' ')[-1] # [-1] selects the last elemen
print(firstname)
print(lastname)
Christopher
Brooks
In [27]:
'Chris' + 2
---------------------------------------------------------------------------
TypeError Traceback (most recent call last)
<ipython-input-27-1623ac76de6e> in <module>()
----> 1 'Chris' + 2
In [28]:
'Chris' + str(2)
Out[28]:
'Chris2'
In [29]:
Out[29]:
'brooksch@umich.edu'
In [33]:
In [32]:
for name in x:
print(x[name])
brooksch@umich.edu
billg@microsoft.com
None
In [34]:
brooksch@umich.edu
billg@microsoft.com
None
In [35]:
Christopher Brooks
brooksch@umich.edu
Bill Gates
billg@microsoft.com
Kevyn Collins-Thompson
None
In [36]:
In [37]:
fname
Out[37]:
'Christopher'
In [38]:
lname
Out[38]:
'Brooks'
Make sure the number of values you are unpacking matches the number of variables being assigned.
In [39]:
---------------------------------------------------------------------------
ValueError Traceback (most recent call last)
<ipython-input-39-9ce70064f53e> in <module>()
1 x = ('Christopher', 'Brooks', 'brooksch@umich.edu', 'Ann Arbor')
----> 2 fname, lname, email = x
print('Chris' + 2)
---------------------------------------------------------------------------
TypeError Traceback (most recent call last)
<ipython-input-40-82ccfdd3d5d3> in <module>()
----> 1 print('Chris' + 2)
In [41]:
print('Chris' + str(2))
Chris2
In [43]:
sales_record = {
'price': 3.24,
'num_items': 4,
'person': 'Chris'}
print(sales_statement.format(sales_record['person'],
sales_record['num_items'],
sales_record['price'],
sales_record['num_items']*sales_record['price']))
Let's import our datafile mpg.csv, which contains fuel economy data for 234 cars.
In [7]:
import csv
%precision 2
Out[7]:
[OrderedDict([('', '1'),
('manufacturer', 'audi'),
('model', 'a4'),
('displ', '1.8'),
('year', '1999'),
('cyl', '4'),
('trans', 'auto(l5)'),
('drv', 'f'),
('cty', '18'),
('hwy', '29'),
('fl', 'p'),
('class', 'compact')]),
OrderedDict([('', '2'),
('manufacturer', 'audi'),
('model', 'a4'),
('displ', '1.8'),
('year', '1999'),
('cyl', '4'),
csv.Dictreader has read in each row of our csv file as a dictionary. len shows that our list is comprised of
234 dictionaries.
In [8]:
len(mpg)
Out[8]:
234
In [9]:
mpg[0].keys()
Out[9]:
odict_keys(['', 'manufacturer', 'model', 'displ', 'year', 'cyl', 'trans', 'd
rv', 'cty', 'hwy', 'fl', 'class'])
This is how to find the average cty fuel economy across all cars. All values in the dictionaries are strings, so we
need to convert to float.
In [11]:
Out[11]:
3945.00
Similarly this is how to find the average hwy fuel economy across all cars.
In [ ]:
Use set to return the unique values for the number of cylinders the cars in our dataset have.
In [13]:
Out[13]:
{'4', '5', '6', '8'}
Here's a more complex example where we are grouping the cars by number of cylinder, and finding the average
cty mpg for each group.
In [2]:
CtyMpgByCyl = []
CtyMpgByCyl.sort(key=lambda x: x[0])
CtyMpgByCyl
---------------------------------------------------------------------------
NameError Traceback (most recent call last)
<ipython-input-2-6a38e09e7d36> in <module>()
1 CtyMpgByCyl = []
2
----> 3 for c in cylinders: # iterate over all the cylinder levels
4 summpg = 0
5 cyltypecount = 0
Use set to return the unique values for the class types in our dataset.
In [15]:
Out[15]:
{'2seater', 'compact', 'midsize', 'minivan', 'pickup', 'subcompact', 'suv'}
And here's an example of how to find the average hwy mpg for each class of vehicle in our dataset.
In [3]:
HwyMpgByClass = []
HwyMpgByClass.sort(key=lambda x: x[1])
HwyMpgByClass
---------------------------------------------------------------------------
NameError Traceback (most recent call last)
<ipython-input-3-189fb3d201ff> in <module>()
1 HwyMpgByClass = []
2
----> 3 for t in vehicleclass: # iterate over all the vehicle classes
4 summpg = 0
5 vclasscount = 0
import datetime as dt
import time as tm
time returns the current time in seconds since the Epoch. (January 1st, 1970)
In [9]:
tm.time()
Out[9]:
1573149245.2140586
In [10]:
dtnow = dt.datetime.fromtimestamp(tm.time())
dtnow
Out[10]:
datetime.datetime(2019, 11, 7, 17, 54, 42, 650901)
In [14]:
Out[14]:
(2019, 11, 7, 17, 54, 42)
In [15]:
Out[15]:
datetime.timedelta(100)
In [16]:
today = dt.date.today()
In [17]:
Out[17]:
datetime.date(2019, 7, 30)
In [18]:
Out[18]:
True
In [19]:
class Person:
department = 'School of Information' #a class variable
In [20]:
person = Person()
person.set_name('Christopher Brooks')
person.set_location('Ann Arbor, MI, USA')
print('{} live in {} and works in the department {}'.format(person.name, person.location, p
Christopher Brooks live in Ann Arbor, MI, USA and works in the department Sc
hool of Information
In [21]:
Out[21]:
<map at 0x7f79f46617f0>
Now let's iterate through the map object to see the values.
In [22]:
9.0
11.0
12.34
2.01
Here's an example of lambda that takes in three parameters and adds the first two.
In [23]:
my_function = lambda a, b, c : a + b
In [24]:
my_function(1, 2, 3)
Out[24]:
3
In [25]:
my_list = []
for number in range(0, 1000):
if number % 2 == 0:
my_list.append(number)
my_list
Out[25]:
[0,
2,
4,
6,
8,
10,
12,
14,
16,
18,
20,
22,
24,
26,
28,
30,
32,
34,
In [26]:
Out[26]:
[0,
2,
4,
6,
8,
10,
12,
14,
16,
18,
20,
22,
24,
26,
28,
30,
32,
34,
import numpy as np
Creating Arrays
In [28]:
mylist = [1, 2, 3]
x = np.array(mylist)
x
Out[28]:
array([1, 2, 3])
In [29]:
y = np.array([4, 5, 6])
y
Out[29]:
array([4, 5, 6])
In [30]:
Out[30]:
array([[ 7, 8, 9],
[10, 11, 12]])
Use the shape method to find the dimensions of the array. (rows, columns)
In [31]:
m.shape
Out[31]:
(2, 3)
In [33]:
Out[33]:
array([ 0, 2, 4, 6, 8, 10, 12, 14, 16, 18, 20, 22, 24, 26, 28])
reshape returns an array with the same data with a new shape.
In [34]:
Out[34]:
array([[ 0, 2, 4, 6, 8],
[10, 12, 14, 16, 18],
[20, 22, 24, 26, 28]])
In [35]:
Out[35]:
array([ 0. , 0.5, 1. , 1.5, 2. , 2.5, 3. , 3.5, 4. ])
In [36]:
o.resize(3, 3)
o
Out[36]:
array([[ 0. , 0.5, 1. ],
[ 1.5, 2. , 2.5],
[ 3. , 3.5, 4. ]])
ones returns a new array of given shape and type, filled with ones.
In [37]:
np.ones((3, 2))
Out[37]:
array([[ 1., 1.],
[ 1., 1.],
[ 1., 1.]])
zeros returns a new array of given shape and type, filled with zeros.
In [38]:
np.zeros((2, 3))
Out[38]:
eye returns a 2-D array with ones on the diagonal and zeros elsewhere.
In [39]:
np.eye(3)
Out[39]:
array([[ 1., 0., 0.],
[ 0., 1., 0.],
[ 0., 0., 1.]])
In [40]:
np.diag(y)
Out[40]:
array([[4, 0, 0],
[0, 5, 0],
[0, 0, 6]])
In [41]:
np.array([1, 2, 3] * 3)
Out[41]:
array([1, 2, 3, 1, 2, 3, 1, 2, 3])
In [42]:
np.repeat([1, 2, 3], 3)
Out[42]:
array([1, 1, 1, 2, 2, 2, 3, 3, 3])
Combining Arrays
In [43]:
Out[43]:
array([[1, 1, 1],
[1, 1, 1]])
In [44]:
np.vstack([p, 2*p])
Out[44]:
array([[1, 1, 1],
[1, 1, 1],
[2, 2, 2],
[2, 2, 2]])
In [45]:
np.hstack([p, 2*p])
Out[45]:
array([[1, 1, 1, 2, 2, 2],
[1, 1, 1, 2, 2, 2]])
Operations
Use +, -, *, / and ** to perform element wise addition, subtraction, multiplication, division and power.
In [46]:
[5 7 9]
[-3 -3 -3]
In [47]:
[ 4 10 18]
[ 0.25 0.4 0.5 ]
In [48]:
[1 4 9]
Dot Product:
[]
y1
[ x1 x2 x3 ] ⋅ y2 = x 1y 1 + x 2y 2 + x 3y 3
y3
In [49]:
Out[49]:
32
In [50]:
z = np.array([y, y**2])
print(len(z)) # number of rows of array
Let's look at transposing arrays. Transposing permutes the dimensions of the array.
In [51]:
z = np.array([y, y**2])
z
Out[51]:
array([[ 4, 5, 6],
[16, 25, 36]])
In [52]:
z.shape
Out[52]:
(2, 3)
In [53]:
z.T
Out[53]:
array([[ 4, 16],
[ 5, 25],
[ 6, 36]])
In [54]:
z.T.shape
Out[54]:
(3, 2)
Use .dtype to see the data type of the elements in the array.
In [55]:
z.dtype
Out[55]:
dtype('int64')
In [56]:
z = z.astype('f')
z.dtype
Out[56]:
dtype('float32')
Math Functions
Numpy has many built in math functions that can be performed on arrays.
In [ ]:
In [ ]:
a.sum()
In [ ]:
a.max()
In [ ]:
a.min()
In [ ]:
a.mean()
In [ ]:
a.std()
argmax and argmin return the index of the maximum and minimum values in the array.
In [ ]:
a.argmax()
In [ ]:
a.argmin()
Indexing / Slicing
In [ ]:
s = np.arange(13)**2
s
Use bracket notation to get the value at a specific index. Remember that indexing starts at 0.
In [ ]:
Leaving start or stop empty will default to the beginning/end of the array.
In [ ]:
s[1:5]
In [ ]:
s[-4:]
Here we are starting 5th element from the end, and counting backwards by 2 until the beginning of the array is
reached.
In [ ]:
s[-5::-2]
In [ ]:
r = np.arange(36)
r.resize((6, 6))
r
In [ ]:
r[2, 2]
In [ ]:
r[3, 3:6]
Here we are selecting all the rows up to (and not including) row 2, and all the columns up to (and not including)
the last column.
In [ ]:
r[:2, :-1]
This is a slice of the last row, and only every other element.
In [ ]:
r[-1, ::2]
We can also perform conditional indexing. Here we are selecting values from the array that are greater than 30.
(Also see np.where)
In [ ]:
Here we are assigning all values in the array that are greater than 30 to the value of 30.
In [ ]:
Copying Data
r2 is a slice of r
In [ ]:
r2 = r[:3,:3]
r2
Set this slice's values to zero ([:] selects the entire array)
In [ ]:
r2[:] = 0
r2
In [ ]:
To avoid this, use r.copy to create a copy that will not affect the original array
In [ ]:
r_copy = r.copy()
r_copy
In [ ]:
r_copy[:] = 10
print(r_copy, '\n')
print(r)
In [57]:
Out[57]:
array([[3, 3, 2],
[4, 5, 9],
[3, 1, 2],
[3, 7, 6]])
Iterate by row:
In [58]:
[3 3 2]
[4 5 9]
[3 1 2]
[3 7 6]
Iterate by index:
In [59]:
for i in range(len(test)):
print(test[i])
[3 3 2]
[4 5 9]
[3 1 2]
[3 7 6]
In [60]:
row 0 is [3 3 2]
row 1 is [4 5 9]
row 2 is [3 1 2]
row 3 is [3 7 6]
In [61]:
test2 = test**2
test2
Out[61]:
array([[ 9, 9, 4],
[16, 25, 81],
[ 9, 1, 4],
[ 9, 49, 36]])
In [62]:
[3 3 2] + [9 9 4] = [12 12 6]
[4 5 9] + [16 25 81] = [20 30 90]
[3 1 2] + [9 1 4] = [12 2 6]
[3 7 6] + [ 9 49 36] = [12 56 42]
In [ ]:
In [ ]: