Hello
Python!
I N T R O D U C T I O N TO P
YTHON
Hugo Bowne-Anderson
Data Scientist at DataCamp
Variables and
Types
I N T R O D U C T I O N TO PYTHON
Variabl
e Specific, case-sensitive name
Call up value through variable name
1.79 m - 68.7 kg
height = 1.79
weight = 68.7
height
1.79
Calculate
BMI
height = 1.79 68.7 / 1.79 ** 2
weight = 68.7
height
21.4413
1.79
weight / height ** 2
weight
BMI = height 2 21.4413
bmi = weight / height ** 2
bmi
21.4413
Reproducibilit
yheight = 1.79
weight = 68.7
bmi = weight / height ** 2
print(bmi)
21.4413
Reproducibilit
yheight = 1.79
weight = 74.2 # <-
bmi = weight / height ** 2
print(bmi)
23.1578
Python
Types
type(bmi)
float
day_of_week = 5
type(day_of_week)
int
Python Types
(2)
x = "body mass index"
y = 'this works
too' type(y)
str
z = True
type(z)
bool
Python Types
(3)
2 + 3
'ab' + 'cd'
'abcd'
Different
type =
different
behavior!
Python
Lists
I N T R O D U C T I O N TO P
YTHON
Python Data
Types
float - real numbers
int - integer
numbers str -
string, text
bool - True, False
height = 1.73
tall = True
Each
variable
represents
single value
Problem
Data Science: many data
points Height of entire family
height1 = 1.73
height2 = 1.68
height3 = 1.71
height4 = 1.89
Inc onvenien
t
Python
List
[a, b, c]
[1.73, 1.68, 1.71, 1.89]
[1.73, 1.68, 1.71, 1.89]
fam = [1.73, 1.68, 1.71, 1.89]
fam
[1.73, 1.68, 1.71, 1.89]
Name a collection of values
Contain any type
Contain different types
Python
List
[a, b, c]
fam = ["liz", 1.73, "emma", 1.68, "mom", 1.71, "dad", 1.89]
fam
['liz', 1.73, 'emma', 1.68, 'mom', 1.71, 'dad', 1.89]
fam2 = [["liz", 1.73],
["emma", 1.68],
["mom", 1.71],
["dad", 1.89]]
fam2
[['liz', 1.73], ['emma', 1.68], ['mom', 1.71], ['dad', 1.89]]
List
type
type(fam)
list
type(fam2)
list
Specific functionality
Specific behavior
Subset ing
lists
fam = ["liz", 1.73, "emma", 1.68, "mom", 1.71, "dad", 1.89]
fam
['liz', 1.73, 'emma', 1.68, 'mom', 1.71, 'dad', 1.89]
fam[3]
1.68
Subset ing
lists
['liz', 1.73, 'emma', 1.68, 'mom', 1.71, 'dad', 1.89]
fam[6]
'dad'
fam[-1]
1.89
fam[7]
1.89
Subset ing
lists
['liz', 1.73, 'emma', 1.68, 'mom', 1.71, 'dad', 1.89]
fam[6]
'dad'
fam[-1] # <-
1.89
fam[7] # <-
1.89
List
slicing
fam
['liz', 1.73, 'emma', 1.68, 'mom', 1.71, 'dad', 1.89]
fam[3:5]
[1.68, 'mom']
fam[1:4]
[1.73, 'emma', 1.68]
List
slicing
fam
['liz', 1.73, 'emma', 1.68, 'mom', 1.71, 'dad', 1.89]
fam[:4]
['liz', 1.73, 'emma', 1.68]
fam[5:]
[1.71, 'dad', 1.89]
Manipulating
Lists
I N T R O D U C T I O N TO PYTHON
List
Manipulation
Change list elements
Add list elements
Remove list elements
Changing list
elements
fam = ["liz", 1.73, "emma", 1.68, "mom", 1.71, "dad",
1.89] fam
['liz', 1.73, 'emma', 1.68, 'mom', 1.71, 'dad', 1.89]
fam[7] = 1.86
fam
['liz', 1.73, 'emma', 1.68, 'mom', 1.71, 'dad', 1.86]
fam[0:2] = ["lisa",
1.74] fam
['lisa', 1.74, 'emma', 1.68, 'mom', 1.71, 'dad', 1.86]
Adding and removing
elements
fam + ["me", 1.79]
['lisa', 1.74,'emma', 1.68, 'mom', 1.71, 'dad', 1.86, 'me', 1.79]
fam_ext = fam + ["me", 1.79]
del(fam[2])
fam
['lisa', 1.74, 1.68, 'mom', 1.71, 'dad', 1.86]
Behind the scenes
(1)
x = ["a", "b", "c"]
Behind the scenes
(1)
x = ["a", "b", "c"]
y = x
y[1] = "z"
y
['a', 'z', 'c']
['a', 'z', 'c']
Behind the scenes
(1)
x = ["a", "b", "c"]
y = x
y[1] = "z"
y
['a', 'z', 'c']
['a', 'z', 'c']
Behind the scenes
(1)
x = ["a", "b", "c"]
y = x
y[1] = "z"
y
['a', 'z', 'c']
['a', 'z', 'c']
Behind the scenes
(2)
x = ["a", "b", "c"]
Behind the scenes
(2)
x = ["a", "b", "c"]
y = list(x)
y = x[:]
Behind the scenes
(2)
x = ["a", "b", "c"]
y = list(x)
y = x[:]
y[1] = "z"
x
['a', 'b', 'c']
INTRODUCTION TO PYTHON
Functions
I N T R O D U C T I O N TO P
YTHON
Function
s Nothing new!
type()
Piece of reusable
code Solves
particular task
Call function instead
of writing code
yourself
Exampl
e
fam = [1.73, 1.68, 1.71, 1.89]
fam
[1.73, 1.68, 1.71, 1.89]
max(fam)
1.89
Exampl
e
fam = [1.73, 1.68, 1.71, 1.89]
fam
[1.73, 1.68, 1.71, 1.89]
max(fam)
1.89
Exampl
e
fam = [1.73, 1.68, 1.71, 1.89]
fam
[1.73, 1.68, 1.71, 1.89]
max(fam)
1.89
Exampl
e
fam = [1.73, 1.68, 1.71, 1.89]
fam
[1.73, 1.68, 1.71, 1.89]
max(fam)
1.89
tallest = max(fam)
tallest
1.89
round(
)
round(1.68, 1)
1.7
round(1.68)
help(round) # Open up documentation
Help on built-in function round in module builtins:
round(number, ndigits=None)
Round a number to a given precision in decimal digits.
The return value is an integer if ndigits is omitted or None.
Otherwise the return value has the same type as the number. ndigits may be negative.
round(
)
help(round)
Help on built-in function round in module builtins:
round(number, ndigits=None)
Round a number to a given precision in decimal digits.
The return value is an integer if ndigits is omitted or None.
Otherwise the return value has the same type as the number. ndigits may be negative.
round(
)
help(round)
Help on built-in function round in module builtins:
round(number, ndigits=None)
Round a number to a given precision in decimal digits.
The return value is an integer if ndigits is omitted or None.
Otherwise the return value has the same type as the number. ndigits may be negative.
round(
)
help(round)
Help on built-in function round in module builtins:
round(number, ndigits=None)
Round a number to a given precision in decimal digits.
The return value is an integer if ndigits is omitted or None.
Otherwise the return value has the same type as the number. ndigits may be negative.
round(
)
help(round)
Help on built-in function round in module builtins:
round(number, ndigits=None)
Round a number to a given precision in decimal digits.
The return value is an integer if ndigits is omitted or None.
Otherwise the return value has the same type as the number. ndigits may be negative.
round(
)
help(round)
Help on built-in function round in module builtins:
round(number, ndigits=None)
Round a number to a given precision in decimal digits.
The return value is an integer if ndigits is omitted or None.
Otherwise the return value has the same type as the number. ndigits may be negative.
round(
)
help(round)
Help on built-in function round in module builtins:
round(number, ndigits=None)
Round a number to a given precision in decimal digits.
The return value is an integer if ndigits is omitted or None.
Otherwise the return value has the same type as the number. ndigits may be negative.
round(
)
help(round)
Help on built-in function round in module builtins:
round(number, ndigits=None)
Round a number to a given precision in decimal digits.
The return value is an integer if ndigits is omitted or None.
Otherwise the return value has the same type as the number. ndigits may be negative.
round(
)
help(round)
Help on built-in function round in module builtins:
round(number, ndigits=None)
Round a number to a given precision in decimal digits.
The return value is an integer if ndigits is omitted or None.
Otherwise the return value has the same type as the number. ndigits may be negative.
round(
)
help(round)
Help on built-in function round in module builtins:
round(number, ndigits=None)
Round a number to a given precision in decimal digits.
The return value is an integer if ndigits is omitted or None.
Otherwise the return value has the same type as the number. ndigits may be negative.
round(
)
help(round)
Help on built-in function round in module builtins:
round(number, ndigits=None)
Round a number to a given precision in decimal digits.
The return value is an integer if ndigits is omitted or None.
Otherwise the return value has the same type as the number. ndigits may be negative.
round(
)
help(round)
Help on built-in function round in module builtins:
round(number, ndigits=None)
Round a number to a given precision in decimal digits.
The return value is an integer if ndigits is omitted or None.
Otherwise the return value has the same type as the number. ndigits may be negative.
round(
)
help(round)
Help on built-in function round in module builtins:
round(number, ndigits=None)
Round a number to a given precision in decimal digits.
The return value is an integer if ndigits is omitted or None.
Otherwise the return value has the same type as the number. ndigits may be negative.
round(
)
help(round)
Help on built-in function round in module builtins:
round(number, ndigits=None)
Round a number to a given precision in decimal digits.
The return value is an integer if ndigits is omitted or None.
Otherwise the return value has the same type as the number. ndigits may be negative.
round(number)
round(number, ndigits)
Find
functions
How to know?
Standard task -> probably function
exists! The internet is your friend
Methods
I N T R O D U C T I O N TO P
YTHON
Built-in
Functions
Maximum of list: max()
Length of list or string:
len() Get index in list: ?
Reversing a list: ?
Back 2
Basics
sister = "liz"
height = 1.73
fam = ["liz", 1.73, "emma", 1.68,
"mom", 1.71, "dad", 1.89]
Back 2
Basics
sister = "liz"
height = 1.73
fam = ["liz", 1.73, "emma", 1.68,
"mom", 1.71, "dad", 1.89]
Methods: Functions
that belong to objects
Back 2
Basics
sister = "liz"
height = 1.73
fam = ["liz", 1.73, "emma", 1.68,
"mom", 1.71, "dad", 1.89]
Methods: Functions
that belong to objects
list
methods
fam
['liz', 1.73, 'emma', 1.68, 'mom', 1.71, 'dad', 1.89]
fam.index("mom") # "Call method index() on fam"
fam.count(1.73)
1
str
methods
sister
'liz'
sister.capitalize()
'Liz'
sister.replace("z", "sa")
'lisa'
Method
s Everything = object
Object have methods associated, depending on type
sister.replace("z", "sa")
'lisa'
fam.replace("mom", "mommy")
AttributeError: 'list' object has no attribute 'replace'
Method
s
sister.index("z")
fam.index("mom")
4
Methods
(2)
fam
['liz', 1.73, 'emma', 1.68, 'mom', 1.71, 'dad', 1.89]
fam.append("me"
) fam
['liz', 1.73, 'emma', 1.68, 'mom', 1.71, 'dad', 1.89, 'me']
fam.append(1.79
) fam
['liz', 1.73, 'emma', 1.68, 'mom', 1.71, 'dad', 1.89, 'me', 1.79]
Summar
yunctions
F
type(fam)
list
Methods: call functions on
objects
fam.index("dad")
6
Packages
I N T R O D U C T I O N TO P
YTHON
Motivatio
nFunctions and methods are
powerful
All code in Python distribution?
Huge code base: messy
Lots of code you won’t
use Maintenance problem
Package
s Directory of Python Scripts
Eac h sc ript = module
Specify functions, methods,
types
Thousands of
packages available
NumPy
Matplotlib
sc ikit-learn
Install
package
http://pip.readthedocs.org/en/stable/installing/
Download get-pip.py
Terminal:
python3 get-pip.py
pip3 install numpy
Import
package
import numpy import numpy as np
array([1, 2, 3]) np.array([1, 2, 3])
NameError: name 'array' is not defined array([1, 2, 3])
numpy.array([1, 2, 3]) from numpy import array
array([1, 2, 3])
array([1, 2, 3])
array([1, 2, 3])
from numpy import
array
my_script.py
from numpy import array
fam = ["liz", 1.73, "emma", 1.68,
"mom", 1.71, "dad", 1.89]
...
fam_ext = fam + ["me", 1.79]
...
print(str(len(fam_ext)) + "
elements in fam_ext")
...
np_fam = array(fam_ext)
Using NumPy, but not very clear
import
numpy
import numpy as np
fam = ["liz", 1.73, "emma", 1.68,
"mom", 1.71, "dad", 1.89]
...
fam_ext = fam + ["me", 1.79]
...
print(str(len(fam_ext)) + "
elements in fam_ext")
...
np_fam = np.array(fam_ext) #
Clearly using NumPy
NumPy
I N T R O D U C T I O N TO P
YTHON
Lists
Recap
Powerful
Collection of values
Hold different types
Change, add, remove
Need for Data Science
Mathematical
operations over
collections
Speed
Illustratio
n
height = [1.73, 1.68, 1.71, 1.89,
1.79]
height
[1.73, 1.68, 1.71, 1.89, 1.79]
weight = [65.4, 59.2, 63.6, 88.4,
68.7]
weight
[65.4, 59.2, 63.6, 88.4, 68.7]
weight / height ** 2
TypeError: unsupported operand type(s) for ** or pow(): 'list' and
'int'
Solution:
NumPy
Numeric Python
Alternative to Python List: NumPy
Array Calculations over entire arrays
Easy and Fast
Installation
In the terminal: pip3 install
numpy
NumP
y
import numpy as np
np_height = np.array(height)
np_height
array([1.73, 1.68, 1.71, 1.89, 1.79])
np_weight = np.array(weight)
np_weight
array([65.4, 59.2, 63.6, 88.4, 68.7])
bmi = np_weight / np_height ** 2
bmi
array([21.85171573, 20.97505669, 21.75028214, 24.7473475 , 21.44127836])
Compariso
n
height = [1.73, 1.68, 1.71, 1.89, 1.79]
weight = [65.4, 59.2, 63.6, 88.4, 68.7]
weight / height ** 2
TypeError: unsupported operand type(s) for ** or pow(): 'list' and 'int'
np_height = np.array(height)
np_weight = np.array(weight)
np_weight / np_height ** 2
array([21.85171573, 20.97505669, 21.75028214, 24.7473475 , 21.44127836])
NumPy:
remarks
np.array([1.0, "is", True])
array(['1.0', 'is', 'True'], dtype='<U32')
NumPy arrays: contain only one type
NumPy:
remarks
python_list = [1, 2, 3]
numpy_array = np.array([1, 2, 3])
python_list + python_list
[1, 2, 3, 1, 2, 3]
numpy_array + numpy_array
array([2, 4, 6])
Different types: different behavior!
NumPy Subset
ing
bmi
array([21.85171573, 20.97505669, 21.75028214, 24.7473475 , 21.44127836])
bmi[1]
20.975
bmi > 23
array([False, False, False, True, False])
bmi[bmi > 23]
array([24.7473475])
2D NumPy Ar
ays
I N T R O D U C T I O N TO PYTHON
Type of NumPy
Arrays
import numpy as np
np_height = np.array([1.73, 1.68, 1.71, 1.89, 1.79])
np_weight = np.array([65.4, 59.2, 63.6, 88.4, 68.7])
type(np_height)
numpy.ndarray
type(np_weight)
numpy.ndarray
2D NumPy
Arrays
np_2d = np.array([[1.73, 1.68, 1.71, 1.89, 1.79],
[65.4, 59.2, 63.6, 88.4, 68.7]])
np_2d
array([[ 1.73, 1.68, 1.71, 1.89, 1.79],
[65.4 , 59.2 , 63.6 , 88.4 , 68.7 ]])
np_2d.shape
(2, 5) # 2 rows, 5 columns
np.array([[1.73, 1.68, 1.71, 1.89, 1.79],
[65.4, 59.2, 63.6, 88.4, "68.7"]])
array([['1.73', '1.68', '1.71', '1.89', '1.79'],
['65.4', '59.2', '63.6', '88.4', '68.7']], dtype='<U32')
Subset
ing 0 1 2 3 4
array([[ 1.73, 1.68, 1.71, 1.89, 1.79], 0
[ 65.4, 59.2, 63.6, 88.4, 68.7]]) 1
np_2d[0]
array([1.73, 1.68, 1.71, 1.89, 1.79])
Subset
ing 0 1 2 3 4
array([[ 1.73, 1.68, 1.71, 1.89, 1.79], 0
[ 65.4, 59.2, 63.6, 88.4, 68.7]]) 1
np_2d[0][2]
1.71
np_2d[0, 2]
1.71
Subset
ing 0 1 2 3 4
array([[ 1.68, 1.71, 1.89, 1.79], 0
1.73,
[ 59.2, 63.6, 88.4, 68.7]]) 1
65.4,
np_2d[:, 1:3]
array([[ 1.68, 1.71],
[59.2 , 63.6 ]])
np_2d[1, :]
array([65.4, 59.2, 63.6, 88.4, 68.7])
NumPy:
Basic
Statistics
I N T R O D U C T I O N TO PY
THON
Data
analysis
Get to know your data
Little data -> simply look at
it Big data -> ?
City-wide
survey
import numpy as np
np_city = ... # Implementation left
out np_city
array([[1.64, 71.78],
[1.37, 63.35],
[1.6 , 55.09],
...,
[2.04, 74.85],
[2.04, 68.72],
[2.01, 73.57]])
NumP
ynp.mean(np_city[:, 0])
1.7472
np.median(np_city[:, 0])
1.75
NumP
ynp.corrcoef(np_city[:, 0], np_city[:,
1])
array([[ 1. , -0.01802],
[-0.01803, 1.
]])
np.std(np_city[:, 0])
0.1992
sum(), sort(), ...
Enforce single data type: speed!
Generate
data
Arguments for np.random.normal()
distribution mean
distribution standard deviation
number of samples
height =
np.round(np.random.normal(1.75,
0.20,
5000), 2)
weight = np.round(np.random.normal(60.32, 15, 5000), 2)
np_city = np.column_stack((height, weight))