Ch4 Slides Python Learn
Ch4 Slides Python Learn
NumPy
Intro to Python for Data Science
Lists Recap
● Powerful
● Collection of values
● Hold different types
● Change, add, remove
● Need for Data Science
● Mathematical operations over collections
● Speed
Intro to Python for Data Science
Illustration
In [1]: height = [1.73, 1.68, 1.71, 1.89, 1.79]
In [2]: height
Out[2]: [1.73, 1.68, 1.71, 1.89, 1.79]
In [4]: weight
Out[4]: [65.4, 59.2, 63.6, 88.4, 68.7]
Solution: NumPy
● Numeric Python
● Alternative to Python List: NumPy Array
● Calculations over entire arrays
● Easy and Fast
● Installation
● In the terminal: pip3 install numpy
Intro to Python for Data Science
NumPy
In [6]: import numpy as np
In [8]: np_height
Out[8]: array([ 1.73, 1.68, 1.71, 1.89, 1.79])
In [10]: np_weight
Out[10]: array([ 65.4, 59.2, 63.6, 88.4, 68.7])
In [12]: bmi
Out[12]: array([ 21.852, 20.975, 21.75 , 24.747, 21.441])
Intro to Python for Data Science
NumPy
In [6]: import numpy as np Element-wise calculations
In [8]: np_height
Out[8]: array([ 1.73, 1.68, 1.71, 1.89, 1.79])
In [10]: np_weight
Out[10]: array([ 65.4, 59.2, 63.6, 88.4, 68.7])
In [12]: bmi
Out[12]: array([ 21.852, 20.975, 21.75 , 24.747, 21.441])
= 65.5/1.73 ** 2
Intro to Python for Data Science
Comparison
In [13]: height = [1.73, 1.68, 1.71, 1.89, 1.79]
NumPy: remarks
In [19]: np.array([1.0, "is", True])
Out[19]: NumPy arrays: contain only one type
array(['1.0', 'is', 'True'],
dtype='<U32')
NumPy Subse"ing
In [24]: bmi
Out[24]: array([ 21.852, 20.975, 21.75 , 24.747, 21.441])
In [25]: bmi[1]
Out[25]: 20.975
Let’s practice!
INTRO TO PYTHON FOR DATA SCIENCE
2D NumPy Arrays
Intro to Python for Data Science
In [4]: type(np_height)
Out[4]: numpy.ndarray
ndarray = N-dimensional array
In [5]: type(np_weight)
Out[5]: numpy.ndarray
Intro to Python for Data Science
2D NumPy Arrays
In [6]: np_2d = np.array([[1.73, 1.68, 1.71, 1.89, 1.79],
[65.4, 59.2, 63.6, 88.4, 68.7]])
In [7]: np_2d
Out[7]:
array([[ 1.73, 1.68, 1.71, 1.89, 1.79],
[ 65.4 , 59.2 , 63.6 , 88.4 , 68.7 ]])
In [8]: np_2d.shape
2 rows, 5 columns
Out[8]: (2, 5)
Subse"ing array([[
[
1.73,
65.4,
1.68,
59.2,
1.71,
63.6,
1.89,
88.4,
1.79], 0
68.7]]) 1
In [10]: np_2d[0]
Out[10]: array([ 1.73, 1.68, 1.71, 1.89, 1.79])
In [11]: np_2d[0][2]
Out[11]: 1.71
In [12]: np_2d[0,2]
Out[12]: 1.71
Intro to Python for Data Science
0 1 2 3 4
Subse"ing array([[
[
1.73,
65.4,
1.68,
59.2,
1.71,
63.6,
1.89,
88.4,
1.79], 0
68.7]]) 1
In [10]: np_2d[0]
Out[10]: array([ 1.73, 1.68, 1.71, 1.89, 1.79])
In [11]: np_2d[0][2]
Out[11]: 1.71
In [12]: np_2d[0,2]
Out[12]: 1.71
In [13]: np_2d[:,1:3]
Out[13]:
array([[ 1.68, 1.71],
[ 59.2 , 63.6 ]])
Intro to Python for Data Science
0 1 2 3 4
Subse"ing array([[
[
1.73,
65.4,
1.68,
59.2,
1.71,
63.6,
1.89,
88.4,
1.79], 0
68.7]]) 1
In [10]: np_2d[0]
Out[10]: array([ 1.73, 1.68, 1.71, 1.89, 1.79])
In [11]: np_2d[0][2]
Out[11]: 1.71
In [12]: np_2d[0,2]
Out[12]: 1.71
In [13]: np_2d[:,1:3]
Out[13]:
array([[ 1.68, 1.71],
[ 59.2 , 63.6 ]])
In [14]: np_2d[1,:]
Out[14]: array([ 65.4, 59.2, 63.6, 88.4, 68.7])
INTRO TO PYTHON FOR DATA SCIENCE
Let’s practice!
INTRO TO PYTHON FOR DATA SCIENCE
Data analysis
● Get to know your data
● Li"le data -> simply look at it
● Big data -> ?
Intro to Python for Data Science
City-wide survey
In [1]: import numpy as np
In [3]: np_city
Out[3]:
array([[ 1.64, 71.78],
[ 1.37, 63.35],
[ 1.6 , 55.09],
...,
[ 2.04, 74.85],
[ 2.04, 68.72],
[ 2.01, 73.57]])
Intro to Python for Data Science
NumPy
In [4]: np.mean(np_city[:,0])
Out[4]: 1.7472
In [5]: np.median(np_city[:,0])
Out[5]: 1.75
In [7]: np.std(np_city[:,0])
Out[7]: 0.1992
Generate data
distribution
distribution
number of
mean standard dev. samples
Let’s practice!