python-intro
October 9, 2019
1 Python For Data Science
Felix Biessmann
Lecture 1 - Introduction
2 Short intro
• Name
• Educational backgound (Physics/Computer Science/…?)
• Work experience
• Experience with Python? If yes, what project?
• I’m Felix Biessmann (ß = ss)
• BSc in Cognitive Science
– Computational Linguistics
– Neuroscience
• MSc in Neuroscience
– Brain-Computer-Interfaces
– Functional Magnetic Resonance Imaging data
• PhD in Machine Learning
– Biomedical Applications
– Multimodal neuroimaging data
– Text analysis on web data
• Assistant Professor for Machine Learning at Korea University, Seoul
• Amazon Research
– Computer Vision
– Recommender Systems
– Machine Learning Infrastructure
– ML for Data Quality
• Einstein Center for Digital Future / Beuth
– Data Quality
1
– ML for Humans
2.1 Useful Resources
• Jake Vanderplas: Whirlwind Tour of Python
• Jake Vanderplas: Python Data Science Handbook
• Wes McKinnery: Python for Data Analysis
• Andreas Mueller: Introduction to ML with Python
• Joel Grus: Data Science from Scratch
• Scikit-Learn Documentation
• stackoverflow
3 Why Python?
3.0.1 Popularity
Python is the fastest growing programming language
3.0.2 Popularity
https://stackoverflow.blog/2017/09/06/incredible-growth-python/
3.0.3 Popularity
According to a comprehensive stackoverflow survey Python is the most wanted language for the
second year in a row, meaning that it is the language that developers who do not yet use it most
often say they want to learn.
3.0.4 Popularity
• Simple
• Versatile
• No boilerplate code
• De facto standard for Data Science / Machine Learning
• All technical interviews I made at Amazon: Candidates chose Python
3.0.5 Libraries
• NumPy
• SciPy
• Matplotlib
• IPython
• Pandas
2
• sklearn
• tensorflow
• pytorch
• mxnet
• …
3.0.6 Time
Programmer time is more expensive than CPU time.
4 The Zen of Python
Python lovers are quick to point out how intuitive Python is.
But intution is strongly related to familiarity.
Many developers used to traditional (compiled, typed) languages won’t find Python intuitive.
If you want to dive deep into the philosophy behind Python, try this easter egg in a Python shell:
[1]: import this
The Zen of Python, by Tim Peters
Beautiful is better than ugly.
Explicit is better than implicit.
Simple is better than complex.
Complex is better than complicated.
Flat is better than nested.
Sparse is better than dense.
Readability counts.
Special cases aren't special enough to break the rules.
Although practicality beats purity.
Errors should never pass silently.
Unless explicitly silenced.
In the face of ambiguity, refuse the temptation to guess.
There should be one-- and preferably only one --obvious way to do it.
Although that way may not be obvious at first unless you're Dutch.
Now is better than never.
Although never is often better than *right* now.
If the implementation is hard to explain, it's a bad idea.
If the implementation is easy to explain, it may be a good idea.
Namespaces are one honking great idea -- let's do more of those!
4.1 How to Install Python
• Most systems come with (an out-dated version of) python pre-installed
3
• Download binaries for your system at https://www.python.org/downloads/
• You can use system specific package managers (apt for linux, homebrew for OSX)
• Or you use Anaconda (preferred if you don’t like fiddling with installations)
4.2 Python 2 vs 3
• Days of Python 2 are counted
• Use Python 3
• Most important difference:
• python 2:
print "foobar"
• python 3:
print("foobar")
5 Virtual Environments
• For different projects you will need different dependencies (or versions thereof)
• It is good practice to encapsulate your dependencies in a virtual environment
• Virtualenv (python documentation)
python3 -m venv [path_to_venv]
source [path_to_venv]/bin/activate
• Anaconda
conda create -n my_env python=3.7
conda activate my_env
6 How to Install Python Packages
To avoid cluttered dependencies, you might want to install package only in a virtual environment,
not system-wide.
• For plain python
– pip install [packagename]
• For Anaconda installations
– conda install [packagename]
4
7 How to Run Python Code
7.1 How to Run Python Code
• Shell command $ python [somefilename].py
• Interactively
• Python Interpreter $ python
• IPython Interpreter $ ipython
• Jupyter Notebooks $ jupyter notebook
• Make sure you use the right python/ipython/jupyter binaries!
We will discuss pros and cons after looking into each of those options
7.2 Setup Python in Room D132L
Open a terminal (e.g. by hitting cmd-space and type terminal and enter)
Create a new virtual environment and activate it
python3 -m venv venv
source venv/bin/activate
Upgrade the python package manager and install jupyter
pip install --upgrade pip
pip install prompt_toolkit==2.0.3
pip install jupyter
7.3 The Python Interpreter
Type python at the command prompt:
$ python
Python 3.5.1 |Continuum Analytics, Inc.| (default, Dec 7 2015, 11:24:55)
Type "help", "copyright", "credits" or "license" for more information.
>>>
You can type and execute code snippets:
>>> 1 + 1
2
>>> x = 5
>>> x * 3
15
You can run python code files:
exec(open('some_file.py').read())
5
7.4 The IPython Interpreter
The Python shell lacks many useful features of a modern shell
• tab-completion
• history (per default)
• help/documentation functionality
• plotting
IPython adds all that.
7.5 The IPython Interpreter
IPython is an enhanced python interpreter invented by Fernando Perez.
7.6 The IPython Interpreter
After installation ipython at the command prompt:
$ ipython
Python 3.5.1 |Continuum Analytics, Inc.| (default, Dec 7 2015, 11:24:55)
Type "copyright", "credits" or "license" for more information.
IPython 4.0.0 -- An enhanced Interactive Python.
? -> Introduction and overview of IPython's features.
%quickref -> Quick reference.
help -> Python's own help system.
object? -> Details about 'object', use 'object??' for extra details.
In [1]:
7.7 The IPython Interpreter
Visual difference: - Python uses >>> - IPython uses numbered commands (e.g. In [1]:)
ipython has the same functionalities as a standard python shell
In [1]: 1 + 1
Out[1]: 2
In [2]: x = 5
In [3]: x * 3
Out[3]: 15
… and many more
6
7.8 Some Nice IPython Functionalities
• help / documentation with ?
• pasting code
• timing code
• debugging code
• plotting inline
7.9 The helper/documentation function ?
In [8]: b = [1, 2, 3]
In [9]: b?
Type: list
String Form:[1, 2, 3]
Length: 3
Docstring:
list() -> new empty list
list(iterable) -> new list initialized from iterable's items
In [10]: print?
Docstring:
print(value, ..., sep=' ', end='\n', file=sys.stdout, flush=False)
Prints the values to a stream, or to sys.stdout by default.
Optional keyword arguments:
file: a file-like object (stream); defaults to the current sys.stdout.
sep: string inserted between values, default a space.
end: string appended after the last value, default a newline.
flush: whether to forcibly flush the stream.
Type: builtin_function_or_method
def add_numbers(a, b):
"""
Add two numbers together
Returns
-------
the_sum : type of arguments
"""
return a + b
In [11]: add_numbers?
Signature: add_numbers(a, b)
Docstring:
Add two numbers together
Returns
7
-------
the_sum : type of arguments
File: <ipython-input-9-6a548a216e27>
Type: function
import numpy as np
np.*load*?
np.__loader__
np.load
np.loads
np.loadtxt
np.pkgload
7.10 Pasting Code
In [17]: %paste
x = 5
y = 7
if x > 5:
x += 1
y = 8
## -- End pasted text --
7.11 Timing Code
[2]: %timeit x = 4 + 5
14.8 ns ± 0.206 ns per loop (mean ± std. dev. of 7 runs, 100000000 loops each)
7.12 Interactive Debugging
Some buggy function
def foo():
bar = [3] # bar is a list with one entry
return bar[5] # this tries to return the 6th entry, which will not work
In IPython shell:
In [15]: foo()
---------------------------------------------------------------------------
IndexError Traceback (most recent call last)
<ipython-input-186-624891b0d01a> in <module>()
----> 1 foo()
<ipython-input-185-48a597a45538> in foo()
8
1 def foo():
2 bar = [3] # bar is a list with one entry
----> 3 return bar[5] # this tries to return the 6th entry, which will not work
IndexError: list index out of range
In [16]: %debug
> <ipython-input-185-48a597a45538>(3)foo()
1 def foo():
2 bar = [3] # bar is a list with one entry
----> 3 return bar[5] # this tries to return the 6th entry, which will not work
ipdb> bar
[3]
7.13 Running Code
Some function in a file called ipython_script_test.py
print("Hello World!")
In IPython shell:
In [14]: %run ipython_script_test.py
Hello World!
7.14 The Jupyter Notebook
• made for Julia, python and R (hence Ju-py-teR)
• has all functionality of the IPython shell
• runs in browser (executed remotely or locally)
• simpler plotting
• sharing of analyses simpler (even dashboards)
• markup cells for docu
• popular (also in companies)
7.14.1 Starting a notebook
In your activated virtual/conda environment you can start jupyter with
jupyter notebook
and a browser window should open.
Click File and New Notebook
7.15 The Jupyter Notebook
• adds many useful features
9
• remote execution
• inline plotting
• interactive plots
• has many problems
• hidden state
• version control is difficult
• testing is difficult
• check out Joel Gru’s Talk
[3]: ### Inline Plotting
from matplotlib import pyplot as plt
import numpy as np
%matplotlib inline
plt.plot(np.sin( np.arange(-np.pi,np.pi,.1)));
7.16 Run Python scripts from shell
Open a file test.py and type:
# file: test.py
print("Running test.py")
10
x = 5
print("Result is", 3 * x)
Go to directory of file and type python filename in a shell:
$ python test.py
Running test.py
Result is 15
7.17 Pros and Cons of Execution Options
7.17.1 Python Shell
Pros - interactive development / debugging
Cons - very inconvenient
I only use it if there is no IPython installed (and there’s no internet.).
7.17.2 IPython Shell
Pros - interactive development / debugging - Convenience - Documentation - Debugging - Tab-
Completion - History
Cons - too convenient (for people too lazy to organize code) - hidden state
7.17.3 Jupyter Notebooks
Pros - all of IPython’s advantages - inline plotting - code and markdown cells allow better docu-
mentation
Cons - too convenient (for people too lazy to organize code) - hidden state - version control is
difficult
7.17.4 Python scripts
Pros - version control - clean reproducible execution (e.g. no hidden state) - precise timing
Cons - not interactive - debugging more difficult
8 Python Syntax
8.1 Comments Are Marked by #
# this is a comment and will not be interpreted
11
Python does not have multiline comments
But any decent editor will take care of that
8.2 End of line terminates a statement
Other languages require ; or similar things - Python doesn’t
If you really need to do multiline statements
• you can use a backslash \
x = 1 + 2 + 3 + 4 +\
5 + 6 + 7 + 8
• or you use grouping
y = [1,3,4
5,6,7]
8.3 ; also terminates a statement
Normally you would do
x = 1
y = 2
Alternatively:
x = 1; y = 2
8.4 Intendation!
Whitespace Matters in Python
In Python code is grouped by intendation:
for i in range(10):
if i < 5:
print("Lower")
else:
print("Higher")
Don’t mix up tabs or spaces!
9 Exercises
• Print Hello World
– in a Python shell
– in an IPython shell
12
– in a Jupyter notebook
• write a *.py file that, when executed prints Hello World and execute that file
– from a shell
– from a Python shell
– from an IPython shell
– from a Jupyter notebook
13