0% found this document useful (0 votes)
5 views

python-intro

The document is an introductory lecture on Python for Data Science by Felix Biessmann, covering his background, the popularity of Python, and its advantages for data science applications. It includes resources for learning Python, installation instructions, and an overview of tools like IPython and Jupyter Notebooks. Additionally, it discusses Python syntax, execution options, and provides exercises for practice.

Uploaded by

muffmufferson
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
5 views

python-intro

The document is an introductory lecture on Python for Data Science by Felix Biessmann, covering his background, the popularity of Python, and its advantages for data science applications. It includes resources for learning Python, installation instructions, and an overview of tools like IPython and Jupyter Notebooks. Additionally, it discusses Python syntax, execution options, and provides exercises for practice.

Uploaded by

muffmufferson
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 13

python-intro

October 9, 2019

1 Python For Data Science

Felix Biessmann
Lecture 1 - Introduction

2 Short intro

• Name
• Educational backgound (Physics/Computer Science/…?)
• Work experience
• Experience with Python? If yes, what project?
• I’m Felix Biessmann (ß = ss)
• BSc in Cognitive Science
– Computational Linguistics
– Neuroscience
• MSc in Neuroscience
– Brain-Computer-Interfaces
– Functional Magnetic Resonance Imaging data
• PhD in Machine Learning
– Biomedical Applications
– Multimodal neuroimaging data
– Text analysis on web data
• Assistant Professor for Machine Learning at Korea University, Seoul
• Amazon Research
– Computer Vision
– Recommender Systems
– Machine Learning Infrastructure
– ML for Data Quality
• Einstein Center for Digital Future / Beuth
– Data Quality

1
– ML for Humans

2.1 Useful Resources

• Jake Vanderplas: Whirlwind Tour of Python


• Jake Vanderplas: Python Data Science Handbook
• Wes McKinnery: Python for Data Analysis
• Andreas Mueller: Introduction to ML with Python
• Joel Grus: Data Science from Scratch
• Scikit-Learn Documentation
• stackoverflow

3 Why Python?

3.0.1 Popularity

Python is the fastest growing programming language

3.0.2 Popularity

https://stackoverflow.blog/2017/09/06/incredible-growth-python/

3.0.3 Popularity

According to a comprehensive stackoverflow survey Python is the most wanted language for the
second year in a row, meaning that it is the language that developers who do not yet use it most
often say they want to learn.

3.0.4 Popularity

• Simple
• Versatile
• No boilerplate code
• De facto standard for Data Science / Machine Learning
• All technical interviews I made at Amazon: Candidates chose Python

3.0.5 Libraries

• NumPy
• SciPy
• Matplotlib
• IPython
• Pandas

2
• sklearn
• tensorflow
• pytorch
• mxnet
• …

3.0.6 Time

Programmer time is more expensive than CPU time.

4 The Zen of Python

Python lovers are quick to point out how intuitive Python is.
But intution is strongly related to familiarity.
Many developers used to traditional (compiled, typed) languages won’t find Python intuitive.
If you want to dive deep into the philosophy behind Python, try this easter egg in a Python shell:
[1]: import this

The Zen of Python, by Tim Peters

Beautiful is better than ugly.


Explicit is better than implicit.
Simple is better than complex.
Complex is better than complicated.
Flat is better than nested.
Sparse is better than dense.
Readability counts.
Special cases aren't special enough to break the rules.
Although practicality beats purity.
Errors should never pass silently.
Unless explicitly silenced.
In the face of ambiguity, refuse the temptation to guess.
There should be one-- and preferably only one --obvious way to do it.
Although that way may not be obvious at first unless you're Dutch.
Now is better than never.
Although never is often better than *right* now.
If the implementation is hard to explain, it's a bad idea.
If the implementation is easy to explain, it may be a good idea.
Namespaces are one honking great idea -- let's do more of those!

4.1 How to Install Python

• Most systems come with (an out-dated version of) python pre-installed

3
• Download binaries for your system at https://www.python.org/downloads/
• You can use system specific package managers (apt for linux, homebrew for OSX)
• Or you use Anaconda (preferred if you don’t like fiddling with installations)

4.2 Python 2 vs 3

• Days of Python 2 are counted


• Use Python 3
• Most important difference:
• python 2:
print "foobar"
• python 3:
print("foobar")

5 Virtual Environments

• For different projects you will need different dependencies (or versions thereof)
• It is good practice to encapsulate your dependencies in a virtual environment
• Virtualenv (python documentation)
python3 -m venv [path_to_venv]
source [path_to_venv]/bin/activate
• Anaconda
conda create -n my_env python=3.7
conda activate my_env

6 How to Install Python Packages

To avoid cluttered dependencies, you might want to install package only in a virtual environment,
not system-wide.
• For plain python
– pip install [packagename]
• For Anaconda installations
– conda install [packagename]

4
7 How to Run Python Code

7.1 How to Run Python Code

• Shell command $ python [somefilename].py


• Interactively
• Python Interpreter $ python
• IPython Interpreter $ ipython
• Jupyter Notebooks $ jupyter notebook
• Make sure you use the right python/ipython/jupyter binaries!
We will discuss pros and cons after looking into each of those options

7.2 Setup Python in Room D132L

Open a terminal (e.g. by hitting cmd-space and type terminal and enter)
Create a new virtual environment and activate it
python3 -m venv venv
source venv/bin/activate
Upgrade the python package manager and install jupyter
pip install --upgrade pip
pip install prompt_toolkit==2.0.3
pip install jupyter

7.3 The Python Interpreter

Type python at the command prompt:


$ python
Python 3.5.1 |Continuum Analytics, Inc.| (default, Dec 7 2015, 11:24:55)
Type "help", "copyright", "credits" or "license" for more information.
>>>
You can type and execute code snippets:
>>> 1 + 1
2
>>> x = 5
>>> x * 3
15
You can run python code files:
exec(open('some_file.py').read())

5
7.4 The IPython Interpreter

The Python shell lacks many useful features of a modern shell


• tab-completion
• history (per default)
• help/documentation functionality
• plotting
IPython adds all that.

7.5 The IPython Interpreter

IPython is an enhanced python interpreter invented by Fernando Perez.

7.6 The IPython Interpreter

After installation ipython at the command prompt:


$ ipython
Python 3.5.1 |Continuum Analytics, Inc.| (default, Dec 7 2015, 11:24:55)
Type "copyright", "credits" or "license" for more information.

IPython 4.0.0 -- An enhanced Interactive Python.


? -> Introduction and overview of IPython's features.
%quickref -> Quick reference.
help -> Python's own help system.
object? -> Details about 'object', use 'object??' for extra details.

In [1]:

7.7 The IPython Interpreter

Visual difference: - Python uses >>> - IPython uses numbered commands (e.g. In [1]:)
ipython has the same functionalities as a standard python shell
In [1]: 1 + 1
Out[1]: 2

In [2]: x = 5

In [3]: x * 3
Out[3]: 15
… and many more

6
7.8 Some Nice IPython Functionalities

• help / documentation with ?


• pasting code
• timing code
• debugging code
• plotting inline

7.9 The helper/documentation function ?

In [8]: b = [1, 2, 3]

In [9]: b?
Type: list
String Form:[1, 2, 3]
Length: 3
Docstring:
list() -> new empty list
list(iterable) -> new list initialized from iterable's items

In [10]: print?
Docstring:
print(value, ..., sep=' ', end='\n', file=sys.stdout, flush=False)

Prints the values to a stream, or to sys.stdout by default.


Optional keyword arguments:
file: a file-like object (stream); defaults to the current sys.stdout.
sep: string inserted between values, default a space.
end: string appended after the last value, default a newline.
flush: whether to forcibly flush the stream.
Type: builtin_function_or_method
def add_numbers(a, b):
"""
Add two numbers together

Returns
-------
the_sum : type of arguments
"""
return a + b
In [11]: add_numbers?
Signature: add_numbers(a, b)
Docstring:
Add two numbers together

Returns

7
-------
the_sum : type of arguments
File: <ipython-input-9-6a548a216e27>
Type: function
import numpy as np
np.*load*?
np.__loader__
np.load
np.loads
np.loadtxt
np.pkgload

7.10 Pasting Code

In [17]: %paste
x = 5
y = 7
if x > 5:
x += 1

y = 8
## -- End pasted text --

7.11 Timing Code

[2]: %timeit x = 4 + 5

14.8 ns ± 0.206 ns per loop (mean ± std. dev. of 7 runs, 100000000 loops each)

7.12 Interactive Debugging

Some buggy function


def foo():
bar = [3] # bar is a list with one entry
return bar[5] # this tries to return the 6th entry, which will not work
In IPython shell:
In [15]: foo()
---------------------------------------------------------------------------
IndexError Traceback (most recent call last)
<ipython-input-186-624891b0d01a> in <module>()
----> 1 foo()

<ipython-input-185-48a597a45538> in foo()

8
1 def foo():
2 bar = [3] # bar is a list with one entry
----> 3 return bar[5] # this tries to return the 6th entry, which will not work

IndexError: list index out of range


In [16]: %debug
> <ipython-input-185-48a597a45538>(3)foo()
1 def foo():
2 bar = [3] # bar is a list with one entry
----> 3 return bar[5] # this tries to return the 6th entry, which will not work

ipdb> bar
[3]

7.13 Running Code

Some function in a file called ipython_script_test.py


print("Hello World!")
In IPython shell:
In [14]: %run ipython_script_test.py
Hello World!

7.14 The Jupyter Notebook

• made for Julia, python and R (hence Ju-py-teR)


• has all functionality of the IPython shell
• runs in browser (executed remotely or locally)
• simpler plotting
• sharing of analyses simpler (even dashboards)
• markup cells for docu
• popular (also in companies)

7.14.1 Starting a notebook

In your activated virtual/conda environment you can start jupyter with


jupyter notebook
and a browser window should open.
Click File and New Notebook

7.15 The Jupyter Notebook

• adds many useful features

9
• remote execution
• inline plotting
• interactive plots
• has many problems
• hidden state
• version control is difficult
• testing is difficult
• check out Joel Gru’s Talk
[3]: ### Inline Plotting
from matplotlib import pyplot as plt
import numpy as np
%matplotlib inline

plt.plot(np.sin( np.arange(-np.pi,np.pi,.1)));

7.16 Run Python scripts from shell

Open a file test.py and type:


# file: test.py
print("Running test.py")

10
x = 5
print("Result is", 3 * x)
Go to directory of file and type python filename in a shell:
$ python test.py
Running test.py
Result is 15

7.17 Pros and Cons of Execution Options

7.17.1 Python Shell

Pros - interactive development / debugging


Cons - very inconvenient
I only use it if there is no IPython installed (and there’s no internet.).

7.17.2 IPython Shell

Pros - interactive development / debugging - Convenience - Documentation - Debugging - Tab-


Completion - History
Cons - too convenient (for people too lazy to organize code) - hidden state

7.17.3 Jupyter Notebooks

Pros - all of IPython’s advantages - inline plotting - code and markdown cells allow better docu-
mentation
Cons - too convenient (for people too lazy to organize code) - hidden state - version control is
difficult

7.17.4 Python scripts

Pros - version control - clean reproducible execution (e.g. no hidden state) - precise timing
Cons - not interactive - debugging more difficult

8 Python Syntax

8.1 Comments Are Marked by #

# this is a comment and will not be interpreted

11
Python does not have multiline comments
But any decent editor will take care of that

8.2 End of line terminates a statement

Other languages require ; or similar things - Python doesn’t


If you really need to do multiline statements
• you can use a backslash \
x = 1 + 2 + 3 + 4 +\
5 + 6 + 7 + 8
• or you use grouping
y = [1,3,4
5,6,7]

8.3 ; also terminates a statement

Normally you would do


x = 1
y = 2
Alternatively:
x = 1; y = 2

8.4 Intendation!

Whitespace Matters in Python


In Python code is grouped by intendation:
for i in range(10):
if i < 5:
print("Lower")
else:
print("Higher")
Don’t mix up tabs or spaces!

9 Exercises

• Print Hello World


– in a Python shell
– in an IPython shell

12
– in a Jupyter notebook
• write a *.py file that, when executed prints Hello World and execute that file
– from a shell
– from a Python shell
– from an IPython shell
– from a Jupyter notebook

13

You might also like