Pre-Course Programming
Python Foundations and Tooling
Study Program Data Science
Prof. Dr. Tillmann Schwörer
Access to Course Materials
Enrolled data science students have permanent access via:
https://collab.fh-kiel.de/course/view.php?id=321
Students not yet enrolled can use the following Dropbox folder:
https://bit.ly/precourse-programming-2024
Note that this dropbox folder will be deleted after the precourse!
First materials are already online, further materials will be uploaded
during the precourse
2
Kickoff Survey
https://forms.gle/TWUGUAcxUBcZ3F2d9
3
Python foundations
Data types
Operators
Functions
Control flow and iterators
Finding help Data Science Workflow
Python
Tooling
Installation
Visual Studio Code
Jupyter Notebooks
Python Packages
Virtual Environments
4
Agenda (preliminary)
Day 1
Installation
Primitive Types
Using Python as a Calculator
Python Packages and Virtual Environments
Day 2
Functions
List, Tuple, Dictionary and Set
Control Flow
Day 3
List and Dict Comprehensions
Useful Iterators
Developing our own quiz application
5
Python Setup and Tooling
6
Recommended Setup
CPython (not Anaconda Python)
Integrated Development Environment: Visual Studio Code
Python Extension and Jupyter Extension for Visual Studio Code
Installation of Python Packages in virtual environments (one virtual
environment per university course)
7
Python Installation Guide
No Python so far → just follow the instructions
I have an older Python version installed
Python 2.X: you MUST install a new version
Python 3.X: you CAN install a new version
In general, it is absolutely fine to have multiple Python versions installed
I have Anaconda Python installed
Either stick with Anaconda: some aspects will be slightly different for you
Or install CPython in addition: it is fine to have both distributions installed
I have strange problems due to multiple Python versions installed on my
system
This happens occasionally: let‘s try to fix these problems NOW!
8
Which Python distribution?
Distributions CPython distribution Anaconda distribution
https://www.python.org https://anaconda.org/
What is it? Standard Python distribution Data Science Toolkit :
• Python and R distribution
• Package management system „conda“
• Graphical User Interface (GUI) „Anaconda
Navigator“
• Selection of IDEs
How are projects The relevant Python version and Python packages The Python/R versions and all required packages
managed? are installed in a virtual environment including requirements of your operating system
are defined in conda environments
How are packages Packages are installed from the Python Package • Packages are installed from the conda package
installed? Index (https://pypi.org/) via pip install repository via conda install packagename
packagename
What are pros and cons? • Requires little space on disc • Requires a lot of disc space
• 395.000 Python packages are available from • Fewer Python packages are available, and
PyPi sometimes not their most recent versions;
• Does not handle system dependencies installing missing packages via pip creates its
own problems
• Handles system dependencies
9
Integrated Development Environment (IDE)
Jupyterlab/Jupyter
Visual Studio Code PyCharm Notebook (via Spyder
Browser)
Jupyter Notebooks
✓ ✓ ✓ x
(.ipynb)
Python Script (.py) ✓ ✓ X ✓
Git ✓ ✓ (✓) x
Pricing Free For Students free Free Free
General purpose code
editor: support for Comprehensive Data Science focus, but
specific functionalities functionalities for Python; cannot replace a full IDE
Comments Similarity to RStudio
and programming somewhat overloaded for larger projects and
languages are available interface production code
via extensions
10
Visual Studio Code
File Explorer
Git / Version Control
Extensions
Command Palette
Ctrl + Shift + P
Fast access to everything
Settings
11
Packages
3rd party packages: e.g.
for data science
Standard library: built-in
modules with core
Python functionalities
Python
12
Virtual Environments
Different projects may need different
versions of Python and/or packages →
Virtual Environment
Ensure reproducibility: code needs to
run stably on any system, today and in
the future
Safely experiment
13
Virtual Environments in VS Code
14
Python script vs Jupyter Notebook
Script (.py) Jupyter Notebook (.ipynb)
15
Python scripts
Text file (editable in any text editor), with file ending .py
Use cases:
Focus on code
Concise documents
Re-use code elsewhere by importing the python file
Production setting
Comments are marked by # symbol. Comments won‘t be executed. All
other lines will.
Execution modes in VS Code: interactive (selection, current line) or script
mode (entire file)
16
Python scripts: execution modes
Run selection/line
Shift + Enter
Execute entire file
17
Jupyter Notebooks
we see: we type: under the hood:
18
Jupyter Notebooks
Content aspects:
Code, output, text, formulas, images, etc. all in one file. Contents are formatted via
Markdown
Julia, Python, and R code
Use cases: exploration, presentation, documentation, books
Technical aspects:
Files with JSON structure and file ending.ipynb (IPython Notebook)
Web-based
Editable in browser via Jupyterlab/Jupyer Notebook, and via IDEs such as VS Code
Convertible into html, pdf, or python scripts
19
Jupyter Notebooks
Disadvantages
Editing functionalities are limited compared to Python scripts
Working productively requires that you remember many shortcuts
Not useful if you want to re-use code
Not ideal for version control due to JSON format
→ More information on working with Jupyter Notebooks in VS Code.
20
Markdown
Formatting of text documents
Intuitive, easy to read, platform-
independent
Applications: Jupyter Notebooks,
Rmarkdown, Github,…
Useful links:
Cheat Sheet, Tutorial
VS Code extension: Markdown
Preview Enhanced
21
Python Foundations
22
Data types Manipulating data
Primitive Types create | subset | edit | delete
List
Tuple
Dictionary Operators
Set [2, 3] Math | comparison |logical
Python
Functions Control flow and Iterators
Foundations
Using functions if, else and loops
Writing functions List and dict comprehensions
Lambda functions Generator expressions
Generator functions
Concepts / Paradigms
Mutability
Object Oriented Programming
Functional programming
23
Built-In Data Types
Integer 20
Numbers
Float 37.5
Primitive
Immutable
Complex 1+3j
Boolean True/False
Data Types Strings ˈJessaˈ
Sequences
Tuple (3, 4.5, ˈbˈ)
Collections
List [2, ˈaˈ, 5.7]
Mutable
Dictionary {1:ˈaˈ, 2:ˈbˈ}
Set {2, 4, 6}
24
Math Operators
25
Functions
Formatting:
Consistent indentation
Docstring: function help
Type Hints (optional)
Default values (optional)
A function may
carry out some operations
return objects
print()
26
Methods
Methods are functions which are associated with a specific class
Data Science is an instance of the class string
The split method operates on the string ‘Data Science’
27
Strings
Create strings:
'single quotes'
"double quotes"
"""triple quotes""" (for multiline strings)
Subsetting strings
[start:stop:step]
zero-based
start is inclusive, stop is exclusive
negative indexing allowed
28
Strings
Concatenate strings via +
Repeat strings via *
Iterate over string elements
String Methods
str.upper()
str.lower()
str.reverse()
…
29
Comparison and Membership Operators
30
Assignment, names, values
Assignment: x = 2337
Variable x is a name that stores the reference to value 2337
Consequences:
Dynamic typing: We can change the type of x, by
pointing x to another object
Aliasing: Multiple names can point to the same object
Side effects: If a mutable object is changed, this
effects all aliases
31
Mutability
Mutable objects can be changed, which means that the change occurs in-
place (without altering the memory address)
Use copy method to avoid side effects between two names
a a b a b
2 3 4 2 3 4 2 3 5
#123456 #123456 #123456
32
List and Tuple
Lists and tuples: ordered collections, can contain mixed types
Tuples are immutable, lists are mutable
Tuples have very few methods
Lists have many methods: append, extend, pop, remove, …
Shared features of all sequence types:
Subset via index position
Concatenate sequences via + operator
Repeat sequences via * operator
Iterate over elements in a for loop
33
If - elif - else loop
Condition that evaluates to True or False
34
While condition
Intialize variable i Condition that evaluates to True or False
Once the condition evaluates to False, the while loop stops
Change the value of the variable for next iteration
35
For loop
for number in [2, 4, 6]:
print(number)
for <var> in <iterable>:
# do something #
36
Iterables
tuple
list (2, 4, 6)
[2, 4, 6] 2 4 6 string
2 4 6 ˈdataˈ
d a t a
for <var> in <iterable>:
# do something #
dictionary
range
dict = {1:ˈAnnaˈ, 2:ˈMaxˈ}
range(0,10,2)
0 2 4 6 8
dict.keys() dict.values()
1 2 ˈAnnaˈ ˈMaxˈ
37
Iterables
for <var1> <var2> <…> in <iterable>:
# do something #
dict for i, j in {1:ˈAnnaˈ,2:ˈMaxˈ}.items(): ˈ1. Annaˈ
print(fˈ{i}. {j}ˈ) ˈ2. Maxˈ
enumerate for i, j in enumerate(ˈJoeˈ): 0: ˈJˈ
print(fˈ{i}: {j}ˈ) 1: ˈoˈ
2: ˈeˈ
zip names = [ˈAnnaˈ, ˈMaxˈ]
ages = [18, 20]
for name, age in zip(names, ages): ˈAnna is 18ˈ
print(fˈ{name} is {age}ˈ) ˈMax is 20ˈ38
Special statements in a loop
pass: placeholder to avoid error due to empty loop
continue: continue with next iteration of loop
break: completely interrupt execution
39
List comprehension
Concise alternative to a list-creating for loop
For loop:
squares = []
for x in [1, 3, 5]: [1, 9, 25]
squares.append(x**2)
List comprehension:
squares = [x**2 for x in [1, 3, 5]]
40
Dictionary comprehension
Concise alternative to a dictionary-creating for loop
Replaces for loop:
squares = {}
for x in [1, 3, 5]: {1: 1, 3: 9, 5: 25}
squares.update({x: x**2})
Dict comprehension:
squares = {x: x**2 for x in [1, 3, 5]}
41
Conventions
PEP 20: The Zen of Python
Beautiful is better than ugly.
Explicit is better than implicit.
Simple is better than complex.
…
PEP 8: Style Guide for Python Code
4 spaces per indentation level Automate PEP 8 and PEP 257 consistent code:
Code should always use UTF-8 encoding • pip install autopep8
• VS Code docstring generator extension
Imports should usually be on separate lines
…
PEP 257: Docstring Conventions
42