1. Python Introduction
1. Python Introduction
ADVIST
by
Dmitrii Nechaev & Dr. Lothar Richter
21.10.22
Organization
2/122
Organization
Ę Important
3/122
Organization
4/122
Team behind the course
5/122
Overview
6/122
Overview
Fourth Iteration
Experience driven syllabus ‑ subject to change
Depending on the progress in the lecture single topics could be added or
dropped
The sequence of topics might be shuffled
Hybrid nature: presentation of theoretical concepts are blended with back‑
and front‑end technology
7/122
Motivation
Present topics beyond the canonical syllabus which are useful in working
with (bioinformatics) data
DATA:
Data Processing: Python
Efficient Data Processing: NumPy and pandas
Efficient Data Storage: NoSQL
Data Analysis: DM and ML techniques
Data Visualization
Data Acqusition: Biological Databases
8/122
Topic I: Data (Pre-)Processing - Python
Python:
Syntax
Data types and operations on them
Control structures
Functions
Modules
Objects
Idiomatic & Efficient Python
9/122
Topic II: Data (Pre-)Processing - NumPy & pandas
10/122
Topic III: Data Analysis
11/122
Topic IV: Data Storage
12/122
Topic V: Data Visualization
13/122
Topic VI: Data Acquisition - DBs
Biological Databases:
Database Taxonomy: Primary and Secondary DBs
Where to obtain data? Genbank, UniProt & PDB
Accessing biological databases (through a web interface and
programmatically)
Parsing biological data (manually and through BioPython)
Assembling a dataset from biological data
14/122
Exercises
15/122
Exercise Structure
16/122
Schedule
17/122
Schedule
date topic date topic
October 21st Intro December 23rd Christmas Eve
October 28th Pythonics 1 December 30th New Year’s Eve
November 4th Pythonics 2 January 6th Epiphany
November 11th Pythonics 3 January 13th NoSQL DBs 2
November 18th NumPy 1 January 20th Biological DBs 1
November 25th NumPy 2 January 27th Biological DBs 2
December 2nd Pandas 1 February 3rd Visualization
December 9th Pandas 2 February 10th Environments
December 16th NoSQL DBs 1 February 17th Exam
18/122
Python: Overview
19/122
Python: Overview
We start with a shallow and narrow overview of Python. We will cover the following:
Variables
Basic data types
Control structures
Functions
Classes
Modules
Packages
20/122
Python: Variables
21/122
Python: Variables
one
1 my_variable = 1 + 'one'
22/122
Python: Variables
Variable names can contain letters, digits, and the underscore symbol. Variable names
can’t start with a digit:
1 myvar1 = 1
2 my_var_1 = 'one'
3 _myvar1 = 'and another one'
4 1myvar = 'illegal'
23/122
Python: Variables
Python keywords can’t be used as variable names. We can get a list of keywords by
running the following code:
1 import keyword
2 print(keyword.kwlist)
24/122
Python: Variables
Use lowercase for the variable names, with words separated by underscores:
1 use_this = True
2 doNotUseThis = True
25/122
Python: Data Types and Control Flow
26/122
Python: Data Types
27/122
Python: Control Flow
Python has the usual control flow constructs, such as branches and loops.
Branching is achieved via an if statement.
Pattern matching (switch‑like behavior) is available in Python starting with the 3.10
version via the match-case statement.
for and while statements allow us to repeatedly execute a block of code. There is no
do ... while statement in Python.
28/122
Data types: bool
True and False are Boolean values. We can perform logical operations using the
following operators:
1 True and False
False
1 True or False
True
1 not True
False
29/122
Data Types: int
int is a built‑in integral data type. The range of possible values is limited only by the
machine’s memory:
1 9999999999999999999 * 2
19999999999999999998
30/122
Data Types: int
We can compare integers using comparison operators:
1 5 > 3
True
1 5 >= 3
True
1 5 == 3
False
31/122
Data Types: int
We can compare integers using comparison operators:
1 5 <= 3
False
1 5 < 3
False
1 5 != 3
True
32/122
Control Flow: if
We can use Boolean values together with an if statement to perform branching, that is,
execute a block of code if a condition is true:
1 x = 6
2 if x < 0:
3 print('Negative')
4 if x > 0:
5 print('Positive')
Positive
33/122
Control Flow: if-else
We can add another code branch using an else statement. The block of code
associated with else statement is executed if a condition is false:
1 x = 6
2 if x < 0:
3 print('Negative')
4 else:
5 print('Positive')
Positive
34/122
Control Flow: if-elif-else
We can add even more branches using an elif statement:
1 x = 0
2 if x < 0:
3 print('Negative')
4 elif x == 0:
5 print('Zero')
6 else:
7 print('Positive')
Zero
35/122
Control Flow: match-case
We can also execute a block of code associated with a pattern that matches a given
expression:
1 x = 4
2 match x:
3 case 2:
4 print('Too low')
5 case 4:
6 print('Just right')
7 case 8:
8 print('Too high')
Just right
We will cover this new powerful feature later.
36/122
Data Types: int
We can perform arithmetic operations on integers via arithmetic operators:
1 5 + 3
8
1 5 - 3
2
1 5 * 3
15
1 5 / 3
1.6666666666666667
37/122
Data Types: int
We can perform arithmetic operations on integers via arithmetic operators:
1 5 // 3
1
1 5 % 3
2
1 5 ** 3
125
1 -5
-5
38/122
Data Types: int
6.0
1 7 + 6 - 5 * 4 / 3 ** (2 + 1)
12.25925925925926
39/122
Data Types: float
float is a built‑in type for floating‑point values. These values have limited precision:
1 0.1 == 0.10000000000000001
True
We can use a decimal point or exponential notation to create floating‑point values:
1 12e-3
0.012
40/122
Data Types: float
We can compare floats and use arithmetic operators with them as well:
1 1.2 + 3.4 > 9.8 - 7.6
True
1 5.1 * 3.2 // 4
4.0
41/122
Data Types: bool, int, and float
Boolean values are evaluated as 1 and 0 when used with arithmetic operators:
1 5 - True
4
1 4.3 + False
4.3
42/122
Control Flow: while
while loop executes a block of code as long as the condition is true:
1 x = 0
2 while x < 5:
3 print(x)
4 x = x + 1
0
1
2
3
4
43/122
Data Types: None
The null object, named None, is used to represent the absence of a value.
44/122
Data Types: str
Python has a data type for strings (sequences of characters) and doesn’t have a specific
data type for a single character, although we can create a string containing only one
character (or an empty string containing zero characters):
1 'Hello, World!'
'Hello, World!'
1 'H'
'H'
1 ''
''
45/122
Data Types: str
Python has single‑quoted and double‑quoted strings:
1 ”Hello, World!” == 'Hello, World!'
True
Whether we use single or double quotes, the result is the same. However, having both is
convenient:
1 ”Let's play!”
”Let's play!”
1 'I said, ”We want to play!”'
46/122
Data Types: str
What if we have both single and double quotes (or other special characters) in a string?
We can escape them using the backslash character:
1 'I said, ”Let\'s play!”'
47/122
Data Types: str
Triple quotes also allow us to create multi‑line strings:
1 ”””This is
2 a multiline string.”””
48/122
Data Types: str
49/122
Data Types: str
A string is a sequence of characters, and we can access individual characters by
indexing:
1 'Hello, World!'[0]
'H'
1 'Hello, World!'[-1]
'!'
We can get length of a string by using function len():
1 len('Hello, World!')
13
50/122
Sequence Data Types
A sequence is an ordered collection of values. Strings are examples of sequences. Other
basic sequence types in Python are lists, tuples, and ranges.
1 [1, 2, 3, 4, 5]
[1, 2, 3, 4, 5]
1 (1, 2, 3, 4, 5)
(1, 2, 3, 4, 5)
1 range(1, 6)
range(1, 6)
51/122
Data Types: list
A Python list is a mutable, heterogeneous, ordered sequence of elements:
1 my_list = [True, 2, 'three']
2 my_list[1]
2
1 my_list[2] = 'third'
2 my_list
[True, 2, 'third']
1 len(my_list)
52/122
Data Types: list
We can add elements to a list via extend() and append():
1 my_list.append(['IV'])
2 my_list
53/122
Data Types: list
[1, 2, 3, 4, 5, 6]
54/122
Data Types: tuple
5
1 my_tuple[2] = 'third'
55/122
Data Types: tuple
We can’t add elements to a tuple (it is immutable). We can still concatenate tuples:
1 tuple_1 = (1, 2)
2 tuple_2 = (3, 4)
3 tuple_3 = (5, 6)
4 tuple_4 = (7, )
5 tuple_5 = ()
6 big_tuple = tuple_1 + tuple_2 + tuple_3 + tuple_4 + tuple_5
7 big_tuple
(1, 2, 3, 4, 5, 6, 7)
56/122
Control Flow: for
Unlike Java, Python doesn’t have a three‑statement loop such as
1 for (int i = 0; i < 5; i++) {
2 System.out.println(i);
3 }
for loop in Python iterates over elements of a collection, such as a list or a tuple:
1 for el in [True, 2, 'three']:
2 print(el)
True
2
three
57/122
Data Types: range
range is another basic sequence type in Python. It is a sequence of integers from start
to end:
1 for el in range(3):
2 print(el)
0
1
2
58/122
Data Types: range
1 for el in range(3, 6):
2 print(el)
3
4
5
1 for el in range(3, 9, 2):
2 print(el)
3
5
7
59/122
Data Types: range
1 for el in range(0, -10, -4):
2 print(el)
0
-4
-8
Be careful with negatives:
1 for el in range(0, -10):
2 print(el)
60/122
Control Flow: for and range
range data type comes in handy when we want to perform iteration a specific number
of times or want to iterate through specific numbers. Sure, in some cases we can simply
do the following:
1 for el in (0, 1, 2, 3, 4, 5, 6, 7, 8, 9):
2 do something
61/122
Control Flow: loops
break statement terminates a loop, continue statement skips to the next iteration of
a loop:
1 for x in range(5, 15):
2 if x > 10:
3 break
4 if x % 2 == 0:
5 continue
6 print(x)
5
7
9
62/122
Control Flow: Blocks of Code
In Java, a block of code is delimited by curly braces {}. In Python, a block of code is
marked by indentation. Indent your blocks of code with four spaces!
63/122
Data Types: dict
A dictionary is a mapping from hashable values to arbitrary values. Another way to
describe it is as a mutable, heterogeneous, unordered sequence of key‑value pairs:
1 my_dict = {'c': 1, 'b': 2, 'a': 3}
2 my_dict['d'] = 4
3 my_dict['c'] = 'Hello, World!'
4 my_dict
64/122
Data Types: dict
Can we concatenate two dictionaries the way we concatenated tuples and lists? No:
1 my_dict + {'z': True}
65/122
Data Types: dict
We have to use the union operatior | for dictionary ’concatenation’ (in‑quotes,
with‑an‑asterisk, pay‑attention, you get the idea):
1 dict1 = {'a': 1, 'b': 2}
2 dict2 = {'a': 3, 'c': 4}
66/122
Data Types: dict
we can also update one dictionary with the values from another dictionary using the
update() method:
67/122
Data Types: set
{1, 2, 3, 4, 'string'}
68/122
Data Types: set
We can add elements to a set using the add() method:
1 my_set.add('string2')
2 my_set
True
1 'string3' not in my_set
True
69/122
Data Types: set
We can compute union, intersection, difference, and symmetric difference of sets in
Python:
1 my_set_1 = {1, 2, 3, 4, 5}
2 my_set_2 = {4, 5, 6, 7, 8}
70/122
Data Types: Conversion
1 bool(10) 1 bool(0)
True False
1 bool('string') 1 bool('')
True False
1 bool(['value']) 1 bool([])
True False
71/122
Data Types: Conversion
1 int(True) 1 int(False)
1 0
1 int(5.4) 1 int(5.9)
5 5
1 int('10') 1 int('10', 2)
10 2
72/122
Data Types: Conversion
1 float(True)
1.0
1 float(False)
0.0
1 float(5)
5.0
1 float('10.0')
10.0
73/122
Data Types: Conversion
1 str(True)
'True'
1 str(False)
'False'
1 str(12.5)
'12.5'
1 str([1, 2, 3])
'[1, 2, 3]'
74/122
Data Types: Conversion
1 list((1, 2, 3)) 1 tuple([1, 2, 3])
[1, 2, 3] (1, 2, 3)
1 list({1, 2, 3}) 1 tuple({1, 2, 3})
[1, 2, 3] (1, 2, 3)
1 list({'a': 1, 'b': 2}) 1 tuple({'a': 1, 'b': 2})
['a', 'b'] ('a', 'b')
1 list('ABCD') 1 tuple('ABCD')
['A', 'B', 'C', 'D'] ('A', 'B', 'C', 'D')
75/122
Data Types: Conversion
1 set([1, 2, 3, 1])
{1, 2, 3}
1 set((1, 2, 3))
{1, 2, 3}
1 set({'a': 1, 'b': 2})
{'a', 'b'}
1 set('Hello, world!')
{' ', '!', ',', 'H', 'd', 'e', 'l', 'o', 'r', 'w'}
76/122
Python: Data Types
77/122
Python: Control Flow
78/122
Python: Functions
79/122
Python: Functions
A function is a named block of code that can accept arguments and can have a return
value. We have already seen a function that returns a number of elements in a sequence:
1 len('Hello, World!')
13
80/122
Functions: Defining a Function
To define a function, we use the keyword def followed by a function name, a list of
parameters in parentheses, a colon, and the function body (use the same indentation
with four spaces to delimit a function’s body):
1 def print_square(x):
2 print(x * x)
3
4 print_square(9)
81
81/122
Functions: Parameters
Functions can have an arbitrary number of parameters:
1 def really_stale_joke():
2 print(42)
3
4 really_stale_joke()
42
1 def my_sum(first, second, third, fourth):
2 print(first + second + third + fourth)
3
4 my_sum(10, 9, 8, 7)
34
82/122
Functions: Return Value
4 cube_of_four = my_pow(4, 3)
5 cube_of_four
64
83/122
Functions: Pass
4 empty_func()
84/122
Functions: Valid Names
Use lowercase for the function names, with words separated by underscores (just like
the variables).
85/122
Python: Classes
86/122
Python: Classes
The class mechanism allows us to create new data types, combining state and behavior.
To create a new class, use the class keyword:
1 class Duck:
2 pass
3
4 my_duck = Duck()
5 type(my_duck)
__main__.Duck
87/122
Classes: __init__
To initialize an object’s state, we use the __init__ method. This method is
automatically invoked after a new object has been instantiated:
1 class Duck:
2 def __init__(self, name, color):
3 self.name = name
4 self.color = color
5
Donald
white
88/122
Classes: self
When we invoke a method on an object, the object is automatically passed to the
method as the first parameter. We can give the first parameter any name we want, but
conventionally it is called self:
1 class UnusualDuck:
2 def __init__(this, name, color):
3 this.name = name
4 this.color = color
5
green
Greeny
89/122
Classes: Methods
We define methods just like we define functions, except we do it inside a class:
1 class Duck:
2 def __init__(self, name, color):
3 self.name = name
4 self.color = color
5 def talk(self):
6 return ”Quack! I'm ” + self.name
7
90/122
Classes: Inheritance
We can create a subclass by specifying its superclass in the class statement. Let’s
define the classes:
1 class Animal:
2 def __init__(self, name, color):
3 self.name = name
4 self.color = color
5
6 class Duck(Animal):
7 def talk(self):
8 return ”Quack! I'm ” + self.name
9
10 class Cow(Animal):
11 def talk(self):
12 return ”Moo! I'm ” + self.name
91/122
Classes: Inheritance
92/122
Classes: Overriding Methods
We can override a method of a superclass by reimplementing it in a subclass:
1 class Animal:
2 def __init__(self, name, color):
3 self.name = name
4 self.color = color
5 def talk(self):
6 return ”Hi! I'm ” + self.name
7
8 class Duck(Animal):
9 pass
10
11 class Cow(Animal):
12 def talk(self):
13 return ”Moo! I'm ” + self.name
93/122
Classes: Overriding Methods
94/122
Classes: Overriding Methods
We can invoke a method of a superclass via super():
1 class ChocoCow(Cow):
2 def __init__(self, name, color, cocoa_content):
3 super().__init__(name, color)
4 self.cocoa_content = cocoa_content
5
65
95/122
Classes: Valid Names
96/122
Modules
97/122
Modules: Overview
98/122
Modules: Overview
99/122
Modules: Overview
We’ve decided to develop a computer game about charming rogues, palace intrigues,
and time travel that has the following characters:
Robin Hood, who likes to shoot arrows with a bow;
Antonio Vivaldi, who likes to play on a violin with a bow;
Marie Antoinette, who likes to tie her hair with a bow;
…
You get the idea. We have a naming conflict. We need namespaces. We need to use
modules.
100/122
Modules: Overview
101/122
Modules: Overview
A Python module is a file containing Python code* and having ’.py‘ extension. The name
of the module is the name of the file without extension. Let us create two such files:
greetings_en.py:
1 def hello_world():
2 return 'Hello World!`
greetings_de.py:
1 def hello_world():
2 return 'Hallo Welt!'
* a module can also be written in C or be a part of the interpreter itself, but we will not
cover those cases
102/122
Modules: Overview
'Hello World!'
1 greetings_de.hello_world()
'Hallo Welt!'
103/122
Modules: Overview
Where does import statement load files from? To simplify things, from the Python
installation directory, from the system path, and from the working directory.
104/122
Modules: Overview
We can specify a custom name for a module using import as. We can also import
specific names from a module:
1 import greetings_de as dtsch
2 dtsch.hello_world()
'Hallo Welt!'
1 from greetings_en import hello_world
2 hello_world()
'Hello World!'
1 from greetings_en import hello_world as oi
2 oi()
'Hello World!'
105/122
Modules: Overview
106/122
Packages
107/122
Packages: Overview
What if our collection of i18n modules grew to include french, spanish, italian,
portuguese, etc.? We don’t want to have all these modules (and, presumably, many
others) in the root directory of our program. We can use packages to solve the problem!
108/122
Packages: Namespace Packages
A namespace package is just a directory that contains modules. Using directories allows
us to create hierarchies of modules and to group similar modules together. Let’s create
a greetings directory and the following files in the directory:
en.py: it.py:
1 def hello_world(): 1 def hello_world():
2 return 'Hello World!` 2 return 'Ciao mondo!'
de.py:
1 def hello_world():
2 return 'Hallo Welt!'
109/122
Packages: Namespace Packages
Now, we can again load our modules by name using import:
1 from greetings import de
2 from greetings import en as english
3 from greetings.it import hello_world as ciao
1 de.hello_world()
'Hallo Welt!'
1 english.hello_world()
'Hello World!'
1 ciao()
'Ciao mondo!'
110/122
Packages: Namespace Packages
We explicitly specified the modules we wanted to import from a package. What
happens if we import only the package itself? Let’s create a farewells directory and
the following files in the directory:
en.py: it.py:
1 def bye(): 1 def bye():
2 return 'Bye!` 2 return 'Ciao!'
de.py:
1 def bye():
2 return 'Tschüs!'
111/122
Packages: Namespace Packages
1 import farewells
2 farewells.de
112/122
Packages: Namespace Packages
Loading a namespace package doesn’t give us access to the modules contained in the
package. To access the modules we either have to explicitly import them or use a
“regular” package.
113/122
Packages: Regular Packages
A “regular” package contains an __init__.py file. This file is supposed to contain
the initialization logic. Let’s make a copy of the farewells directory, name it
farewells2, and create an __init__.py file in the farewells2 directory with
the following code in it:
__init__.py:
1 from . import de
2 from . import en
3 from . import it
Pay attention to the . ‑ we are not importing these modules from the working directory,
so we had to use relative import (in other words, we had to specify where the modules
are relative to the location of the import statement) instead of an absolute import.
114/122
Packages: Regular Packages
Let’s load the package and check if we can access the modules:
1 import farewells2
2 farewells2.de.bye()
'Tschüs!'
115/122
Package Installation and Version Management
There is pip, pipenv, venv, poetry and other tools I probably haven’t even heard
about. We are going to use conda in this course. Please visit
https://docs.conda.io/en/latest/miniconda.html, download the Miniconda installer
and run it.
116/122
conda: Overview
117/122
conda: Overview
To show a list of installed packages, use conda list.
To remove a package, use conda remove package_name.
To show a list of environments, use conda info -e.
To switch to a different environment, use conda activate env_name.
Finally, to remove an environment, use conda env remove --name
env_name.
Let’s remove the matplotlib package, show a list of installed packages, then switch
to the base environment and remove the advist environment:
1 conda remove matplotlib
2 conda list
3 conda activate base
4 conda env remove --name advist-00
118/122
Python: Overview
This ends our shallow and narrow overview of Python. Hopefully you all now have a
basic understanding of the following topics:
Modules
Classes
Functions
Control Structures
Basic Data Types
Variables
119/122
Python: Documentation
120/122
Python: Documentation
121/122
Thank you!
QUESTIONS?
122/122