Biochemical Society Online Training Course
Practical Python for beginners: a Biochemist's guide
Course overview - Practical Python for beginners:
a Biochemist's guide
Please do not share this document. This course is © the Authors with an exclusive licence
to publish/reuse ‘Practical Python for beginners: a Biochemist's guide’ belonging to the
Biochemical Society. The course is only supplied for personal use. By registering, delegates
agree not to: copy the material for distribution, reuse it (for purposes other than personal
learning), or share it with third parties without permission from the Biochemical Society.
Learning objectives
After completion of the course, the successful learner will be able to:
• Explain the rationale for scripting
• Install Anaconda and navigate Spyder
• Use the IPython command line as a calculator and to assign variables
• Use the basic data types and some simple functions
• Create lists and select elements from them
• Use For Loops to perform operations iteratively
• Explain what a library is, import a Python library and use functions it contains
• Read biochemical data from a file into a Python
• Understand computing concepts such as what is meant by the working directory, absolute
and relative paths and be able to apply these concepts to data import
• Analyse and visualise biochemical data using powerful Python packages such as NumPy,
Pandas, Sklearn and Matplotlib
• Run examples of more complex analyses in Jupyter notebooks
Definitions
Fill in the below table of definitions as you complete the course.
Command
Script
Comment
Syntax Highlighting
Operator
Operand
Assignment
Function
Integer
Float
String
Reproducibility
List
Indexing a list
Method
For loops
1
Biochemical Society Online Training Course
Practical Python for beginners: a Biochemist's guide
If
Else
Break
Continue
Elif
Library
Alias
Element-wise
NumPy
Array
Pandas
DataFrame
Working directory
Csv file
2
Biochemical Society Online Training Course
Practical Python for beginners: a Biochemist's guide
Modules
Module 0 - Welcome to the course
The analysis and reporting phase should be entirely reproducible - given the data generated, we
should be able to reproduce every aspect of its processing, analysis and reporting. Scripting, in which
each step is explicitly articulated, is the best way to achieve this.
The more reproducible your analysis, the faster you, and others, will be able to apply similar
analyses in the future and the more transparent and open your work; this leads to better science.
Notes:
3
Biochemical Society Online Training Course
Practical Python for beginners: a Biochemist's guide
Module 1 - Getting set up
By the end of this module, you will be able to:
• Install anaconda and open a Python IDE called Spyder
• Describe the different parts of Spyder
• Customise the appearance of Spyder
• Understand how to use the Interactive Python console and a script
1.2 - Anaconda
Q. What is Anaconda?
Download and install Anaconda following the instructions on the course site. Start Anaconda
navigator as you normally start a programme i.e. the Start menu (Windows) or Dock (Mac).
1.3 - Spyder
Start Anaconda navigator click on Spyder to launch.
The Spyder window is divided into three panes and each of these has different tabs.
The top right pane has a Help tab, File Explorer tab and a Variable Explorer tab. Click on the File
Explorer tab and navigate to a folder for working.
1.5 - Typing commands in the console
The bottom right pane is the Interactive Python (IPython) console. This is where commands are
executed.
• In [#]: is the prompt, a command is typed after this.
• Pressing enter will send a command and return the result on a line starting: Out [#]
• For the rest of this course, we won't show In [#] or Out [#].
1.6 - Using a script
The pane on the left is a script file.
• To run a selection of code, highlight what you want to run and press the "Run the Current
selection of line" button (or F9).
• Make sure you save your Python scripts with the file extension .py
Q. What are three advantages of using a script?
-
-
-
1.7 – Commenting
It is good practice to comment your code. You can write as much information in comments as you
like by adding a hash, #, to the start of the line.
4
Biochemical Society Online Training Course
Practical Python for beginners: a Biochemist's guide
Example:
# Using python as a calculator
3 + 4
# I get an answer of 7
Q. Why comment your code?
Notes:
5
Biochemical Society Online Training Course
Practical Python for beginners: a Biochemist's guide
Module 2 - Basic data types and simple functions
By the end of this module, you will be able to:
• Describe the integer, float and string data types for Python variables
• Assign values to variables and explain why this is good practice
• Use functions to discover the type of a variable and print it
2.3 - IPython as a calculator and operators
Example:
In [ ]: 4 + 3
Out[ ]: 7
Here the ‘+’ is known as an operator and both ‘4’ and ‘3’ are known as operands.
Other operators include:
• - subtraction • < is less than
• / division • > is greater than
• * multiplication • == is equal to
• ** exponentiation (“to the power of”) • != is not equal to
• % modulus (remainder after division)
2.4 – Assignment
The = is the assignment operator. To assign the numbers 4 and 3 to variables we can use:
num1 = 4
num2 = 3
2.5 - Your first function
A function has a name followed by brackets. Inside the brackets go arguments which tell the
function what to act on and often how to act on it.
For example the function type() returns the datatype of an object:
type(num1)
Out[ ]: int
Int is short for integer.
If you use a decimal point, even for a whole number, the type will be a float:
num3 = 4.0
type(num3)
Out[ ]: float
2.6 - Basic data types
Integers and floats are two different kinds of numerical data. Words are of the data type ‘string’.
6
Biochemical Society Online Training Course
Practical Python for beginners: a Biochemist's guide
We can add strings together:
word1 = 'hello'
word2 = 'Emma'
word1 + word2
Out[ ]: 'helloEmma'
You would need to add a space explicitly:
word1 + ' ' + word2
Out[ ]: 'hello Emma'
In this case, writing word1 will print out the word but in other scenarios we must tell python to
print using the print() function:
print(word1)
Out[ ]: hello
2.7 - Why we assign to variables
Q. Why do we use variable assignment?
Example:
pi = 3.14
radius = 5
Using the variables to calculate the circumference:
2 * pi * radius
And the area:
pi * radius**2
Notes:
7
Biochemical Society Online Training Course
Practical Python for beginners: a Biochemist's guide
Module 3 - Lists and selection, more functions and some methods
By the end of this module, you will be able to:
• Create lists and be able to index them for selection
• Use functions and methods to work with lists
• Appreciate the difference between functions and methods in Python
3.3 - Data structures: list
A list is denoted by using square brackets with a comma between each element.
To make a list called values:
values = [2, 5, 3, 7]
We can use the function type() on a list to reveal its data structure:
type(values)
Out[ ]: list
Some other useful functions for finding the number of elements of a list and the biggest element:
len(values)
Out[ ]: 4
max(values)
Out[ ]: 7
3.5 - List indexing
One important fact to remember is that the index starts at 0.
index 0 1 2 3
values[ 2, 5, 3, 7 ]
We denote the index of an element with square brackets. Thus, the first element of values is
extracted with:
values[0]
Out[ ]: 2
And the third element by:
values[2]
Out[ ]: 3
3.7 - Lists: data types
Lists can contain other data types, such as strings (str)
names = ['maria', 'isaac', 'sam', 'jamie']
type(names)
Out[ ]: list
8
Biochemical Society Online Training Course
Practical Python for beginners: a Biochemist's guide
type(names) tells us what type of object names is, whereas
type(names[0])
Out[ ]: str
tells us what type of variable is in the first element of the names list.
3.9 - Methods
To use an object’s methods, you use the dot notation.
Here are some examples of methods that do require an argument:
Lists have a count() method used like this:
names.count('maria')
Out[ ]: 1
We can also get the index of ‘maria’
names.index('maria')
Out[ ]: 0
An example of a method that doesn’t require an argument and that changes a list object as well as
returning a value is the pop method.
names.pop()
Out[ ]: 'jamie'
Pop returns the last element of names but it also changes names!
print(names)
Out[ ]: ['maria', 'isaac', 'sam']
Notes:
9
Biochemical Society Online Training Course
Practical Python for beginners: a Biochemist's guide
Module 4 - For Loops and other control statements
At the end of this module, you will be able to:
• Create For loops to iterate through different data structures
• Use Else, If and Elif statements to change how data is treated
• Understand the difference between Continue and Break statements and when to use them
4.2 - Making your code loopy
For Loops take the general structure:
for element in sequence:
do something to the current element.
Q. What is a For loop? And what data structures do they work with?
Examples:
for i in 'Spam':
print(i)
Out[ ]: S
Out[ ]: p
Out[ ]: a
Out[ ]: m
for letter in 'Spam':
print(letter)
Out[ ]: S
Out[ ]: p
Out[ ]: a
Out[ ]: m
With For Loops you can repeat a function a specified number of times.
for i in range(3):
print('Hello')
Out[ ]: Hello
Out[ ]: Hello
Out[ ]: Hello
4.5 - Combining loops with methods
For Loops are also powerful in their ability to access data from inside data structures in an iterative
fashion, which you then could manipulate, transform, store, or do anything else you can think of.
present = ['kick', 'lick', 'chuck']
10
Biochemical Society Online Training Course
Practical Python for beginners: a Biochemist's guide
past = []
for verb in present:
past.append(verb + 'ed')
print(past)
Out[ ]: ['kicked', 'licked', 'chucked']
4.8 - If Statements
If statements specify that the code should only be run if a condition is satisfied. They take the
general form:
if condition
code
Example:
present = ['kick', 'lick', 'chuck', 'tie']
past = []
for verb in present:
if 'ck' in verb:
past.append(verb + 'ed')
print(past)
Out[ ]: ['kicked', 'licked', 'chucked']
4.9 - If and Else Statements
Else specifies what code to run if the if statement is not satisfied.
present = ['kick', 'lick', 'chuck', 'tie']
past = []
for verb in present:
if 'ck' in verb:
past.append(verb + 'ed')
else:
past.append(verb + 'd')
print(past)
Out[ ]: ['kicked', 'licked', 'chucked', 'tied']
4.11 - Other control statements: Skip or Stop?
Break will end a loop entirely, whereas continue will end the current iteration of the loop and move
on to the next.
Examples:
for number in range(4):
if number == 1:
break
print(number)
Out[ ]: 0
for number in range(4):
if number == 1:
11
Biochemical Society Online Training Course
Practical Python for beginners: a Biochemist's guide
continue
print(number)
Out[ ]: 0
Out[ ]: 2
Out[ ]: 3
4.12 - It’s Elif all the way down
Elif (else if) statements can be stacked multiple times, but they need to come after an if statement.
for number in range(5):
if number == 0:
print('Zero')
elif number%2 == 1:
print('Odd')
else:
print('Even')
Notes:
12
Biochemical Society Online Training Course
Practical Python for beginners: a Biochemist's guide
Module 5 – NumPy
By the end of this module, you will be able to:
• Understand what is meant by the term library
• Import the NumPy library into Python
• Use some functions of the NumPy library
• Use some NumPy functions to handle simple data arrays
5.3 - Why use libraries?
Q. Why do we use libraries?
5.4 - Common libraries
NumPy
It is the fundamental package for scientific computing in Python. It enables the use and manipulation
of data arrays (really important for data analysis), basic algebra and statistics.
Pandas
Pandas is a specialist data analysis library which enables you to import data from .csv files easily and
put them into DataFrame.
Seaborn
This is a data visualisation library which enables you to draw some spectacularly pretty graphs within
Python.
5.7 - Importing NumPy
In order to use an object defined within the NumPy library, we will have to import it. We can do this
by typing import <name of library>.
Example:
import numpy
a = numpy.array([1,2,3])
5.9 - Aliases for library names
Example:
import numpy as np
b = np.array([1,2,3])
5.11 - NumPy array methods
Making a 2-dimensional array
Example:
13
Biochemical Society Online Training Course
Practical Python for beginners: a Biochemist's guide
a = np.array([(1,2,3), (4,5,6)])
print(a)
Adding arrays together (element-wise):
c = a + b
Other operations:
cc = c * 10
np.log(c)
5.15 - Multidimensional NumPy arrays
Example:
z = np.array([(1,2,3), (4,5,6), (7,8,9), (10,11,12)])
5.16 - NumPy array methods: creating patterns and empty arrays
Creating an array with a pattern of numbers, example:
x = np.arange(0,8,1)
5.18 - Creating a placeholder array
Example:
x = np.zeros((3,3))
Notes:
14
Biochemical Society Online Training Course
Practical Python for beginners: a Biochemist's guide
Module 6 - Data import using Pandas
By the end of this module, you will be able to:
• Import the Pandas library
• Explain the difference between a Pandas DataFrame and Series
• Use os to find and set a working directory
• Use Pandas to import data from a csv file into a Pandas DataFrame
• Use pandas to select specific parts of the DataFrame using loc and iloc
6.2 - Introduction to Pandas
A Pandas DataFrame object can be made using python code, or by importing data from txt, csv and
excel files. Pandas can also create objects called Series.
6.3 - Making a DataFrame using Pandas
Example:
import pandas as pd
List1 = [('Maria', 98, 70, 11),('Isaac', 20, 87,
34),('Sam', 93, 60, 100),('Jamie', 100, 68, 0)]
df = pd.DataFrame(data = List1)
print(df)
6.5 - DataFrame indices
df is a common name to assign a DataFrame to. The extra column on the left is the index. This is how
to access data from the structure at a particular entry, since each row will have a unique number.
You can also set columns as the indices.
Example:
df = pd.DataFrame(data = List1, columns =['student',
'maths', 'chemistry', 'biology'])
df = df.set_index('student')
print(df)
6.7 - Working Directories
Before importing any files, use the os library to find our working directory.
Import the os library:
import os
And use one of its methods to display (get) our current working directory:
os.getcwd()
6.8 - Creating a new directory
Making a folder/directory:
os.mkdir('python_work')
15
Biochemical Society Online Training Course
Practical Python for beginners: a Biochemist's guide
Change directory:
os.chdir('python_work')
6.9 - Pandas can import different file formats
A common format is Comma Separated Values (csv).
To import from a .csv file we would use:
df = pd.read_csv('filename.csv')
We can also import from text or excel files, examples:
df = pd.read_csv('filename.txt')
df = pd.read_excel('filename.xlsx', sheet_name='Sheet1')
6.13 - Viewing a DataFrame
For bigger data frames you will need to explicitly tell Python to print all of the data.
pd.set_option('display.max_rows', df2.shape[0])
pd.set_option('display.max_columns', df2.shape[1])
print(df2)
6.14 - Using loc and iloc
You can print specific rows and/or columns of the DataFrame using loc and iloc (location and
integer-location).
Example:
print(df.loc['Maria':'Isaac','maths'])
print(df.iloc[0:2,0])
Q. What is the difference between iloc and loc?
Notes:
16
Biochemical Society Online Training Course
Practical Python for beginners: a Biochemist's guide
Module 7 - Using Jupyter notebooks
By the end of this module, you will be able to:
• Launch a Jupyter notebook
• Add code to a notebook using cells and run that code
• Add cells for text to a notebook
7.2 - Opening Jupyter notebooks
1. Launch Jupyter notebooks from Anaconda navigator.
2. You will find a browser window opens with a web-based file explorer. The files seen are
specific to you as the notebook is running on your machine (not the internet).
3. Start a new notebook using the New button and choosing Python3. This opens a new,
currently untitled, notebook with a single code cell.
7.3 – 7.6 Notebooks have cells
• A notebook has one code ‘cell’ at first. You can type Python code and comments into it.
• Run the code by pressing the Run button which is on the tool bar at the top.
• The output will appear underneath along with an additional code cell.
• A notebook can have as many cells as you like to lay out your work. Cells can be code or text.
You can use a text cell to write about your work.
• To turn a cell from code to text (or vice versa), we use the dropdown menu on the right of
the toolbar and choose the 'markdown' option.
• If you want to alter one of the cells, click on it, edit as you require and run it again.
Notes:
17
Biochemical Society Online Training Course
Practical Python for beginners: a Biochemist's guide
Module 8 - Summarising, analysing and visualising biochemical data
By the end of this module, you will be able to:
• Subset a Pandas DataFrame
• Use the seaborn package to create scatterplots and regression plots
• Use the scikit-learn package to carry out a linear regression
• Access regression model estimates for use
Notes:
18
Biochemical Society Online Training Course
Practical Python for beginners: a Biochemist's guide
Module 9 – Case study, Exploring Metagenomics Data
By the end of this module, you will be able to:
• Use your understanding of data types and control statements to follow a complex example
• Run complex code in a Jupyter notebook
• Appreciate how to compare genes in a metagenomic dataset to those in specific pathways
using Python
Notes:
19