0% found this document useful (0 votes)
72 views28 pages

Python Lectures 2

This document shows a brief overview about programming in Python. It's an introductory presentation about programming language, especially in Python.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
72 views28 pages

Python Lectures 2

This document shows a brief overview about programming in Python. It's an introductory presentation about programming language, especially in Python.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 28

FIRST

STEPS
TOWARDS
PROGRAMMING
PYTHON FOR
GENOMIC DATA
SCIENCE

Interactive Mode Programming


Python 3.4.3 (v3.4.3:9b73f1c3e601, Feb 23 2015, 02:52:03)
[GCC 4.2.1 (Apple Inc. build 5666) (dot 3)]
>>> print(Hello world!)
Hello world!
>>>

NUMBERS AND STRINGS

Using Python As A Calculator


>>>
10
>>>
4.5
>>>
100
>>>
5.0
>>>
2
>>>
17

5+5
10.5-2*3
10**2

** is used to calculate powers

17.0 // 3

floor division discards the fractional part

17 % 3
5 * 3 + 2

the % operator returns the remainder after division

* takes precedence over +, -

Numbers can be different types


1. Integer numbers:
>>> type(5)
<type 'int>

2. Real numbers:
>>> type(3.5)
<type 'float'>
>>> 12/5
2
>>> float(12)/5
2.4
>>> 12.0/5
2.4

What happened?

3. Complex numbers:
>>> type(3+2j)
<type complex'>
>>> (2+1j)**2
(3+4j)
5

Strings
Single quoted strings:

>>> atg
atg
Double quoted strings:
>>> atg
atg

>>> This is a codon, isn't it?'


File "<stdin>", line 1
'This is a codon, isn't it?'
^
SyntaxError: invalid syntax
Why? Use double quotes instead:
>>> This is a codon, isn't it?
"This is a codon, isn't it?
Or use \ to escape quotes:
>>> This is a codon, isn\'t
it?
"This is a codon, isn't it?

Strings (contd)
String can span multiple lines with triple quotes:
use triple-quotes: """...""" or '''...'''
>>> """
... >dna1
... atgacgtacgtacgtacgtagctagctataattagc
... atgatdgdgtata
... >dna2
... ttggtgtcgccgcccccgcgttttaatatgcgctat
... """
\n>dna1\natgacgtacgtacgtacgtagctagctataattagc
\natgatdgdgtata
\n>dna2\nttggtgtcgccgcccccgcgttttaatatgcgctat\n
\n is a special character that signiWies a new line
7

Escape Characters
Backslash is an escape character that gives
the next character a special meaning:

Construct

Meaning

\n

Newline

\t

Tab

\\

Backslash

\"

Double quote
8

Printing Strings Nicely


The print() function produces a more readable output, by

omitting the enclosing quotes and by printing escaped and special


characters:
End of lines are automatically included in the string,
but its possible to prevent this by adding a \ at the
>>> print("""\
end of the line.
... >dna1
... atgacgtacgtacgtacgtagctagctataattagc
... atgatdgdgtata
... >dna2
... ttggtgtcgccgcccccgcgttttaatatgcgctat
... """)
>dna1
atgacgtacgtacgtacgtagctagctataattagc
atgatdgdgtata
>dna2
ttggtgtcgccgcccccgcgttttaatatgcgctat
9

Basic String Operators


+
*
in
not in

concatenate strings
copy string (replicate)
membership: true if Wirst string exists inside
non-membership: true if Wirst string does not
exist in second string

>>> 'atg' + 'gtacgtccgt'


'atggtacgtccgt'
>>> 'atg'*3
'atgatgatg'
>>> 'atg' in 'atggccggcgta'
True
>>> 'n' in 'atgtgggg'
False
10

VARIABLES

11

Variables
Variables are storage containers for numbers, strings, etc.
The equal sign (=) is used to assign a value to a variable:
>>> codon = 'atg'

Note: no result is displayed before the next


interactive prompt

>>> dna_sequence="gtcgcctaaccgtatatttttcccgt"

If a variable is not deWined (assigned a value), trying to use

it will give you an error:


>>> dna
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
NameError: name 'dna' is not defined

12

Variables (contd)
The name we associate to a value is called a variable
because its value can change:
>>>
>>>
4
>>>
>>>
4
>>>
>>>
7
>>>
4

a=4
a
b=a
b
b=b+3
b

#yes, this is legal

a
13

Variables Tips And Notes


Tips:
Give meaningful names to variables: e.g. dnaSequence is better than s

Notes: variable names follow specific rules:


They are case-sensitive. This means that the following variables are
different (i.e. they refer to different values):
>>> myvariable = 1
>>> MyVariable = 2
>>> MYVARIABLE = 3
Variable names can ONLY consist of letters, numbers (not the first letter), and
the underscore character.
Valid names: name, _str, DNA , sequence1
Invalid names: 1string, name#, year@20

14

More String Operators


[x]
[x:y]

indexing: gives the character at position x in a string


slicing: gives a substring between positions x and y in a string

>>> dna="gatcccccgatattatttgc"
>>> dna[0]
'g'
>>> dna[-1]

the position in the string is called its index and the Wirst
character has index 0
indices may also be negative numbers, which start counting
from the right, beginning at -1

'c'
>>> dna[-2]
'g'
>>> dna[0:3]

the start is always included, and the end always excluded

'gat
>>> dna[:3]
gat
>>> dna[2:]
tcccccgatattatttgc

an omitted Wirst index defaults to zero


an omitted second index defaults to the size of the
whole string being sliced

15

A Useful String Function


The built-in function len() returns the length of a string:
>>> dna="acgctcgcgcggcgatagctgatcgatcggcgcgctttttttttaaaag"
>>> len(dna)
49

Note. Python has a number of functions and types built into it that are
always available. Some other examples:
type()
print()

You can Wind more information about any built-in function by using the
pydoc command in a shell or the help function of the interpreter. Try:
>>> help(len)
16

Strings as Objects
string variable

value of the dna string

>>> dna="aagtccgcgcgctttttaaggagccttttgacggc"

String variables are things that know more than their values. They are
objects. They can also perform speciWic actions (functions), called methods
that only they can perform:
>>> dna.count('c')
9

the . (dot) operator can be interpreted as: ask object dna to do


something such as: count how many c characters it contains.

The method count() can also count substrings:


>>> dna.count('gc')
5

17

More Useful String Functions


>>> dna
'acgctcgcgcggcgatagctgatcgatcggcgcgctttttttttaaaag

there is also a lower() function

>>> dna.upper()

'ACGCTCGCGCGGCGATAGCTGATCGATCGGCGCGCTTTTTTTTTAAAAG'
>>> dna

Note that the value of the string dna did not change.

'acgctcgcgcggcgatagctgatcgatcggcgcgctttttttttaaaag
>>> dna.find('ag')
16
>>> dna.find('ag',17)

only the Wirst occurrence of ag is reported


we need to tell the method where to start looking

47
>>> dna.rfind(ag)

same as find(), but search backwards in string

47
You can Wind all methods for the type str:
>>> help(str)

18

More Useful String Functions (contd)


>>> dna
'acgctcgcgcggcgatagctgatcgatcggcgcgctttttttttaaaag
>>> dna.islower()
True
>>> dna.isupper()
False

Another useful string method for bioinformatics programming is:


>>> dna.replace('a','A')
'AcgctcgcgcggcgAtAgctgAtcgAtcggcgcgctttttttttAAAAg
19

A First Program In Python


read DNA sequence from user
>>> dna = 'acgctcgcgcggcgatagctgatcgatcggcgcgctttttttttaaaag'

count the number of Cs in DNA sequence


>>> no_c=dna.count('c')

count the number of Gs in DNA sequence


>>> no_g=dna.count('g')

determine the length of the DNA sequence


>>> dna_length=len(dna)

compute the GC%

The .0 after 100 is only required in Python 2.x

>>> gc_percent=(no_c+no_g)*100.0/dna_length

print GC%
>>> print(gc_percent)
53.06122448979592

20

Executing Code From A File


Write the code of your program in a Wile and then pass the Wile
to the Python interpreter:
gc.py

It is a convention for files containing python code to have a py extension

dna = 'acgctcgcgcggcgatagctgatcgatcggcgcgctttttttttaaaag'
no_c = dna.count('c')
no_g = dna.count('g')
dna_length = len(dna)
gc_percent = (no_c + no_g) * 100.0 / dna_length
print(gc_percent)

Now pass this Wile to the interpreter:


~/Work/courses/python>python gc.py
53.06122448979592
21

Executing Code From a File (contd)


gc.py
#!/usr/bin/python

You can make your file executable if you put this


line at the beginning of your file

dna = 'acgctcgcgcggcgatagctgatcgatcggcgcgctttttttttaaaag'
no_c=dna.count('c')
no_g=dna.count('g')
dna_length=len(dna)
gc_percent=(no_c+no_g)*100/dna_length
print(gc_percent)

Now you can run your python program like this:


~/Work/courses/python>./gc.py
53.06122448979592

22

Executing Code From a File (contd)


Make your python Wile executable by setting the x execution bit.
chmod a+x gc.py
You can Wind where the Python interpreter is located by running which:
~/Work/courses/python>which python
/usr/bin/python

Now you can run your python program like this:


~/Work/courses/python>./gc.py
53.06122448979592

23

Adding Comments
gc.py
#! /usr/bin/python

multiple line comments are included in between


This is my first Python program.
It computes the GC content of a DNA sequence.

# get DNA sequence:


everything that follows a # is ignored up to the line end
dna = 'acgctcgcgcggcgatagctgatcgatcggcgcgctttttttttaaaag'
no_c=dna.count('c') # count Cs in DNA sequence
no_g=dna.count(g') # count Gs in DNA sequence
dna_len=len(dna) # get the length of the DNA sequence
gc_perc=(no_c+no_g)*100.0/dna_len # compute gc percentage
print(gc_perc) # print GC% to screen

24

Reading Input
Python 2.x Python 3.x
>>> dna=raw_input("Enter a DNA sequence, please:")
Enter a DNA sequence, please:agtagcatgaggagggacttc
>>> dna
'agtagcatgaggagggacttc'

input() (Python 3) and raw_input() (Python 2) always return strings:


>>> my_number=raw_input(Please enter a number:")
Please enter a number:10
>>> type(my_number)
<type 'str'>
use the built-in function int() to
>>> actual_number=int(my_number)
transform the string into an integer
>>> type(actual_number)
<type 'int'>
25

Some Conversion Functions


Function

Description

int(x [,base]) converts x to an integer

float(x)

converts x to a Wloating-point (real)


number

complex(real
[,imag])

creates a complex number

str(x)

converts x to a string

chr(x)

converts an integer to a character

>>> chr(65)
A'
>>> str(65)
65'

every number <256 has a unique character associated to it

26

Fancier Output Formatting


leave out the parentheses in Python 2.x
>>> print("The DNA sequences GC content is",gc_perc,%)
The DNA sequences GC content is 53.06122448979592 %

The value of the gc_perc variable has many digits following the dot
which are not very signiWicant. You can eliminate the display of too many
digits by imposing a certain format to the printed string:
note the double % to print a % symbol
>>> print("The DNA sequences GC content is %5.3f %% % gc_perc)
The DNA sequences GC content is 53.061 %
value that is
formatted
percent operator separating the
formatting string and the value to
replace the format placeholder 27

formatting string

The Formatting String


total number of digits

% 5 . 3 f

letter indicating the type


of the value to format

indicates that a number of digits


format follows following the dot
Other formatting characters:
>>> print("%d" % 10.6) # 10.6 is first transformed into an integer
10
>>> print("%3d" % 10.6) # notice the space in front of 10
10
>>> print("%o"% 10) # use %o for an octal/%x for a hexadecimal integer
12
>>> print("%e" % 10.6) # 'E' notation uses powers of 10
1.060000e+01
>>> print("%s" % dna)
gtataggagatatattag
28

You might also like