Python Lectures 2
Python Lectures 2
STEPS
TOWARDS
PROGRAMMING
PYTHON
FOR
GENOMIC
DATA
SCIENCE
5+5
10.5-2*3
10**2
17.0 // 3
17 % 3
5 * 3 + 2
2. Real
numbers:
>>> type(3.5)
<type 'float'>
>>> 12/5
2
>>> float(12)/5
2.4
>>> 12.0/5
2.4
What happened?
3. Complex
numbers:
>>> type(3+2j)
<type complex'>
>>> (2+1j)**2
(3+4j)
5
Strings
Single
quoted
strings:
>>> atg
atg
Double
quoted
strings:
>>> atg
atg
Strings
(contd)
String
can
span
multiple
lines
with
triple
quotes:
use
triple-quotes:
"""..."""
or
'''...'''
>>> """
... >dna1
... atgacgtacgtacgtacgtagctagctataattagc
... atgatdgdgtata
... >dna2
... ttggtgtcgccgcccccgcgttttaatatgcgctat
... """
\n>dna1\natgacgtacgtacgtacgtagctagctataattagc
\natgatdgdgtata
\n>dna2\nttggtgtcgccgcccccgcgttttaatatgcgctat\n
\n
is
a
special
character
that
signiWies
a
new
line
7
Escape
Characters
Backslash
is
an
escape
character
that
gives
the
next
character
a
special
meaning:
Construct
Meaning
\n
Newline
\t
Tab
\\
Backslash
\"
Double quote
8
concatenate
strings
copy
string
(replicate)
membership:
true
if
Wirst
string
exists
inside
non-membership:
true
if
Wirst
string
does
not
exist
in
second
string
VARIABLES
11
Variables
Variables
are
storage
containers
for
numbers,
strings,
etc.
The
equal
sign
(=)
is
used
to
assign
a
value
to
a
variable:
>>> codon = 'atg'
>>> dna_sequence="gtcgcctaaccgtatatttttcccgt"
12
Variables
(contd)
The
name
we
associate
to
a
value
is
called
a
variable
because
its
value
can
change:
>>>
>>>
4
>>>
>>>
4
>>>
>>>
7
>>>
4
a=4
a
b=a
b
b=b+3
b
a
13
14
>>> dna="gatcccccgatattatttgc"
>>> dna[0]
'g'
>>> dna[-1]
the
position
in
the
string
is
called
its
index
and
the
Wirst
character
has
index
0
indices
may
also
be
negative
numbers,
which
start
counting
from
the
right,
beginning
at
-1
'c'
>>> dna[-2]
'g'
>>> dna[0:3]
'gat
>>> dna[:3]
gat
>>> dna[2:]
tcccccgatattatttgc
15
Note. Python
has
a
number
of
functions
and
types
built
into
it
that
are
always
available.
Some
other
examples:
type()
print()
You
can
Wind
more
information
about
any
built-in
function
by
using
the
pydoc
command
in
a
shell
or
the
help
function
of
the
interpreter.
Try:
>>> help(len)
16
Strings
as
Objects
string
variable
>>> dna="aagtccgcgcgctttttaaggagccttttgacggc"
String
variables
are
things
that
know
more
than
their
values.
They
are
objects.
They
can
also
perform
speciWic
actions
(functions),
called
methods
that
only
they
can
perform:
>>> dna.count('c')
9
17
>>> dna.upper()
'ACGCTCGCGCGGCGATAGCTGATCGATCGGCGCGCTTTTTTTTTAAAAG'
>>> dna
Note that the value of the string dna did not change.
'acgctcgcgcggcgatagctgatcgatcggcgcgctttttttttaaaag
>>> dna.find('ag')
16
>>> dna.find('ag',17)
47
>>> dna.rfind(ag)
47
You
can
Wind
all
methods
for
the
type str:
>>> help(str)
18
>>> gc_percent=(no_c+no_g)*100.0/dna_length
print GC%
>>> print(gc_percent)
53.06122448979592
20
dna = 'acgctcgcgcggcgatagctgatcgatcggcgcgctttttttttaaaag'
no_c = dna.count('c')
no_g = dna.count('g')
dna_length = len(dna)
gc_percent = (no_c + no_g) * 100.0 / dna_length
print(gc_percent)
dna = 'acgctcgcgcggcgatagctgatcgatcggcgcgctttttttttaaaag'
no_c=dna.count('c')
no_g=dna.count('g')
dna_length=len(dna)
gc_percent=(no_c+no_g)*100/dna_length
print(gc_percent)
22
23
Adding
Comments
gc.py
#! /usr/bin/python
24
Reading
Input
Python
2.x
Python
3.x
>>> dna=raw_input("Enter a DNA sequence, please:")
Enter a DNA sequence, please:agtagcatgaggagggacttc
>>> dna
'agtagcatgaggagggacttc'
Description
float(x)
complex(real
[,imag])
str(x)
converts x to a string
chr(x)
>>> chr(65)
A'
>>> str(65)
65'
26
The
value
of
the
gc_perc
variable
has
many
digits
following
the
dot
which
are
not
very
signiWicant.
You
can
eliminate
the
display
of
too
many
digits
by
imposing
a
certain
format
to
the
printed
string:
note
the
double
%
to
print
a
%
symbol
>>> print("The DNA sequences GC content is %5.3f %% % gc_perc)
The DNA sequences GC content is 53.061 %
value
that
is
formatted
percent
operator
separating
the
formatting
string
and
the
value
to
replace
the
format
placeholder
27
formatting string
% 5 . 3 f