0% found this document useful (0 votes)
26 views28 pages

Lec 2 PDF

This document provides an overview of Biopython, a widely used Python package for bioinformatics. It discusses why Python is well-suited for bioinformatics applications due to its cross-platform use, built-in features, dynamic and modular nature. The document then describes several popular Python tools for bioinformatics including Biopython, PyMOL, Scikit-learn, and NumPy. Biopython is highlighted as an open-source collection of Python modules for biological computations that can work with DNA, RNA, protein sequences and structures. The document also discusses common data types used as inputs for Biopython.

Uploaded by

ziadmohamad3412
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
26 views28 pages

Lec 2 PDF

This document provides an overview of Biopython, a widely used Python package for bioinformatics. It discusses why Python is well-suited for bioinformatics applications due to its cross-platform use, built-in features, dynamic and modular nature. The document then describes several popular Python tools for bioinformatics including Biopython, PyMOL, Scikit-learn, and NumPy. Biopython is highlighted as an open-source collection of Python modules for biological computations that can work with DNA, RNA, protein sequences and structures. The document also discusses common data types used as inputs for Biopython.

Uploaded by

ziadmohamad3412
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 28

BioPython

Edited by
Python Programming
in bioinformatics
Why Python?
 Python can be installed and used on different platforms,
including Windows, Mac, and Linux.
 Python has several built-in features that make it well-
suited for bioinformatics applications.
 Python‟s dynamic and modular nature allows researchers
to reuse and share code, reducing development time and
increasing productivity.
 Python has a relatively simple syntax, making it easy to
learn and use.
 Python is a high-level language that offers advanced data
structures and functions that make it easy to work with
complex biological data.
Tools for Python Programming in
Bioinformatics

1. Biopython
 One of the most widely used bioinformatics packages for Python. Biopython is an
open-source collection of Python modules that provides a set of powerful and easy-
to-use tools for performing biological computations.
 Biopython requires very less code and comes up with the following advantages −

 Some of the tasks of Biopython are:


 Biopython provides tools for working with DNA, RNA, and protein sequences,
including sequence alignment, motif and pattern matching, and translation between
nucleotide and protein sequences.
 Biopython includes tools for working with protein structures, such as parsing and
manipulating PDB files and performing structure comparisons.
 Biopython supports file formats commonly used in bioinformatics, such as FASTA,
GenBank, and BLAST.
 Biopython includes tools for visualizing biological data, such as sequence alignment
plots and phylogenetic trees.
 BioSQL − Standard set of SQL tables for storing sequences plus features and
annotations.
2. PyMOL
 PyMOL is a free and open-source molecular
visualization software used in bioinformatics. It creates
high-quality images and animations of molecular
structures, which can be useful in a variety of applications
including drug discovery, protein engineering, and
molecular biology research.
 PyMOL is written in Python and can easily integrate with
other Python-based tools and libraries.
3. Scikit-learn
 Scikit-learn is a Python library that provides tools for machine
learning. It is a powerful and flexible tool for machine learning
applications in bioinformatics which provides a wide range of
algorithms and tools that can be used to analyze complex
biological datasets and make predictions about biological
systems.
 Some uses of Scikit-learn in bioinformatics are:
 It can be used to classify biological samples based on gene
expression data or proteomics data.
 It can be used to cluster biological samples or reduce the
dimensionality of large datasets.
 It can be used to develop machine learning models to predict
the structure of proteins and protein-protein interactions
based on their amino acid sequences.
4. NumPy (Numerical Python)
 NumPy is a Python library that is used for working with
numerical data in Python. It is extensively used in Pandas,
SciPy, Matplotlib, Scikit-learn, and many other scientific
Python packages. NumPy provides a multidimensional
array object called „ndarray‟ and can be used to perform a
wide range of mathematical operations on arrays.
To install and import Biopython:
What are the input data types for Biopython?
 Text file:
 1. Sequence file (sequence.txt)

 2. Cell Microarray
What are the input data types for Biopython?

 CSV file:

 FASTA File:
What are the input data types for Biopython?

 Other files format like:


 Blast output
 GenBank
 PubMed and Medline
 SCOP, including „dom‟ and „lin‟ files
 UniGene
 SwissProt
Overview on

Some key notes in Python


Data Types: Number Types

int, float, complex

1. Integer 2. Real 3. Complex


numbers: numbers: numbers:
>>> type(4) >>> type(4.5) >>> type(3+2j)

<type 'int’>
<type ’float’>
<type ’complex'>

>>> (2+1j)**2
>>> 17/5
(3+4j)

3
Data Types: Strings

 Single quote:
>>> ’atg’
’atg’

 Double quote:
>>> ”atg”
’atg’
>>> ’This is a codon, isn’t it?’
Invalid Syntax

>>> ” This is a codon, isn’t it?” # Or >>> ’This is a codon, isn\’t it?’

This is a codon, isn’t it?


String Operators
 Escape character: Backslash „\‟ , gives special meaning for the
following character.
 To produce more readable outputs: print()
 String Operators: Construct Meaning
 Concatenate + \n Newline
 Copy or replicate * \t Tab
 Checks if first IS in second string in \\ Backslash
 Checks if first IS NOT in second string not in \” Double Qoute
>>> ’atg’ + ’gcc’
’atggcc’
>>> ’atg’ * 3
’atgatgatg’
>>> ’tg’ in ’atgatgatg’
True
>>> ’tc’ in ’atgatgatg’
False
Variables
 Variables are containers that store numbers, strings,
and other data types and structures.
 Variables are names given to values that can be changed.
 Variables are assigned values using the equal sign (=).
>>> codon = ’tag'
>>> dna_sequence = "gtcgcctaaccgtatatttttcccgt"
 A variable cannot be used if not assigned a
value, an error occurs.
>>> dna

NameError: name 'dna' is not defined


Variables
 Naming
 Select meaningful names: dnaSequence, is better than s.
 Follow naming rules:
 Case-sensitive :
 DnaSequence = 1
 DNASEQUENCE = 2
 Dnasequence = 3
 Consists of letters and numbers combinations, and
underscore.
 Dna1, dna_1, dnaSeq.
 Numbers should not be the first letter.
 Invalid: 1dna
 No special characters.
 dna#, dna@1
String Operators
 [i] : returns the character in index i in a string. (index)
 [i:j] : returns the substring between index i and index j in a string. (slice)
>>> dna="gatcccccgatattatttgc”
>>> dna[0]
'g’ - The first position in a string is position 0
>>> dna[-1]
'c’ - Counting from the right using negative
indices, begins with -1
>>> dna[-2]
'g’
>>> dna[0:3]
'gat’ - In slices: Start index included, end index
excluded
>>> dna[:3]
‘gat’ - Ommiting start index means use default, 0
>>> dna[2:]
‘tcccccgatattatttgc’ - Ommiting end index means use default, end
of string
Strings as Objects
• String variables are objects that can perform specific
actions using built-in methods:
>>> dna="gatcccccgatattatttgc
>>> len(dna)
20
>>> dna.count(‟t') - Count characthers
7
>>> dna.count(‟ga') - Count substrings
2
Strings Functions
>>> dna="gatcccccgatattatttgc”
>>> dna.upper() - Convert all to upper case, lower(): Lower
case
GATCCCCCGATATTATTTGC
>>> dna.find(‟ga') - Returns the first occurrence of „ga‟, -1 if not
found
0
>>> dna.find(‟at‟,5) - Returns the first occurrence of „ga‟ starting
from index 5
9
>>> dna.rfind(„ga‟) - Returns the last occurrence of „ga‟, -1 if not
8
>>> dna.islower() - True if all is lower case
True
>>> dna.isupper()
False
>>> dna.replace('a','A') - Replaces all ‟a‟ with ‟A‟
Inputs

>>> dna = input("Enter a DNA sequence, please:")


Enter a DNA sequence, please: agtagcatgaggagggacttc
>>> dna
agtagcatgaggagggacttc
Examples:
 Create a random DNA sequence of length 10
import random
alphabet = "AGCT"
sequence = ""
for i in range(10):
index = random.randint(0, 3)
sequence = sequence + alphabet[index]
Read from a text file
readlines(). read().

•readlines(x); read up to x bytes. If you read(x); read up to x bytes in a file. If


don’t supply a size, it reads all the data you don’t supply the size, it reads the
until it reaches a newline (\n) or the end entire file.
of a paragraph. The output is displayed as strings only
once.
Write a text file
Notes about file modes

What is + means in open()?

• The + adds either reading or writing to an existing open mode (update mode).
• The r means reading file; r+ means reading and writing the file.
• The w means writing file; w+ means reading and writing the file.
• The a means writing file, append mode; a+ means reading and writing file, append mode.
Examples:
 Difference between r and r+ in open()

with open('file.txt„, „r‟) as f: with open('file.txt', 'r+') as f:


print(f.read()) f.write("new line \n")
Output Output
On Terminal On Terminal
new line
welcome to python 1
welcome to python 1
welcome to python 2
welcome to python 2
welcome to python 3
welcome to python 3
welcome to python 4
welcome to python 4

with open('file.txt', 'r') as f:


f.write("test \n")
io.UnsupportedOperation: not writable
Examples:
 Difference between w and w+ in open()
with open('file.txt', 'w+') as f: with open('file.txt', 'w+') as f:
f.write("test 1\n") f.write("test 1\n")
f.write("test 2\n") f.write("test 2\n")
f.write("test 3\n") f.write("test 3\n")
Output f.seek(0)
file.txt lines = f.read()
test 1
test 2
print(lines)
test 3 Output
Terminal
test 1
test 2
test 3

Note: f. seek(0)  move the file pointer to begining


Examples:
 Difference between a and a+ in open()
with open('file.txt', 'a') as f: with open('file.txt', 'a+') as f:
f.write(“3") f.seek(0)
Output lines = f.readlines()
file.txt f.write("\n" + str(len(lines)))
welcome to python 1
welcome to python 2 Output
welcome to python 3 file.txt
welcome to python 4 welcome to python 1
3 welcome to python 2
welcome to python 3
welcome to python 4
4
Assignment
 Apply all the discussed functions on a text file produced
by your self

You might also like