0% found this document useful (0 votes)
86 views11 pages

User Guide For Propy 1.0: 1.1 What Is This?

This document provides an overview of how to use the propy Python package for analyzing protein sequences. The propy package can be installed on Linux and Windows systems from its download page. It allows users to obtain protein sequences from Uniprot by ID and calculate various protein descriptors, including amino acid composition, autocorrelation, and pseudo amino acid composition based on properties from the AAindex database. The propy package supports calculating over 1,000 built-in protein features for characterization and analysis of protein sequences.

Uploaded by

suryasan
Copyright
© Attribution Non-Commercial (BY-NC)
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
86 views11 pages

User Guide For Propy 1.0: 1.1 What Is This?

This document provides an overview of how to use the propy Python package for analyzing protein sequences. The propy package can be installed on Linux and Windows systems from its download page. It allows users to obtain protein sequences from Uniprot by ID and calculate various protein descriptors, including amino acid composition, autocorrelation, and pseudo amino acid composition based on properties from the AAindex database. The propy package supports calculating over 1,000 built-in protein features for characterization and analysis of protein sequences.

Uploaded by

suryasan
Copyright
© Attribution Non-Commercial (BY-NC)
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 11

User Guide for propy 1.

1.1 What is this?

This document is intended to provide an overview of how one can use the propy functionality from Python. Its not comprehensive and its not a manual. If you find mistakes, or have suggestions for improvements, please either fix them yourselves in the source document (the .py file) or send them to the mailing list: [email protected]

1.2 Install the propy package

propy has been successfully tested on Linux and Windows systems. The author could download the propy package from http://code.google.com/p/protpy/downloads/list (.zip and .tar.gz). The install process of propy is very easy:

On Windows: (1): download the propy package (.zip) (2): extract or uncompress the .zip file (3): cd propy-1.0 (4): python setup.py install On Linux: (1): download the propy package (.tar.gz) (2): tar -zxf propy-1.0.tar.gz (3): cd propy-1.0 (4): python setup.py install or sudo python setup.py install 1.3 Download proteins from Uniprot

You can get a protein sequence from the Uniprot website by providing a Uniprot ID.

You can get the window 2+1 sub-sequences whose central point is the given amino acid ToAA.

You can also get several protein sequences by providing a file containing Uniprot IDs of these proteins.

The downloaded protein sequences have been saved in "F:/target1.txt".

You could check whether the input sequence is a valid protein sequence or not.

The output is the number of the protein sequence if it is valid; otherwise 0.

1.4 Obtaining the property from the AAindex database

You could get the properties of amino acids from the AAindex database by providing a property name (e.g., KRIW790103). The output is given in the form of dictionary.

If the user provides the directory containing the AAindex database (the AAindex database could be downloaded from ftp://ftp.genome.jp/pub/db/community/aaindex/. It consists of three files: aaindex1, aaindex2 and aaindex3), the program will read the given database to get the property.

It should be noted that the propy package has contained the AAindex database. The GetAAIndex1 methods in AAIndex will get the property from the aaindex1 database.

If the user does not provide the directory containing the AAindex database, the program will downlaod the three databases (i.e., aaindex1, aaindex2 and aaindex3) to obtain the property. It should be noted that the downloaded AAindex will be saved in the current directory. You can also specify the directory according to your needs.

The downloaded databases are saved in F disk. The GetAAIndex23 methods in AAIndex will get the property from the aaindex2 and aaindex3 databases.

1.5 Calculating protein descriptors

There are two ways to calculate protein descriptors in the propy package. One is to directly use the corresponding methods, the other one is firstly to construct a GetProDes class and then run their methods to obtain the protein descriptors. It should be noted that the output is a dictionary form, whose keys and values represent the descriptor name and the descriptor value, respectively. The user could clearly understand the meaning of each descriptor.

Use functions:

Use GetProDes class:

Example 1: Calculating amino acid composition descriptors

Example 2: Calculating Geary autocorrelation descriptors

Example 3: Calculating pseudo amino acid composition descriptors

When we change the values of lamda and weight, we could get different PAAC values. Note that the number of PAAC depends on the choice of lamda. If lamda = 10, we can obtain 20+lamda=30 PAAC descriptors.

Example 4: Calculating all protein descriptors The GetProDes class includes a built-in method which can calculate all protein descriptors.

Example 5: Calculating protein descriptors based on the user-defined property

The user could provide some property in the form of dictionary in python. Thus, propy could calculate the descriptors based on the user-defined property.

Example 6:

Calculating protein descriptors based on the property from AAindex

A powerful ability of propy is that it can easily calculate thousands of protein features through automatically obtaining the needed property from AAindex.

Table List of propy computed features for protein sequences

Feature group Amino acid composition

Features Amino acid composition Dipeptide composition Tripeptide composition

Number of descriptors 20 400 8000 240a 240 a 240 a 21 21 105 60 100 50 b 50c

Autocorrelation

Normalized Moreau-Broto autocorrelation Moran autocorrelation Geary autocorrelation

CTD

Composition Transition Distribution

Quasi-sequence order

Sequence order coupling number Quasi-sequence order descriptors

Pseudo amino acid composition

Pseudo amino acid composition Amphiphilic pseudo amino acid composition

The number depends on the choice of the number of properties of amino acid and the choice of the maximum values

of the lag. The default is use eight types of properties and lag = 30.
b

The number depends on the choice of the number of the set of amino acid properties and the choice of the lamda

value. The default is use three types of properties proposed by Chou et al and lamda = 30.
c

The number depends on the choice of the lamda vlaue. The default is that lamda = 30.

You might also like