0% found this document useful (0 votes)
148 views6 pages

Task Een

This document reviews different approaches for predicting the subcellular locations of proteins. It discusses how subcellular location is important for understanding protein function. It describes the main subcellular locations in eukaryotic cells like the nucleus, endoplasmic reticulum, Golgi apparatus. It reviews four main categories of prediction approaches: 1) those based on amino acid composition, 2) those using sorting signals, 3) homology-based approaches, and 4) hybrid methods. The performance and coverage of different predictors is compared. Machine learning algorithms like neural networks, support vector machines, and Bayesian networks are discussed for predicting subcellular locations from protein features.

Uploaded by

Imran Javed
Copyright
© Attribution Non-Commercial (BY-NC)
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
148 views6 pages

Task Een

This document reviews different approaches for predicting the subcellular locations of proteins. It discusses how subcellular location is important for understanding protein function. It describes the main subcellular locations in eukaryotic cells like the nucleus, endoplasmic reticulum, Golgi apparatus. It reviews four main categories of prediction approaches: 1) those based on amino acid composition, 2) those using sorting signals, 3) homology-based approaches, and 4) hybrid methods. The performance and coverage of different predictors is compared. Machine learning algorithms like neural networks, support vector machines, and Bayesian networks are discussed for predicting subcellular locations from protein features.

Uploaded by

Imran Javed
Copyright
© Attribution Non-Commercial (BY-NC)
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 6

Performance of Different Approaches for Predicting the Subcellular Locations of Proteins: A Review

Muhammad Taskeen Raza


Department of Electrical Engineering, VET Department of Electrical Engineering, LCWU Lahore, Pakistan [email protected]

Noor M.Sheikh
Department of Electrical Engineering University of Engineering & technology Lahore, Pakistan [email protected]

Muhammad Abuzar Fahiem


Department of Computer Science Lahore College for Women University Lahore, Pakistan [email protected]

Ahmed M. Mehdi
Institute for Molecular Bioscience The University of Queensland Brisbane, Australia [email protected]

Abstract-Subcellular location of a protein is closely related to its function. Knowing the subcellular localization of proteins is important in molecular cell biology, proteomics, and system biology and drug discovery. Different predictors have been developed that predict presence, location and interaction of molecules using various of computational Machine techniques and including artificial probabilistic models Learning

subcellular location of particular proteins even seems more valuable. Wide range of Genomic projects has generated protein

sequence, interaction and expression data of almost every organism in large quantity [I]. However, the functions of many proteins are still un-annotated. Different research efforts are being carried out to interpret the role of these proteins in the cell particularly and in the organism generally. There are a number of ways through which Genome function annotation can be interpreted, like, protein composition, protein structure and shape, protein interaction, subcellular location, etc. Subcellular Locations information of the proteins is the key regarding this knowledge discovery in Proteomics. Subcellular locations are actually the compartments in the cells of the living bodies with defined walls and boundaries in eukaryotic cells. These compartments vary in number and their functions in cells. Cells can be classified in two main types, prokaryotic cells and eukaryotic cells. Prokaryotic cells are simpler, lacking of nucleus and defined boundaries, best characterized example is of bacteria cells. While eukaryotic cells are greater in size and volume and having membrane bound well defined boundaries of the cell compartments, the subcellular locations, example is cell of mammalian. A typical mammalian's eukaryotic cell has the following subcellular locations, Nucleolus, reticulum and Nucleus, ribosome vesicle, rough endoplasmic reticulum, Golgi apparatus, cytoskeleton, smooth endoplasmic lysosome, mitochondria, These vacuole, are the cytoplasm, possible centriole.

intelligence algorithms.

These predictors partially cover the

different aspects of exploration of subcellular locations. Some of them are equally well applicable to many types of organisms (human, yeast, mouse, bacteria) while some are specific and focus on better performance in accuracy of the predicted results. Similarly some of the techniques cover "few" number of proteins but more accurately and on the other side some algorithms predict sub cellular locations of "many" proteins at the expense of prediction accuracy. This research is a review of most common and efficient techniques grouped in four in total, which are 1amino acid composition and order-based predictors 2-sorting signal predictors 3- homology-based predictors and 4-hybrid methods that use several sources of information to predict localization. The work Elucidate the performance and coverage comparisons among the subcellular locations predictors.

Keywords- Predictor; Subcellular Location; Organisms; Machine Learning; Eukaryotic Cells.

I.

INTRODUCTION

Predicting subcellular localization of a protein is very crucial in determining the protein's role in the cell in general and for drug discovery process in particular. If we come to know the subcellular localization of protein in the cell, formulation of the drug and its targets can be suggested. After having reliable information of subcellular location of a protein the drugs molecules can be affixed with the protein of interest to reach its target. However, due to mutations in genes and proteins, there are high chances of unusual subcellular localizations of the proteins, such as, in case of diseases like cancer. So in this scenario the importance of information about

accommodations of the proteins. As far as Subcellular location is concerned in the cell, it is not only un-annotated itself but also there is the issue of multi compartmental Proteins. Some proteins keep on changing their localization in the cells depending on their role and function defmed by the nature in the cell. Experimental approaches and methods are being employed for the knowledge of the subcellular locations. This work is very much laborious and time consuming as well, especially when there are huge

90

numbers of proteins to be experimented in this fashion. Fortunately the issue can be resolved with satisfactory efficiency when we change the domain of analysis of such problems. Machine Learning is the area of statistical pattern recognition, which offers a variety of methods and techniques. These Probabilistic models, including the Neural Networks, Support Vector Bayesian subcellular throughput machines, can of Markov's Random Fields and be used for the inference of are such proteins. Although these techniques then Networks,

the case under consideration, the inputs are the proteins and the desired output is the inference of subcellular locations of those proteins. So there should be information about the features and properties of proteins that may lead to inference properly. Table I, provides the concise information about these features used by various researchers and also their relation with accuracy of the results. In [6] the researchers earlier utilized only interPro Motifs as main protein features. Later on, in [12] they refined their approach by using more than one protein features in Hybrid design of predictors to achieve the better accuracy in prediction. Some of the protein characteristics described below.

localizations with the

relatively probabilistic models but when we compare their experimental probabilistic model are much ahead because we are to annotate the billions of proteins which have been discovered in their composition only, not in their functions. As far as accuracy of the results is concerned definitely experimental methods are the better options always, but there is always a trade-off between throughput and accuracy of probabilistic models. And Importantly the machine learning models especially like Bayesian Networks (BN) which are also called the "Belief Networks" are not based on the frequency concept of the occurring events, but on the past record and experience of the events, called "prior probabilities". And also BN takes into account the posterior probabilities and improves the prediction accuracy of probabilistic inference. In this way when we perform the statistical analysis of the results of these models and compare with experimental results, for predicting the subcellular locations of the proteins, Interestingly, we achieve the very close results in some cases and even better in the situations when the experimental methods are difficult to perform due to unavailability of resources and/or inaccessibility to the samples. In this paper, section II is about the techniques employed, protein features utilized and data sets used by researchers. Section III discusses the results of those research works including the comparative study of the selected models and the methods of statistical analysis commonly in use. II. MATERIALS & METHODS UTILISED

Protein composition: provides the information about

the constituent of the proteins and their order.

Protein Interaction: provides the information about

the

corresponding

interacting

proteins

in

any

compartment.

Protein expression: provides the information about

the protein density/quantity in the cell at particular location.


B. Datasets

Genomic Projects are and proteomics Research is live and dynamic area for the researchers throughout the world, even some developing countries are also coming forward in this cause for the service to humanity ultimately. If we will be having more and more accurate knowledge about the basic building block of Life, the cell, and its constituents significantly the proteins then will be in much better position to cure and control the diseases and disorders in the cells, the life, of the human beings especially and other living organisms generally. Few of the data resources in use by the analysts are following.

UniprotiSwissprot : for Protein Sequence data and For Localization Data[3,4]

Statistical Approaches used for the inference and prediction of subcellular locations of proteins are different in their objectives and results and also by techniques. Some of them infer more accurate results while some cover the more number of proteins. Similarly some authors [2, 3] employed the Support Vector Machine techniques, some [4] apply Neural Networks, some [5, 6] of them use Bayesian Networks and some Bioinformatics Models. researchers used other probabilistic

BioGRID: For protein Interaction data.[7,8]

Other commonly used datasets are Hera human dataset, Yeast data set, Lifedb GFP dataset, NucProt.
Table I: Protein features utilized by different Researchers.

Researchers

Features utilized

Accuracy 81%

Z.Lu

[1]

Text annotation of Homologs.

They also use different Datasets from reliable resources like Swissprot, bioGRID etc for getting the Biological data like protein Sequence data, protein interaction data, protein expression data and Signal peptides codes. They are utilizing different features and characteristics efficiently for the input query proteins like Amino acid Composition and order [3,4],signal peptides [9,10 ], interProMotifs [6] etc.
A. Features

Sujun

[3]

Protein composition.

79 %

Scott

[6]

InterPro Motifs.

78%

Olof

[9]

Signal peptides.

85%

Scott

[12]

Selecting biological features determines unique and important characteristics of the input data on the basis of which input information can be categorized into desired output groups. In

Protein Motif, protein interaction ,signal peptides.

94%

91

C. Existting Approaches

prediction accuracy jackknife test was used. Accuracy of prediction for prokaryotic cells was achieved up to 91.4% and for eukaryotic cells up to 79.4%.These results were more accurate than previous works of Reinhardt & Hubbard[4], chou
& Elord[16]. C-2: Sorting Signal Predictors

There are many Machine Learning Approaches which are being applied for the inference of subcellular locations of the proteins, technique wise like Neural networks, Support vector machines, Bayesian network and others and feature wise Like protein sequence, protein interaction protein expression and others. All these techniques used for the prediction of desired protein locations in the cells can be categorized in the following groups from the, input Protein Characteristic, point of view
1.

Protein sorting is the process by which cell accurately transports protein to desired subcellular location in the cell. Correct positioning of the protein in the cell is very crucial and is linked with dynamic organization of a cell. Incorrect sorting can cause several diseases in cell such as cancers [15]. This process is carried on the basis of information which is contained in the protein itself, called protein sorting signal or signal Peptides. A signal peptide is usually 3-60 long amino acid chain. Emanuelsson O. and co-workers [9] proposed a tool for the prediction of subcellular locations of proteins, named as TargetP.This tool is based on Neural Network technique. Other features of this model are following. It predicts the mitochocondrian, the chloroplast, secretary pathways locations and "others "in the cell. Prediction accuracy for plants is 85% for 4 subcellular locations while for Non Plants it is 90% for three subcellular 10cations.Data set used from SWISS-PROT. TargetP was developed using two layer neural networks. First layer consists of separate sub network for each of the input signal peptides, which are actually presequence i.e. mitochondrial targeting peptides (mTP), choloroplast transit peptides (cTP) and secretory pathway (SP) signal peptides. While in non plant version this layer contains two presequence mTP and SP because it predicts about these two sites only. The d output of the first layer is applied to 2n Integrating Network Layer which ultimately outputs a score generated by the model for each input query protein, and thus predicting the subcellular location for which having the highest score. Further all the networks used in the model were of feed forward type, and having the zero or one layer of hidden neurons all the ones were trained using error back propagation Method.
C-3: Homology-based Predictors

Amino acid composition and Order based Predictors. Sorting signal predictors. Homology-based Predictors. Hybrid Predictors.

2. 3. 4.

Detailed description about the above mentioned approaches is given in the following section.

C-J: Amino acid composition and Order based Predictors

Amino

acid

based

composition

predictors

use

the

composition information of the proteins. Protein is composed of amino and acid groups chemically. These compounds are bonded with each other in specific pattern and order for a given proteins. it has been found experimentally that proteins which are located in any specific compartment of the cell have a special composition and order which is nearly common to all of the proteins of one compartment.[4 ]Several attempts have been done which use this protein feature for the prediction of subcellular locations. Hua S. & Sun Z. [3] developed a prediction system for subcellular localization, named SubLoc, They applied the Support Vector Machine (SVM) techniques for this predictor and utilized the amino acid composition feature of the proteins effectively. Support Vector Machine (SVM) technique itself was proposed by vapnik (1995,1998 ) for the applications of Pattern recognition, we refer to that for complete approach of the technique.SVM is now very common tool being used for different machine learning applications in various areas. They [3] used the SVM with the following features, input vector used for the SVM of dimension 20, because of 20 unique amino acid compositions. They predicted Periplasmic, the 3 subcellular locations (Cytoplasmic, and

It is usually believed that proteins with similar sequences perform same function [3]. Therefore, homology based predictors are based on the principal that do the similarity search on the sequence data for the for feature texts [1].This known subcellular has shown locations, extract the text from the homologs and then use classifier approach considerable performance over the previous approaches of amino acid composition and signal peptides. Lu Z. and colleagues [1] have employed this approach as part of the ongoing efforts for the annotations of protein subcellular locations. They used the database text annotation of homologs.Some of the key points about their strategy are following. They modeled five classifiers for predicting subcellular locations of five organism plants, animals, fungi, gram positive bacteria and gram negative bacteria. Accuracy for these classifiers is such that 81 % for fungi and 93 % for other four classifiers. Other prominence of their work is that the coverage in terms location coverage, taxonomic coverage and sequence coverage were much better than the previous

Extracellular) for prokaryotic cells and 4 subcellular locations (Cytoplasmic, Extracellulafor, Mitochondrial, Nuclear) for eukaryotic cells through their Prediction system. So it was a multiclass classification issue. They developed SubLoc by two classifiers mainly, one 3-c1ass classifier for prokaryotic cells one 4-c1ass classifier for prokaryotic cells. They simplified the classifiers design by decomposing the multi classification to series of binary classification, because the binary classifier is easy to implement. Further they trained the SVM model by I-v-r (one-versus-rest) approach. In this training procedure the ith SVM is trained with all the samples in the ith class having positive labels and also all other samples with negative labels are used. To develop the system open source software SVM light was used. To examine the

92

works [2, 3]. The tool developed by them named Proteome Analyst (PA) has also unique feature of providing open access to the user on web and allowing them to make their own customized classifiers for desired results easily. This tool is available at http: www.cs.ualberta.ca/-bioinfo/PA Specific Working approach of their work is that the query sequence is compared/matched with SWISS-PROT database(having known subcellular locations) entries ,thus obtain the homologs of the query sequence on the basis of the Boolean values results of the query protein .After this by using the more similar homologs ,the classifier predicts the subcelular location of query protein. Also classifier must be trainer before it can be used for the inference. This training is done by using labeled training data.
C-4: Hybrid Predictors

Golgi

apparatus,

cytosol,

nucleus,

peroxisome,

plasma

membrane, lysosome, mitochondrion and extra cellular space. This model was implemented at two levels, on first level three independent modules, namely, Motif module, targeting module and interaction module. Each of these nodules can independently predict subcellular location of the input proteins by employing their corresponding protein features. The motif module is based on InterPro motif in the proteins .The target module utilizes the Protein signal peptides feature while the Interaction module used the protein-protein interaction information from the CORE dataset. At second level a naive Bayse network is used which combines the predictions of all three basic modules, thus refining the prediction accuracy. Verification of result was analyzed using 10 fold cross validation method. One important characteristic of this hybrid model is that each basic module has complementary role in overall performance of the predictors. Like Interaction module which is the most accurate of the three but has least coverage on the other hand. Targeting module in reverse has the highest coverage, even of 100 %, but results have low accuracy. Thus the beauty of integration is revealed by ultimately providing the highest coverage and accuracy for yeast proteins out of all the previous works in this organism. More recently Scott M S with his co-workers [12] improved their previous PSL2 predictor to PSLT.which is also based on Bayesian network model. In this model BN integrates different Protein data. Salient features of their latest predictor are it is applicable to human proteins, it can predict all the subcellular localization of proteins in the cell. So the location coverage is 100%.Also it annotates the multi compartmental proteins. For nine subcellular locations it achieved the accuracy of 78% when analyzed by 10 fold cross validation test by covering 74% of the HomoSapien Proteins. It was also applied to related species and found accuracy of more than 80%.
ill.

Biological Data available for the annotation of subcellular locations of proteins is incomplete and inaccurate as well in some cases. Because some of the data is generated on the bases of results by probabilistic models only, having not any sort of experimental verification, like Protein interaction data. Missing data for some of the proteins in datasets is also the problem while predicting locations of proteins in cells. So all the three earlier discussed categories suffer from these problem of input data used for inference .The logical solution in such type of problems is to integrate all these data which is either incomplete or inaccurate or both and further when they are obtained by diverse resources. By integrating the data the compensation can be done while predicting the results. So the Hybrid Predictor are the one which use different types of data as input like, protein composition data, protein interaction data, protein signal peptides and protein expression data etc at a time and also these predictor may use each of these data type from more than one resource to compensate for deficiencies in data availability. A Bayesian network is a best possible method for making hybrid models and to integrate huge amount of data. The Bayesian network model BN has the beauty of data integration as well as ability to compensate for missing data. Bayesian network are Directed Acyclic Graphs (DAG) also called Belief Network. Nodes of the Bayesian network are random variables, which may be of type Boolean or continuous, Hidden or Known. And also have the arcs joining the nodes showing conditional dependencies of nodes on their parent nodes. Earlier Drawid A. & Gerstein M. [11] proposed a hybrid predictor based on Bayesian networks. They utilized this method by integrating 30 diverse features of proteins. Later on, testing and training revealed also that out of those 30 features nearly half of them were redundant. Accuracy of their predictor was 75%.A key characteristic of the model was integrating expression data with protein sequence data. Thus the prediction accuracy of subcellular locations of the proteins was improved. Later Scott M S. and co-workers [6] utilized this approach in refining protein subcellular locations. Their approach was also based on the Bayesian network technique. Other key points of their approach are following. This prediction system named PSL2 is able to predict the localization of all the yeast proteins into 9 compartments of the cells, the endoplasmic reticulum,

COMPARATIVE STUDY

A. Statistical Analysis

To verify the results of the proposed predictors various statistical methods have been employed to obtain the accuracy and also compare with other predictors. Some of them are following.

10 Fold Cross validation Test: In this method input data

is randomly partitioned into 10 non overlapping subsets. Firstly nine of the subsets are used to train the th predictor and remaining 10 subsets is used for inference. Then this procedure is repeated with all 10 subsets individually, used by[6,12]

Self Consistency test: In this analysis the same dataset is

used for training the predictor first and then for the inference.[12]

Jackknife

Test:

basic

idea

behind out

this

variance more

estimator is

that after

leaving

one or

observations from the sample test, statistic estimate is recomputed. By doing so variance estimate is calculated[3,4]

93

B.

Comparisons

comprehensive look on the performance of their proposed models. The comparison shows that due the nature of available data, the most efficient approach is of Hybrid Predictors. Hybrid predictors not only get reasonable accuracy but also their coverage for locations and Taxonomic as well is significant, especially when we want the inference for human proteins which are relatively lacking in data availability as compared to other species.
In this evaluation the parameters considered are, technique

Genome annotation, to have protein function knowledge is a live research topic, so many researchers and groups are contributing their services in this cause. Some of the models are presented in this paper to have the overall view and know the different aspects of the various approaches. Table II is showing such comparison between different categories of predictors and further different techniques used. When we come towards the comparison of these models it's not simple and also not the fair. Because some of the authors [4, 7] are in the pioneers in this area while some who have proposed innovative approach [1, 6] later. Some of them [6, 10] focus on the coverage of the proteins while others on the prediction accuracy [1, 9]. But still we can have a brief and

used,

coverage,

accuracy

and

statistical

analysis

method

employed.

TABLE II: Comparison of Different Subcellular Locations Predictors.

Category

Author

Technique

Accuracy
Location

Coverage
Taxonomic

Statistical Test

Amino Acid Composition predictors

Hua S and Sun Z [3]

Support Vector Machine

91.4% 79.4%

03 04 03 04 03 03 04

prokaryotic eukaryotic prokaryotic eukaryotic Plants Non-Plants eukaryotes, Gramnegative and Grampositive bacteria Yeast and worm fungi animals, plants, Gram-Negative Bacteria and GramPositive Bacteria Yeast

Jackknife test Jackknife test Jackknife test Jackknife test Redundancy-reduced Test Redundancy-reduced Test
10 fold Cross validation

Reinhardt and Hubbard T[4]

Neural Network

81.0%

66.0%
Sorting signal predictors

Emanuelsson 0 et al [9]

Neural Network

85.0% 90.0%

Bendtsen JD et al [10]

neural network and hidden Markov BLAST search NB Classifier

75%

Homologybased predictors

Marcotte E M et al [13] Lu Z et al. l ] [

50% 81.0% 93.0%

02 09 09

self-consistency test
10 fold Cross validation 10 fold Cross validation

Hybrid Methods

Drawid M.[I!]

&

Gerstein

Bayesian Network

75.0%

05

10 fold Cross validation self-consistency test

Scott M S et al.[6]

Bayesian Network

78%

09

Human

10 fold Cross validation

94

IV.

SUMMARY

[7]

In this review paper we discussed the very hot important

Zheng Yuan." Prediction of protein subcellular locations using Markov chain models"FEBS letters Vol. 451 issue no. I , pages 23-26, May " 14, 1999. Ahmed M. Mehdi, Muhammad Shoaib B. Sehgal, Bostjan Kobe , Timothy L. Bailey'and Mikael Boden," A probabilistic model of nuclear import of proteins", Bioinformatics, Volume 27 No.9, pp. 1239-1246, February 28,2011. Olof Emanuelsson, Henrik Nielsen, Silren Brunak and Gunnar von Heijne," Predicting Subcellular Localization of Proteins Based On their N-Terrninal Amino Acid Sequence",Joumal of Molecular Biology, Volume 300,No. 4, pp. 1005- 1016, July 2 1, 2004.

research area in the Bioinformatics, the inference of sub cellular localization. At start the motivation and the background behind the topic was discussed. Then different aspects of the research work for modeling of subcellular location predictors were covered, including, protein features commonly utilized, data set resources used and different machine learning techniques like neural networks, support vector machine, Bayesian network etc were reviewed in the context of this inference .At the end, few of the results were presented in the form of comparative table. This shows that their performance varies in the coverage and accuracy significantly. Because that they all focused on different aspects of performance for the subcellular inference in their prediction models.

[8]

[9]

[10] Jannick Dyrlov Bendtsen, Henrik Nielsen, Gunnar von Heijne and SIMen Brunak," Improved Prediction of Signal Peptides: SignalP 3.0 ",Journal of Molecular Biology, Volume 340,No. 4, pp. 783-795, July 16,2004. [II] Amar Drawid and Mark Gerstein," A Bayesian System Integrating Expression Data withSequence Patterns for Localizing Proteins: Comprehensive Application to the Yeast Genome",Journal of MolecularBiology, Volume 301, No. 4, pp. 1059-1075, August 25 ,2000. [12] Michelle S. Scott, Sara J. Calafell, David Y. Thomas, Michael T. Hallett," Refining Protein Subcellular Localization", PLoS Computational Biology, Volume I , No. 6, November 2005. [13] Edward M. Marcotte , Ioannis Xenarios, Alexander M. van der Bliek, and David Eisenberg," Localizing proteins in the cell from theirphylogenetic profiles",PNAS, Volume 97,No. 22, pp. 1211512120,2000. [14] Nancy Y. Yu, James R. Wagner, , Matthew R. Laird, Gabor Melli, Sebastien Rey, Raymond Lo, Phuong Dao, S. Cenk Sahinalp, Martin Ester, Leonard 1. Foster and Fiona S. L. Brinkman," PSORTb 3.0: improved protein subcellular localization prediction with refined localization subcategories and predictive capabilities for all prokaryotes", Bioinformatics, Volume 26,No.13, pp. 1608-1615, 2010. [15] http://en.wikipedia.org/wikilProtein_sorting

REFERENCES [I] Z. Lu, D. Szafron*, R. Greiner, P. Lu, D.S. Wishart, B. Poulin, J. Anvik, C. Macdonell and R. Eisner "Predicting subcellular localization of proteins using machine-learned classifiers," Bioinformatics, Vol. 20, No. 4 , pp.547-556, Jan 22, 2004. Kuo-Chen Chou and Yu-Dong Cai "Using Functional Domain Composition and Support Vector Machines for Prediction of Protein SubcellularLocation, The Journal of Biological chemistry Vo1.277, No.48, pp.45765-45769, Aug 16,2002 Sujun Hua and Zhirong Sun, "Support vector machine approach for protein subcellular localization prediction," Bioinformatics,Vol. 17 No. 8 , pp. 721-728, Apr 24, 2001 A ReinHardt and T.Hubbard, , "Using Neural Networks for prediction of the subceluular location of proteins" Nucleic Acids research,Vol. 26 No. 9, March 9,1998. Jennifer L. Gardy, Cory Spencer, Ke Wang, Martin Ester, Gabor E. Tusnady, Istvan Simon, Sujun Hua, Katalin deFays, Christophe Lambert, Kenta Nakai and Fiona S.L. Brinkman,"PSORT-B: improving protein for Gram-negative bacteria subcellular localization prediction ",Nucleic Acids research,Vol. 31, No. 13, pp. 3613-3617,2003. Michelle S. Scott, David Y. Thomas and Michael T. Hallett," Predicting Subcellular Localization via Protein Motif Co-Occurrence",Genome Research, Volume 2004,No. 14, pp. 1957-1966, 2004.

[2]

[3]

[4]

[5]

[16] Kuo-Chen Choul and David W. Elrod," Protein subcellular location prediction", PEDS, Volume 12,No.2, pp. 107-1181998,

[6]

95

You might also like