0% found this document useful (0 votes)

148 views6 pages

Task Een

This document reviews different approaches for predicting the subcellular locations of proteins. It discusses how subcellular location is important for understanding protein function. It describes the main subcellular locations in eukaryotic cells like the nucleus, endoplasmic reticulum, Golgi apparatus. It reviews four main categories of prediction approaches: 1) those based on amino acid composition, 2) those using sorting signals, 3) homology-based approaches, and 4) hybrid methods. The performance and coverage of different predictors is compared. Machine learning algorithms like neural networks, support vector machines, and Bayesian networks are discussed for predicting subcellular locations from protein features.

Uploaded by

Imran Javed

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

148 views6 pages

Task Een

Uploaded by

Imran Javed

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 6

Performance of Different Approaches for Predicting the Subcellular Locations of Proteins: A Review

Muhammad Taskeen Raza

Department of Electrical Engineering, VET Department of Electrical Engineering, LCWU Lahore, Pakistan [email protected]

Noor M.Sheikh
Department of Electrical Engineering University of Engineering & technology Lahore, Pakistan [email protected]

Muhammad Abuzar Fahiem

Department of Computer Science Lahore College for Women University Lahore, Pakistan [email protected]

Ahmed M. Mehdi
Institute for Molecular Bioscience The University of Queensland Brisbane, Australia [email protected]

Abstract-Subcellular location of a protein is closely related to its function. Knowing the subcellular localization of proteins is important in molecular cell biology, proteomics, and system biology and drug discovery. Different predictors have been developed that predict presence, location and interaction of molecules using various of computational Machine techniques and including artificial probabilistic models Learning

subcellular location of particular proteins even seems more valuable. Wide range of Genomic projects has generated protein

sequence, interaction and expression data of almost every organism in large quantity [I]. However, the functions of many proteins are still un-annotated. Different research efforts are being carried out to interpret the role of these proteins in the cell particularly and in the organism generally. There are a number of ways through which Genome function annotation can be interpreted, like, protein composition, protein structure and shape, protein interaction, subcellular location, etc. Subcellular Locations information of the proteins is the key regarding this knowledge discovery in Proteomics. Subcellular locations are actually the compartments in the cells of the living bodies with defined walls and boundaries in eukaryotic cells. These compartments vary in number and their functions in cells. Cells can be classified in two main types, prokaryotic cells and eukaryotic cells. Prokaryotic cells are simpler, lacking of nucleus and defined boundaries, best characterized example is of bacteria cells. While eukaryotic cells are greater in size and volume and having membrane bound well defined boundaries of the cell compartments, the subcellular locations, example is cell of mammalian. A typical mammalian's eukaryotic cell has the following subcellular locations, Nucleolus, reticulum and Nucleus, ribosome vesicle, rough endoplasmic reticulum, Golgi apparatus, cytoskeleton, smooth endoplasmic lysosome, mitochondria, These vacuole, are the cytoplasm, possible centriole.

intelligence algorithms.

These predictors partially cover the

different aspects of exploration of subcellular locations. Some of them are equally well applicable to many types of organisms (human, yeast, mouse, bacteria) while some are specific and focus on better performance in accuracy of the predicted results. Similarly some of the techniques cover "few" number of proteins but more accurately and on the other side some algorithms predict sub cellular locations of "many" proteins at the expense of prediction accuracy. This research is a review of most common and efficient techniques grouped in four in total, which are 1amino acid composition and order-based predictors 2-sorting signal predictors 3- homology-based predictors and 4-hybrid methods that use several sources of information to predict localization. The work Elucidate the performance and coverage comparisons among the subcellular locations predictors.

Keywords- Predictor; Subcellular Location; Organisms; Machine Learning; Eukaryotic Cells.

INTRODUCTION

Predicting subcellular localization of a protein is very crucial in determining the protein's role in the cell in general and for drug discovery process in particular. If we come to know the subcellular localization of protein in the cell, formulation of the drug and its targets can be suggested. After having reliable information of subcellular location of a protein the drugs molecules can be affixed with the protein of interest to reach its target. However, due to mutations in genes and proteins, there are high chances of unusual subcellular localizations of the proteins, such as, in case of diseases like cancer. So in this scenario the importance of information about

accommodations of the proteins. As far as Subcellular location is concerned in the cell, it is not only un-annotated itself but also there is the issue of multi compartmental Proteins. Some proteins keep on changing their localization in the cells depending on their role and function defmed by the nature in the cell. Experimental approaches and methods are being employed for the knowledge of the subcellular locations. This work is very much laborious and time consuming as well, especially when there are huge

numbers of proteins to be experimented in this fashion. Fortunately the issue can be resolved with satisfactory efficiency when we change the domain of analysis of such problems. Machine Learning is the area of statistical pattern recognition, which offers a variety of methods and techniques. These Probabilistic models, including the Neural Networks, Support Vector Bayesian subcellular throughput machines, can of Markov's Random Fields and be used for the inference of are such proteins. Although these techniques then Networks,

the case under consideration, the inputs are the proteins and the desired output is the inference of subcellular locations of those proteins. So there should be information about the features and properties of proteins that may lead to inference properly. Table I, provides the concise information about these features used by various researchers and also their relation with accuracy of the results. In [6] the researchers earlier utilized only interPro Motifs as main protein features. Later on, in [12] they refined their approach by using more than one protein features in Hybrid design of predictors to achieve the better accuracy in prediction. Some of the protein characteristics described below.

localizations with the

relatively probabilistic models but when we compare their experimental probabilistic model are much ahead because we are to annotate the billions of proteins which have been discovered in their composition only, not in their functions. As far as accuracy of the results is concerned definitely experimental methods are the better options always, but there is always a trade-off between throughput and accuracy of probabilistic models. And Importantly the machine learning models especially like Bayesian Networks (BN) which are also called the "Belief Networks" are not based on the frequency concept of the occurring events, but on the past record and experience of the events, called "prior probabilities". And also BN takes into account the posterior probabilities and improves the prediction accuracy of probabilistic inference. In this way when we perform the statistical analysis of the results of these models and compare with experimental results, for predicting the subcellular locations of the proteins, Interestingly, we achieve the very close results in some cases and even better in the situations when the experimental methods are difficult to perform due to unavailability of resources and/or inaccessibility to the samples. In this paper, section II is about the techniques employed, protein features utilized and data sets used by researchers. Section III discusses the results of those research works including the comparative study of the selected models and the methods of statistical analysis commonly in use. II. MATERIALS & METHODS UTILISED

Protein composition: provides the information about

the constituent of the proteins and their order.

Protein Interaction: provides the information about

the

corresponding

interacting

proteins

any

compartment.

Protein expression: provides the information about

the protein density/quantity in the cell at particular location.

B. Datasets

Genomic Projects are and proteomics Research is live and dynamic area for the researchers throughout the world, even some developing countries are also coming forward in this cause for the service to humanity ultimately. If we will be having more and more accurate knowledge about the basic building block of Life, the cell, and its constituents significantly the proteins then will be in much better position to cure and control the diseases and disorders in the cells, the life, of the human beings especially and other living organisms generally. Few of the data resources in use by the analysts are following.

UniprotiSwissprot : for Protein Sequence data and For Localization Data[3,4]

Statistical Approaches used for the inference and prediction of subcellular locations of proteins are different in their objectives and results and also by techniques. Some of them infer more accurate results while some cover the more number of proteins. Similarly some authors [2, 3] employed the Support Vector Machine techniques, some [4] apply Neural Networks, some [5, 6] of them use Bayesian Networks and some Bioinformatics Models. researchers used other probabilistic

BioGRID: For protein Interaction data.[7,8]

Other commonly used datasets are Hera human dataset, Yeast data set, Lifedb GFP dataset, NucProt.
Table I: Protein features utilized by different Researchers.

Researchers

Features utilized

Accuracy 81%

Z.Lu

[1]

Text annotation of Homologs.

They also use different Datasets from reliable resources like Swissprot, bioGRID etc for getting the Biological data like protein Sequence data, protein interaction data, protein expression data and Signal peptides codes. They are utilizing different features and characteristics efficiently for the input query proteins like Amino acid Composition and order [3,4],signal peptides [9,10 ], interProMotifs [6] etc.
A. Features

Sujun

[3]

Protein composition.

79 %

Scott

[6]

InterPro Motifs.

78%

Olof

[9]

Signal peptides.

85%

Scott

[12]

Selecting biological features determines unique and important characteristics of the input data on the basis of which input information can be categorized into desired output groups. In

Protein Motif, protein interaction ,signal peptides.

94%

C. Existting Approaches

prediction accuracy jackknife test was used. Accuracy of prediction for prokaryotic cells was achieved up to 91.4% and for eukaryotic cells up to 79.4%.These results were more accurate than previous works of Reinhardt & Hubbard[4], chou
& Elord[16]. C-2: Sorting Signal Predictors

There are many Machine Learning Approaches which are being applied for the inference of subcellular locations of the proteins, technique wise like Neural networks, Support vector machines, Bayesian network and others and feature wise Like protein sequence, protein interaction protein expression and others. All these techniques used for the prediction of desired protein locations in the cells can be categorized in the following groups from the, input Protein Characteristic, point of view
1.

Protein sorting is the process by which cell accurately transports protein to desired subcellular location in the cell. Correct positioning of the protein in the cell is very crucial and is linked with dynamic organization of a cell. Incorrect sorting can cause several diseases in cell such as cancers [15]. This process is carried on the basis of information which is contained in the protein itself, called protein sorting signal or signal Peptides. A signal peptide is usually 3-60 long amino acid chain. Emanuelsson O. and co-workers [9] proposed a tool for the prediction of subcellular locations of proteins, named as TargetP.This tool is based on Neural Network technique. Other features of this model are following. It predicts the mitochocondrian, the chloroplast, secretary pathways locations and "others "in the cell. Prediction accuracy for plants is 85% for 4 subcellular locations while for Non Plants it is 90% for three subcellular 10cations.Data set used from SWISS-PROT. TargetP was developed using two layer neural networks. First layer consists of separate sub network for each of the input signal peptides, which are actually presequence i.e. mitochondrial targeting peptides (mTP), choloroplast transit peptides (cTP) and secretory pathway (SP) signal peptides. While in non plant version this layer contains two presequence mTP and SP because it predicts about these two sites only. The d output of the first layer is applied to 2n Integrating Network Layer which ultimately outputs a score generated by the model for each input query protein, and thus predicting the subcellular location for which having the highest score. Further all the networks used in the model were of feed forward type, and having the zero or one layer of hidden neurons all the ones were trained using error back propagation Method.
C-3: Homology-based Predictors

Amino acid composition and Order based Predictors. Sorting signal predictors. Homology-based Predictors. Hybrid Predictors.

2. 3. 4.

Detailed description about the above mentioned approaches is given in the following section.

C-J: Amino acid composition and Order based Predictors

Amino

acid

based

composition

predictors

use

the

composition information of the proteins. Protein is composed of amino and acid groups chemically. These compounds are bonded with each other in specific pattern and order for a given proteins. it has been found experimentally that proteins which are located in any specific compartment of the cell have a special composition and order which is nearly common to all of the proteins of one compartment.[4 ]Several attempts have been done which use this protein feature for the prediction of subcellular locations. Hua S. & Sun Z. [3] developed a prediction system for subcellular localization, named SubLoc, They applied the Support Vector Machine (SVM) techniques for this predictor and utilized the amino acid composition feature of the proteins effectively. Support Vector Machine (SVM) technique itself was proposed by vapnik (1995,1998 ) for the applications of Pattern recognition, we refer to that for complete approach of the technique.SVM is now very common tool being used for different machine learning applications in various areas. They [3] used the SVM with the following features, input vector used for the SVM of dimension 20, because of 20 unique amino acid compositions. They predicted Periplasmic, the 3 subcellular locations (Cytoplasmic, and

It is usually believed that proteins with similar sequences perform same function [3]. Therefore, homology based predictors are based on the principal that do the similarity search on the sequence data for the for feature texts [1].This known subcellular has shown locations, extract the text from the homologs and then use classifier approach considerable performance over the previous approaches of amino acid composition and signal peptides. Lu Z. and colleagues [1] have employed this approach as part of the ongoing efforts for the annotations of protein subcellular locations. They used the database text annotation of homologs.Some of the key points about their strategy are following. They modeled five classifiers for predicting subcellular locations of five organism plants, animals, fungi, gram positive bacteria and gram negative bacteria. Accuracy for these classifiers is such that 81 % for fungi and 93 % for other four classifiers. Other prominence of their work is that the coverage in terms location coverage, taxonomic coverage and sequence coverage were much better than the previous

Extracellular) for prokaryotic cells and 4 subcellular locations (Cytoplasmic, Extracellulafor, Mitochondrial, Nuclear) for eukaryotic cells through their Prediction system. So it was a multiclass classification issue. They developed SubLoc by two classifiers mainly, one 3-c1ass classifier for prokaryotic cells one 4-c1ass classifier for prokaryotic cells. They simplified the classifiers design by decomposing the multi classification to series of binary classification, because the binary classifier is easy to implement. Further they trained the SVM model by I-v-r (one-versus-rest) approach. In this training procedure the ith SVM is trained with all the samples in the ith class having positive labels and also all other samples with negative labels are used. To develop the system open source software SVM light was used. To examine the

works [2, 3]. The tool developed by them named Proteome Analyst (PA) has also unique feature of providing open access to the user on web and allowing them to make their own customized classifiers for desired results easily. This tool is available at http: www.cs.ualberta.ca/-bioinfo/PA Specific Working approach of their work is that the query sequence is compared/matched with SWISS-PROT database(having known subcellular locations) entries ,thus obtain the homologs of the query sequence on the basis of the Boolean values results of the query protein .After this by using the more similar homologs ,the classifier predicts the subcelular location of query protein. Also classifier must be trainer before it can be used for the inference. This training is done by using labeled training data.
C-4: Hybrid Predictors

Golgi

apparatus,

cytosol,

nucleus,

peroxisome,

plasma

membrane, lysosome, mitochondrion and extra cellular space. This model was implemented at two levels, on first level three independent modules, namely, Motif module, targeting module and interaction module. Each of these nodules can independently predict subcellular location of the input proteins by employing their corresponding protein features. The motif module is based on InterPro motif in the proteins .The target module utilizes the Protein signal peptides feature while the Interaction module used the protein-protein interaction information from the CORE dataset. At second level a naive Bayse network is used which combines the predictions of all three basic modules, thus refining the prediction accuracy. Verification of result was analyzed using 10 fold cross validation method. One important characteristic of this hybrid model is that each basic module has complementary role in overall performance of the predictors. Like Interaction module which is the most accurate of the three but has least coverage on the other hand. Targeting module in reverse has the highest coverage, even of 100 %, but results have low accuracy. Thus the beauty of integration is revealed by ultimately providing the highest coverage and accuracy for yeast proteins out of all the previous works in this organism. More recently Scott M S with his co-workers [12] improved their previous PSL2 predictor to PSLT.which is also based on Bayesian network model. In this model BN integrates different Protein data. Salient features of their latest predictor are it is applicable to human proteins, it can predict all the subcellular localization of proteins in the cell. So the location coverage is 100%.Also it annotates the multi compartmental proteins. For nine subcellular locations it achieved the accuracy of 78% when analyzed by 10 fold cross validation test by covering 74% of the HomoSapien Proteins. It was also applied to related species and found accuracy of more than 80%.
ill.

Biological Data available for the annotation of subcellular locations of proteins is incomplete and inaccurate as well in some cases. Because some of the data is generated on the bases of results by probabilistic models only, having not any sort of experimental verification, like Protein interaction data. Missing data for some of the proteins in datasets is also the problem while predicting locations of proteins in cells. So all the three earlier discussed categories suffer from these problem of input data used for inference .The logical solution in such type of problems is to integrate all these data which is either incomplete or inaccurate or both and further when they are obtained by diverse resources. By integrating the data the compensation can be done while predicting the results. So the Hybrid Predictor are the one which use different types of data as input like, protein composition data, protein interaction data, protein signal peptides and protein expression data etc at a time and also these predictor may use each of these data type from more than one resource to compensate for deficiencies in data availability. A Bayesian network is a best possible method for making hybrid models and to integrate huge amount of data. The Bayesian network model BN has the beauty of data integration as well as ability to compensate for missing data. Bayesian network are Directed Acyclic Graphs (DAG) also called Belief Network. Nodes of the Bayesian network are random variables, which may be of type Boolean or continuous, Hidden or Known. And also have the arcs joining the nodes showing conditional dependencies of nodes on their parent nodes. Earlier Drawid A. & Gerstein M. [11] proposed a hybrid predictor based on Bayesian networks. They utilized this method by integrating 30 diverse features of proteins. Later on, testing and training revealed also that out of those 30 features nearly half of them were redundant. Accuracy of their predictor was 75%.A key characteristic of the model was integrating expression data with protein sequence data. Thus the prediction accuracy of subcellular locations of the proteins was improved. Later Scott M S. and co-workers [6] utilized this approach in refining protein subcellular locations. Their approach was also based on the Bayesian network technique. Other key points of their approach are following. This prediction system named PSL2 is able to predict the localization of all the yeast proteins into 9 compartments of the cells, the endoplasmic reticulum,

COMPARATIVE STUDY

A. Statistical Analysis

To verify the results of the proposed predictors various statistical methods have been employed to obtain the accuracy and also compare with other predictors. Some of them are following.

10 Fold Cross validation Test: In this method input data

is randomly partitioned into 10 non overlapping subsets. Firstly nine of the subsets are used to train the th predictor and remaining 10 subsets is used for inference. Then this procedure is repeated with all 10 subsets individually, used by[6,12]

Self Consistency test: In this analysis the same dataset is

used for training the predictor first and then for the inference.[12]

Jackknife

Test:

basic

idea

behind out

this

variance more

estimator is

that after

leaving

one or

observations from the sample test, statistic estimate is recomputed. By doing so variance estimate is calculated[3,4]

Comparisons

comprehensive look on the performance of their proposed models. The comparison shows that due the nature of available data, the most efficient approach is of Hybrid Predictors. Hybrid predictors not only get reasonable accuracy but also their coverage for locations and Taxonomic as well is significant, especially when we want the inference for human proteins which are relatively lacking in data availability as compared to other species.
In this evaluation the parameters considered are, technique

Genome annotation, to have protein function knowledge is a live research topic, so many researchers and groups are contributing their services in this cause. Some of the models are presented in this paper to have the overall view and know the different aspects of the various approaches. Table II is showing such comparison between different categories of predictors and further different techniques used. When we come towards the comparison of these models it's not simple and also not the fair. Because some of the authors [4, 7] are in the pioneers in this area while some who have proposed innovative approach [1, 6] later. Some of them [6, 10] focus on the coverage of the proteins while others on the prediction accuracy [1, 9]. But still we can have a brief and

used,

coverage,

accuracy

and

statistical

analysis

method

employed.

TABLE II: Comparison of Different Subcellular Locations Predictors.

Amino Acid Composition predictors

Hua S and Sun Z [3]

Support Vector Machine

91.4% 79.4%

03 04 03 04 03 03 04

prokaryotic eukaryotic prokaryotic eukaryotic Plants Non-Plants eukaryotes, Gramnegative and Grampositive bacteria Yeast and worm fungi animals, plants, Gram-Negative Bacteria and GramPositive Bacteria Yeast

Jackknife test Jackknife test Jackknife test Jackknife test Redundancy-reduced Test Redundancy-reduced Test
10 fold Cross validation

Reinhardt and Hubbard T[4]

Neural Network

81.0%

66.0%
Sorting signal predictors

Emanuelsson 0 et al [9]

Neural Network

85.0% 90.0%

Bendtsen JD et al [10]

neural network and hidden Markov BLAST search NB Classifier

75%

Homologybased predictors

Marcotte E M et al [13] Lu Z et al. l ] [

50% 81.0% 93.0%

02 09 09

self-consistency test
10 fold Cross validation 10 fold Cross validation

Hybrid Methods

Drawid M.[I!]

Gerstein

Bayesian Network

75.0%

10 fold Cross validation self-consistency test

Scott M S et al.[6]

Bayesian Network

78%

Human

10 fold Cross validation

IV.

SUMMARY

[7]

In this review paper we discussed the very hot important

Zheng Yuan." Prediction of protein subcellular locations using Markov chain models"FEBS letters Vol. 451 issue no. I , pages 23-26, May " 14, 1999. Ahmed M. Mehdi, Muhammad Shoaib B. Sehgal, Bostjan Kobe , Timothy L. Bailey'and Mikael Boden," A probabilistic model of nuclear import of proteins", Bioinformatics, Volume 27 No.9, pp. 1239-1246, February 28,2011. Olof Emanuelsson, Henrik Nielsen, Silren Brunak and Gunnar von Heijne," Predicting Subcellular Localization of Proteins Based On their N-Terrninal Amino Acid Sequence",Joumal of Molecular Biology, Volume 300,No. 4, pp. 1005- 1016, July 2 1, 2004.

research area in the Bioinformatics, the inference of sub cellular localization. At start the motivation and the background behind the topic was discussed. Then different aspects of the research work for modeling of subcellular location predictors were covered, including, protein features commonly utilized, data set resources used and different machine learning techniques like neural networks, support vector machine, Bayesian network etc were reviewed in the context of this inference .At the end, few of the results were presented in the form of comparative table. This shows that their performance varies in the coverage and accuracy significantly. Because that they all focused on different aspects of performance for the subcellular inference in their prediction models.

[8]

[9]

[10] Jannick Dyrlov Bendtsen, Henrik Nielsen, Gunnar von Heijne and SIMen Brunak," Improved Prediction of Signal Peptides: SignalP 3.0 ",Journal of Molecular Biology, Volume 340,No. 4, pp. 783-795, July 16,2004. [II] Amar Drawid and Mark Gerstein," A Bayesian System Integrating Expression Data withSequence Patterns for Localizing Proteins: Comprehensive Application to the Yeast Genome",Journal of MolecularBiology, Volume 301, No. 4, pp. 1059-1075, August 25 ,2000. [12] Michelle S. Scott, Sara J. Calafell, David Y. Thomas, Michael T. Hallett," Refining Protein Subcellular Localization", PLoS Computational Biology, Volume I , No. 6, November 2005. [13] Edward M. Marcotte , Ioannis Xenarios, Alexander M. van der Bliek, and David Eisenberg," Localizing proteins in the cell from theirphylogenetic profiles",PNAS, Volume 97,No. 22, pp. 1211512120,2000. [14] Nancy Y. Yu, James R. Wagner, , Matthew R. Laird, Gabor Melli, Sebastien Rey, Raymond Lo, Phuong Dao, S. Cenk Sahinalp, Martin Ester, Leonard 1. Foster and Fiona S. L. Brinkman," PSORTb 3.0: improved protein subcellular localization prediction with refined localization subcategories and predictive capabilities for all prokaryotes", Bioinformatics, Volume 26,No.13, pp. 1608-1615, 2010. [15] http://en.wikipedia.org/wikilProtein_sorting

REFERENCES [I] Z. Lu, D. Szafron*, R. Greiner, P. Lu, D.S. Wishart, B. Poulin, J. Anvik, C. Macdonell and R. Eisner "Predicting subcellular localization of proteins using machine-learned classifiers," Bioinformatics, Vol. 20, No. 4 , pp.547-556, Jan 22, 2004. Kuo-Chen Chou and Yu-Dong Cai "Using Functional Domain Composition and Support Vector Machines for Prediction of Protein SubcellularLocation, The Journal of Biological chemistry Vo1.277, No.48, pp.45765-45769, Aug 16,2002 Sujun Hua and Zhirong Sun, "Support vector machine approach for protein subcellular localization prediction," Bioinformatics,Vol. 17 No. 8 , pp. 721-728, Apr 24, 2001 A ReinHardt and T.Hubbard, , "Using Neural Networks for prediction of the subceluular location of proteins" Nucleic Acids research,Vol. 26 No. 9, March 9,1998. Jennifer L. Gardy, Cory Spencer, Ke Wang, Martin Ester, Gabor E. Tusnady, Istvan Simon, Sujun Hua, Katalin deFays, Christophe Lambert, Kenta Nakai and Fiona S.L. Brinkman,"PSORT-B: improving protein for Gram-negative bacteria subcellular localization prediction ",Nucleic Acids research,Vol. 31, No. 13, pp. 3613-3617,2003. Michelle S. Scott, David Y. Thomas and Michael T. Hallett," Predicting Subcellular Localization via Protein Motif Co-Occurrence",Genome Research, Volume 2004,No. 14, pp. 1957-1966, 2004.

[2]

[3]

[4]

[5]

[16] Kuo-Chen Choul and David W. Elrod," Protein subcellular location prediction", PEDS, Volume 12,No.2, pp. 107-1181998,

[6]

Method Statement 14728983812691479973057231
No ratings yet
Method Statement 14728983812691479973057231
6 pages
Unit 4 Water
100% (1)
Unit 4 Water
31 pages
UCE Service Manual
100% (2)
UCE Service Manual
220 pages
S.V Reg. in Asme TDP 1, Asme Sec 1, b31.1
No ratings yet
S.V Reg. in Asme TDP 1, Asme Sec 1, b31.1
9 pages
Goal-Directed Cold Exposure Protocols From The Huberman Lab Podcast
No ratings yet
Goal-Directed Cold Exposure Protocols From The Huberman Lab Podcast
2 pages
Important Questions Soil
No ratings yet
Important Questions Soil
12 pages
Cause and Effect - Key IELTS Vocabulary Because: Notes
100% (1)
Cause and Effect - Key IELTS Vocabulary Because: Notes
18 pages
Peri
No ratings yet
Peri
128 pages
Heteroskedasticity
100% (1)
Heteroskedasticity
23 pages
F650man I
No ratings yet
F650man I
553 pages
FAMILY LAW-I Marriage
No ratings yet
FAMILY LAW-I Marriage
126 pages
English Year 4 - Paper 1
No ratings yet
English Year 4 - Paper 1
26 pages
F Ma Friction
100% (1)
F Ma Friction
5 pages
Arts7 Q1 M1 Attiresfabricsandtapestriesv Final
100% (2)
Arts7 Q1 M1 Attiresfabricsandtapestriesv Final
28 pages
Os Study at Penpol PVT LTD
No ratings yet
Os Study at Penpol PVT LTD
88 pages
Cardiopulmonary Resuscitation CPR
No ratings yet
Cardiopulmonary Resuscitation CPR
18 pages
Pic Datasheet
No ratings yet
Pic Datasheet
370 pages
(Advances in Protein Chemistry and Structural Biology Volume 95) Donev, Rossen-Proteomics in Biomedicine and Pharmacology-Academic Press (2014)
No ratings yet
(Advances in Protein Chemistry and Structural Biology Volume 95) Donev, Rossen-Proteomics in Biomedicine and Pharmacology-Academic Press (2014)
363 pages
Protectii Diferentiale
No ratings yet
Protectii Diferentiale
8 pages
Proteins Bioinfo Latest
No ratings yet
Proteins Bioinfo Latest
45 pages
Target Tracking
No ratings yet
Target Tracking
33 pages
Protein Folding
No ratings yet
Protein Folding
21 pages
11.bioinformatics Analysis of Proteins
No ratings yet
11.bioinformatics Analysis of Proteins
49 pages
Alaguraj Thesis PDF PDF
No ratings yet
Alaguraj Thesis PDF PDF
169 pages
Analyzing You Rprotein Using Bioinformatics Tools
No ratings yet
Analyzing You Rprotein Using Bioinformatics Tools
49 pages
Thermal Insulation Barrier Providing Corrosion Protection With "Cool-To-Touch" Properties
No ratings yet
Thermal Insulation Barrier Providing Corrosion Protection With "Cool-To-Touch" Properties
2 pages
Ankita PPT of Genomics
No ratings yet
Ankita PPT of Genomics
6 pages
Protein Modeling: Protein Structure Prediction Other Topics
No ratings yet
Protein Modeling: Protein Structure Prediction Other Topics
76 pages
Proteomics: by Hamsa Ehsan (16756) Maryam Saleem (16752) Namra Talib (16724)
No ratings yet
Proteomics: by Hamsa Ehsan (16756) Maryam Saleem (16752) Namra Talib (16724)
16 pages
Machine Learning Bio Inform Atcs
No ratings yet
Machine Learning Bio Inform Atcs
38 pages
Snowflake Bentley
No ratings yet
Snowflake Bentley
82 pages
Glasgow Coma Scale
0% (1)
Glasgow Coma Scale
3 pages
Current Scenario On Application of Computational Tools in Biological Systems
No ratings yet
Current Scenario On Application of Computational Tools in Biological Systems
12 pages
Proteomics Basics
No ratings yet
Proteomics Basics
18 pages
Virology Journal: Predicting The Subcellular Localization of Viral Proteins Within A Mammalian Host Cell
No ratings yet
Virology Journal: Predicting The Subcellular Localization of Viral Proteins Within A Mammalian Host Cell
8 pages
Towards The Development of Lte Networks: Implementation of Openimscore in Asterisk/Openbts GSM Network
No ratings yet
Towards The Development of Lte Networks: Implementation of Openimscore in Asterisk/Openbts GSM Network
4 pages
Target Identification
No ratings yet
Target Identification
6 pages
Protein Sequence Design and Analysis: BLAST Based Computational Approaches-A Review
No ratings yet
Protein Sequence Design and Analysis: BLAST Based Computational Approaches-A Review
16 pages
Extracting Information From Text and Images For Location Proteomics
No ratings yet
Extracting Information From Text and Images For Location Proteomics
8 pages
Prediction of Protein Sub-Cellular: Localization Through Weighted Combination of Classifiers
No ratings yet
Prediction of Protein Sub-Cellular: Localization Through Weighted Combination of Classifiers
6 pages
4-5 Elements of Effective Writing
No ratings yet
4-5 Elements of Effective Writing
37 pages
Support Recovery Survey
No ratings yet
Support Recovery Survey
10 pages
FH Ofdm PDF
No ratings yet
FH Ofdm PDF
5 pages
MR 307 Laguna 8
No ratings yet
MR 307 Laguna 8
314 pages
Finish Up Array Applications - Move On To Proteomics - Protein Microarrays
No ratings yet
Finish Up Array Applications - Move On To Proteomics - Protein Microarrays
20 pages
Lec 01 Transcript
No ratings yet
Lec 01 Transcript
14 pages
Turbo Flanges and Wastegate Flanges Product Information
No ratings yet
Turbo Flanges and Wastegate Flanges Product Information
8 pages
Protein Database
No ratings yet
Protein Database
8 pages
Leis 2010
No ratings yet
Leis 2010
13 pages
Unit 3
No ratings yet
Unit 3
9 pages
C o A G U L A: Installation How It Works Painting Tools Making Sound Keyboard Shortcuts Contact
No ratings yet
C o A G U L A: Installation How It Works Painting Tools Making Sound Keyboard Shortcuts Contact
15 pages
Review of Methods
No ratings yet
Review of Methods
7 pages
Emanuelsson 2007
No ratings yet
Emanuelsson 2007
19 pages
Advances in Predicting Subcellular Localization of Multi-Label Pro-Teins and Its Implication For Developing Multi-Target Drugs
No ratings yet
Advances in Predicting Subcellular Localization of Multi-Label Pro-Teins and Its Implication For Developing Multi-Target Drugs
26 pages
Protein Tertiary Structures: Prediction From Amino Acid Sequences
No ratings yet
Protein Tertiary Structures: Prediction From Amino Acid Sequences
7 pages
v68 161
No ratings yet
v68 161
3 pages
Hu 2004
No ratings yet
Hu 2004
13 pages
SSB GibsonMcElhaneyLtr4.2016 PDF
No ratings yet
SSB GibsonMcElhaneyLtr4.2016 PDF
1 page
These Our Games Sport and The Church of Scotland Mission To Kenya C 1907 1937
No ratings yet
These Our Games Sport and The Church of Scotland Mission To Kenya C 1907 1937
30 pages
Protien Structure Identification
No ratings yet
Protien Structure Identification
19 pages
1 s2.0 S1535947621001584 Main
No ratings yet
1 s2.0 S1535947621001584 Main
28 pages
Application of Pseudo Amino Acid Composition For Predicting Protein Subcellular Location Using Stochastic Signal Processing Approach
No ratings yet
Application of Pseudo Amino Acid Composition For Predicting Protein Subcellular Location Using Stochastic Signal Processing Approach
8 pages
PSSM SVM
No ratings yet
PSSM SVM
8 pages
Chapter 16 Water Resources
No ratings yet
Chapter 16 Water Resources
3 pages
Dingo Optimized Fuzzy CNN Technique For Efficient Protein Structure Prediction
No ratings yet
Dingo Optimized Fuzzy CNN Technique For Efficient Protein Structure Prediction
9 pages
Identi Fi Cation of Ligand Binding Site and Protein-Protein Interaction Area
No ratings yet
Identi Fi Cation of Ligand Binding Site and Protein-Protein Interaction Area
172 pages
Ploy AAA
No ratings yet
Ploy AAA
50 pages
In Silico Prediction and Characterization of Protein Post Translational Modifications - 2016 - Journal of Proteomics
No ratings yet
In Silico Prediction and Characterization of Protein Post Translational Modifications - 2016 - Journal of Proteomics
11 pages
Prediction of Protein Subcellular Locations by Incorporating Quasi-Sequence-Order Effect
No ratings yet
Prediction of Protein Subcellular Locations by Incorporating Quasi-Sequence-Order Effect
7 pages
Report
No ratings yet
Report
4 pages
Genomics and Proteomics
No ratings yet
Genomics and Proteomics
3 pages
Slides 3
No ratings yet
Slides 3
53 pages
BIO 401 Note... Protein Function Prediction and Protein Interaction, String
No ratings yet
BIO 401 Note... Protein Function Prediction and Protein Interaction, String
4 pages
A Foundation For Reliable Spatial Proteomics Data Analysis
No ratings yet
A Foundation For Reliable Spatial Proteomics Data Analysis
16 pages
3 Days Trip in Ujjain, Madhya Pradesh, India
No ratings yet
3 Days Trip in Ujjain, Madhya Pradesh, India
2 pages
Deep Learning For 3D Protein Structure Prediction in Drug Discovery: A Novel Approach To Revolutionizing Therapeutic Agent Development
No ratings yet
Deep Learning For 3D Protein Structure Prediction in Drug Discovery: A Novel Approach To Revolutionizing Therapeutic Agent Development
6 pages
TPD Computational Metods Review
No ratings yet
TPD Computational Metods Review
25 pages
Predicting Protein-Ligand Binding Site Using Support Vector Machine With Protein Properties
No ratings yet
Predicting Protein-Ligand Binding Site Using Support Vector Machine With Protein Properties
13 pages
Structural Bioinformatics and Protein Structure Prediction
No ratings yet
Structural Bioinformatics and Protein Structure Prediction
14 pages
Protein Secondary Structure Prediction - A Survey of The State of The Art
No ratings yet
Protein Secondary Structure Prediction - A Survey of The State of The Art
24 pages
Lec (3) - Protein - Databases
No ratings yet
Lec (3) - Protein - Databases
22 pages
Genomics and Proteomics in Drug Discovery and Development
No ratings yet
Genomics and Proteomics in Drug Discovery and Development
34 pages
Prediction of Protein Subcellular Locations by Combining K Local Hyperplane Distance Nearest Neighbor 1st Edition by Hong Liu, Haodi Feng, Daming Zhu 9783540738701 Download
No ratings yet
Prediction of Protein Subcellular Locations by Combining K Local Hyperplane Distance Nearest Neighbor 1st Edition by Hong Liu, Haodi Feng, Daming Zhu 9783540738701 Download
24 pages
c630 Nickel Aluminum Bronze PDF
No ratings yet
c630 Nickel Aluminum Bronze PDF
2 pages
Khushi
No ratings yet
Khushi
22 pages
Function Prediction
No ratings yet
Function Prediction
17 pages
Assessing Subcellular Resolution
No ratings yet
Assessing Subcellular Resolution
27 pages
LOPIT
No ratings yet
LOPIT
12 pages
Organellar Maps Through Proteomic Profiling
No ratings yet
Organellar Maps Through Proteomic Profiling
13 pages
Moving Profiling Spatial Proteomics Beyond Discrete
No ratings yet
Moving Profiling Spatial Proteomics Beyond Discrete
10 pages
1 s2.0 S0959440X22002056 Main1
No ratings yet
1 s2.0 S0959440X22002056 Main1
7 pages
Science Adq2634
No ratings yet
Science Adq2634
7 pages
Protein Localization by Cell Imaging
No ratings yet
Protein Localization by Cell Imaging
3 pages

Task Een

Uploaded by

Task Een

Uploaded by

Performance of Different Approaches for Predicting the Subcellular Locations of Proteins: A Review

Muhammad Taskeen Raza

Muhammad Abuzar Fahiem

These predictors partially cover the

Keywords- Predictor; Subcellular Location; Organisms; Machine Learning; Eukaryotic Cells.

localizations with the

Protein composition: provides the information about

the constituent of the proteins and their order.

Protein Interaction: provides the information about

Protein expression: provides the information about

the protein density/quantity in the cell at particular location.

UniprotiSwissprot : for Protein Sequence data and For Localization Data[3,4]

BioGRID: For protein Interaction data.[7,8]

Text annotation of Homologs.

Protein Motif, protein interaction ,signal peptides.

C-J: Amino acid composition and Order based Predictors

10 Fold Cross validation Test: In this method input data

Self Consistency test: In this analysis the same dataset is

TABLE II: Comparison of Different Subcellular Locations Predictors.

Amino Acid Composition predictors

Hua S and Sun Z [3]

Support Vector Machine

Reinhardt and Hubbard T[4]

neural network and hidden Markov BLAST search NB Classifier

Marcotte E M et al [13] Lu Z et al. l ] [

50% 81.0% 93.0%

10 fold Cross validation self-consistency test

10 fold Cross validation

In this review paper we discussed the very hot important

You might also like