A bioinformatics approach to the identification, classification, and analysis of hydroxyproline-rich glycoproteins

Plant Physiol. 2010 Jun;153(2):485-513. doi: 10.1104/pp.110.156554. Epub 2010 Apr 15.

Abstract

Hydroxyproline-rich glycoproteins (HRGPs) are a superfamily of plant cell wall proteins that function in diverse aspects of plant growth and development. This superfamily consists of three members: hyperglycosylated arabinogalactan proteins (AGPs), moderately glycosylated extensins (EXTs), and lightly glycosylated proline-rich proteins (PRPs). Hybrid and chimeric versions of HRGP molecules also exist. In order to "mine" genomic databases for HRGPs and to facilitate and guide research in the field, the BIO OHIO software program was developed that identifies and classifies AGPs, EXTs, PRPs, hybrid HRGPs, and chimeric HRGPs from proteins predicted from DNA sequence data. This bioinformatics program is based on searching for biased amino acid compositions and for particular protein motifs associated with known HRGPs. HRGPs identified by the program are subsequently analyzed to elucidate the following: (1) repeating amino acid sequences, (2) signal peptide and glycosylphosphatidylinositol lipid anchor addition sequences, (3) similar HRGPs via Basic Local Alignment Search Tool, (4) expression patterns of their genes, (5) other HRGPs, glycosyl transferase, prolyl 4-hydroxylase, and peroxidase genes coexpressed with their genes, and (6) gene structure and whether genetic mutants exist in their genes. The program was used to identify and classify 166 HRGPs from Arabidopsis (Arabidopsis thaliana) as follows: 85 AGPs (including classical AGPs, lysine-rich AGPs, arabinogalactan peptides, fasciclin-like AGPs, plastocyanin AGPs, and other chimeric AGPs), 59 EXTs (including SP(5) EXTs, SP(5)/SP(4) EXTs, SP(4) EXTs, SP(4)/SP(3) EXTs, a SP(3) EXT, "short" EXTs, leucine-rich repeat-EXTs, proline-rich extensin-like receptor kinases, and other chimeric EXTs), 18 PRPs (including PRPs and chimeric PRPs), and AGP/EXT hybrid HRGPs.

MeSH terms

  • Amino Acid Motifs
  • Amino Acid Sequence
  • Arabidopsis / metabolism
  • Computational Biology / methods*
  • Data Mining
  • Databases, Protein
  • Genes, Plant
  • Glycoproteins / chemistry*
  • Glycoproteins / classification*
  • Molecular Sequence Data
  • Plant Proteins / chemistry*
  • Plant Proteins / classification*
  • Sequence Analysis, Protein
  • Software

Substances

  • Glycoproteins
  • Plant Proteins
  • extensin protein, plant