0% found this document useful (0 votes)
37 views4 pages

FASTA Format Description

A sequence in FASTA format begins with a single-line description, followed by lines of sequence data. It is recommended that all lines of text be shorter than 80 characters in length. Sequences are expected to be represented in the standard IUB / IUPAC amino acid and nucleic acid codes.

Uploaded by

Pritika Sharma
Copyright
© Attribution Non-Commercial (BY-NC)
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOC, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
37 views4 pages

FASTA Format Description

A sequence in FASTA format begins with a single-line description, followed by lines of sequence data. It is recommended that all lines of text be shorter than 80 characters in length. Sequences are expected to be represented in the standard IUB / IUPAC amino acid and nucleic acid codes.

Uploaded by

Pritika Sharma
Copyright
© Attribution Non-Commercial (BY-NC)
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOC, PDF, TXT or read online on Scribd
You are on page 1/ 4

FASTA format description

A sequence in FASTA format begins with a single-line description,


followed by lines of sequence data. The description line is distinguished
from the sequence data by a greater-than (">") symbol in the first
column. It is recommended that all lines of text be shorter than 80
characters in length. An example sequence in FASTA
format is:
>gi|532319|pir|TVFV2E|TVFV2E envelope protein
ELRLRYCAPAGFALLKCNDADYDGFKTNCSNVSVVHCTNLMNTTVTTGLLLNGSYSENRT
QIWQKHRTSNDSALILLNKHYNLTVTCKRPGNKTVLPVTIMAGLVFHSQKYNLRLRQAWC
HFPSNWKGAWKEVKEEIVNLPKERYRGTNDPKRIFFQRQWGDPETANLWFNCHGEFFYCK
MDWFLNYLNNLTVDADHNECKNTSGTKSGNKRAPGPCVQRTYVACHIRSVIIWLETISKK
TYAPPREGHLECTSTVTGMTVELNYIPKNRTNVTLSPQIESIWAAELDRYKLVEITPIGF
APTEVRRYTGGHERQKRVPFVXXXXXXXXXXXXXXXXXXXXXXVQSQHLLAGILQQQKNL
LAAVEAQQQMLKLTIWGVK

Sequences are expected to be represented in the standard IUB/IUPAC


amino acid and nucleic acid codes, with these exceptions: lower-case
letters are accepted and are mapped into upper-case; a single hyphen or
dash can be used to represent a gap of indeterminate length; and in
amino acid sequences, U and * are acceptable letters (see below). Before
submitting a request, any numerical digits in the query sequence should
either be removed or replaced by appropriate letter codes (e.g., N for
unknown nucleic acid residue or X for unknown amino acid residue).
The nucleic acid codes supported are:
A --> adenosine M --> A C (amino)
C --> cytidine S --> G C (strong)
G --> guanine W --> A T (weak)
T --> thymidine B --> G T C
U --> uridine D --> G A T
R --> G A (purine) H --> A C T
Y --> T C (pyrimidine) V --> G C A
K --> G T (keto) N --> A G C T (any)
- gap of indeterminate length
For those programs that use amino acid query sequences (BLASTP and
TBLASTN), the accepted amino acid codes are:
A alanine P proline
B aspartate or asparagine Q glutamine
C cystine R arginine
D aspartate S serine
E glutamate T threonine
F phenylalanine U selenocysteine
G glycine V valine
H histidine W tryptophan
I isoleucine Y tyrosine
K lysine Z glutamate or glutamine
L leucine X any
M methionine * translation stop
N asparagine - gap of indeterminate length

FASTA format description


 A sequence in FASTA format consists of:
o One line starting with a ">" sign, followed by a sequence identification
code. 
It is optionally be followed by a textual description of the sequence. Since
it is not part of the official description of the format, software can choose
to ignore this, when it is present.
o One or more lines containing the sequence itself.
 A file in FASTA format may comprise more than one sequence.

 The FASTA format is sometimes also referred to as the "Pearson" format (after the
author of the FASTA program and ditto format).

FASTA format example


Use the mouse to cut-and-paste the sequence(s) below into the appropriate input window.

>BTBSCRYR
tgcaccaaacatgtctaaagctggaaccaaaattactttctttgaagacaaaaactttca
aggccgccactatgacagcgattgcgactgtgcagatttccacatgtacctgagccgctg
caactccatcagagtggaaggaggcacctgggctgtgtatgaaaggcccaattttgctgg
gtacatgtacatcctaccccggggcgagtatcctgagtaccagcactggatgggcctcaa
cgaccgcctcagctcctgcagggctgttcacctgtctagtggaggccagtataagcttca
gatctttgagaaaggggattttaatggtcagatgcatgagaccacggaagactgcccttc
catcatggagcagttccacatgcgggaggtccactcctgtaaggtgctggagggcgcctg
gatcttctatgagctgcccaactaccgaggcaggcagtacctgctggacaagaaggagta
ccggaagcccgtcgactggggtgcagcttccccagctgtccagtctttccgccgcattgt
ggagtgatgatacagatgcggccaaacgctggctggccttgtcatccaaataagcattat
aaataaaacaattggcatgc

>crab_anapl ALPHA CRYSTALLIN B CHAIN (ALPHA(B)-


CRYSTALLIN).
MDITIHNPLIRRPLFSWLAPSRIFDQIFGEHLQESELLPASPSLSPFLMR
SPIFRMPSWLETGLSEMRLEKDKFSVNLDVKHFSPEELKVKVLGDM
VEIH
GKHEERQDEHGFIAREFNRKYRIPADVDPLTITSSLSLDGVLTVSAPR
KQ
SDVPERSIPITREEKPAIAGAQRK

>crab_anapl ALPHA CRYSTALLIN B CHAIN (ALPHA(B)-


CRYSTALLIN).
MDITIHNPLIRRPLFSWLAPSRIFDQIFGEHLQESELLPASPSLSPFLMR
SPIFRMPSWLETGLSEMRLEKDKFSVNLDVKHFSPEELKVKVLGDM
VEIH
GKHEERQDEHGFIAREFNRKYRIPADVDPLTITSSLSLDGVLTVSAPR
KQ
SDVPERSIPITREEKPAIAGAQRK
>crab_bovin ALPHA CRYSTALLIN B CHAIN (ALPHA(B)-
CRYSTALLIN).
MDIAIHHPWIRRPFFPFHSPSRLFDQFFGEHLLESDLFPASTSLSPFYL
R
PPSFLRAPSWIDTGLSEMRLEKDRFSVNLDVKHFSPEELKVKVLGDV
IEV
HGKHEERQDEHGFISREFHRKYRIPADVDPLAITSSLSSDGVLTVNGP
RK
QASGPERTIPITREEKPAVTAAPKK
>crab_chick ALPHA CRYSTALLIN B CHAIN (ALPHA(B)-
CRYSTALLIN).
MDITIHNPLVRRPLFSWLTPSRIFDQIFGEHLQESELLPTSPSLSPFLMR
SPFFRMPSWLETGLSEMRLEKDKFSVNLDVKHFSPEELKVKVLGDM
IEIH
GKHEERQDEHGFIAREFSRKYRIPADVDPLTITSSLSLDGVLTVSAPR
KQ
SDVPERSIPITREEKPAIAGSQRK
>crab_human ALPHA CRYSTALLIN B CHAIN (ALPHA(B)-
CRYSTALLIN).
MDIAIHHPWIRRPFFPFHSPSRLFDQFFGEHLLESDLFPTSTSLSPFYLR
PPSFLRAPSWFDTGLSEMRLEKDRFSVNLDVKHFSPEELKVKVLGD
VIEV
HGKHEERQDEHGFISREFHRKYRIPADVDPLTITSSLSSDGVLTVNGP
RK
QVSGPERTIPITREEKPAVTAAPKK
>crab_mesau ALPHA CRYSTALLIN B CHAIN (ALPHA(B)-
CRYSTALLIN).
MDIAIHHPWIRRPFFPFHSPSRLFDQFFGEHLLESDLFSTATSLSPFYL
R
PPSFLRAPSWIDTGLSEMRMEKDRFSVNLDVKHFSPEELKVKVLGD
VVEV
HGKHEERQDEHGFISREFHRKYRIPADVDPLTITSSLSSDGVLTVNGP
RK
QASGPERTIPITREEKPAVTAAPKK
>crab_mouse ALPHA CRYSTALLIN B CHAIN (ALPHA(B)-
CRYSTALLIN) (P23).
MDIAIHHPWIRRPFFPFHSPSRLFDQFFGEHLLESDLFSTATSLSPFYL
R
PPSFLRAPSWIDTGLSEMRLEKDRFSVNLDVKHFSPEELKVKVLGDV
IEV
HGKHEERQDEHGFISREFHRKYRIPADVDPLAITSSLSSDGVLTVNGP
RK
QVSGPERTIPITREEKPAVAAAPKK
>crab_rabit ALPHA CRYSTALLIN B CHAIN (ALPHA(B)-
CRYSTALLIN).
MDIAIHHPWIRRPFFPFHSPSRLFDQFFGEHLLESDLFPTSTSLSPFYLR
PPSFLRAPSWIDTGLSEMRLEKDRFSVNLDVKHFSPEELKVKVLGDV
IEV
HGKHEERQDEHGFISREFHRKYRIPADVDPLTITSSLSSDGVLTVNGP
RK
QAPGPERTIPITREEKPAVTAAPKK
>crab_rat ALPHA CRYSTALLIN B CHAIN (ALPHA(B)-CRYSTALLIN).
MDIAIHHPWIRRPFFPFHSPSRLFDQFFGEHLLESDLFSTATSLSPFYL
R
PPSFLRAPSWIDTGLSEMRMEKDRFSVNLDVKHFSPEELKVKVLGD
VIEV
HGKHEERQDEHGFISREFHRKYRIPADVDPLTITSSLSSDGVLTVNGP
RK
QASGPERTIPITREEKPAVTAAPKK
>crab_squac ALPHA CRYSTALLIN B CHAIN (ALPHA(B)-
CRYSTALLIN).
MDIAIQHPWLRRPLFPSSIFPSRIFDQNFGEHFDPDLFPSFSSMLSPFY
W
RMGAPMARMPSWAQTGLSELRLDKDKFAIHLDVKHFTPEELRVKIL
GDFI
EVQAQHEERQDEHGYVSREFHRKYKVPAGVDPLVITCSLSADGVLT
ITGP
RKVADVPERSVPISRDEKPAVAGPQQK

You might also like