E1 - Biological Databases and Data Organization: General Content
E1 - Biological Databases and Data Organization: General Content
General content
1. Swissprot has 565254 sequence entries and has higher evidence percentages at both
protein and transcript level, compared to trEMBl, which has 219174961 sequence
entries and low evidence percentages. The difference between the distributions for
the two databases is that trEMBL is unreviewed and computationally annotated
compared to SwissProt, which is review/verified with literature by a curator and
annotated manually.
- SwissProt
- TrEMBL
2. The active sites annotated in the total number of sequences is found under features,
act_site and total number. Active sites are found to be 168907 in SwissProt
3. The number of proteins with at least one active site is found to be 102056, under
number of entries.
4. May be because most of the proteins are sequenced in the given model organisms
and the graph therefore stagnates.
2. The length of the sequence is 912. After downloading the FASTA sequence it was
inserted into https://www.browserling.com/tools/letter-frequency, and according to
the online tool, the protein consists of 79 serines.
3. According to NCBI the taxonomic identifier for Homo sapiens is 9606, and
approximately 1480000 proteins,
https://www.ncbi.nlm.nih.gov/Taxonomy/Browser/wwwtax.cgi?lvl=0&id=9606. The
human body consists of lots of proteins, which explains why the number of proteins
is that high in the database.
4. Existence of the protein is supported by experimental evidence at protein level.
5. Manual annotation means that a curator has been reading literature about the
protein, checked databases, and reviewed papers.
6. 9 mutations were found under pathology and biotech, HeLa cells is mentioned in 4 of
the mutations. HeLa cells are a cancer cell line, and cancer is a sign of mutated cells,
therefore tested in these cells.
7. Without the quotation marks, the database searches for every word and gives hits
for every single word. With a quotation mark, the search engine searches for
everything between the quotation marks as a sentence.