Biological Databases Lab 2
Biological Databases Lab 2
BBIT418P
Assessment – 2
By:
Suryaa .A
21BCB0032
Exercise 3
1.How many nucleotide sequences are there for the bacterium Bacillus
subtilis in the NCBI Sequence Database?
2.How many nucleotide sequences are there from the bacterium Bacillus
subtilis in the RefSeq part of the NCBI Sequence Database?
Ans. 35191 nucleotide sequences are there from the bacterium Bacillus
subtilis in the RefSeq part of the NCBI Sequence Database
4.What is the accession number for the Bacillus subtilis genome in NCBI?
6. Obtaining the most recent submissions. We will search for recent submissions
on groundnut (Bacillus subtilis) in the Nucleotide database. How many
submissions were done of ground nut sequences between January 1st 2015 and
December 31st 2015.
7. What are the accession numbers of the 10 largest human mRNA sequences in
Genbank? You need an advanced search for this:
▪ Set Organism to Mus musculus Click Search
▪ Activate the mRNA filter in the Filter menu at the left of the search results
page
▪ Activate the Sequence Length filter and set the range to 70000 to 999999. To
know the ideal lower limit, you have to do some trial and error until you have
length thresholds that return between 10 and 20 mRNA sequences
Ans.
Accession: NM_001385708.1
Accession: XM_036165622.1
Accession: XM_036160535.1
Accession: XM_036160534.1
Accession: XM_036160529.1
Accession: XM_036160528.1
Accession: XM_036160527.1
Accession: XM_036160524.1
Accession: XM_036160520.1
Accession: XM_006519771.5
Exercise 4
Extraction of secondary data from TRANSFAC and InterPro Databases
AIM:
PROTOCOL
3. Go to the structure section and click on 'here' which will direct you to a new webpage
4. Go to the 'Superclass: Basic Domains' and click 'Jun-B' under Jun Factors
6. Visit 'Tissue, Cell type, Pathology, Brain, Blood and Cell' to answer the following
Queries
Specific Information
1. Tissue: Provide screenshot for Protein Expression of JUN in different organs
and RNA expression of JUN in different tissues
2. Cell Type: Provide screenshot for the expression pattern of JUN across
several cells in the body
3. Pathology: Provide screenshot for the Interactive survival scatter plot of JUN
in Cervical Cancer and Lung Cancer
4. Brain: Provide the screenshot for RNA expression of JUN in Brain
5. Blood: Provide the screenshot for RNA expression of JUN in different blood
cells
II. Identification of protein domains using InterPro
PROTOCOL
Input Query accession numbers: Uniprot ID: P48742
Open Browser (Chrome or Firefox) from your system
1. Go to uniprot database https://www.uniprot.org/
2. Paste/ type the query accession number P48742
3. Retrieve the protein sequence from UniProt database
4. Provide the sequence details such as protein name, Gene name, review status
and organism
Ans.
Protein: LIM/homeobox protein Lhx1
Gene: LHX1
Status: UniProtKB reviewed (Swiss-Prot)
Organism: Homo sapiens (Human)
7. Report the domain information associated with the query sequence and
provide the screenshot
Ans.
1. Znf_LIM - IPR001781
Domain Architectures: 1k
Aligned Sequences: 0
Interactions:45
Proteins:126kStructures: 86
Taxonomy: 10k
Proteomes: 3k
AlphaFold: 78k
Pathways: 101
2. Lhx1/5_LIM1 - IPR049618
Domain Architectures: 0
Aligned Sequences: 0
Interactions: 0
Proteins: 2k
Structures: 0
Taxonomy: 3k
Proteomes: 694
AlphaFold: 1k
Pathways: 3
3. HD - IPR001356
Domain Architectures: 2k
Aligned Sequences: 0
Interactions: 29
Proteins: 311k
Structures: 233
Taxonomy: 15k
Proteomes: 3k
AlphaFold: 213k
Pathways: 311
4. Lhx1/5_LIM2 - IPR049619
Domain Architectures: 0
Aligned Sequences: 0
Interactions: 0
Proteins: 2k
Structures: 0
Taxonomy: 3k
Proteomes: 779
AlphaFold: 1k
Pathways: 3