0% found this document useful (0 votes)
13 views14 pages

Biological Databases Lab 2

The document contains exercises related to genome mining using NCBI resources, detailing nucleotide sequences for Bacillus subtilis and submissions by Matthew Berriman. It also includes extraction of secondary data from TRANSFAC and InterPro databases, focusing on the JUNB gene and LIM/homeobox protein Lhx1. Additionally, it reports on protein domains, architectures, and interactions associated with the query sequences.

Uploaded by

Suryaa
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
13 views14 pages

Biological Databases Lab 2

The document contains exercises related to genome mining using NCBI resources, detailing nucleotide sequences for Bacillus subtilis and submissions by Matthew Berriman. It also includes extraction of secondary data from TRANSFAC and InterPro databases, focusing on the JUNB gene and LIM/homeobox protein Lhx1. Additionally, it reports on protein domains, architectures, and interactions associated with the query sequences.

Uploaded by

Suryaa
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 14

Biological Databases Lab

BBIT418P

Assessment – 2

By:
Suryaa .A
21BCB0032
Exercise 3

Genome mining- NCBI Resources

1.How many nucleotide sequences are there for the bacterium Bacillus
subtilis in the NCBI Sequence Database?

Ans. There is a total of 89,571 nucleotide sequences.

2.How many nucleotide sequences are there from the bacterium Bacillus
subtilis in the RefSeq part of the NCBI Sequence Database?

Ans. 35191 nucleotide sequences are there from the bacterium Bacillus
subtilis in the RefSeq part of the NCBI Sequence Database

3.How many nucleotide sequences were submitted to NCBI by Matthew


Berriman?

Ans. 648243 is the number of nucleotide sequences submitted by Matthew


Berriman to NCBI.

4.What is the accession number for the Bacillus subtilis genome in NCBI?

Ans. GCA_000009045.1 is the accession number

5. Retrieve the sequence of Accession: NM_002289.3. Give its name, phylum,


source, product and function.
Ans. Name: Homo sapiens
Phylum: Chordata
Source: Homo sapiens (human)
Product: "alpha-lactalbumin precursor"
Function: This gene encodes alpha-lactalbumin, a principal protein of milk.
Alpha-lactalbumin forms the regulatory subunit of the lactose synthase (LS)
heterodimer and beta 1,4- galactosyltransferase (beta4Gal-T1) forms the
catalytic component. Together, these proteins enable LS to produce lactose by
transfering galactose moieties to glucose. As a monomer, alpha-lactalbumin
strongly binds calcium and zinc ions and may possess bactericidal or antitumor
activity. A folding variant of alpha-lactalbumin, called HAMLET, likely induces
apoptosis in tumor and immature cells.

6. Obtaining the most recent submissions. We will search for recent submissions
on groundnut (Bacillus subtilis) in the Nucleotide database. How many
submissions were done of ground nut sequences between January 1st 2015 and
December 31st 2015.
7. What are the accession numbers of the 10 largest human mRNA sequences in
Genbank? You need an advanced search for this:
▪ Set Organism to Mus musculus Click Search
▪ Activate the mRNA filter in the Filter menu at the left of the search results
page
▪ Activate the Sequence Length filter and set the range to 70000 to 999999. To
know the ideal lower limit, you have to do some trial and error until you have
length thresholds that return between 10 and 20 mRNA sequences

Ans.
Accession: NM_001385708.1
Accession: XM_036165622.1
Accession: XM_036160535.1
Accession: XM_036160534.1
Accession: XM_036160529.1
Accession: XM_036160528.1
Accession: XM_036160527.1
Accession: XM_036160524.1
Accession: XM_036160520.1
Accession: XM_006519771.5
Exercise 4
Extraction of secondary data from TRANSFAC and InterPro Databases

AIM:

To extract real matrices and family data fromTRANSFAC and InterProI.

Accessing data from TRANSFAC

PROTOCOL

1. Open Browser (Chrome or Firefox) from your system

2. Visit the TRANSFAC database (https://genexplain.com/transfac/#section1)

3. Go to the structure section and click on 'here' which will direct you to a new webpage

4. Go to the 'Superclass: Basic Domains' and click 'Jun-B' under Jun Factors

5. Click on ( ) which directs to 'The Human Protein Atlas'

6. Visit 'Tissue, Cell type, Pathology, Brain, Blood and Cell' to answer the following

Queries

QUERIES AND RESULT


Ans.
General Information - Tissue
Gene Name: JUNB
Gene Description: JunB proto-oncogene, AP-1 transcription factor subunit
Protein Class: Human disease related genes
Plasma proteins
Transcription factors
Location: Intracellular
General Information – Cell Type
Gene Name: JUNB
Gene Description: JunB proto-oncogene, AP-1 transcription factor subunit
Protein Class: Human disease related genes
Plasma proteins
Transcription factors
Location: Intracellular
General Information – Pathology
Gene Name: JUNB
Gene Description: JunB proto-oncogene, AP-1 transcription factor subunit
Protein Class: Human disease related genes
Plasma proteins
Transcription factors
Location: Intracellular
General Information – Brain (Human gene)
Gene Name: JUNB
Gene Description: JunB proto-oncogene, AP-1 transcription factor subunit
Protein Class: N/A
Location: Intracellular
General Information – Blood

Specific Information
1. Tissue: Provide screenshot for Protein Expression of JUN in different organs
and RNA expression of JUN in different tissues
2. Cell Type: Provide screenshot for the expression pattern of JUN across
several cells in the body
3. Pathology: Provide screenshot for the Interactive survival scatter plot of JUN
in Cervical Cancer and Lung Cancer
4. Brain: Provide the screenshot for RNA expression of JUN in Brain
5. Blood: Provide the screenshot for RNA expression of JUN in different blood
cells
II. Identification of protein domains using InterPro
PROTOCOL
Input Query accession numbers: Uniprot ID: P48742
Open Browser (Chrome or Firefox) from your system
1. Go to uniprot database https://www.uniprot.org/
2. Paste/ type the query accession number P48742
3. Retrieve the protein sequence from UniProt database
4. Provide the sequence details such as protein name, Gene name, review status
and organism

Ans.
Protein: LIM/homeobox protein Lhx1
Gene: LHX1
Status: UniProtKB reviewed (Swiss-Prot)
Organism: Homo sapiens (Human)

5. Visit the Interpro https://www.ebi.ac.uk/interpro/


InterPro provides functional analysis of proteins by classifying them into
families and predicting domains and important sites.

6. Paste the protein sequence copied from Uniprot database


Ans.
>sp|P48742|LHX1_HUMAN LIM/homeobox protein Lhx1 OS=Homo sapiens
OX=9606 GN=LHX1 PE=1 SV=2
MVHCAGCKRPILDRFLLNVLDRAWHVKCVQCCECKCNLTEKCFSREGKLYCKNDFF
RCFGTKCAGCAQGISPSDLVRRARSKVFHLNCFTCMMCNKQLSTGEELYIIDENKFV
CKEDYLSNSSVAKENSLHSATTGSDPSLSPDSQDPSQDDAKDSESANVSDKEAGSNE
NDDQNLGAKRRGPRTTIKAKQLETLKAAFAATPKPTRHIREQLAQETGLNMRVIQVW
FQNRRSKERRMKQ
LSALGARRHAFFRSPRRMRPLVDRLEPGELIPNGPFSFYGDYQSEYYGPGGNYDFFPQ
GPPSSQAQTPVDLPFVPSSGPSGTPLGGLEHPLPGHHPSSEAQRFTDILAHPPGDSPSPE
PSLPGPLHSMSAEVFGPSPPFSSLSVNGGASYGNHLSHPPEMNEAAVW

7. Report the domain information associated with the query sequence and
provide the screenshot

8. Report the number of architectures, aligned sequences, reported interactions


and the number of proteins structures of each domain by clicking on each
domain given at the right side.

Ans.
1. Znf_LIM - IPR001781
Domain Architectures: 1k
Aligned Sequences: 0
Interactions:45
Proteins:126kStructures: 86
Taxonomy: 10k
Proteomes: 3k
AlphaFold: 78k
Pathways: 101

2. Lhx1/5_LIM1 - IPR049618
Domain Architectures: 0
Aligned Sequences: 0
Interactions: 0
Proteins: 2k
Structures: 0
Taxonomy: 3k
Proteomes: 694
AlphaFold: 1k
Pathways: 3

3. HD - IPR001356
Domain Architectures: 2k
Aligned Sequences: 0
Interactions: 29
Proteins: 311k
Structures: 233
Taxonomy: 15k
Proteomes: 3k
AlphaFold: 213k
Pathways: 311
4. Lhx1/5_LIM2 - IPR049619
Domain Architectures: 0
Aligned Sequences: 0
Interactions: 0
Proteins: 2k
Structures: 0
Taxonomy: 3k
Proteomes: 779
AlphaFold: 1k
Pathways: 3

You might also like