0% found this document useful (0 votes)

13 views14 pages

Biological Databases Lab 2

The document contains exercises related to genome mining using NCBI resources, detailing nucleotide sequences for Bacillus subtilis and submissions by Matthew Berriman. It also includes extraction of secondary data from TRANSFAC and InterPro databases, focusing on the JUNB gene and LIM/homeobox protein Lhx1. Additionally, it reports on protein domains, architectures, and interactions associated with the query sequences.

Uploaded by

Suryaa

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

13 views14 pages

Biological Databases Lab 2

Uploaded by

Suryaa

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

You are on page 1/ 14

Biological Databases Lab

BBIT418P

Assessment – 2

By:
Suryaa .A
21BCB0032
Exercise 3

Genome mining- NCBI Resources

1.How many nucleotide sequences are there for the bacterium Bacillus
subtilis in the NCBI Sequence Database?

Ans. There is a total of 89,571 nucleotide sequences.

2.How many nucleotide sequences are there from the bacterium Bacillus
subtilis in the RefSeq part of the NCBI Sequence Database?

Ans. 35191 nucleotide sequences are there from the bacterium Bacillus
subtilis in the RefSeq part of the NCBI Sequence Database

3.How many nucleotide sequences were submitted to NCBI by Matthew

Berriman?

Ans. 648243 is the number of nucleotide sequences submitted by Matthew

Berriman to NCBI.

4.What is the accession number for the Bacillus subtilis genome in NCBI?

Ans. GCA_000009045.1 is the accession number

5. Retrieve the sequence of Accession: NM_002289.3. Give its name, phylum,

source, product and function.
Ans. Name: Homo sapiens
Phylum: Chordata
Source: Homo sapiens (human)
Product: "alpha-lactalbumin precursor"
Function: This gene encodes alpha-lactalbumin, a principal protein of milk.
Alpha-lactalbumin forms the regulatory subunit of the lactose synthase (LS)
heterodimer and beta 1,4- galactosyltransferase (beta4Gal-T1) forms the
catalytic component. Together, these proteins enable LS to produce lactose by
transfering galactose moieties to glucose. As a monomer, alpha-lactalbumin
strongly binds calcium and zinc ions and may possess bactericidal or antitumor
activity. A folding variant of alpha-lactalbumin, called HAMLET, likely induces
apoptosis in tumor and immature cells.

6. Obtaining the most recent submissions. We will search for recent submissions
on groundnut (Bacillus subtilis) in the Nucleotide database. How many
submissions were done of ground nut sequences between January 1st 2015 and
December 31st 2015.
7. What are the accession numbers of the 10 largest human mRNA sequences in
Genbank? You need an advanced search for this:
▪ Set Organism to Mus musculus Click Search
▪ Activate the mRNA filter in the Filter menu at the left of the search results
page
▪ Activate the Sequence Length filter and set the range to 70000 to 999999. To
know the ideal lower limit, you have to do some trial and error until you have
length thresholds that return between 10 and 20 mRNA sequences

Ans.
Accession: NM_001385708.1
Accession: XM_036165622.1
Accession: XM_036160535.1
Accession: XM_036160534.1
Accession: XM_036160529.1
Accession: XM_036160528.1
Accession: XM_036160527.1
Accession: XM_036160524.1
Accession: XM_036160520.1
Accession: XM_006519771.5
Exercise 4
Extraction of secondary data from TRANSFAC and InterPro Databases

AIM:

To extract real matrices and family data fromTRANSFAC and InterProI.

Accessing data from TRANSFAC

PROTOCOL

1. Open Browser (Chrome or Firefox) from your system

2. Visit the TRANSFAC database (https://genexplain.com/transfac/#section1)

3. Go to the structure section and click on 'here' which will direct you to a new webpage

4. Go to the 'Superclass: Basic Domains' and click 'Jun-B' under Jun Factors

5. Click on ( ) which directs to 'The Human Protein Atlas'

6. Visit 'Tissue, Cell type, Pathology, Brain, Blood and Cell' to answer the following

Queries

QUERIES AND RESULT

Ans.
General Information - Tissue
Gene Name: JUNB
Gene Description: JunB proto-oncogene, AP-1 transcription factor subunit
Protein Class: Human disease related genes
Plasma proteins
Transcription factors
Location: Intracellular
General Information – Cell Type
Gene Name: JUNB
Gene Description: JunB proto-oncogene, AP-1 transcription factor subunit
Protein Class: Human disease related genes
Plasma proteins
Transcription factors
Location: Intracellular
General Information – Pathology
Gene Name: JUNB
Gene Description: JunB proto-oncogene, AP-1 transcription factor subunit
Protein Class: Human disease related genes
Plasma proteins
Transcription factors
Location: Intracellular
General Information – Brain (Human gene)
Gene Name: JUNB
Gene Description: JunB proto-oncogene, AP-1 transcription factor subunit
Protein Class: N/A
Location: Intracellular
General Information – Blood

Specific Information
1. Tissue: Provide screenshot for Protein Expression of JUN in different organs
and RNA expression of JUN in different tissues
2. Cell Type: Provide screenshot for the expression pattern of JUN across
several cells in the body
3. Pathology: Provide screenshot for the Interactive survival scatter plot of JUN
in Cervical Cancer and Lung Cancer
4. Brain: Provide the screenshot for RNA expression of JUN in Brain
5. Blood: Provide the screenshot for RNA expression of JUN in different blood
cells
II. Identification of protein domains using InterPro
PROTOCOL
Input Query accession numbers: Uniprot ID: P48742
Open Browser (Chrome or Firefox) from your system
1. Go to uniprot database https://www.uniprot.org/
2. Paste/ type the query accession number P48742
3. Retrieve the protein sequence from UniProt database
4. Provide the sequence details such as protein name, Gene name, review status
and organism

Ans.
Protein: LIM/homeobox protein Lhx1
Gene: LHX1
Status: UniProtKB reviewed (Swiss-Prot)
Organism: Homo sapiens (Human)

5. Visit the Interpro https://www.ebi.ac.uk/interpro/

InterPro provides functional analysis of proteins by classifying them into
families and predicting domains and important sites.

6. Paste the protein sequence copied from Uniprot database

Ans.
>sp|P48742|LHX1_HUMAN LIM/homeobox protein Lhx1 OS=Homo sapiens
OX=9606 GN=LHX1 PE=1 SV=2
MVHCAGCKRPILDRFLLNVLDRAWHVKCVQCCECKCNLTEKCFSREGKLYCKNDFF
RCFGTKCAGCAQGISPSDLVRRARSKVFHLNCFTCMMCNKQLSTGEELYIIDENKFV
CKEDYLSNSSVAKENSLHSATTGSDPSLSPDSQDPSQDDAKDSESANVSDKEAGSNE
NDDQNLGAKRRGPRTTIKAKQLETLKAAFAATPKPTRHIREQLAQETGLNMRVIQVW
FQNRRSKERRMKQ
LSALGARRHAFFRSPRRMRPLVDRLEPGELIPNGPFSFYGDYQSEYYGPGGNYDFFPQ
GPPSSQAQTPVDLPFVPSSGPSGTPLGGLEHPLPGHHPSSEAQRFTDILAHPPGDSPSPE
PSLPGPLHSMSAEVFGPSPPFSSLSVNGGASYGNHLSHPPEMNEAAVW

7. Report the domain information associated with the query sequence and
provide the screenshot

8. Report the number of architectures, aligned sequences, reported interactions

and the number of proteins structures of each domain by clicking on each
domain given at the right side.

Ans.
1. Znf_LIM - IPR001781
Domain Architectures: 1k
Aligned Sequences: 0
Interactions:45
Proteins:126kStructures: 86
Taxonomy: 10k
Proteomes: 3k
AlphaFold: 78k
Pathways: 101

2. Lhx1/5_LIM1 - IPR049618
Domain Architectures: 0
Aligned Sequences: 0
Interactions: 0
Proteins: 2k
Structures: 0
Taxonomy: 3k
Proteomes: 694
AlphaFold: 1k
Pathways: 3

3. HD - IPR001356
Domain Architectures: 2k
Aligned Sequences: 0
Interactions: 29
Proteins: 311k
Structures: 233
Taxonomy: 15k
Proteomes: 3k
AlphaFold: 213k
Pathways: 311
4. Lhx1/5_LIM2 - IPR049619
Domain Architectures: 0
Aligned Sequences: 0
Interactions: 0
Proteins: 2k
Structures: 0
Taxonomy: 3k
Proteomes: 779
AlphaFold: 1k
Pathways: 3

#3 Mol Bio Gene of Interest QUESTIONS
100% (1)
#3 Mol Bio Gene of Interest QUESTIONS
4 pages
MCQS Paper Molecular Biology
No ratings yet
MCQS Paper Molecular Biology
10 pages
Module in Tics
No ratings yet
Module in Tics
20 pages
Bioinfo Lab Final
No ratings yet
Bioinfo Lab Final
49 pages
Uniprotkb Tutorial DJL 2011-10-28
No ratings yet
Uniprotkb Tutorial DJL 2011-10-28
12 pages
Bioinformatics Exercise TYBSC
No ratings yet
Bioinformatics Exercise TYBSC
13 pages
Practical Lab Exercise For Intro Bioinf II
No ratings yet
Practical Lab Exercise For Intro Bioinf II
29 pages
Computational Biology B.Tech - Biotech (Vith Semester)
No ratings yet
Computational Biology B.Tech - Biotech (Vith Semester)
34 pages
Bioinformatics Manual Updated
No ratings yet
Bioinformatics Manual Updated
48 pages
Activity 1: Using Databases To Analyze DNA Sequences
No ratings yet
Activity 1: Using Databases To Analyze DNA Sequences
9 pages
Bioinformatics Lab 1
0% (1)
Bioinformatics Lab 1
4 pages
23msc02001 CB Journal
No ratings yet
23msc02001 CB Journal
34 pages
Koenig Biological Databases
No ratings yet
Koenig Biological Databases
35 pages
Online Biological Databases: A/Prof. Ly Le
No ratings yet
Online Biological Databases: A/Prof. Ly Le
64 pages
Lab Report 1 Bioinformatics
No ratings yet
Lab Report 1 Bioinformatics
13 pages
BI Lab Manual (18-19)
No ratings yet
BI Lab Manual (18-19)
21 pages
Bioinform-Tica-Pdf-May-6-2010-12-38-Pm-3-5-Meg
No ratings yet
Bioinform-Tica-Pdf-May-6-2010-12-38-Pm-3-5-Meg
105 pages
lecture2-BGGN213 F17
No ratings yet
lecture2-BGGN213 F17
10 pages
15GN402L Final Bioinformatics Lab Manual
No ratings yet
15GN402L Final Bioinformatics Lab Manual
68 pages
PC#1 Exercises Introduction To NCBI 2020-Solved
No ratings yet
PC#1 Exercises Introduction To NCBI 2020-Solved
6 pages
University of Okara: Name: Topic: Subject: Semester: Department
No ratings yet
University of Okara: Name: Topic: Subject: Semester: Department
29 pages
Ahmed Saad Qatea / 4 Stage
No ratings yet
Ahmed Saad Qatea / 4 Stage
10 pages
Advanced Cellular Biology
No ratings yet
Advanced Cellular Biology
50 pages
Coursera 14b Unit 1-Ncbi PDF
No ratings yet
Coursera 14b Unit 1-Ncbi PDF
5 pages
Bioinformatic Database Record
No ratings yet
Bioinformatic Database Record
63 pages
Adv Bi Unit 1
No ratings yet
Adv Bi Unit 1
39 pages
Genomics & Proteomics
No ratings yet
Genomics & Proteomics
22 pages
Semwork 1
No ratings yet
Semwork 1
19 pages
Biological Sequence Databases
No ratings yet
Biological Sequence Databases
33 pages
Vels University Bioinformatics Manual-2025 - Prakash Balu
No ratings yet
Vels University Bioinformatics Manual-2025 - Prakash Balu
37 pages
Bioinformatics
No ratings yet
Bioinformatics
55 pages
GTGF GGCF
No ratings yet
GTGF GGCF
19 pages
Lecture 3 Database
No ratings yet
Lecture 3 Database
81 pages
BTH 403-BTG407 Practical Session1
No ratings yet
BTH 403-BTG407 Practical Session1
12 pages
Active Learning Activity 1 Bms551 Principles of Bioinformatics
No ratings yet
Active Learning Activity 1 Bms551 Principles of Bioinformatics
2 pages
Introduction To Bioinformatics
No ratings yet
Introduction To Bioinformatics
30 pages
Fat Noews
No ratings yet
Fat Noews
21 pages
Biology Essay - Genome
No ratings yet
Biology Essay - Genome
10 pages
Manual
No ratings yet
Manual
68 pages
Sequence and Structure Retrieval
No ratings yet
Sequence and Structure Retrieval
9 pages
Class12 Biological Database
No ratings yet
Class12 Biological Database
23 pages
Lecture 3
No ratings yet
Lecture 3
55 pages
Data Mining & Sequence Retrieval Practical
No ratings yet
Data Mining & Sequence Retrieval Practical
46 pages
Seminari 3 - Analisis Estructura Proteines
No ratings yet
Seminari 3 - Analisis Estructura Proteines
56 pages
BS20B015 Bioinfo3
No ratings yet
BS20B015 Bioinfo3
9 pages
SQH7001 Bioinformatics Task - Velda Rifka Almira
No ratings yet
SQH7001 Bioinformatics Task - Velda Rifka Almira
9 pages
PC#1 Exercises Introduction To NCBI 2020 v2
No ratings yet
PC#1 Exercises Introduction To NCBI 2020 v2
4 pages
Uniprot (Practicle) : S B Mirza 1314
No ratings yet
Uniprot (Practicle) : S B Mirza 1314
4 pages
2006 09 01 - Lect01 - ch1 2 PDF
No ratings yet
2006 09 01 - Lect01 - ch1 2 PDF
104 pages
Lab Report 07
100% (1)
Lab Report 07
19 pages
Bioinfo Endsem Project
No ratings yet
Bioinfo Endsem Project
18 pages
Uniprot Webinar Oct 2022
No ratings yet
Uniprot Webinar Oct 2022
48 pages
Asm 4
No ratings yet
Asm 4
12 pages
Mulder 2007
No ratings yet
Mulder 2007
13 pages
Biological Sequence Databases: A. National Center For Biotechnology Information (NCBI)
No ratings yet
Biological Sequence Databases: A. National Center For Biotechnology Information (NCBI)
41 pages
Note That There Are Several Different "Basic Blast" Programs Available at Ncbi (Including Nucleotide Blast, Protein Blast, and Blastx)
No ratings yet
Note That There Are Several Different "Basic Blast" Programs Available at Ncbi (Including Nucleotide Blast, Protein Blast, and Blastx)
10 pages
Assign 4 - GR5 - S22324
No ratings yet
Assign 4 - GR5 - S22324
9 pages
Bioinformatics: Intended Learning Outcomes
No ratings yet
Bioinformatics: Intended Learning Outcomes
9 pages
Lecture 4 Ncbi Database
No ratings yet
Lecture 4 Ncbi Database
30 pages
Overview On Bioinformatics
No ratings yet
Overview On Bioinformatics
75 pages
Lipids and Lipoproteins Part 2
No ratings yet
Lipids and Lipoproteins Part 2
23 pages
Enzyme Classification Exercise
No ratings yet
Enzyme Classification Exercise
10 pages
Fatty Acid Synthesis
No ratings yet
Fatty Acid Synthesis
30 pages
Cell-Cell Adhesion and Cell Junction: Submitted by Ashish Palodkar Msc. Biotechnology 1 Sem
No ratings yet
Cell-Cell Adhesion and Cell Junction: Submitted by Ashish Palodkar Msc. Biotechnology 1 Sem
70 pages
METTL16 Antagonizes MRE11-mediated DNA End Resection and Confers Synthetic Lethality To PARP Inhibition in Pancreatic Ductal Adenocarcinoma
No ratings yet
METTL16 Antagonizes MRE11-mediated DNA End Resection and Confers Synthetic Lethality To PARP Inhibition in Pancreatic Ductal Adenocarcinoma
31 pages
Final Exam (Practice) KEY
100% (1)
Final Exam (Practice) KEY
11 pages
Biochemistry
No ratings yet
Biochemistry
3 pages
Question Bank - Chapter 6
No ratings yet
Question Bank - Chapter 6
5 pages
Biochem Toc
No ratings yet
Biochem Toc
20 pages
Post-Translational Modification - Wikipedia
No ratings yet
Post-Translational Modification - Wikipedia
62 pages
Enzymes: Proteins With Catalytic Properties
No ratings yet
Enzymes: Proteins With Catalytic Properties
25 pages
WORKSHEET Biomolecules
No ratings yet
WORKSHEET Biomolecules
9 pages
Ques Fall2019
No ratings yet
Ques Fall2019
8 pages
Biomolecules Cornell
No ratings yet
Biomolecules Cornell
4 pages
DNA Extraction Lab March 8
No ratings yet
DNA Extraction Lab March 8
4 pages
TM-04 Prokaryotic and Eukaryotic Chromosome Structure (Genap 2016-2017)
No ratings yet
TM-04 Prokaryotic and Eukaryotic Chromosome Structure (Genap 2016-2017)
32 pages
Chapter 20: Biotechnology: Name - Period
No ratings yet
Chapter 20: Biotechnology: Name - Period
9 pages
Biotechnology Reviewer
No ratings yet
Biotechnology Reviewer
5 pages
Lecture 4 - Structure of Viruses
No ratings yet
Lecture 4 - Structure of Viruses
57 pages
Bio Molecules
No ratings yet
Bio Molecules
7 pages
3 Transcription-From - DNA - To - RNA
No ratings yet
3 Transcription-From - DNA - To - RNA
7 pages
PRETEST - 37 Copies
No ratings yet
PRETEST - 37 Copies
2 pages
Superfamily Database
No ratings yet
Superfamily Database
8 pages
Structure of Antibody
No ratings yet
Structure of Antibody
3 pages
Gel Electrophoresis Webquest
No ratings yet
Gel Electrophoresis Webquest
4 pages
Inhibition: Maria Roceline P. Sandoy 2MT01
No ratings yet
Inhibition: Maria Roceline P. Sandoy 2MT01
15 pages
IG Bio Enzymes s17 P43 Q2 D
No ratings yet
IG Bio Enzymes s17 P43 Q2 D
2 pages
Protein Metabolism
No ratings yet
Protein Metabolism
18 pages

Biological Databases Lab 2

Uploaded by

Biological Databases Lab 2

Uploaded by

Biological Databases Lab

Genome mining- NCBI Resources

Ans. There is a total of 89,571 nucleotide sequences.

3.How many nucleotide sequences were submitted to NCBI by Matthew

Ans. 648243 is the number of nucleotide sequences submitted by Matthew

Ans. GCA_000009045.1 is the accession number

5. Retrieve the sequence of Accession: NM_002289.3. Give its name, phylum,

To extract real matrices and family data fromTRANSFAC and InterProI.

Accessing data from TRANSFAC

1. Open Browser (Chrome or Firefox) from your system

2. Visit the TRANSFAC database (https://genexplain.com/transfac/#section1)

5. Click on ( ) which directs to 'The Human Protein Atlas'

QUERIES AND RESULT

5. Visit the Interpro https://www.ebi.ac.uk/interpro/

6. Paste the protein sequence copied from Uniprot database

8. Report the number of architectures, aligned sequences, reported interactions

You might also like