0% found this document useful (0 votes)
16 views17 pages

Bioinformatics Lab Assaignment 2

The document outlines an assignment for a Bioinformatics Lab course focused on Retinol Binding Protein 4 and HIV pol, guiding students through the use of NCBI's website for gene research. It includes specific tasks such as searching for proteins, understanding gene functions, and exploring protein domains and mutations. The assignment emphasizes the importance of precise search terms and provides insights into the biological significance of RBP4 in vitamin A transport and its implications for human health.

Uploaded by

salmanalishba980
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
16 views17 pages

Bioinformatics Lab Assaignment 2

The document outlines an assignment for a Bioinformatics Lab course focused on Retinol Binding Protein 4 and HIV pol, guiding students through the use of NCBI's website for gene research. It includes specific tasks such as searching for proteins, understanding gene functions, and exploring protein domains and mutations. The assignment emphasizes the importance of precise search terms and provides insights into the biological significance of RBP4 in vitamin A transport and its implications for human health.

Uploaded by

salmanalishba980
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 17

NAME; Alishba Salman

CLASS; 3rd year (1st semester)


DEPARTMENT; Biotechnology
COURSE TITLE; BIOINFORMATICS LAB
COURSE CODE; BTH-3051
COURSE INCHARGE; Miss Ayesha Aman
ASSAIGNMENT OF BIOINFORMATICS
Lab
 You will be looking at the Retinol Binding Protein 4 and HIV pol to learn to navigate
through NCBI’s website and the different linked databases. After performing this assignment
with one gene, you will perform a similar assignment with your gene of interest.

Go to NCBI’s website (http://www.ncbi.nlm.nih.gov/)

When doing database searches, your search terms need to be as specific as possible in order to
eliminate large returns of data, some of it useless. A record is an individual file or NCBI “hit”
obtained from a search.

1. Start at the main NCBI page. Use All Databases on the NCBI home page. To retrieve a large
amount of returns, use “retinol binding protein” as your search term. How many hits did you
find in the Entrez Page?

1. How many of these hits fell into the protein category?

There are 5966 hits fell into the protein category.

1. Would you get a different answer without using the quotation marks around your search
term? Why?
2. Now try “retinol binding protein 4” on the All Databases search. How many proteins do you
find in the Entrez Page?

3. What about “rpb4” on the All Databases search?

The term “rpb4” in the All Databases search likely refers to the gene encoding the fourth-largest
subunit of RNA polymerase II (Pol II), a crucial enzyme responsible for synthesizing messenger
RNA in eukaryotic cells. In Saccharomyces cerevisiae (baker’s yeast), this gene is known as
RPB4, and in humans, it is encoded by the POLR2D gene.

3. How many proteins do you find in the Entrez Page?

There are 1435 proteins in the Entrez Page

3. To make it even more specific let’s add “rbp4 homo sapiens.

3. How many proteins do you find using the All Databases Search?
There are 42 proteins find in the All Databases Search.
4. In question 3 you are actually looking for the full length rbp4 for Homo sapiens with
accession number NP_006735

5. What about searching using this tool do you think you still make you get other hits when you type in
“rbp4 homo sapiens?”
6. What is the full name of this gene’s protein product?
7. Give a brief description of what the protein does. If you quote a record, give me the link you
used?

Retinol-binding protein (RBP) is a specialized glycoprotein. Retinol binding protein 4, also


known as RBP4, This protein belongs to the lipocalin family and is the specific carrier for
protein. For retinol (vitamin A alcohol) in the bloodstream. Synthesized mainly in the liver and
adipose tissue, RBP binds retinol in a 1:1 ratio, forming a complex that solubilizes the
hydrophobic vitamin, protecting it from oxidative damage, and facilitating its transport to various
target tissues, including the retina, skin, lungs, and gonads. It delivers retinol from the liver
stores to the peripheral tissues. In plasma, the RBP-retinol complex interacts with transthyretin,
which prevents its loss by filtration through the kidney glomeruli. A deficiency of vitamin A
blocks secretion of the binding protein posttranslational and results in defective delivery and
supply to the epidermal cells

https://en.wikipedia.org/wiki/Retinol_binding_protein_4?utm_source=chatgpt.com

8. How many amino acids are in this protein?

The number of amino acids in retinol-binding proteins varies depending on the specific type of
retinol-binding protein (RBP).

1. Human Serum Retinol-Binding Protein (RBP4): This protein consists of 182 amino
acid residues.
2. Human Cellular Retinol-Binding Protein 1 (CRBP1): This protein comprises 135
amino acid residues.

These variations are due to differences in the specific functions and structures of each RBP type

9. Is there functional protein domains described for this protein? You will find this in the
conserved domain database. This is either in RefSeq or can be linked from Domains through
the record. List them.

Yes, the RBP4 (Retinol Binding Protein 4) protein contains functional domains that are well-
characterized in the Conserved Domain Database (CDD) and other resources like UniProt and
InterPro.

Functional Domains in RBP4

1. Lipocalin_RBP_like (CDD: cd00743)


o Location: Amino acids 22–192
o Description: This domain is characteristic of the lipocalin family, which includes
proteins involved in the transport of small hydrophobic molecules. In RBP4, this
domain is crucial for binding retinol (vitamin A) and its transport in the
bloodstream. NCBIlipidmaps.org+1NCBI+1
2. Lipocalin (Pfam: PF00061)
o Location: Amino acids 37–175
o Description: This domain is also associated with the lipocalin family and is
involved in binding and transporting small hydrophobic molecules, such as
retinol. NCBI
3. Retinol Binding Protein/Purpurin (InterPro: IPR002449)
o Description: This domain is specific to retinol-binding proteins and purpurin,
highlighting the protein's role in binding and transporting retinol. lipidmaps.org

Accessing Domain Information

 NCBI Conserved Domain Database (CDD): Provides detailed information on


conserved domains within protein sequences.
 UniProt: Offers comprehensive protein sequence and functional information, including
domain annotations.
 InterPro: Integrates multiple protein signature databases to provide functional
annotations.

10. How many amino acids are in the sig_peptide____________? What is the sig peptide? How
many are in the mat_peptide__________? What is the mat peptide?

Signal peptide (sig_peptide)

The signal peptide is a short amino acid sequence at the beginning (N-terminus) of a newly
synthesized protein. It typically consists of 15–30 amino acids.

 Its function is to direct the protein to the endoplasmic reticulum (ER) for secretion or
membrane insertion in eukaryotic cells.
 After reaching the ER, the signal peptide is usually cleaved off by a signal peptidase.
 It is not part of the mature protein.

Mature peptide (mat_peptide)

The mature peptide is the final, functional form of the protein after all processing steps (like
signal peptide removal, cleavage, and folding).

 It is the protein that performs the biological function.


 It typically lacks the signal peptide.

Amino acids are in the sig_peptide

 Without specific data, the typical range is 15–30 amino acids.

To get the exact number, you'd need the amino acid sequence or information from a database
(e.g., UniProt or NCBI).
Amino acids are in the mat_peptide

 You can calculate this by subtracting the number of amino acids in the signal peptide
from the full precursor protein length (if there are no other propeptides or cleavages).

Example (if you provide a sequence or gene/protein ID):

If you have a protein like preproinsulin:

 Signal peptide = 24 amino acids


 Mature insulin = ~51 amino acids (after proinsulin processing)

11. What does CDS stand for and how many nucleotides are in the CDS for this gene?

CDS stands for Coding DNA Sequence. It refers to the portion of a gene's DNA or RNA that
codes for protein — specifically, it's the region that is translated into amino acids. To
determine how many nucleotides are in the CDS for a particular gene, I would need to see the
gene sequence or be given the gene name and access to a database such as NCBI, Ensemble or a
genome browser. If you have:

 The gene sequence: You can count the number of nucleotides in the CDS.
 The gene name: I can look it up and tell you the CDS length.
 A FASTA or GenBank file: Upload it here, and I’ll extract the CDS length for you.

12. Can you find any PubMed references for this gene? Give me the link(s) of 3 of these.
13. What does it mean when the record states that it has been “curated by NCBI staff?”

When a record states that it has been “curated by NCBI staff,” it means that the information in
that record has been reviewed, verified, and possibly edited by experts at the National Center
for Biotechnology Information (NCBI). This manual curation ensures greater accuracy,
consistency, and reliability of the data, compared to automatically generated or unreviewed
records.

Specifically, curation by NCBI staff may involve:

 Checking for correct gene/protein annotations


 Resolving discrepancies in the data
 Adding cross-references to related databases
 Improving functional descriptions or metadata
 Making sure the data follows NCBI’s quality standards

14. Read the section on RefSeq. http://www.ncbi.nlm.nih.gov/RefSeq/RSfaq.html Based on


your earlier searching, explain in your own words why RefSeq is useful in bioinformatics.

The NCBI Reference Sequence (RefSeq) database is a curated, non-redundant collection of


genomic, transcript, and protein sequences. It serves as a foundational resource in
bioinformatics by providing standardized and well-annotated sequences that are essential for
various analyses, including gene identification, mutation detection, and functional
annotation.

For instance, in the study of retinol-binding proteins, which are crucial for vitamin A
transport and metabolism, RefSeq offers a comprehensive and accurate reference for the
RBP1 gene and its associated protein. This enables researchers to confidently interpret
sequence variations, understand gene function, and explore potential implications in health
and disease.

RefSeq's utility extends beyond individual gene studies. Its integration with other NCBI
resources facilitates comparative genomics, evolutionary studies, and the development of
diagnostic tools. By providing a stable and consistent coordinate system, RefSeq supports the
accurate reporting of clinical variations and enhances the reproducibility of bioinformatics
analysis.

In summary, RefSeq is indispensable in bioinformatics for its role in providing high-quality,


curated sequence data that underpin a wide range of genomic analyses and applications.
15. Click on the link associated with Conserved Domains under Entrez Gene (it is in the list to
the right). What is a conserved domain of this protein called? What is its function
16. What chromosome is this gene on? Which chromosome arm is it on? How many nucleotides
are listed in this entire chromosome? You will find this information in Entrez GENE
database or in the Map viewer links to the right on the page.

The RBP4 gene, which encodes retinol binding protein 4, is located on chromosome 10 at
cytoband 10q23.33. Its precise genomic coordinates are 93591694 to 93601744 on the reverse
strand of chromosome 10, according to the GRCh38.p14 human genome assembly.

This gene is situated on the long arm (q arm) of chromosome 10, specifically in the 10q23.33
region. The "q" designation indicates the long arm of the chromosome, distinguishing it from the
short arm, labeled "p"
1. Entire Chromosome 10

 In the human genome (GRCh38), chromosome 10 contains:


o 133,797,422 base pairs (nucleotides)

2. RBP4 Gene Length

 The RBP4 gene spans:


o From position 93,591,694 to 93,601,744 on chromosome 10 (GRCh38)
o That's a total of 10,051 nucleotides (genomic span)

17. Click on Map viewer. What is the accession number of the genomic contig for
RBP4_____________? How many nucleotides do it contain___________? What is a
genomic contig?
 A genomic contig (short for contiguous sequence) is a continuous stretch of DNA
sequence that has been assembled from shorter sequence reads during genome
sequencing.
18. Click on the annotation links labeled sv, pr, dl, ev, mm, hm, sts in Map viewer in the pink
box. What is each of these links abbreviations for?

In NCBI's Map Viewer, the annotation links labeled sv, pr, dl, ev, mm, hm, and sts correspond
to specific tools and resources that provide detailed information about genomic regions. Here's
what each abbreviation stands for:

Annotation Links in Map Viewer

1. sv – Sequence Viewer
o Displays a graphical representation of the nucleotide sequence for the selected
region, allowing users to view gene structures, exons, and other genomic features.
2. pr – Protein
o Links to the protein sequence(s) associated with the gene or genomic region of
interest, providing insights into the translated product.
3. dl – Download
o Offers options to download sequence data from the specified chromosomal region
in various formats for further analysis.
4. ev – Evidence Viewer
o Shows alignments of RefSeq and GenBank transcript sequences (such as mRNAs
and ESTs) to the genomic contig, highlighting supporting evidence for gene
models.
5. mm – Model Maker
o Provides tools to construct or refine gene models based on available transcript
data and genomic sequence, aiding in the prediction of gene structure.
6. hm – HomoloGene
o Links to the HomoloGene database, which identifies homologous genes across
different species, facilitating comparative genomics studies.
7. sts – Sequence Tagged Site
o Directs to information about Sequence Tagged Sites, which are short, unique
DNA sequences used as landmarks in genetic mapping and marker-assisted
studies.

These tools collectively enhance the functionality of Map Viewer by providing comprehensive
resources for viewing, analyzing, and interpreting genomic data
19. Does this gene contain introns? If so, how many and where are the splice junctions? Which
link did you use to discover this? There are several, including looking for the gene name in
the genomic contig sequence, or looking in the whole chromosome sequence.

Yes, the RBP4 gene (retinol binding protein 4) contains introns. According to the NCBI Gene
database, the RBP4 gene comprises 8 exons. This suggests the presence of introns between these
exons, as the gene is transcribed into precursor mRNA (pre-mRNA) that includes both exons and
introns.

How to Determine the Number of Introns and Splice Junctions

To identify the exact number and locations of introns, as well as the splice junctions, you can
utilize the NCBI Genome Data Viewer or the NCBI Gene database. Here's how:

1. NCBI Gene Database:


o Visit the RBP4 Gene page on NCBI.
o Under the "Genomic regions, transcripts, and products" section, click on the
"See RBP4 in Genome Data Viewer" link.
o This will open a graphical representation of the gene's structure, displaying the
exons and introns along the chromosome.NCBI+2NCBI+2NCBI+2
2. NCBI Genome Data Viewer:
o In the Genome Data Viewer, you can zoom in on the specific region of
chromosome 10 where RBP4 is located (10q23.33).
o The viewer will show the gene's exons and introns, along with the splice
junctions, which are typically located at the boundaries between exons and
introns.

By examining the graphical representation, you can determine the number of introns and the
precise locations of the splice junctions.
19. Click on the OMIM link. What is a biological consequence of a mutation in this protein for
humans?

A mutation in the RBP4 (Retinol Binding Protein 4) gene can have several biological
consequences in humans, primarily due to its essential role in transporting vitamin A (retinol)
from the liver to peripheral tissues.

Biological Function of RBP4

RBP4 binds retinol (vitamin A) in the blood and delivers it to cells via interaction with a receptor
called STRA6. Vitamin A is crucial for:

 Vision (especially night vision)


 Immune function
 Embryonic development
 Cellular growth and differentiation

Consequences of Mutations in RBP4

1. Eye and Vision Disorders


o Night blindness and other retinal dysfunctions can occur due to impaired
vitamin A transport.
o Rare inherited retinal dystrophies have been linked to RBP4 mutations.
2. Congenital Malformations
o Mutations can cause RBP4-related oculofaciocardiodental (OFCD)-like
syndromes, including:
 Eye abnormalities (e.g., microphthalmia)
 Facial dysmorphisms
 Congenital heart defects
 Developmental delays
3. Metabolic Disorders
o Elevated or altered RBP4 levels (due to mutations or regulation issues) have been
associated with:
 Insulin resistance
 Type 2 diabetes
 Obesity
o Though these are not typically caused by coding mutations, RBP4 is being studied
as a biomarker for these conditions.

Case Example:

A loss-of-function mutation in RBP4 can lead to vitamin A deficiency, even if dietary intake is
adequate, because the body cannot transport retinol efficiently. This can result in symptoms like:

 Dry eyes
 Impaired immunity
 Skin issues
 Growth retardation in children

20. Can you find your gene in SwissProt (http://us.expasy.org/sprot/) database? Give me the
accession number in SwissProt.
21. What is the advantage of SWISS-PROT vs. NCBI?

Both SWISS-PROT (now part of UniProtKB/Swiss-Prot) and NCBI provide protein sequence
data, but they serve different purposes and offer different strengths.

Here’s a direct comparison:

SWISS-PROT (UniProtKB/Swiss-Prot)

✅ Advantages:

1. Manual Curation
o Every entry is reviewed by experts.
o Errors are corrected, and information is added based on experimental evidence.
2. High-Quality Functional Annotations
o Includes detailed info on:
 Protein function
 Domain structure
 Post-translational modifications
 Variants and disease links
3. Non-redundant
o Only one entry per protein per species (no duplicated submissions).
4. Stable and Consistent Format
o Better suited for reliable data mining, modeling, and pathway analysis.

NCBI (GenPept / RefSeq / GenBank)

Advantages:

1. Broad Coverage
o Includes both curated (RefSeq) and unreviewed submissions (GenBank).
o Contains more recent and raw data, including novel or predicted proteins.
2. Integrated with Genomic Data
o Easily connects with gene locations, mRNA, genomic contigs, and other NCBI
tools (BLAST, Gene, Genome Data Viewer).
3. Rapid Updates
o New sequences are submitted and published quickly—useful for cutting-edge
research.
Feature SWISS-PROT (UniProtKB) NCBI (RefSeq/GenPept)

Curation Manual (high quality) Mixed: curated + automated

Annotation Depth Rich, functional Basic to moderate

Redundancy Non-redundant May include redundant entries

Update Speed Slower, carefully curated Fast, includes raw data

Genomic Integration Limited Strong (linked to genome)

Best For Trusted annotations, models Broad discovery, genome analysis

You might also like