SQH7001 Bioinformatics Task - Velda Rifka Almira

Name : Velda Rifka Almira
Student ID : 22098713
Course ID : SQH7001
Course name : Research Methodology in Environmental Management Technology
Assignment on Bioinformatics
Lab Exercise – Part 1
1. Using the primary databases from NCBI website, search for mRNA and protein sequence of
insulin. The table below contains the results of required information list:
Type of
Insulin mRNA Sequence Insulin Protein Sequence
Data
Name of NCBI Nucleotide Database NCBI Protein Database
Database & (https://www.ncbi.nlm.nih.gov/nuccore/?ter (https://www.ncbi.nlm.nih.gov/protein/?ter
URL m=mRNA+of+insulin ) m=insulin+protein)
54607 72218
Number of
Search Hits
1. MN57671.1 1. AAA40590.1
2. NM_001204686.1 2. NP_001191615.1
3. M61153.1 3. KAB1251309.1
4. U03610.1 4. NP_001035835.1
5. JF909299.1 5. NP_571131.1
Top Five
Search
Result
Accession
Number
Type of
Insulin mRNA Sequence Insulin Protein Sequence
Data
DDBJ (374) PDB (2925)
EMBL (170) RefSeq (32346)
GenBank (15737) UniProtKB / Swiss-Prot (3109)
INSDC [GenBank] (16281) DDBJ (551)
RefSeq (38322) EMBL (1149)
TPA (4) GenBank (31926)
PIR (53)
Source
Databases
6499 5484
Number of
Sequence
Search Hits
Specifically
for Humans
384 603
Number of
Search Hits
for
Sequences
Released
from year
2016 to 2017
for Humans
2. Obtained the following required information from a secondary database regarding

hemoglobin protein sequence. The table below contains the results of required information
list:
Type of Data Hemoglobin Protein Sequence

UniProtKB
Name & URL of the
(https://www.uniprot.org/uniprotkb?query=hemoglobin+protein+sequ
database used
ences)
65110
Total number of results

obtained
1. Manually annotated & reviewed (Swiss-Prot) : 1191

2. Computationally/automatically annotated & not reviewed
Number of (TrEMBL) : 63919
Computationally &
Manually Curated
Sequences
37
Number of Sequence
that may be related to
disease(s)
1. Manually annotated & reviewed (Swiss-Prot) : 16

2. Computationally annotated & not reviewed (TrEMBL) : 54
Number of hb protein
sequence for Danio
Rerio that are
computationally and
manually reviewed
1. State the difference between :

a. Nuclotide and Nucleic Acid
b. Nucleic Acids and Amino Acids
2. Why each of the deposited sequence in a database is given an accession number or an ID?
3. State the similarity and differences between primary and secondary biological databases
4. In what situation permits the utilization of sequences in the primary databases (for the use of
scientific research)?
Answers :
1. a. Nucleotide is the monomer of nucleic acids, and nucleic acid is the polymer of nucleotides.
Nucleotide is composed of phosphate group and a nitrogenous base, which are attached to
pentose sugar. As for nucleic acid is composed of a chain of nucleotides, which are linked by
phosphodiester bonds.
They also have different function, nucleotides are polymerized to form DNA or RNA, they
serve as an energy source and signal transducer. While nucleic acids are also involved in
gene expression, as the storage of genetic information. Below are the examples of
nucleotides and nucleic acids:
• Nucleotides → ATP, ADP, CMP, dGTP, ddATP
• Nucleic Acids → DNA, RNA
b. Nucleic acid is a complex organic molecule that made up of nucleotides (pentose sugars,
nitrogenous bases, and phosplinked in a long chain. As for amino acid is a simple organic
molecule, which contains both carboxyl and amino groups. Nucleic acid is a polymer and the
monomer of nucleic acid are nucleotides, while the amino acid is a monomer and the polymer
of amino acids is a protein.
Both of them also serve different roles. Nucleic acids store genetic information of the cell and
are involved in the synthesis of functional proteins. On the other hand, Amino acids are used
in the translation of mRNA as building blocks of proteins.
2. Each deposited sequence in a database is given an accession number or an ID to identify the

sequence record and to track its updates over time. Sequence records can also be linked to
relevant entries in different databases using accession numbers.
3. Primary and secondary biological databases are highly important in bioinformatics, but both
of them have different purposes and distinctive characteristics. Although they still have
several similarities, notably as follows:
• Data Collection → Both collect and store biological data
• Accessible Online → Both are typically available online, and utilized by researchers
worldwide for various purposes.
• Both are essential tools in biological research, providing functional data which
answers a wide range of inquiries.
As for the differences between the primary and secondary biological databases, mainly as
follows:
Aspect of Differences Primary Databases Secondary Databases
Contain raw data from Contain processed or derived

experimental results data, frequently contain primary
Type of Data Stored
(directly obtained from data-based annotations,
sequencing experiments) interpretations, and forecasts.
Higher levels of curation

The curation typically
typically applied, including
focuses more to ensure the
manual annotation by the
quality and integrity of data
experts. The data in secondary
Curation Process (less interpretation or
databases usually has been
analysis), since it is directly
evaluated, categorized, and
submitted from
frequently linked to other
experimental results.
relevant information.
Aspect of Differences Primary Databases Secondary Databases
Function as archives for
Help in grasping the structural
original, uninterpreted data.
and functional implications of
These databases hold the
the original data. These
Objective and Function experimental data that
databases are used by
researchers deposit for
researchers to learn more about
public access and record-
biological phenomena.
keeping.
4. The utility of sequences in the primary databases is usually permitted for :

a. Academic Research and Educational Activities
b. Citation or Acknowledgements of the database
c. Non-commercial use (for commercial use might require additional permissions)
When utilizing the sequences, we need to adhere to the terms and conditions set by the
database, and follow the ethical guidelines.
1. Obtain the PDB structure file from the PDB with the PDB ID of 1A6M. By following the
instructions, below are the answers to the questions:
a. The required information for the given structure obtained from PDB:
• Name of protein : Oxy-Myoglobin

• Protein function : Oxygen Transport
• Method of 3D capture : X-Ray Diffraction
• Resolution : 1.00 Å
• Number of chains : 1 unique chain
• Protein sequence length : 151
• Name of ligands :
- HEM (Protoporphyrin IX Containing Fe - C34 H32 Fe N4 O4)
- SO4 (Sulfate Ion – O4 S)
- OXY (Oxygen Molecule – O2)
• Stucture released date : 06 April 1994 (06/04/1994)

b. The commands execution via RasMol:
1A6M Structure (Before Executing Command Line)
After executing the first command in red - colour [170,170,240]

After executing the second command in red - colour [200,0,40]
After executing the third command in red – cpk 250
After executing the fourth command in red – cpk 200

After executing the last command in red – wireframe 100
Description of the objective on conducting the provided (bolded) commands:

• Backbone off : Executing this command aim to turn off the backbone
representation of the molecule, which typically illustrates the main chain of
protein, excluding side chains.
• Cartoon on : This command activates the cartoon representation, which is a simple
version and more visually intuitive to show protein structures. Highlighting the
secondary structures (for instance alpha helices, beta sheets).
• Select hem : This command is for selection of the heme group. “hem” is typically
a shorthand for the heme group in proteins like hemoglobin or myoglobin.
• Cartoon [200,0,40] : Executing this sets the color of selected part (in our case, the
heme group) to specific color → Red, Green, Blue (RGB) value of [200,0,40] falls
on the shade of red.

SQH7001 Bioinformatics Task - Velda Rifka Almira

Uploaded by

Copyright:

Available Formats

SQH7001 Bioinformatics Task - Velda Rifka Almira

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

SQH7001 Bioinformatics Task - Velda Rifka Almira

Uploaded by

Copyright:

Available Formats

Name : Velda Rifka Almira

Lab Exercise – Part 1

Lab Exercise – Part 2

2. Obtained the following required information from a secondary database regarding

Type of Data Hemoglobin Protein Sequence

Total number of results

1. Manually annotated & reviewed (Swiss-Prot) : 1191

1. Manually annotated & reviewed (Swiss-Prot) : 16

Lab Exercise – Part 3

1. State the difference between :

2. Each deposited sequence in a database is given an accession number or an ID to identify the

Contain raw data from Contain processed or derived

Higher levels of curation

4. The utility of sequences in the primary databases is usually permitted for :

Lab Exercise – Part 4

• Name of protein : Oxy-Myoglobin

• Stucture released date : 06 April 1994 (06/04/1994)

After executing the first command in red - colour [170,170,240]

After executing the third command in red – cpk 250

After executing the fourth command in red – cpk 200

Description of the objective on conducting the provided (bolded) commands:

You might also like