100% found this document useful (1 vote)
324 views15 pages

BS1005 Computer Lab 1 - U1940636A

The document discusses using the Protein Data Bank and PyMOL software to learn about biomolecules. Hands-on tasks involving constructing a 3D model of a water molecule from a PDB file and identifying the amino acid sequence of an unknown protein are described. Key aspects of PDB files such as atom coordinates and PyMOL's visualization capabilities are also explained.

Uploaded by

Doug A. Hole
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
100% found this document useful (1 vote)
324 views15 pages

BS1005 Computer Lab 1 - U1940636A

The document discusses using the Protein Data Bank and PyMOL software to learn about biomolecules. Hands-on tasks involving constructing a 3D model of a water molecule from a PDB file and identifying the amino acid sequence of an unknown protein are described. Key aspects of PDB files such as atom coordinates and PyMOL's visualization capabilities are also explained.

Uploaded by

Doug A. Hole
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 15

BS1005 – BIOCHEMISTRY I

COMPUTER PRACTICAL 1
Using the Protein Data Bank (PDB) and PyMOL

Name: Ryan Kairon


Matriculation Number: U1940636A
Date: 27/03/20
Abstract

For this computer laboratory, the focus was to learn about the Protein Data Bank (PDB) and PyMOL as
resources for biology as well as biochemistry. PDB was introduced as a resource to garner and distribute
information about proteins and other biomolecules via [.pdb] files, and PyMOL was introduced as a
molecular visualization software. Hands-on sessions with modifying [.pdb] files provided fundamental
knowledge of how PyMOL interpreted data from a text format into 3-D rendering. These sessions were
facilitated by two tasks – the first task was constructing a 3-D render of a water molecule from a [.pdb] file,
and the second task was identifying the amino sequence of an unknown protein. In summary, these tasks
were solved with molecular geometry and knowledge of amino acid structure respectively. PDB and
PyMOL were then concluded to be very useful tools in learning about biochemistry due to only requiring
fundamental knowledge about biochemistry.

Introduction

The Protein Data Bank, or PDB, is an international repository for the 3-D structure of biomolecules which
includes proteins and nucleic acids1. The PDB is a reliable and efficient resource, providing scientists with
updated information to help facilitate their research; one example is resolving the structure of the novel
coronavirus (COVID-19) protease that emerged during 20191.

Figure 11: Screenshot of the PDB website on 20/03/20. Note that resources about the current COVID-19 pandemic
are available.

The protein data bank stores and organizes information (i.e. atomic coordinates) about biological
macromolecules by assigning each molecule a unique ID. Structural data is garnered by using methods such
as X-ray crystallography, NMR spectroscopy and cryo-electron microscopy1. Certain methods, such as
crystallography, allow specific parameters such as vibrational frequencies to be compiled as well. This
information can be downloaded from the PDB website itself.
Figure 21: PDB website entry for the novel coronavirus COVID-19 protease. The unique ID PDB assigns to the
molecule is highlighted in green, and the download sequence to download the [.pdb] file is highlighted in orange.

The PDB utilizes a unique file format: [.pdb], encoding information about molecular structures in text. This
file format includes information such as atoms present, atomic coordinates (in x, y, z) and secondary
structure assignments. Specialized information such as ‘occupancy’ (a value assigned to measure the
‘presence’ of an atom – an occupancy value of 1 states that the atom is always present in specific
coordinates) and ‘temperature factor’ (a value assigned to how susceptible an atom is to move from
vibration and such – higher values indicate higher movement) code for atom movement for example1.
Information is segregated into different ‘records’, for example atomic information is stored on the ATOM
record, and atomic connections are stored under the CONECT record. An added advantage to [.pdb] files
is that the encoding text can be interpreted/edited by humans directly. As a result, many molecular
visualization programs are compatible with [.pdb] files and visualize them within their GUIs.

Figure 31: A description of the ATOM record on a [.pdb] file. Note that other parameters such as temperature factor
is not included in this figure.
However, it is to note that PDB distributes the asymmetric unit of each molecule if crystallography was the
primary source of molecular information1. What this means is that only the smallest asymmetric unit of the
molecule that can be used to create the full molecule (after applying symmetric operations like rotation,
translation etc.) is distributed. This can be visualized as seen below.

Figure 41: The green arrow represents the subunit that is distributed by the PDB, as it can simply be rotated or
translated to complete the whole unit.

A real-life example can be seen with the PDB file on COVID-19 protease:

Figure 51: (Left image) Full structure of COVID-19 protease with 2 subunits. (Right image) Asymmetric unit
structure of COVID-19 protease obtained from PDB and visualized on PyMOL.

PyMOL is an open-source (meaning freely distributed and open to user customization) cross-platform
molecular visualization software2 which was used in this lab. PyMOL is useful because:
1. It is compatible with [.pdb] files
2. It can produce high-quality 3-D renders of biomolecules
3. PyMOL also has an interactive GUI that is simple for beginners and does not necessarily require
coding to use the software.
4. Cross-platform means that it is accessible to multiple operating systems
Figure 6: COVID-19 protease with b-factor putty present applied on PyMOL.

PyMOL also integrates display presents that are specific to each file type the program is compatible with.
For example, the b-factor putty present on PyMOL uses the ‘temperature factor’ values explained earlier
from the [.pdb] file2 and colors the visualized molecule as so. As seen in Figure 6, COVID-19 protease with
the b-factor putty present is colored in different regions; red and warm colors indicate higher temperature
potential, blue and cooler colors indicate lower temperature potential. An explanation for these colors would
be that the outer regions of the macromolecule (colored in red) are more susceptible to vibrations or other
factors, which would cause movement of said region and increase temperature. Inner regions of the
macromolecule (colored in blue) are less susceptible to environmental factors, which translates into lower
temperature potential.

PyMOL also allows measurements and calculations of the biomolecule to be performed, such as the surface
area of molecules – for example, the surface area of the asymmetric unit of COVID-19 protease is
34275.961 angstrom.

Materials & Methods


Materials Used:
1. Notepad++ (or any text editor) software
2. PyMOL software
3. [.pdb] file of unknown protein

Method: Encoding the molecular structure of H2O (Water)

Figure 2: Labelling of essential parameters that code for the structural properties of the water molecule
1. The [.pdb] file was created from scratch using a text editor; more specifically, only an ATOM
record1 was created.

2. A layout of an ATOM record was created with reference1 to ensure proper column spacing; the
atom label, residue, and element were modified to represent atoms of water:

OW, HW, WAT – Indicates that the oxygen and hydrogen are part of a water molecule. Naming
these labels is subjective and is used as a personal reference. Highlighted in yellow in Fig 2.
O, H – The element the atom represents. The correct periodic symbol must be used for input, in
this case, oxygen and hydrogen. Highlighted in purple on Fig 2.

3. Next, the atomic coordinates were calculated. Well-known dimensions of water are the 0.957 Å
length between oxygen and hydrogen, and the 109.5o angle4 (undistorted bond angle) between the
hydrogen atoms.

Figure 3: Visualization of atomic coordinates on a cartesian plane. The red circle denotes the oxygen, while the
black dots represent hydrogen atoms.

As water is a planar molecule, the z coordinate is not considered and is left at 0. For the x and y
coordinates, as seen in Figure 3 the coordinates of the hydrogen atoms can be calculated if oxygen
was taken as the initial (0,0) point on a cartesian plane. Thus, trigonometry may be used to find the
coordinates of the hydrogen atom. After calculation, the coordinates were inserted into their
respective columns as seen in Figure 2.

4. Next, the .pdb file was opened on the PyMOL software. The general structure of water was shown
with the red section representing oxygen and the white sections representing hydrogen.

5. The wizard tool was used to insert measurement labels into the structure, and the length and the
angle were rounded to give 1.0 and 109.0.
Method: Finding the amino sequence of an unknown protein

Figure 4: Section of the solved amino acid code, with the essential parameters highlighted in colored boxes.

1. The [.pdb] file of the unknown peptide was opened using a text editor. The file code was like Figure
4; however, the red box’s column was filled with the argument ‘UNK’ instead of the residue name.

2. To identify the unknown amino acids, the N and C terminus of each amino acid was located. This
was done by analyzing the column of labeled elements to find the peptide linkages, which can be
seen from the yellow and green boxes in Figure 4, labeling the peptide linkage between a carboxylic
and an amine group respectively.

3. After identifying the terminuses, the amino acid itself could be identified by correlating the atoms
within the terminuses and atoms within an R group. This was further assisted as the labeled atoms
provided insight on the R group structure, as seen highlighted in blue from Figure 4.

4. Thus, the amino sequence identified was:

[N] Serine-Glycine-Phenylalanine-Arginine-Lysine-Methionine-Alanine-Phenylalanine-
Proline-Serine-Glycine-Lysine [C]
Results
Code used for Task 1:
ATOM 1 OW WAT A 1 0.000 0.000 0.000 1.00 0.000 O
ATOM 2 HW WAT A 1 0.957 0.000 0.000 1.00 0.000 H
ATOM 3 HW WAT A 1 -0.311 0.904 0.000 1.00 0.000 H

Screenshot of GUI of water molecule with dimensions labelled:

Code used for Task 2:


ATOM 1 N SER A 1 28.680 28.210 30.950 1.00 0.00 N
ATOM 2 H1 SER A 1 28.080 28.980 30.690 1.00 0.00 H
ATOM 3 H2 SER A 1 28.130 27.440 31.310 1.00 0.00 H
ATOM 4 H3 SER A 1 29.270 27.970 30.170 1.00 0.00 H
ATOM 5 CA SER A 1 29.570 28.760 31.930 1.00 0.00 C
ATOM 6 HA SER A 1 29.060 29.090 32.830 1.00 0.00 H
ATOM 7 CB SER A 1 30.710 27.790 32.320 1.00 0.00 C
ATOM 8 HB2 SER A 1 30.080 26.900 32.360 1.00 0.00 H
ATOM 9 HB3 SER A 1 31.400 27.790 31.470 1.00 0.00 H
ATOM 10 OG SER A 1 31.360 28.180 33.490 1.00 0.00 O
ATOM 11 HG SER A 1 30.730 27.990 34.180 1.00 0.00 H
ATOM 12 C SER A 1 30.230 29.990 31.440 1.00 0.00 C
ATOM 13 O SER A 1 30.750 29.990 30.300 1.00 0.00 O
ATOM 14 N GLY A 2 30.330 31.050 32.250 1.00 0.00 N
ATOM 15 H GLY A 2 29.780 31.070 33.100 1.00 0.00 H
ATOM 16 CA GLY A 2 31.060 32.240 31.910 1.00 0.00 C
ATOM 17 HA2 GLY A 2 32.090 31.970 31.700 1.00 0.00 H
ATOM 18 HA3 GLY A 2 30.590 32.670 31.020 1.00 0.00 H
ATOM 19 C GLY A 2 30.980 33.240 33.040 1.00 0.00 C
ATOM 20 O GLY A 2 29.960 33.420 33.590 1.00 0.00 O
ATOM 21 N PHE A 3 32.160 33.810 33.460 1.00 0.00 N
ATOM 22 H PHE A 3 33.040 33.410 33.170 1.00 0.00 H
ATOM 23 CA PHE A 3 32.250 34.710 34.630 1.00 0.00 C
ATOM 24 HA PHE A 3 31.420 35.410 34.570 1.00 0.00 H
ATOM 25 CB PHE A 3 32.320 33.850 35.860 1.00 0.00 C
ATOM 26 HB2 PHE A 3 31.340 33.400 36.000 1.00 0.00 H
ATOM 27 HB3 PHE A 3 33.120 33.100 35.860 1.00 0.00 H
ATOM 28 CG PHE A 3 32.540 34.710 37.100 1.00 0.00 C
ATOM 29 CD1 PHE A 3 31.460 35.470 37.670 1.00 0.00 C
ATOM 30 HD1 PHE A 3 30.490 35.500 37.210 1.00 0.00 H
ATOM 31 CE1 PHE A 3 31.650 36.180 38.850 1.00 0.00 C
ATOM 32 HE1 PHE A 3 30.800 36.630 39.350 1.00 0.00 H
ATOM 33 CZ PHE A 3 32.910 36.230 39.480 1.00 0.00 C
ATOM 34 HZ PHE A 3 32.980 36.730 40.440 1.00 0.00 H
ATOM 35 CE2 PHE A 3 33.900 35.400 39.070 1.00 0.00 C
ATOM 36 HE2 PHE A 3 34.790 35.260 39.670 1.00 0.00 H
ATOM 37 CD2 PHE A 3 33.700 34.620 37.920 1.00 0.00 C
ATOM 38 HD2 PHE A 3 34.530 33.960 37.710 1.00 0.00 H
ATOM 39 C PHE A 3 33.570 35.530 34.610 1.00 0.00 C
ATOM 40 O PHE A 3 34.590 35.060 34.160 1.00 0.00 O
ATOM 41 N ARG A 4 33.620 36.750 35.160 1.00 0.00 N
ATOM 42 H ARG A 4 32.820 37.070 35.690 1.00 0.00 H
ATOM 43 CA ARG A 4 34.900 37.440 35.470 1.00 0.00 C
ATOM 44 HA ARG A 4 35.700 36.700 35.570 1.00 0.00 H
ATOM 45 CB ARG A 4 35.250 38.430 34.280 1.00 0.00 C
ATOM 46 HB2 ARG A 4 35.020 38.010 33.300 1.00 0.00 H
ATOM 47 HB3 ARG A 4 34.610 39.320 34.320 1.00 0.00 H
ATOM 48 CG ARG A 4 36.680 38.980 34.110 1.00 0.00 C
ATOM 49 HG2 ARG A 4 36.710 39.700 33.300 1.00 0.00 H
ATOM 50 HG3 ARG A 4 37.060 39.480 35.000 1.00 0.00 H
ATOM 51 CD ARG A 4 37.730 37.900 33.820 1.00 0.00 C
ATOM 52 HD2 ARG A 4 38.600 38.550 33.780 1.00 0.00 H
ATOM 53 HD3 ARG A 4 37.860 37.120 34.560 1.00 0.00 H
ATOM 54 NE ARG A 4 37.580 37.360 32.440 1.00 0.00 N
ATOM 55 HE ARG A 4 37.220 37.970 31.730 1.00 0.00 H
ATOM 56 CZ ARG A 4 37.520 36.080 32.070 1.00 0.00 C
ATOM 57 NH1 ARG A 4 38.110 35.240 32.840 1.00 0.00 N
ATOM 58 1HH1 ARG A 4 38.560 35.560 33.680 1.00 0.00 H
ATOM 59 2HH1 ARG A 4 37.920 34.260 32.640 1.00 0.00 H
ATOM 60 NH2 ARG A 4 37.070 35.730 30.910 1.00 0.00 N
ATOM 61 1HH2 ARG A 4 37.020 36.430 30.180 1.00 0.00 H
ATOM 62 2HH2 ARG A 4 37.220 34.800 30.530 1.00 0.00 H
ATOM 63 C ARG A 4 34.830 38.240 36.730 1.00 0.00 C
ATOM 64 O ARG A 4 33.790 38.880 36.950 1.00 0.00 O
ATOM 65 N LYS A 5 35.970 38.360 37.470 1.00 0.00 N
ATOM 66 H LYS A 5 36.800 37.800 37.360 1.00 0.00 H
ATOM 67 CA LYS A 5 36.030 39.070 38.730 1.00 0.00 C
ATOM 68 HA LYS A 5 35.190 38.830 39.380 1.00 0.00 H
ATOM 69 CB LYS A 5 37.360 38.670 39.430 1.00 0.00 C
ATOM 70 HB2 LYS A 5 37.310 39.210 40.380 1.00 0.00 H
ATOM 71 HB3 LYS A 5 37.340 37.620 39.710 1.00 0.00 H
ATOM 72 CG LYS A 5 38.710 38.910 38.740 1.00 0.00 C
ATOM 73 HG2 LYS A 5 38.760 38.390 37.790 1.00 0.00 H
ATOM 74 HG3 LYS A 5 38.690 40.000 38.600 1.00 0.00 H
ATOM 75 CD LYS A 5 40.000 38.430 39.460 1.00 0.00 C
ATOM 76 HD2 LYS A 5 39.910 38.860 40.460 1.00 0.00 H
ATOM 77 HD3 LYS A 5 40.040 37.350 39.510 1.00 0.00 H
ATOM 78 CE LYS A 5 41.300 38.910 38.970 1.00 0.00 C
ATOM 79 HE2 LYS A 5 41.370 40.000 38.860 1.00 0.00 H
ATOM 80 HE3 LYS A 5 42.020 38.690 39.760 1.00 0.00 H
ATOM 81 NZ LYS A 5 41.730 38.250 37.690 1.00 0.00 N
ATOM 82 HZ1 LYS A 5 41.660 37.240 37.650 1.00 0.00 H
ATOM 83 HZ2 LYS A 5 41.210 38.480 36.850 1.00 0.00 H
ATOM 84 HZ3 LYS A 5 42.700 38.370 37.450 1.00 0.00 H
ATOM 85 C LYS A 5 35.910 40.540 38.520 1.00 0.00 C
ATOM 86 O LYS A 5 36.400 41.170 37.560 1.00 0.00 O
ATOM 87 N MET A 6 35.210 41.230 39.460 1.00 0.00 N
ATOM 88 H MET A 6 34.800 40.720 40.230 1.00 0.00 H
ATOM 89 CA MET A 6 34.900 42.630 39.420 1.00 0.00 C
ATOM 90 HA MET A 6 35.170 42.980 38.420 1.00 0.00 H
ATOM 91 CB MET A 6 33.410 42.890 39.710 1.00 0.00 C
ATOM 92 HB2 MET A 6 32.800 42.340 38.990 1.00 0.00 H
ATOM 93 HB3 MET A 6 33.210 42.610 40.740 1.00 0.00 H
ATOM 94 CG MET A 6 33.010 44.340 39.500 1.00 0.00 C
ATOM 95 HG2 MET A 6 33.340 44.950 40.340 1.00 0.00 H
ATOM 96 HG3 MET A 6 33.560 44.710 38.630 1.00 0.00 H
ATOM 97 SD MET A 6 31.260 44.740 39.300 1.00 0.00 S
ATOM 98 CE MET A 6 30.500 43.990 40.770 1.00 0.00 C
ATOM 99 HE1 MET A 6 30.970 43.010 40.920 1.00 0.00 H
ATOM 100 HE2 MET A 6 30.670 44.650 41.610 1.00 0.00 H
ATOM 101 HE3 MET A 6 29.460 43.760 40.560 1.00 0.00 H
ATOM 102 C MET A 6 35.840 43.410 40.290 1.00 0.00 C
ATOM 103 O MET A 6 36.060 43.080 41.430 1.00 0.00 O
ATOM 104 N ALA A 7 36.270 44.580 39.770 1.00 0.00 N
ATOM 105 H ALA A 7 35.740 44.850 38.950 1.00 0.00 H
ATOM 106 CA ALA A 7 36.960 45.590 40.570 1.00 0.00 C
ATOM 107 HA ALA A 7 37.580 45.130 41.340 1.00 0.00 H
ATOM 108 CB ALA A 7 37.950 46.200 39.510 1.00 0.00 C
ATOM 109 HB1 ALA A 7 38.670 46.810 40.050 1.00 0.00 H
ATOM 110 HB2 ALA A 7 38.430 45.380 38.970 1.00 0.00 H
ATOM 111 HB3 ALA A 7 37.320 46.840 38.890 1.00 0.00 H
ATOM 112 C ALA A 7 36.000 46.620 41.240 1.00 0.00 C
ATOM 113 O ALA A 7 34.860 46.750 40.830 1.00 0.00 O
ATOM 114 N PHE A 8 36.560 47.510 42.080 1.00 0.00 N
ATOM 115 H PHE A 8 37.530 47.370 42.340 1.00 0.00 H
ATOM 116 CA PHE A 8 35.990 48.770 42.480 1.00 0.00 C
ATOM 117 HA PHE A 8 34.920 48.670 42.660 1.00 0.00 H
ATOM 118 CB PHE A 8 36.780 49.220 43.730 1.00 0.00 C
ATOM 119 HB2 PHE A 8 37.850 49.140 43.540 1.00 0.00 H
ATOM 120 HB3 PHE A 8 36.540 50.270 43.900 1.00 0.00 H
ATOM 121 CG PHE A 8 36.370 48.510 44.980 1.00 0.00 C
ATOM 122 CD1 PHE A 8 35.130 48.730 45.550 1.00 0.00 C
ATOM 123 HD1 PHE A 8 34.460 49.460 45.110 1.00 0.00 H
ATOM 124 CE1 PHE A 8 34.790 48.140 46.740 1.00 0.00 C
ATOM 125 HE1 PHE A 8 33.780 48.290 47.100 1.00 0.00 H
ATOM 126 CZ PHE A 8 35.750 47.370 47.410 1.00 0.00 C
ATOM 127 HZ PHE A 8 35.550 46.910 48.370 1.00 0.00 H
ATOM 128 CE2 PHE A 8 36.970 47.080 46.870 1.00 0.00 C
ATOM 129 HE2 PHE A 8 37.650 46.560 47.540 1.00 0.00 H
ATOM 130 CD2 PHE A 8 37.300 47.650 45.620 1.00 0.00 C
ATOM 131 HD2 PHE A 8 38.210 47.370 45.110 1.00 0.00 H
ATOM 132 C PHE A 8 36.040 49.770 41.330 1.00 0.00 C
ATOM 133 O PHE A 8 36.810 49.590 40.380 1.00 0.00 O
ATOM 134 N PRO A 9 35.260 50.840 41.410 1.00 0.00 N
ATOM 135 CD PRO A 9 34.210 51.130 42.410 1.00 0.00 C
ATOM 136 HD2 PRO A 9 34.670 51.060 43.390 1.00 0.00 H
ATOM 137 HD3 PRO A 9 33.440 50.370 42.340 1.00 0.00 H
ATOM 138 CG PRO A 9 33.620 52.540 42.100 1.00 0.00 C
ATOM 139 HG2 PRO A 9 34.210 53.420 42.350 1.00 0.00 H
ATOM 140 HG3 PRO A 9 32.560 52.610 42.340 1.00 0.00 H
ATOM 141 CB PRO A 9 33.760 52.460 40.590 1.00 0.00 C
ATOM 142 HB2 PRO A 9 33.660 53.430 40.100 1.00 0.00 H
ATOM 143 HB3 PRO A 9 32.990 51.770 40.230 1.00 0.00 H
ATOM 144 CA PRO A 9 35.070 51.820 40.320 1.00 0.00 C
ATOM 145 HA PRO A 9 34.950 51.270 39.380 1.00 0.00 H
ATOM 146 C PRO A 9 36.190 52.820 40.140 1.00 0.00 C
ATOM 147 O PRO A 9 36.860 53.270 41.080 1.00 0.00 O
ATOM 148 N SER A 10 36.400 53.490 39.000 1.00 0.00 N
ATOM 149 H SER A 10 35.620 53.560 38.360 1.00 0.00 H
ATOM 150 CA SER A 10 37.660 54.170 38.590 1.00 0.00 C
ATOM 151 HA SER A 10 38.470 53.870 39.240 1.00 0.00 H
ATOM 152 CB SER A 10 38.060 53.750 37.210 1.00 0.00 C
ATOM 153 HB2 SER A 10 38.970 54.300 36.960 1.00 0.00 H
ATOM 154 HB3 SER A 10 38.360 52.700 37.280 1.00 0.00 H
ATOM 155 OG SER A 10 37.160 53.940 36.220 1.00 0.00 O
ATOM 156 HG SER A 10 37.210 54.830 35.860 1.00 0.00 H
ATOM 157 C SER A 10 37.510 55.710 38.560 1.00 0.00 C
ATOM 158 O SER A 10 38.430 56.430 38.920 1.00 0.00 O
ATOM 159 N GLY A 11 36.320 56.230 38.150 1.00 0.00 N
ATOM 160 H GLY A 11 35.570 55.580 37.980 1.00 0.00 H
ATOM 161 CA GLY A 11 35.890 57.570 38.000 1.00 0.00 C
ATOM 162 HA2 GLY A 11 36.250 57.910 37.030 1.00 0.00 H
ATOM 163 HA3 GLY A 11 34.800 57.620 38.000 1.00 0.00 H
ATOM 164 C GLY A 11 36.490 58.590 39.030 1.00 0.00 C
ATOM 165 O GLY A 11 37.030 59.600 38.610 1.00 0.00 O
ATOM 166 N LYS A 12 36.280 58.280 40.330 1.00 0.00 N
ATOM 167 H LYS A 12 35.820 57.420 40.590 1.00 0.00 H
ATOM 168 CA LYS A 12 36.890 59.160 41.420 1.00 0.00 C
ATOM 169 HA LYS A 12 36.470 60.160 41.440 1.00 0.00 H
ATOM 170 CB LYS A 12 36.590 58.560 42.810 1.00 0.00 C
ATOM 171 HB2 LYS A 12 36.810 57.500 42.740 1.00 0.00 H
ATOM 172 HB3 LYS A 12 37.290 58.970 43.530 1.00 0.00 H
ATOM 173 CG LYS A 12 35.150 58.770 43.190 1.00 0.00 C
ATOM 174 HG2 LYS A 12 34.820 59.810 43.190 1.00 0.00 H
ATOM 175 HG3 LYS A 12 34.430 58.290 42.540 1.00 0.00 H
ATOM 176 CD LYS A 12 34.930 58.180 44.580 1.00 0.00 C
ATOM 177 HD2 LYS A 12 35.060 57.110 44.510 1.00 0.00 H
ATOM 178 HD3 LYS A 12 35.560 58.580 45.380 1.00 0.00 H
ATOM 179 CE LYS A 12 33.480 58.520 44.930 1.00 0.00 C
ATOM 180 HE2 LYS A 12 33.460 59.540 45.300 1.00 0.00 H
ATOM 181 HE3 LYS A 12 32.830 58.430 44.060 1.00 0.00 H
ATOM 182 NZ LYS A 12 32.980 57.740 46.070 1.00 0.00 N
ATOM 183 HZ1 LYS A 12 32.960 56.760 45.860 1.00 0.00 H
ATOM 184 HZ2 LYS A 12 33.460 57.810 46.950 1.00 0.00 H
ATOM 185 HZ3 LYS A 12 32.010 58.000 46.240 1.00 0.00 H
ATOM 186 C LYS A 12 38.410 59.340 41.400 1.00 0.00 C
ATOM 187 O LYS A 12 38.980 60.220 42.000 1.00 0.00 O
(End of Code)

Screenshot of PyMOL render of unknown peptide:


Discussion

Task 1:
The first task required the use of trigonometry to solve for the coordinates of hydrogen since water
molecules are planar, their coordinates can easily be mapped within a cartesian plane. From the GUI
measurements, the length of the bond was rounded to 1 angstrom and the bond angle of 109o rounded from
109.5o. These values are correct, especially the bond angle - 109.5o was used as 104.5o is the angle where
electron repulsion is present, and the former angle is the undistorted angle in tetrahedral electron geometry4
(which water molecules have).

Task 2:
For the second task, general knowledge of amino acid structure is required to solve the unknown protein’s
amino acid sequence. The [.pdb] file format was opened in text to identify residues. Since amino acids have
different functional/R groups, using the labeled atoms (alpha carbon, beta carbon etc.) allows for the general
structure of the group to be recognized using a reference3. Knowing that a peptide bond is a condensation
reaction that releases water allows for pinpointing of amino acid start and termination ends within the
peptide – the carboxylic and amine ends would be missing hydrogens and oxygen. However, this method
was simple to execute since the atoms of the residues were in sequence; if they were unordered, the residues
would have to be identified on PyMOL.

However, this is still possible. For example, peptide linkages can be confirmed by identifying carboxylate
salt and amino group linkages with colored atoms:

Figure 5: Shows how linkages can be identified on PyMOL. Carbons are green, oxygen is red, nitrogen is blue,
hydrogen is white – counting the number of each atom and identifying the molecular structure (e.g. the carboxylate
group is readily identifiable by the adjacent oxygen atoms) can be used to recognize linkages (labeled by the yellow
oval).

After identifying the ends of each residue, the molecular structure of the unknown residues may be inferred
to identify the properties of a residue. For example, polarity may be identified by the geometry of the residue
and what atoms are present in each position:
Figure 6: The polarity of a residue can be identified by the structure of the R group and the atoms present; in this
example, a clear polarity can be identified with oxygen at one end of the residue.

Another example is identifying hydrophobic/hydrophilic qualities. This can be done by recognizing methyl
chains (e.g. in proline, alanine, glycine):

Figure 7: Glycine can be identified by its hydrophobic structure; its side chain only containing hydrogen.

Residues with aromatic groups can be easily identified with ring structures present in the R group:

Figure 8: Phenylalanine is easily identifiable by its aromatic side chain; to differentiate against other aromatic
residues (e.g. tyrosine) the atom content needs to be analyzed.
Residues with elements other than C, H, O, N can also be easily identified:

Figure 9: Methionine is easily identifiable by the sulfur atom (SD) within its R group.

Lastly, the charge can be identified from atomic bonds of certain elements:

Figure 10: Arginine and Lysine are positively charged amino acids due to amine present in their chains, which can
be identified on PyMOL (circled in yellow).

By inferring on properties such as these, identifying the residues are much easier as they narrow down what
amino acid they could be.

Conclusion

From this laboratory, basic knowledge of [.pdb] files and PyMOL were acquired, such as the parameters of
each column on a [.pdb] file and the GUI tools on PyMOL. These skills along with general knowledge of
biochemistry were used to create and visualize the molecular structure of water and identify the amino acid
sequence of an unknown protein. Editing properties of a [.pdb] file required visualizing molecular
coordinates, like on a cartesian plane if working with a planar molecule like water. Using PyMOL allowed
for the practical application of amino acid biochemical properties, i.e. knowledge about amino acid structure
to identify peptide linkages and R group properties to identify residues themselves. A helpful aspect of
PyMOL is being able to identify these factors just by visualizing the structure and allows for greater
appreciation of how properties of these residues originated.

Thus, PDB and PyMOL are invaluable tools for scientists, as these tools can be navigated with only a
general knowledge of biochemistry.
References
1
Research Collaboratory For Structural Bioinformatics. Protein Data Bank Homepage.
https://www.rcsb.org/ (accessed Mar 20, 2020)
2
PyMOL by Schrödinger. https://pymol.org/2/ (accessed Mar 20, 2020)
3
School of Biomedical Sciences. Amino Acids. https://teaching.ncl.ac.uk/bms/wiki/ (accessed Mar 20,
2020
4
Chemistry LibreTexts. Geometry of Molecules. https://chem.libretexts.org/ (accessed Mar 20, 2020)

You might also like