Homework 4
Homework 4
This homework assignment contains 3 programming questions. Each question may contain
multiple parts.
Download Hw4.zip from Blackboard and unzip the contents to a convenient location on your
computer. The Python files (.py extension) contain starter codes for the programming
questions. Remaining files are sample text/data files for the File I/O questions.
Solve each question in its own Python file. Do not change the names of the files. Do not
change the headers of the given functions (function names, function parameters).
When you are finished, see the end of this document for submission instructions.
● Your submitted code should run as is. It should not yield syntax errors.
● We will not edit or comment/uncomment parts of your code in order to fix syntax
errors.
These matrices can get quite large. To save on space, only their non-zero elements are
stored. There are multiple ways of storing these matrices. In this homework, you are going to
use dictionaries to store these matrices. Let ANxM be a sparse matrix with N r ows and M
columns. Let Aij be the element of A in the ith
row and the jth
column (i=0...N-1, j=0...M-1). then
this sparse matrix should be represented as follows:
● Let sp_A be a python dictionary that will store A
● The keys are tuples of the row (i) and column (j) indices of non-zero elements.
(sp_A[(i,j)]=Aij, A
ij ≠ 0)
● A special key (-1) stores the matrix dimensions (sp_A[-1]=[N,M])
The starter code for this question is given to you in sparse_matrices.py. This question has
three parts:
Part 1: You will be given the matrices in a list of lists format. You need to convert them to the
aforementioned sparse representation. Fill in the dense_to_sparse function to complete
this part.
Part 2: You are going to implement the matrix transpose operation (switching the row and
column indices of non-zero elements, Aij←Aji) for the dictionary based sparse matrix
representation described in this question. Fill in the sparse_transpose function to
complete this part.
Part 3: The code for matrix multiplication for list of lists implementation is given to you. You
are going to implement matrix multiplication for the dictionary based sparse matrix
representation described in this question. Fill in the sparse_mat_mult function to
complete this part.
For all of the parts, follow the comments. Do not iterate over the entire rows and columns
for any of the parts! This will be inefficient and you will lose points. There is an example
function, sparse_mat_add, as an example for you.
There are also other functions that may help you debug or get inspirations from. Make sure
to go over the sparse_matrices.py file, both comments and code, for your own benefit.
Q2: Protein Center of Mass Calculation - 35 pts
The Protein Data Bank archive serves as a single repository of information about the 3D
structures of proteins, nucleic acids and complex assemblies. Each PDB file contains
various kinds of information. The type of information is indicated in the first six characters of
each line, such as HEADER, SOURCE, COMPND, AUTHOR, REMARKS, etc. For this
homework, you just need to concentrate on lines beginning with the word “ATOM”. An
example is given below, where the columns represent the atom record, atom number, atom
identifier, amino acid type, chain identifier, residue sequence number, x-coordinate (in Å),
y-coordinate (in Å), z coordinate (in Å), occupancy, β-factor and element symbol
respectively. The symbol Å denotes Angstrom (1Å= 10-10 m).
X Y Z ELEMENT
Implement the following functions:
A. pdb_parser(pdb_filename):
Receives one argument, pdb_filename. Opens the PDB file, reads all lines starting
with ATOM and stores the X, Y, Z coordinates and the ELEMENT type of the atom in
a list, atoms.
e.g.
atoms= [[27.340, 24.430, 2.614,’N’],[26.266, 25.413, 2.842,’C’],...]
atoms list is of length N, number of atoms that make up the protein in the PDB file.
The function should return atoms.
B. center_of_mass(atoms):
This function calculates the center of mass of the protein, as follows.
rcm is coordinates of the center of mass of the protein, ri is a list containing the [x, y, z]
coordinates of the ith atom), m
i is the mass of the ith atom and N is the total number of
atoms. mi should be obtained from the dictionary mass={'C':12.01, 'O':16.00,
'H':1.008, ...} by using the element type of the ith
atom as key. mass dictionary is
already provided in the pdb.py template.
C. shift(atoms, vec):
This function translates the protein by vec a nd returns the updated coordinates of the
protein, atomsnew. v ec is a list of size 3.
E.g. vec=[a, b, c], atoms=[[x, y, z, ‘C’]] -> atomsnew=[[x+a, y+b, z+c, ‘C’]]
The starter code for this question is given in CourseScheduling.py. Implement the empty
functions in the starter code to solve the following parts.
PART A: Write a function called read_schedules() that reads course schedules from the 3
given files and stores them in a dictionary of dictionaries:
data = { “COMP110”: { “Name”: “Introduction to Programming with Matlab”,
“Instructor”: “Emre Kutukoglu”,
“Location”: “SCI103”,
“Days”: “TuTh”,
“Start Time”: “8:30”,
“End Time”: “9:45”},
“COMP125”: { “Name”: “Programming with Python”,
“Instructor”: “Ayca Tuzmen”,
“Location”: “ENGZ50”,
“Days”: “TuTh”,
“Start Time”: “13:00”,
“End Time”: “14:15”},
….
}
For the outer dictionary data, the key should be the course ID (e.g.: “COMP110”,
“COMP125”, “ELEC204”) and the value should be a dictionary. For the inner dictionary, the
keys should be “Name”, “Instructor”, “Location”, “Days”, “Start Time”, “End Time”; and the
values should be the corresponding information for that course as shown above.
Hint: You should first read instructors.txt when creating the outer dictionary data, and later
populate the location and time details using locations.txt and times.txt.
PART B: Some courses included in instructors.txt are missing location assignments or time
assignments, i.e., they are included in instructors.txt but not in locations.txt or times.txt. Write
a function find_unscheduled(data) that:
● Takes as input the main dictionary data constructed above
● Has two return values:
○ First return value is the list of courses that have no location
■ Example: [“COMP306”, “INDR460”]
○ Second return value is the list of courses that have no timeslot
■ Example: [“ELEC422”, “MECH435”, …]
PART C: Write a function clean_schedule(data, courses_to_remove) such that:
● data is the main dictionary constructed in Part A
● courses_to_remove is the list of courses that should be removed from data
○ Example: [“ELEC422”, “MECH435”, …]
● Your function should return the resulting dictionary after the courses are removed
from it
Caution: This function must return the resulting dictionary, do not just modify the dictionary
in-place.
When you are finished, compress your Hw4 folder containing all of your answers. The result
should be a SINGLE compressed file (file extension: .zip, .rar, .tar, .tar.gz, or .7z). Upload
this compressed file to Blackboard.
Follow instructions, input-output formats, return values closely. Your code may be graded by
an autograder, which means any inconsistency will be automatically penalized.
You are only going to be graded based on your Blackboard submission. We will not accept
homework via e-mail or other means.