6040 Syllabus
6040 Syllabus
Course description
This course is your hands-on introduction to programming techniques relevant to data analysis
and machine learning. Most of the programming exercises will be based on Python and SQL.
The basic philosophy of this course is that you'll learn the material best by actively doing.
Therefore, you should make an effort to complete all assignments, including any ungraded
("optional") parts, and go a bit beyond on your own (see “How much time and effort are
expected of you?” below).
● Notebooks: 50%
● Midterm 1: 10%
● Midterm 2: 15%
● Final exam: 25%
There is approximately one assignment (lab notebook) or exam due every week. The
assignments vary in difficulty but are weighted roughly equally. Some students find this pace
very demanding; the reason we set it up this way is that we believe learning to program is like
learning a foreign language, which demands constant and consistent practice.
What does "programming proficiency" mean? For context, this course aims to fill in gaps in your
programming background that might keep you from succeeding in other programming-intensive
courses of Georgia Tech’s MS Analytics program, most notably, CSE 6242. If you already have
a significant programming background, consider placing out. If you have no programming
background, you will need to ramp up very quickly. See below for more specific guidance on
what we expect on the two hardest gaps to fill, namely, programming proficiency and linear
algebra.
Please make sure you are aware of the due date and time for your local area. We will not grant
extensions based on your misunderstanding of how to translate dates and times.
Late policy. For your lab notebooks, you get an automatic 72-hour extension on every
assignment. (This extension does not apply to exams.) However, you will lose points every
day the assignment is late, and we will not accept any assignment after the 72-hour period.
The penalty is a deduction of 15% of the value of the assignment each day. For instance, if the
total points for the assignment is 25 points, then you will lose (0.15 * 25) = 3.75 points out of 25
for each day it is late, up to 3 days.
The reason we do not grant extensions beyond 72 hours is that we want to post sample
solutions so your classmates can benefit from seeing them; we do not want to delay everyone
else’s learning because a few people need significantly more time. Keep in mind that there are
many assignments, so any given assignment is only worth a couple percent of your final grade.
Exam procedures. For the exams, you will receive a window of about five (5) days in which to
attempt the exam, with a hard deadline to submit (absolutely no extensions). Once you start
an exam, you have up to 24 hours to submit all your work or the hard deadline, whichever
comes first. (That is, if you start the exam 12 hours before the hard deadline, you’ll only have 12
hours.)
The “3 credit hours” part translates into an average amount of time of about 10-12 hours per
week. However, the actual amount of time you will spend depends heavily on your background
and preparation. Past students who are very good at programming and math reporting spending
much less time per week (maybe as few as 4-5 hours), and students who are rusty or novices at
programming or math have reported spending more (maybe 15 or more hours).
The “graduate-level” part means you are mature and independent enough to try to understand
the material at more than a superficial level. That is, you don’t just watch some videos, go
through the assignments, and stop there; rather, you spend some extra time looking at the code
and examples in detail, trying to cook up your own examples, and coming up with self-tests to
check your understanding. Also, you will need to figure out, quickly, where your gaps are and
make time to get caught up.
As noted above, in past runs of this course we’ve found the two hardest parts for many students
are catching up on (a) basic programming proficiency and (b) linear algebra, which are both
prerequisites to this course. We’ll supply some refresher material but expect that you can catch
up. Here is some additional advice on these two areas.
Programming proficiency. Regarding programming proficiency, we expect that you have taken
at least one introductory programming course in any language, though Python will save you the
most time. You should be familiar with basic programming ideas at least at the level of the
Python Bootcamp that most on-campus MS Analytics students take just before they start. We
also strongly recommend having gone through a course like CS 1301 x, which is Georgia
Tech’s undergraduate introduction to Python class. Students who struggled with this course in
the past have reported success when taking CS 1301x and re-taking this class later. Beyond
that, code drill sites, like CodeSignal and codewars.com (the latter’s absurdly combative name
notwithstanding) can help improve your speed at general computational problem solving.
Please spend time looking at these or similar resources.
Part of developing and improving your programming proficiency is learning how to find answers.
We can’t give you every detail you might need; but, thankfully, you have access to the entire
internet! Getting good at formulating queries, searching for helpful code snippets, and adapting
those snippets into your solutions will be a lifelong skill and is common practice in the “real
world” of software development, so use this class to practice doing so. (During exams, you will
be allowed to search for stuff on the internet!) It’s also a good skill to have because whatever
we teach now might 5 years from now no longer be state-of-the-art, so knowing how to pick up
new things quickly will be a competitive advantage for you. Of course, the time to search may
make the assignments harder and more time-consuming, but you'll find that you get better and
faster at it as you go, which will save you the same learning curve when you're on the job.
Math proficiency. Regarding math, and more specifically, your linear algebra background, we do
provide some refresher material within this course. However, it is non-graded self-study
material. Therefore, you should be prepared to fill in any gaps you find when you encounter
unfamiliar ideas. We strongly recommend looking at the notes from the edX course, Linea
r Algebra: Foundations to Frontiers (LAFF). Its website includes a freely downloadable PDF with
many nice examples and exercises.
But what does “whiteboard level” mean? It’s hard to define precisely, but here is what we have
in mind.
The spirit of this policy is that we do not want is someone posting their solution attempt (possibly
with bugs) and then asking their peers, "Hey can someone help me figure out why this doesn't
work?" That's essentially asking others to debug your work for you. That’s a no-no.
What can I do instead? In such situations, try to reduce the problem to the simplest possible
example that also fails. Posting code, in that case, would be OK. (And the process of distilling
an example often reveals the bug!)
In other words, it's fine and encouraged to post and discuss code examples as a way of
learning. But you want to avoid doing so in a way that might reveal the solution to an
assignment that you are being asked to produce.
You must do all exams completely on your own, without any assistance from others.
Honor code. All course participants—you and we—are expected and required to abide by the
letter and the spirit of the edX Honor Code. In particular, always keep the following in mind:
● Ethical behavior is extremely important in all facets of life. Honest and ethical behavior is
expected at all times.
● You are responsible for completing your own work.
● Any learner found in violation of the edX Honor Code will be subject to any or all of the
actions listed in the edX Honor Code.
● William McKinney. Python for Data Analysis: Data wrangling with Pandas, NumPy, and
IPython, 2nd edition. O'Reilly Media, September 2017. ISBN-13: 978-1449319793. Buy
on Amazon
Here are some tips to improve the response time for your questions. First, make your post
public (rather than private to the instructors), so that anyone in the class can see and respond to
your post. Secondly, adhere to the “Collaboration Policy,” above. If you create a post that
violates this policy, the instructors may ignore your post or even delete it. Thirdly, post during
the week rather than the weekend; the instructors are also trying to maintain some semblance
of work-life balance, so you can expect slower responses over the weekend. Lastly, be sure to
tag your post with the relevant notebook assignment so we can better triage issues. (In Piazza,
a “tag” is also called a “folder,” though unlike desktop folders, you can place a post in more than
one folder.)
What if my question is private in nature? In that case, you can make your post private to the
instructors. (After pressing “new post” to create the post, look for the “Post to” field and select
“Individual student(s)/instructor(s)” and then type “Instructors” to make the post visible only to all
instructors––it’s important to include all instructors so that all of them will see and have a
chance to address your post, which will be faster than addressing only one person.)
Office hours (GT students only). We will have live “dial-in” office hours, to-be-scheduled.
Watch Piazza for an announcement and logistical details.
Accommodations for individuals with disabilities (GT students only). If you have learning
needs that require special accommodation, please contact the Office of Disability Services at
(404) 894-2563 or http://disabilityservices.gatech.edu/, as soon as possible, to make an
appointment to discuss your special needs and to obtain an accommodations letter.
Module 0: Fundamentals.
● Topic 0: Course and co-developer intros
● Topic 1: Python bootcamp review + intro to Jupyter
● Topic 2: Pairwise association mining
○ Default dictionaries, asymptotic running time
● Topic 3: Mathematical preliminaries
○ probability, calculus, linear algebra
● Topic 4: Representing numbers
○ floating-point arithmetic, numerical analysis