0% found this document useful (0 votes)
142 views6 pages

Harvard CS109B Syllabus Draft 20211216

Uploaded by

raju
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
142 views6 pages

Harvard CS109B Syllabus Draft 20211216

Uploaded by

raju
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 6

Syllabus

Draft Syllabus Subject to Change


Advanced Topics in Data Science (Spring 2022)
CS 109b, AC 209b, Stat 121b, or CSCI E-109b

Instructors
Pavlos Protopapas (SEAS) & Mark Glickman (Statistics)

Lectures: Mon & Wed 9:45-11am (Location TBA)


Labs: Fri 9:45-11am (Location TBA)
Advanced Sections: Wed 12:45-2pm (starting 2/2; Location TBA)
O�ce Hours: TBA

Prerequisites: CS 109a, AC 209a, Stat 121a, or CSCI E-109a or the equivalent.

• Course description
• Tentative Course Topics
• Course Objectives
• Course Components
◦ Lectures
◦ Labs
◦ Advanced Sections
◦ Midterm
◦ Projects
◦ Homework Assignments
• Course Resources
◦ Online Materials
◦ Recommended Textbooks & Articles
◦ Getting Help
• Course Policies and Expectations
◦ Grading
◦ Collaboration Policy
◦ Late or Incorrectly Submitted Assignments
◦ Re-grade Requests
◦ Auditing the Class
◦ Academic Integrity
◦ Accommodations for students with disabilities
◦ Diversity and Inclusion Statement

Course Description
Advanced Topics in Data Science (CS109b) is the second half of a one-year introduction to data science. Building upon the material in
Introduction to Data Science, the course introduces advanced methods for data wrangling, data visualization, statistical modeling, and
prediction. Topics include big data, multiple deep learning architectures such as CNNs, RNNs, language models, transformers,
autoencoders, and generative models as well as Bayesian modeling, sampling methods, and unsupervised learning.

The programming language will be Python.

Tentative Course Topics


• Unsupervised Learning, Clustering
• Bayesian Inference
• Hierarchical Bayesian Modeling
• Fully Connected Neural Networks
• Convolutional Neural Networks
• Autoencoders
• Recurrent Neural Networks
• NLP / Text Analysis
• Transformers
• Variational AutoEncoders & Generative Models
• Generative Adversarial Networks

Course Objectives
Upon successful completion of this course, you should feel comfortable with the material mentioned above, and you will have gained
experience working with others on real-world problems. The content knowledge, the project, and teamwork will prepare you for the
professional world or further studies.

Course Components
Lectures, labs, and advanced sections will be live-streamed for Extension School students and can be accessed through the Zoom
section on Canvas. Video recordings of the live stream will be made available to all students within 24 hours after the event, and will be
accessible from the Lecture Video section on Canvas.

Lectures
The class meets for lectures twice a week (M & W). Attending and participating in lectures is a crucial component of learning the
material presented in this course. Students may be asked to complete short readings before certain lectures. Some lectures will also
include real-time coding exercises which we will complete as a class.

Labs
Lab will be held every Friday. Labs will present guided, hands-on coding challenges to prepare students for successfully completing the
homework assignments.

Advanced Sections
The course will include advanced sections for 209b students and will cover a different topic per week.  These 75 min sessions will cover
advanced topics like the mathematical underpinnings of the methods seen in the main course lectures and lab as well as extensions of
those methods.  The material covered in the advanced sections is required for all AC209b students. Tentative topics are:

• Gaussian Mixture Models


• Laplace Approximation
• NN as Universal Approximator
• Solvers
• Segmentation Techniques, YOLO, Unet and M-RCNN
• Variational Autoencoders
• Word2Vec
• BERT
• GANS, Cycle GANS, etc.

Note: Advanced Section are not held every week. Consult the course calendar for exact dates.

Midterm
There will be a midterm exam on Friday, March 25th from 9:45-11am (regular lab time). The exam will likely consist of multiple choice
questions with a take-home coding component. More information to follow.

Projects
Beginning the last week of classes (4/25), students will join groups of 3-4 and be divided into break-out, thematic sections to study an
open problem in one of various domains. The domains are tentative at the moment but may include medicine, law, astronomy,
e-commerce, government, and areas in the humanities. Each section will include lectures by Harvard faculty who are experts in the �eld.
Project work will continue on through reading period and conclude with �nal submissions on 5/6. The �nal submission will consist of a
written report, a Jupyter notebook with all relevant code, and a 6-minute, pre-recorded presentation video.

Homework Assignments
There will be 7 graded homework assignments. Some of them will be due one week after being assigned and some will be due two
weeks after being assigned. For 5 assignments, you have the option to work and submit in pairs, the 2 remaining are to be completed
individually.

Standard assignments are graded out of 5 points.

AC209b students will have additional homework content for most assignments worth 1 point.

Course Resources
Online Materials
All course materials, including lecture notes, lab notes, and section notes will be published on Ed, the course GitHub repo, and the public
site's 'Materials' section.

Note: Lecture content for lectures 1-7 will only be accessible to registered students.

Assignments will only be posted on Canvas.

Working Environment
You will be working in Jupyter Notebooks which you can run in your own machine or in the SEAS JupyterHub.

Recommended Textbooks
• ISLR: An Introduction to Statistical Learning by James, Witten, Hastie, Tibshirani (Springer:  New York, 2013)
• BDA3: Bayesian Data Analysis by Andrew Gelman, John B. Carlin, Hal S. Stern, David B. Dunson, Aki Vehtari, Donald B. Rubin (CRC
Press:  New York, 2013)
• DL: Deep Learning by Goodfellow, Bengio and Courville. (The MIT Press: Cambridge, 2016)
• Glassner: Deep Learning, Vol. 1 & 2 by Andrew Glassner
• SLP Speech and Language Processing by Jurafsky and Martin (3rd Edition Draft)
• INLP Introduction to Natural Language Processing by Jacob Eisenstein (The MIT Press: Cambridge, 2019) Free electronic versions
are available (ISLR, DL, SLP, INLP) or hard copy through Amazon (ISLR, DL, Glassner, SLP, INLP).

Articles & Excerpts

• Unsupervised learning:

◦ Basics: James, G., Witten, D., Hastie, T., & Tibshirani, R. (2021). An introduction to statistical learning (2nd ed.). New York:
Springer. Chapter 12 https://hastie.su.domains/ISLR2/ISLRv2_website.pdf
◦ Silhouette plots: https://scikit-learn.org/stable/auto_examples/cluster/plot_kmeans_silhouette_analysis.html
◦ Gap statistic: Tibshirani, R., Walther, G., & Hastie, T. (2001). Estimating the number of clusters in a data set via the gap
statistic. Journal of the Royal Statistical Society: Series B (Statistical Methodology), 63(2), 411-423.
https://hastie.su.domains/Papers/gap.pdf
◦ DBSCAN: Schubert, E., Sander, J., Ester, M., Kriegel, H. P., & Xu, X. (2017). DBSCAN revisited, revisited: why and how you
should (still) use DBSCAN. ACM Transactions on Database Systems (TODS), 42(3), 1-21. https://dl.acm.org/doi/pdf/10.1145
/3068335?casa_token=_P479lYnlpsAAAAA:PckzU6ZiTt3yMNzFrXyzESZ3N_pp904kN0N2QEwIoq6CxtfPCxnL9bNTGtjhuiNtzSfKyXoM-
QI
• Bayesian material

◦ Basics: Glickman, Mark E. and Van Dyk, David A. (2007) "Basic Bayesian Methods" In Topics in Biostatistics (Methods in
Molecular Biology). Edited by Walter Ambrosius. The Humana Press Inc., Totowa, NJ. ISBN 1-58829-531-1. pp 319-338.
Chapter accessible from http://www.glicko.net/research/glickman-vandyk.pdf

◦ Importance sampling, rejection sampling, MCMC, Metropolis, Gibbs sampler: Andrieu, C., De Freitas, N., Doucet, A., & Jordan,
M. I. (2003). An introduction to MCMC for machine learning. Machine learning, 50(1), 5-43. Article accessible from:
https://www.cs.ubc.ca/~arnaud/andrieu_defreitas_doucet_jordan_intromontecarlomachinelearning.pdf

◦ Bayesian examples - regression, hierarchical modeling: Gelman, A., Carlin, J.B., Stern, H.S., Dunson, D.B., Vehtari, A., & Rubin,
D.B. (2013). Bayesian Data Analysis (3rd ed.). Chapman and Hall/CRC. https://doi.org/10.1201/b16018
http://www.stat.columbia.edu/~gelman/book/BDA3.pdf Chapter 14: Introduction to regression models Chapter 15:
Hierarchical linear models Chapter 16: Generalized linear models (includes logistic regression)

Getting Help
For questions about homework, course content, package installation, the process is:

• try to troubleshoot yourself by reading the lecture, lab, and section notes, and looking up online resources.
• go to o�ce hours this is the best way to get help.
• post on the class Ed forum; we want you and your peers to engage in helping each other. TFs also monitor Ed and will respond
within 24 hours. Note that Ed questions are visible to everyone. If you are citing homework solution code you must post privately
so that only the staff sees your message.
• watch for o�cial announcements via Ed. These announcements will also be sent to the email address associated with your
Canvas account so make sure you have it set appropriately.
• send an email to the Helpline  [email protected] for administrative issues, regrade requests, and non-content speci�c
questions.
• for personal matters that you do not feel comfortable sharing with the TFs, you may send an email to either or both of the
instructors.

Course Policies and Expectations


Grading for CS109b, STAT121b, and CS209b:
Your �nal score for the course will be computed using the following weights:

Assignment Final Grade Weight

Paired Homework (5) 45%

Individual Homework (2) 23%

Midterm 12%

Project 20%

Total 100%

Note: Regular homework (for everyone) counts as 5 points. 209b extra homework counts as 1 point.

Collaboration Policy
We expect you to adhere to the Harvard Honor Code at all times. Failure to adhere to the honor code and our policies may result in
serious penalties, up to and including automatic failure in the course and reference to the ad board. If you work with a partner on an
assignment make sure both parties solve all the problems. Do not divide and conquer. You are expected to be intellectually honest and
give credit where credit is due. In particular:

• if you work with a fellow student but decide to submit different papers, include the name of each other in the designated area of
the submission paper.
• if you work with a fellow student and want to submit the same paper you need to form a group prior to the submission. Details in
the assignment. Not all assignments will permit group submissions.
• you need to write your solutions entirely on your own or with your collaborator
• you are welcome to take ideas from code presented in labs, lecture, or sections but you need to change it, adapt it to your style,
and ultimately write your own. We do not want to see code copied verbatim from the above sources.
• if you use code found on the internet, books, or other sources you need to cite those sources.
• you should not view any written materials or code created by other students for the same assignment;
• you may not provide or make available solutions to individuals who take or may take this course in the future.
• if the assignment allows it you may use third-party libraries and example code, so long as the material is available to all students
in the class and you give proper attribution. Do not remove any original copyright notices and headers.

Late or Incorrectly Submitted Assignments


Each student is allowed up to 3 late days over the semester with at most 1 day applied to any single homework. Outside of these
allotted late days, late homework will not be accepted unless there is a medical (if accompanied by a doctor's note) or other o�cial
University-excused reasons. There is no need to ask before using one of your late days.

If you forgot to join a Group with your peer and are asking for the same grade we will accept this with no penalty up to HW3. For
homeworks beyond that we feel that you should be familiar with the process of joining groups. After that there will be a penalty of -1
point for both members of the group provided the submission was on time.

Grading Guidelines
Homework will be graded based on:

1. How correct your code is (the Notebook cells should run, we are not troubleshooting code)

2. How you have interpreted the results — we want text not just code. It should be a report.

3. How well you present the results.

The scale is 0 to 5 for each assignment and 0 to 1 for the additional 209 assignments.

Re-grade Requests
Our graders and instructors make every effort in grading accurately and in giving you a lot of feedback.

If you discover that your answer to a homework problem was correct but it was marked as incorrect, send an email to the Helpline with
a description of the error. Please do not submit regrade requests based on what you perceive is overly harsh grading. The points we
take off are based on a grading rubric that is being applied uniformly to all assignments.

If you decide to send a regrade request, send an email to the Helpline with subject line "Regrade HW1: Grader=johnsmith" replacing
'HW1' with the current assignment and 'johnsmith' with the name of the grader within 48 hours of the grade release.

Auditing the Class


You are welcome to audit this course. To request access, send an email to [email protected] with you HUID (required) and a
statement of agreement to the terms below.

All auditors must agree to abide by the following rules:

• All auditors are held to the same standard of academic honesty as our registered students. Please do not share homeworks or
solutions with anyone. Violations will be reported to the Harvard Administrative Board.
• Auditors are not permitted to take the course for credit in the future.
• Audiors should not submit HWs or participate in projects.
• Auditors should refrain from using any course and TF resources that are designed for our registered students like Ed, Jupyter Hub,
and o�ce hours.

Academic Integrity
Ethical behavior is an important trait of a Data Scientist, from ethically handling data to attribution of code and work of others. Thus, in
CS109b we give a strong emphasis to Academic Honesty. As a student your best guidelines are to be reasonable and fair. We
encourage teamwork for problem sets, but you should not split the homework and you should work on all the problems together. For
more detailed expectations, please refer to the Collaborations section above.

You are responsible for understanding Harvard Extension School policies on academic integrity https://www.extension.harvard.edu
/resources-policies/student-conduct/academic-integrity and how to use sources responsibly. Stated most broadly, academic integrity
means that all course work submitted, whether a draft or a �nal version of a paper, project, take-home exam, online exam, computer
program, oral presentation, or lab report, must be your own words and ideas, or the sources must be clearly acknowledged. The
potential outcomes for violations of academic integrity are serious and ordinarily include all of the following: required withdrawal (RQ),
which means a failing grade in the course (with no refund), the suspension of registration privileges, and a notation on your transcript.

Using sources responsibly https://www.extension.harvard.edu/resources-policies/resources/avoiding-plagiarism is an essential part of


your Harvard education. We provide additional information about our expectations regarding academic integrity on our website. We
invite you to review that information and to check your understanding of academic citation rules by completing two free online 15-
minute tutorials that are also available on our site. (The tutorials are anonymous open-learning tools.)

Accommodations for students with disabilities


Harvard students needing academic adjustments or accommodations because of a documented disability must present their Faculty
Letter from the Accessible Education O�ce (AEO) and speak with the professor by the end of the second week of the term, (�ll in
speci�c date). Failure to do so may result in the Course Head's inability to respond in a timely manner. All discussions will remain
con�dential, although Faculty are invited to contact AEO to discuss appropriate implementation.

Harvard Extension School is committed to providing an inclusive, accessible academic community for students with disabilities and
chronic health conditions. The Accessibility Services O�ce (ASO) https://www.extension.harvard.edu/resources-policies/accessibility-
services-o�ce-aso offers accommodations and supports to students with documented disabilities. If you have a need for
accommodations or adjustments in your course, please contact the Accessibility Services O�ce by email at
[email protected] or by phone at 617-998-9640.

Diversity and Inclusion Statement


Data Science and Computer Science have historically been representative of only a small sliver of the population. This is despite the
contributions of a diverse group of early pioneers - see Ada Lovelace, Dorothy Vaughan, and Grace Hopper for just a few examples.

As educators, we aim to build a diverse, inclusive, and representative community offering opportunities in data science to those who
have been historically marginalized. We will encourage learning that advances ethical data science, exposes bias in the way data
science is used, and advances research into fair and responsible data science.

We need your help to create a learning environment that supports a diversity of thoughts, perspectives, and experiences, and honors
your identities (including but not limited to race, gender, class, sexuality, religion, ability, etc.) To help accomplish this:

• If you have a name and/or set of pronouns that differ from those in your o�cial Harvard records, please let us know!

• If you feel like your performance in the class is being impacted by your experiences outside of class, please do not hesitate to
come and talk with us. We want to be a resource for you. Remember that you can also submit anonymous feedback (which will
lead to us making a general announcement to the class, if necessary, to address your concerns). If you prefer to speak with
someone outside of the course, you may �nd helpful resources at the Harvard O�ce of Diversity and Inclusion.

• We (like many people) are still learning about diverse perspectives and identities. If something was said in class (by anyone) that
made you feel uncomfortable, please talk to us about it.

• As a participant in course discussions, you are expected to respect your classmates’ diverse backgrounds and perspectives.

Our course will discuss diversity, inclusion, and ethics in data science. Please contact us (in person or electronically) or submit
anonymous feedback if you have any suggestions for how we can improve.
Copyright 2018 © Institute for Applied Computational Science

You might also like