Data Analytics-Syllabus-Spring 2020
Data Analytics-Syllabus-Spring 2020
Software: R or Python
‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐
Course Description
We live in a world occupied by various information. Big data is everywhere. With the
rapidly evolving of the web technology and mobile use, people are becoming more
and more enthusiastic about interacting, communicating and sharing with each other
through different social platforms and media. In recent years, this collective
intelligence has spread to many different domains, with a particular focus on e‐
commerce, healthcare, and social network, causing the volume of user‐generated
data to expand exponentially. The extraction of knowledge from such a large amount
of unstructured dynamically changed is a challenging task. Those typical data includes
social comments from Facebook, online customer reviews, Twitter and other popular
social platforms, shopping transaction records, mobile messages, financial news and
climate data, etc. In the transportation field, mobile devices like GPS or apps in the
smartphone make it possible to track vehicle traces, and some traffic surveillance data
including speed, link counts, etc. also generate big data in large volumes.
However, the methods, models and algorithms that are used in the transportation
field to mine and explore data from estimation, prediction, validation of traffic to
transportation theories and models may not perform well under the new situation.
The same issue also exists in other fields.
‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐
Assignments
We have two homework assignments. These assignments are mainly from the
lectures. They will cover basic data visualization, decision tree, k‐Means, text mining or
social network analysis, etc. These assignments will help you understand concepts and
ideas you’ve learned from lectures. You need to submit a report and your code at the
same time.
Plagiarism Policy: For a programming course, a few people inevitably submit the
homework that is not coded by themselves. Please keep in mind that it is not hard to
detect copying of programs although a program is modified to try to hide its source.
Copying a program, or letting someone else copy your program, is a form of academic
dishonesty and the penalties can be found here.
Late Assignment Policy: the penalty is 50% off the grade of your project or each
assignment.
‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐
Project
We will have a class project for each group. The size of each group is two at
maximum. Each group will be assigned a case with the real data and problems in the real
world. Each group also can use existing online datasets or download your own datasets
from online resources, like Facebook, Twitter, Yelp, etc. We expect each group could
generate a technical report to show some interesting findings by running existing big data
analysis algorithms. We encourage each group/student to use the dataset in their fields.
You need to submit a detailed technical report along with the source code.
‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐
Grading
Your final grade will be composed from the following items:
Attendance: 2% * 10 = 20%
Sometimes I will bring some open questions for the next lecture, and you
will get something to read or think about it in advance. Please be prepared
for a one or two‐minute in‐class presentation. Depending on the time, I
may randomly ask some students to present their findings.
Assignments: 20% *2 = 40%
Final project: 40% *1 = 40%
Letter grades are assigned as follows:
Points Letter Grade Percentage
A 100 – 90
A‐ 89 – 85
B+ 84 – 80
B 79 – 75
B‐ 74 – 70
C+ 69 – 65
C 64 – 60
F Below 60
‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐
Office Hours, E‐mail
I am on campus for most of the day, and you are welcome to come in anytime if
you have any questions. Your office visits are certainly not limited to my regular office
hours, but appointments by email preferred for non‐regular office hour time. Even my
regular office hours, if you could send me an email to confirm that will be great in case I
have any other conflicts. Email is a good way to communicate with me since I usually
answer messages within one day of receiving them.