Data Analytics Applications Syllabus

Download as pdf or txt
Download as pdf or txt
You are on page 1of 6

Data Analytics Applications

DA 6813 Summer 2019

Instructor: Prof. Ashwin Malshe


Class Timing: TR 8:00 pm – 10:00 pm
Classroom: BB 3.03.02
Office: BB 4.06.20
Office Hours: By appointment
Email: [email protected]

Reply policy - I will try and respond to emails within 24 hours. If you do not receive a reply to
your email within a reasonable period of time, please email again. Sometimes emails are
captured by spam filter, addressed incorrectly, or did not actually get sent.

Course Description – This course is designed to showcase various data analytics


applications. As such, it brings together various concepts from R programming, data
visualization, statistics, and machine learning that have been covered in the previous MSDA
courses. The applications discussed in this course cover business management areas such as
1) marketing, 2) finance, 3) social media, and 4) operations. The course primarily focuses on
practicing data wrangling skills, building predictive models, and evaluating model
performance.

Course Resources

Needs Name Available at


Textbook Data Analytics Applications https://ashgreat.github.io/analyticsAppBook/
Manual caret Package Documentation https://topepo.github.io/caret/index.html
Vignette Twitter authentication https://rtweet.info/articles/auth.html
Data Datasets Github repo https://github.com/ashgreat/datasets
R Code Code Github repo https://github.com/ashgreat/DA6813

Instructions for installing R and RStudio and for creating RStudio projects are provided
below.

Instructions for installing R and RStudio: https://www.ashwinmalshe.com/post/installr/


Instructions for creating RStudio projects: https://www.ashwinmalshe.com/post/using-
rstudio-projects/

Topics

Please see the textbook for the topics to be covered.

1
Course Evalutation

Evaluation and Grading


I reserve the right to curve the scale dependent on overall class scores at the end of the
semester. Any curve will only ever make it easier to obtain a certain letter grade. The grade
will count the assessments using the following proportions:

• 60% Group Project Report


• 10% Group Project Concept Presentation
• 30% Group Project Final Presentation

The following grading scale will be used:


A = 90 - 100%, B = 80 - 89%, C = 70 - 79%, D = 60 - 69%, F = below 60%

Group Project

The course grading is entirely based on a group project. Each group will have 4-5 students.
The groups will select topics for the project, which will need the following steps:

1. Identify a business problem from marketing, operations, finance, or any other field
2. Figure out the data needs including availability to tackle this problem
3. Identify the method to analyze data
4. Perform the analysis
5. Get the results in presentable form
6. Provide a business solution

Group Project Submission Guidelines

Project report (60% Grade)

Your project report should cover the following topics/subtopics at the least. You are free to
add more if you think so.

1. A compelling argument for the need to study the topic (Relevance)


• Why should anyone care?
• Who will benefit from this?
2. Thorough review of state of the art (Novelty)
• This is an important part of the project report and will carry significant weight.
• What has been done to address similar issues in the past?
• What has been done to address this specific issue in the past?
• How do you plan to differ?

2
3. Data description
• Describe the data, source of the data, etc.
• Describe your dependent variable
• Describe your independent variables
• What’s the unit of analysis? (firm, individual, real estate, movie, etc.)
• Does your data set have multiple observations per unit of analysis?
• Do you have data that has observations across different time periods?
4. Data analysis
• Did you transform any variables? How and why?
• Did you have to restructure the data before you perform the analysis?
• What did you do about the missing values?
• What analysis technique you used and why did you use it?
• Describe your hyperparameter tuning if you used any machine learning models
• Describe the model selection process including assessment of accuracy of the model
5. Results
• Present your results using tables and plots
• Describe your results in words
• Discuss your results
• Did you find anything unusual/unexpected?
• Can you explain that?
• Did you get good accuracy out-of-sample if this was a predictive task?
6. Appendix
• Appendix should have all the relevant information such as the data characteristics
(univariate and bivariate stats), graphs, etc.
• You will need to submit the code for your data cleaning and analysis.

Submission format
The final report without the appendix will be 15-20 pages maximum. The report has to be
submitted only as a PDF file. The appendix can be submitted as a pdf or html file as you will
use Rmarkdown to get the appendix. Note that the appendix is required. Finally, all the data
and R code will have to submitted separately as a zip archive.

3
Grading rubric for the project report

Score
Component Weight Excellent Good Average Poor
Novelty of the project topic 10% 100% 80% 70% 60%
Relevance of the project topic 10% 100% 80% 70% 60%
Exploratory data analysis 10% 100% 80% 70% 60%
Modeling rigor 20% 100% 80% 70% 60%
Modeling performance 20% 100% 80% 70% 60%
Discussion of results 10% 100% 80% 70% 60%
Interpretation of the findings 10% 100% 80% 70% 60%
Managerial implications 10% 100% 80% 70% 60%

Description of the grading components

Novelty of the project topic: This can also mean the originality of the project topic. You are
free to work on a topic that someone else online has already dealt with. But you will not get
100% for the novelty in that case. If you use an existing Kaggle challenge data but use it to
address a question that was not a part of the challenge, you should score high on novelty.

Relevance of the project topic: The project topic can not only be novel. It also has to be
relevant, preferably to businesses, individuals, or policymakers as we are all business school
professors. Sometimes students may pick controversial topics such as politics. It’s interesting
but there may not be any business relevance.

Exploratory data analysis: In addition to getting summary statistics, this will also cover
preprocessing variables—imputing missing values, centering and scaling, dummy variables,
text cleaning, etc.

Modeling rigor: This will cover choice of the model among many alternatives and
hyperparameter tuning for each model.

Modeling performance: Performance assessment of the model in sample and out-of-the


sample.

Discussion of the results: Describe your findings in words.

Interpretation of the results: Is there anything interesting, unexpected, etc. that you found?
How do you describe these results? What is your rationale for the findings?

Managerial implications: What do you suggest the managers do different because of your
results

4
Project concept presentation (10% Grade)

On June 20, 2019 all the groups will present their project concepts. The grade is based on the
extent of details the groups provide on the following aspects:

1. The business problem


2. Why is the business problem unique?
3. Why is the business problem relevant?
4. Sources of data
5. The roadmap for analysis

Final presentation (30% Grade)

The final presentations will be held on July 30 and August 1

The grade will be assigned based on the following criteria:

1. Presentation quality (clarity, timeliness, conciseness, etc.) – 40%


2. Rigor of the analysis including data gathering, wrangling, and modeling – 20%
3. Actionable insights – 40%

Each group will get 15 minutes to present their work.

Course Policies

Late Submission Policy


A 20% penalty for turning in the project report 1 day late. After 1-day delay, you will be given
a zero for the assignment. Email me early if you need an extension. Do not wait until the last
minute.

Policy on Cheating

Students are expected to be above reproach in scholastic activities. Students who violate
University rules on scholastic dishonesty are subject to disciplinary penalties, including the
possibility of failure in the course and dismissal from the University. "Scholastic dishonesty
includes, but is not limited to, cheating, plagiarism, collusion, the submission for credit of any
work or materials that are attributable in whole or in part to another person, taking an exam
for another person, any act designed to give unfair advantage to a student or the attempt to
commit such acts." From the University of Texas System Rules and Regulations of the Board
of Regents, Rule: 50101. (www.utsystem.edu/BOR/rules.htm).

5
Right to Privacy

Except under specific exceptions provided in the Family Education Rights and Privacy Act of
1974, I will not give information concerning your grades, academic progress, attendance,
address, phone, or e-mail to anyone outside the UTSA system unless you give your prior
written permission. In addition, I will not give or discuss grade information over the phone or
by e-mail.

Special Needs

If you feel that you are eligible for or may be helped by accommodations in the class due to a
disability or special need, contact the Office of Disability Services (ODS). Students with
disabilities must be registered with the ODS located in MS 2.03.18 (458 4157 – voice; 458 4981
– TRY) or UTSA Downtown in FS 1.526 (458-2816), in order to receive support services. To see
if you are eligible for these services and privileges, visit the website below:
http://www.utsa.edu/disability/studeligibility.htm

Computing Assistance

Please email [email protected] (Director of Computing) and [email protected]


(System Analyst) for assistance.

You might also like