Welcome To DS4A Colombia: Projects
Welcome To DS4A Colombia: Projects
LOG OUT
Curriculum Forum
Curriculum Outline
Projects Overview
Learning Resources
––––––––––––––––––––––––––––––––––––––––––––––––––––––––––––––––––––––––––––––––––––––
PROJECTS
Datathon Project ▼
Project Work
INTRODUCTORY DECK
PROBLEM STATEMENT
DATASETS
Final Project ▼
Final Project Material: You should also download the Projects Overview
document from the section above.
Project Work
https://ds4a-colombia.correlation1.com/lesson-plan 1/22
20/1/2020 DS4A
RESOURCES
Curriculum Forum
WEEK 1
Introduction
READING
INTRODUCTION
DOWNLOAD CASE
CASE ANSWERS
CLASS VIDEO
DOWNLOAD CASE
CASE ANSWERS
CLASS VIDEO
https://ds4a-colombia.correlation1.com/lesson-plan 2/22
20/1/2020 DS4A
Case 1.2: How are trading volume and volatility related for energy stocks? ▼
LOG OUT
Curriculum
You are an analyst at a large bank Forum
focused on natural resource stock
investments. This case begins with a brief overview of SOME data, after which
you will: (1) learn how to use the Python library pandas to load the data; (2)
use pandas transform this data into a form amenable for analysis; and finally
(3) use pandas to analyze the above question and come to a conclusion. As
you may have guessed, pandas is an enormously useful library for data
analysis and manipulation.
DOWNLOAD CASE
CASE ANSWERS
CLASS VIDEO
You have recently joined the data science division of a multinational bank.
Over the past month you've been working with a variety of stock data and are
looking to gather fundamental data on a select group of energy stocks. The
firm has no experience doing this in an automated fashion, instead relying on
time-consuming manual labor up to this point.
DOWNLOAD CASE
CASE ANSWERS
CLASS VIDEO
Case 2.1: Should we develop a commercial SNAP test for predicting recovery ▼
from spinal cord injuries?
You are a consultant for a pharmaceutical company. They would like you to
answer the following: "How well do SNAP tests predict six-month recovery
rates and should they be commercially developed?"
DOWNLOAD CASE
CASE ANSWERS
CLASS VIDEO
https://ds4a-colombia.correlation1.com/lesson-plan 3/22
20/1/2020 DS4A
LOG OUT
Curriculum Forum
Case 2.2: How do users engage with a mobile app for automobiles? ▼
You are a data scientist for a large luxury automobile company. Your company
wants you to uncover behavioral patterns of the users who engage with a
mobile app. They believe that if you can find discernible patterns, your
company can leverage those insights to give users incentives to use the app
more frequently.
DOWNLOAD CASE
CASE ANSWERS
CLASS VIDEO
Case 1.4: How are returns and volatility related for energy stocks and the ▼
broader market?
You recently conducted an analysis of several energy stocks and how their
trading volume is related to their volatility. Your boss was quite pleased with
your previous analysis, and now wants you to conduct additional analysis so
he can figure out how to size potential positions in these stocks... i.e. what
percentage of the investment portfolio should be dedicated to each of these
stocks. Specifically, he wants you to look at daily returns and volatility for
each stock as well as for the broader market (i.e. not just the energy sector).
Data Transformation
DOWNLOAD CASE
CASE ANSWERS
CLASS VIDEO
Case 1.5: What patterns exist between energy consumption and generation? ▼
Energy supply and demand is a hotly debated topic across world governments
and political parties. You are an analyst for a new nuclear power plant firm,
and are responsible for discerning patterns in electric power generation and
consumption across different energy sources as well as across sectors of the
U.S. economy in order to help drive business strategy.
https://ds4a-colombia.correlation1.com/lesson-plan 4/22
20/1/2020 DS4A
CLASS VIDEO
The city of New York has seen a rise in the number of accidents on the roads
in the city. They would like to know if the number of accidents have increased
in the last few weeks. Your task is to format the given data and provide
visualizations that would answer the specific questions the client has.
Python basics Web Scraping Data Transformation Data Visualizations with Python
DOWNLOAD CASE
CASE ANSWERS
WEEK 2
FICO Case: Is it sound practice to use FICO credit scores to evaluate credit ▼
risk?
Lenders, such as banks and credit card companies, use credit scores to
evaluate the potential risk posed by lending money to consumers and to
mitigate losses due to bad debt. You are an analyst for a large loan agency,
and your role is to help your company make decisions on whether or not to
approve a loan.
Additional Case
DOWNLOAD CASE
CASE ANSWERS
CLASS VIDEO
https://ds4a-colombia.correlation1.com/lesson-plan 5/22
20/1/2020 DS4A
What information do I have? What information do I need? Do I have enough LOG OUT
information? These are crucial questions that all data scientists must ask
themselves before beginning any analysis. In this case, you will be furthering
Curriculum
your skills in gathering information Forum information sufficiency
and assessing
DOWNLOAD CASE
CASE ANSWERS
CLASS VIDEO
Airbnb has rolled out a new service to help listers set prices. Airbnb makes a
percentage commission off of the listings, so they are incentivized to help
listers price optimally; that is, at the maximum possible point where they will
still close a deal. You are an Airbnb consultant helping with this new pricing
service.
DOWNLOAD CASE
CASE ANSWERS
CLASS VIDEO
Understanding crop production and yield patterns over seasons, years, and
different geographies may yield insights that can improve resilience to food
shortages and, ultimately, improve international food security. You are a
researcher for a think tank that is interested in proposing scientific
investigations and policy solutions based on your insights.
DOWNLOAD CASE
CASE ANSWERS
CLASS VIDEO
https://ds4a-colombia.correlation1.com/lesson-plan 6/22
20/1/2020 DS4A
Case 4.4: How do we deploy the Chicago police force to efficiently fight ▼
crime? LOG OUT
Curriculum Forum
So, you found a dataset available to the Chicago PD from 2017 with
information on crimes committed throughout the city. In this case, we will
focus on exploratory analysis to construct some preliminary strategies for
police deployment. These strategies can be further consolidated or dismissed
using more rigorous statistical analysis.
DOWNLOAD CASE
CASE ANSWERS
CLASS VIDEO
Companies generally like to retain their employees, since hiring and training
new employees is costly and risky. The rate at which employees are quitting is
called the turnover rate, and the high cost of employee turnover has been
pointed out extensively in management literature. You work in your company’s
new talent analytics department. Your company has thousands of employees
and is interested in reducing the time spent recruiting and hiring new
employees to replace the people that quit.
DOWNLOAD CASE
CASE ANSWERS
WEEK 3
You are a data analyst at a large financial services firm that sells a diverse
portfolio of products. In order to make these sales, the firm relies on a call
center where sales agents make calls to current as well as prospective
customers. The company would like you to dive into their data to devise
strategies to increase their revenue or reduce their costs. Specifically, they
https://ds4a-colombia.correlation1.com/lesson-plan 7/22
20/1/2020 DS4A
would like to double down on their most reliable customers, and to cut out
sales agents whom are not producing outcomes. LOG OUT
DOWNLOAD CASE
CASE ANSWERS
CLASS VIDEO
Case 12.2: How can we build a company database to handle product sales ▼
end-to-end?
You are a data analyst for the same large financial services firm as in the
previous case. The firm was pleased with your analysis and now the see the
value of having databases that can easily be queried using SQL. It would
therefore like to move its data, which is currently stored as CSV files, on to a
database.
DOWNLOAD CASE
CASE ANSWERS
CLASS VIDEO
DOWNLOAD CASE
CLASS VIDEO
Case 12.3: Analyzing Net Promoter Score (NPS) data with SQL ▼
You are a data scientist at a new but fast-growing startup. The startup
released its first product 12 months ago and has been tracking Net Promoter
Score (NPS) over its growing customer base since the product's launch. The
startup wants you to investigate the data and answering the following
https://ds4a-colombia.correlation1.com/lesson-plan 8/22
20/1/2020 DS4A
question: "Has our NPS improved over time? And has our average NPS
decreased in specific periods over the last 12 months?" LOG OUT
DOWNLOAD CASE
CASE ANSWERS
CLASS VIDEO
Week 3 Extended Case: How are different customer cohorts affecting our NPS ▼
ratings over time?
You are a data scientist for the same fast-growing startup in the previous case.
You presented your analysis to the executive team of the startup, and now
they want to dig deeper into various customer segments. They want to zoom
in on specific interesting customers, such as those who have left a wide range
of scores over time.
DOWNLOAD CASE
CASE ANSWERS
WEEK 4
AWS S3 Dash
ENVIRONMENT
PRECASE
Case 5.1: How do we prepare data for use with an analytics platform? ▼
https://ds4a-colombia.correlation1.com/lesson-plan 9/22
20/1/2020 DS4A
The company has hundreds of thousands of users and has been collecting
data about trips taken on each of their bikes. Since the dataset collected is LOG OUT
quite large and increasing by the day, they have subscribed to a new analytics
platform which gives them information and insights when they feed their trips
Curriculum
data into it. However, the analytics platformForum
requires the collected data to be
cleaned and converted into a certain format, for which the client requires your
help.
DOWNLOAD CASE
CASE ANSWERS
CLASS VIDEO
Case 4.1: How does my company's sales data compare across different states ▼
over time?
Your company has stores all over the United States. The company has
collected data consisting of line-level order information from all its stores. The
company wishes to compare sales data across different months and
geographies and make this information available to executive members and
key shareholders. If each person has access to an interactive dashboard, it can
offer a foundation for further dialogue and great decision making.
DOWNLOAD CASE
CASE ANSWERS
CLASS VIDEO
Yelp is a very popular website where anyone can write a review about
restaurants, hotels, spas or any place that runs a business. They have decided
to start analysing the data they have to add new features to their website
based on the analysis.
DOWNLOAD CASE
CASE ANSWERS
CLASS VIDEO
https://ds4a-colombia.correlation1.com/lesson-plan 10/22
20/1/2020 DS4A
LOG OUT
Case 4.2: How are different types of crimes distributed by district across ▼
time in Boston? Curriculum Forum
You are a data consultant for the Boston Police Department. The department
is looking to optimize its police deployment strategy so that it can tackle the
most crimes as they occur with the fewest number of resources.
DOWNLOAD CASE
CASE ANSWERS
CLASS VIDEO
DOWNLOAD CASE
CASE ANSWERS
WEEK 5
Environment Setup
ENVIRONMENT
https://ds4a-colombia.correlation1.com/lesson-plan 11/22
20/1/2020 DS4A
Case 7.1: Do there exist significant differences between the balances of my ▼ OUT
LOG
various customers' cohorts?
Curriculum Forum
You are leading a business analytics unit in a bank and have been asked to
support the marketing unit to conduct a customer segmentation analysis. You
are provided with a dataset comprising a sample of customers, their bank
account balances, and some demographic information about them.
DOWNLOAD CASE
CASE ANSWERS
CLASS VIDEO
Previously, you investigated crime data for the Chicago police department,
and discovered many potential factors that could be associated with crime
incidents. Now, the police department wants you to finalize your report to
them so that they can start implementing some strategies based on your
findings. However, because deploying a new strategy is resource intensive,
they want you to confirm that the patterns you observed are not merely due to
randomness.
DOWNLOAD CASE
CASE ANSWERS
CLASS VIDEO
Case 7.3: How can I experiment with online travel deals to attract more ▼
customers?
A/B Testing
https://ds4a-colombia.correlation1.com/lesson-plan 12/22
20/1/2020 DS4A
DOWNLOAD CASE
LOG OUT
CASE ANSWERS
Curriculum Forum
CLASS VIDEO
DOWNLOAD CASE
CASE ANSWERS
CLASS VIDEO
Case 5.3: How do I pre-process text data from Yelp reviews so I can analyze ▼
it?
You are a business consultant for small and medium-sized businesses with a
large number of customers. You would like to help your small businesses
understand what factors are driving positive and negative customer
experiences.
DOWNLOAD CASE
CASE ANSWERS
CLASS VIDEO
Week 5 Extended Case: What are the most important factors driving negative ▼
reviews?
https://ds4a-colombia.correlation1.com/lesson-plan 13/22
20/1/2020 DS4A
CASE ANSWERS
Curriculum Forum
WEEK 6
Environment Setup
ENVIRONMENT
Case 6.1: Is there a pay discrimination between men and women in your ▼
organization?
DOWNLOAD CASE
CASE ANSWERS
CLASS VIDEO
You have been hired as a data scientist by a large real estate company in their
Seattle office. Your job is to assist Seattle residents willing to sell their home
with determining an optimal price to sell their property at in order to
maximize their proceeds while still being able to find willing buyers.
https://ds4a-colombia.correlation1.com/lesson-plan 14/22
20/1/2020 DS4A
DOWNLOAD CASE
LOG OUT
CASE ANSWERS
You are a property developer who frequently buys properties. It would be very
useful to get a fair estimate of the price of a property before seeing the
asking price, based on features like its size and location. Your task is to build
a model to predict property prices in the city of Milwaukee, Wisconsin
DOWNLOAD CASE
CASE ANSWERS
CLASS VIDEO
You are the same property developer from the previous case, with the same
goal. Although the previous model you built was a good start, it did not
incorporate all the variables you wished to include, and you are skeptical of
how well it might work on data that it was not trained on.
Colinearity Regression
DOWNLOAD CASE
CASE ANSWERS
CLASS VIDEO
Week 6 Extended Case: How can I price an insurance quote across the states ▼
of US?
You are the chief data scientist in a large insurance company and you are
tasked to build an accurate predictive model to understand what factors affect
the claim amount.
https://ds4a-colombia.correlation1.com/lesson-plan 15/22
20/1/2020 DS4A
Regression Modeling
LOG OUT
DOWNLOAD CASE
Curriculum Forum
CASE ANSWERS
WEEK 7
Environment Setup
ENVIRONMENT
Case 9.1: How important is the income source of an online loan applicant? ▼
As a data scientist at an emerging P2P lending company, you must answer the
following question: "Should the company verify the income source of an
online loan applicant before approving their loan?"
Logistic Regression
DOWNLOAD CASE
CASE ANSWERS
CLASS VIDEO
Case 11.1: How can I compare different models that predict the probability ▼
of defaulting on a loan?
Cross Validation
https://ds4a-colombia.correlation1.com/lesson-plan 16/22
20/1/2020 DS4A
CASE ANSWERS
Curriculum Forum
CLASS VIDEO
Case 9.2: How do I build a prediction model for Lending Club loan defaults? ▼
The biggest question for every P2P lending company IS whether a user will
default or not. Your task is to build a classification model for determining
whether a user will default on their loan or not
DOWNLOAD CASE
CASE ANSWERS
CLASS VIDEO
You are working for the fraud team at a large insurance company. Policies
which are issued are kept track of and any filed claims are examined and
evaluated to determine legitimacy and final approval for pay out by the
insurance company. It is the role of the fraud team to determine which filed
claims should be approved and which should be denied.
DOWNLOAD CASE
CASE ANSWERS
CLASS VIDEO
Week 7 Extended Case: Does a job training program improve the earnings of ▼
disadvantaged workers?
In this case we will continue our discussion of causal inference. We will study
the importance of covariate balance and explore how to perform matching to
https://ds4a-colombia.correlation1.com/lesson-plan 17/22
20/1/2020 DS4A
get this balance. We will leverage a lot of the new classification models we
have learned for this purpose. LOG OUT
Causal Inference
Curriculum Forum
DOWNLOAD CASE
CASE ANSWERS
WEEK 8
Environment Setup
ENVIRONMENT
L1/L2 Regularization
DOWNLOAD CASE
CASE ANSWERS
CLASS VIDEO
Case 17.2: How can we predict the sentiment associated with a customer ▼
interaction?
https://ds4a-colombia.correlation1.com/lesson-plan 18/22
20/1/2020 DS4A
system (1 being least satisfied and 5 being most satisfied). You also have a
customer support team which interacts with customers over call and LOG OUT
messaging services. our task is to build models which can identify the
sentiment (positive or negative) of each of these non-rated interactions
Curriculum Forum
Sentiment Analysis Natural Language Processing
DOWNLOAD CASE
CASE ANSWERS
CLASS VIDEO
Sampling
DOWNLOAD CASE
CASE ANSWERS
CLASS VIDEO
Case 18.1: How do I predict the fair market prices of used cars? ▼
There are many companies selling used (refurbished) cars across the United
States. As automobiles depreciate in value as they age, this is an extremely
competitive industry, and we have to price the car right in order to win
business. You are a data scientist tasked with building a predictive model for
the prices of used car sales around the country.
DOWNLOAD CASE
CASE ANSWERS
CLASS VIDEO
https://ds4a-colombia.correlation1.com/lesson-plan 19/22
20/1/2020 DS4A
LOG OUT
Case 18.2: How does computer vision work? ▼
Curriculum Forum
DOWNLOAD CASE
CASE ANSWERS
CLASS VIDEO
WEEK 9
Environment Setup
WEEK 8 ENV
Many social networks analyze the communications that run through them, and
would be interested in classifying them according to whether they exhibit
positive or negative sentiment.
DOWNLOAD CASE
CASE ANSWERS
CLASS VIDEO
https://ds4a-colombia.correlation1.com/lesson-plan 20/22
20/1/2020 DS4A
Curriculum Forum
You want to launch promotions on your website. However, targeting users
effecively is a complex task, and you would like to build a system that can
handle this for you.
Reinforcement Learning
DOWNLOAD CASE
CASE ANSWERS
CLASS VIDEO
Guide
DOWNLOAD NOW
WEEK 10
Please use this google form to submit your final project. Please add the report
and your project files to a zip folder and submit. Only one submission per
team should be made. Deadline: 10th December 2019, 11:59 PM
Final Project
SUBMIT
https://ds4a-colombia.correlation1.com/lesson-plan 21/22
20/1/2020 DS4A
LOG OUT
Please use this google form to submit your final project presentation. Only
one submission per team should be made. Deadline: 10th December 2019,
11:59PM Curriculum Forum
SUBMIT
Please use this google form to submit your datathon project report. Please
add the report and your project files to a zip folder and submit. Only one
submission per team should be made.
Datathon
SUBMIT
https://ds4a-colombia.correlation1.com/lesson-plan 22/22