The Data Incubator Data Science RFI Packet
The Data Incubator Data Science RFI Packet
Information
About The Data Incubator
Career Services
Resume polishing, interview prep
Placement Services
Network with our hiring partners and appear in our resume book
Priority Enrollment Program (Full-Time Only) Up to $2,000 off tuition Up to $2,000 off tuition
Data Wrangling
Machine Learning Basics
SQL
Visualization
Python
Copyright © 2021. Pragmatic Institute, LLC and The Data Incubator. All rights reserved.
pandas
Spark
TensorFlow
Tableau
Advanced Machine Learning
Data + Business
Data Communication
Scikit-learn
AWS
NoSQL
Data Pipelines
Production Systems
The Data Incubator
Data Science Fellowship
The Data Incubator Data Science Program is an immersive, hands-on data science bootcamp for those with a passion for data
looking to take the business world by storm. Designed by expert data scientists and with feedback from industry partners, TDI’s data
program helps you master in-demand skills so you can launch your career in data science. You’ll work on real-world projects that
showcase your world-class data science skills using real-world data to solve real-world business problems. Use a public data set to
build a functional capstone project that shows off the skills you’ve mastered to potential employers.
Program Outline
Get hands-on, business-focused data science training from expert instructors, and career services to help you find your
dream job in data. The part-time program takes place in the evenings twice a week from 7:00 PM ET to 9:30 PM ET. Expect
to spend 10 hours per week working outside of class. The full-time program lasts for 8 weeks, Monday-Friday, from 9:00
AM ET to 5:00 PM ET. Expect to spend 40 hours per week on the program. Office hours are available each week
Module Topic
1
Data Wrangling
Acquire and manipulate data in Python with the
foundational tools of data science.
2
Machine Learning
Master the basics of machine learning while building and
training different types of models.
3
SQL
Access databases using SQL interfaces and programs in a
professional environment.
4
Visualization
Present data visually for both technical and non-technical
audiences.
5
Advanced Machine Learning
Learn more advanced machine learning topics and
techniques, including unstructured data and time series.
6
Thinking Beyond the Data
Understand how data science interacts with business and
business concerns.
7
Spark
Distribute computations across multiple computers, such
as a cluster of the cloud, using PySpark.
8
TensorFlow
Build and train neural networks using TensorFlow.
Understand both theoretical and practical applications.
Career Services
To help you find your first—or next—job in business as a data
scientist, we include a robust career services program.
Resume Review
Our career experts review your resume and cover letter to position
your experience in the right light to appeal to employers.
Interview Coaching
Work closely with our career experts to hone your interviewing skills
so you can put your best foot forward during every interview.
Hiring Partners
We work closely with a number of organizations in a wide variety of
industries to place our candidates in their exciting open positions.
Copyright © 2021. Pragmatic Institute, LLC and The Data Incubator. All rights reserved.
thedataincubator.com
Data Science
Fellowship Curriculum
MODULE 1: SKILLS AND TOOLS DATA SCIENCE CURRICULUM FELLOWSHIP | 1
T
HANK YOU FOR YOUR INTEREST IN THE
DATA INCUBATOR! We are the innovative
fellowship program for up-and-coming
data professionals. Below you’ll find our
Data Science Fellowship Curriculum and
more about what sets us apart from other fellowship
programs.
DATA WRANGLING
MODULE Students learn how to acquire and manipulate data in Python with the foundational tools of data science.
The first step of data science is mastering the computational foundations on which data science is built. We cover the
fundamental topics of programming relevant for data science - including pandas, NumPy, SciPy, Matplotlib, regular
expressions, SQL, JSON, XML, checkpointing, and web scraping - that form the core libraries around handling structured
and unstructured data in Python. Students gain practical experience manipulating messy, real-world data using these
libraries. They also walk away with a firm understanding of tools like pip, git, Python, Jupyter notebooks, pdb, and unit
testing that leverage existing open-source packages to accelerate data exploration, development, debugging, and
collaboration.
MODULE 1: DATA WRAGLING: SKILLS AND TOOLS DATA SCIENCE CURRICULUM FELLOWSHIP | 4
INTRODUCTION TO MACHINE LEARNING
MODULE Students learn the basics of machine learning, and building and training different types of models.
In a world with abundant data, leveraging machines to learn valuable patterns from structured data can be extremely
powerful. We explore the basics of machine learning, discussing concepts like regression, classification, model
evaluation metrics, overfitting, variance versus bias, linear regression, ensemble methods, model selection, and
hyperparameter optimization. Through powerful packages such as Scikit-learn, students come away with a strong
understanding of core concepts in machine learning as well as the ability to efficiently train and benchmark accurate
predictive models. They gain hands-on experience building complex ETL pipelines to handle data in a variety of formats,
developing models with tools like feature unions and pipelines to reduce duplicate work, and practicing tricks like
parallelization to speed up prototyping and development.
3 professional environment.
Most data is stored in databases, and they have to be accessed through interfaces. The most common one is SQL, a
declarative language that lets us tell the database which data we want and how to present it. We cover the basics of the
language itself and some of the Python tools related to it.
n Advanced SQL
– Creating Tables
– Database connectors
– Temporary tables and views
– SQL Alchemy
– A note on SQL flavors
Data science is about helping humans understand the story behind the data, and visualizations provide a powerful tool
for helping the analyst understand and communicate that story. We discuss the biases and limitations of both visual
and statistical analysis to promote a more holistic approach.
While machine learning on structured data lays an important foundation, a larger world of analytical opportunities
becomes available through understanding advanced machine learning techniques and how to handle unstructured
data. We explore techniques such as support vector machines, decision trees, random forests, neural nets, clustering,
KMeans, expectation-maximization, time series, and signal processing. Students come away with intuition about the
suitability of different techniques for different problems. In addition to handling structured data, students directly apply
these techniques to large volumes of real-world unstructured data, solving problems in natural language processing
using Word2Vec, bag of words, feature hashing, and topic modeling.
Students will examine methods of dealing with seasonality, as they build models to predict temperatures in several
cities. The training data come from National Weather Service observations and must be cleaned before use.
MODULE 5: ADVANCED MACHINE LEARNING:: SKILLS AND TOOLS DATA SCIENCE CURRICULUM FELLOWSHIP | 9
THINKING OUTSIDE THE DATA
MODULE Students learn how data science interacts with business and business concerns.
Sometimes the most important question to ask in data science comes from thinking beyond the data itself. We
explore a myriad of topics that affect data science decision making as a whole, and affect the implementation of
data-driven business policies. Important topics include data fidelity, relevance, and the value of additional data. Bias
is a major theme, and students think about how their conclusions are influenced by data collection, external factors,
internal structuring, procedural artifacts, and more. Students gain a broader understanding of how to balance trade-
offs to suit the business problem, such as when to favor accuracy over interpretability and vice versa. We also discuss
more practical engineering considerations like building for prediction speed or robustness, and deploying to different
environments. Students apply this knowledge to case studies that simulate what they would be expected to contribute
as part of a real-world team faced with a business problem.
7 using PySpark.
Spark is a technology at the forefront of distributed computing that offers a more abstract but more powerful API. This
module is taught using the Python API. We cover core concepts of Spark like resilient distributed data sets, memory
caching, actions, transformations, tuning, and optimization. Students get to build functioning applications from end to
end. They apply that knowledge to directly developing, building, and deploying Spark jobs to run on large, real-world data
sets in the cloud (AWS and Google Cloud Platform).
TensorFlow is taking the world of deep learning by storm. We demonstrate its capabilities through its Python and Keras
interfaces and build some simple machine learning models. We give a brief overview of the theory of neural networks,
including convolutional and recurrent layers. Students will practice building and testing these networks in TensorFlow
and Keras, using real-world data. They will come away with both theoretical and practical understanding of the
algorithms behind deep-learning algorithms.
ROBERT SCHROLL
Robert studied squishy physics in Chicago, Amherst, and Santiago, Chile, before uniting his
love of computers, teaching, and making pretty graphs at The Data Incubator. In his free
time, he plays tuba and right field, usually not simultaneously.
View Resume
DON FOX
Born and raised in deep South Texas, Don studied chemical engineering at MIT and
Cornell where he researched renewable energy systems. Don was attracted to data
science because it is an interdisciplinary field that combines math, statistics, and
computer science to derive insights of processes using data. He enjoys puns, wearing ties,
cardigans, and everything fall. He is a Data Scientist in Residence.
View Resume
ANA HOCEVAR
Ana obtained her PhD in Physics before becoming a postdoctoral fellow at the Rockefeller
University where she worked on developing and implementing an underwater touchscreen
for dolphins. Now she combines her love for coding and teaching as a Data Scientist in
Residence. She spends her free time doing pottery, sometimes climbing, and every now
and then scuba diving.
View Resume
RUSSELL MARTIN
Russ was born in TN, grew up in NY, and got his PhD in Applied Mathematics from Georgia
Tech. After that he lived and worked for seventeen years in the United Kingdom, including
Warwick University and the University of Liverpool. In his spare time, Russ reads all sorts
of science-y things he probably doesn’t really understand and plays board games.
View Resume
RICHARD OTT
Rich moved from particle physics to data science when he left academia, and is excited
to be joining his interests in data and programming with his love of teaching. In his spare
time, he’s a fan of science, speculative fiction, board games, and hiking.
View Resume
TDI not only helps students refine some basic data skills but will also empower
students to take on larger data tasks with different sets of tools.
Additionally, after completing a program you’ll have a project that you’ve worked
on with the instructor’s guidance from start to finish, and you can use that work to
show companies what kind of skills you can bring to their data team.
thedataincubator.com
What are the
prerequisites
for acceptance
for TDI?
The first requirement is a deep interest and curiosity about a career in data. And
to demonstrate, students are asked to find a data set and work with the data to
explore what it might be like doing similar work in a data career. The exercise also
uncovers the kinds of intuition an applicant might have when it comes to working
with data sets.
Acceptance to TDI isn’t dependent on having specific data skills but it is dependent
on demonstrating a certain inclination for data.
thedataincubator.com
What is the
difference
between a fellow
and a scholar in
the TDI?
The approach limited the number of students TDI could serve each year. There
was a massive demand for this type of training.
thedataincubator.com
How do I get
a full-tuition
scholarship?
Fellows are offered a full-tuition scholarship to the
full-time program and there are a limited number
of these scholarships for each cohort. Fellows
are expected to leave any current employment
during the program and are expected to interview
exclusively with hiring partners.
Ultimately, we’re looking for candidates who’d be a perfect fit for our hiring partners
after they’ve completed the program.
We know our hiring partners are looking for data professionals who can focus on
practical and impactful data projects, who learn quickly and can translate technical
topics for non-technical stakeholders.
Then, we choose fellows based on their performance on the challenges, and there
are two.
We want to know that if you are given a somewhat vague problem, you can figure
out how to solve it. Why?
Because that is a part of working in the real world. You’re not going to get a
step-by-step instruction manual for anything you’ll do at your job. You need to be
comfortable making decisions with incomplete information.
Finally, the fellows will be judged on their ability to work in groups, their
understanding of technical topics and their flexibility during a one-hour group
interview.
thedataincubator.com
Let’s Get Serious About
Data Careers
So, maybe your curiosity is growing. We want to help
you take the next step in this journey.
Data Science
An immersive data science bootcamp for those with STEM
degrees and a passion for data looking to take the business
world by storm. Choose from our full-time, 8-week program,
or our new part-time, 20-week program when you apply.
LEARN MORE AND APPLY