0% found this document useful (0 votes)

12 views28 pages

The Data Incubator Data Science RFI Packet

The Data Incubator offers an 8-week bootcamp designed to help individuals with advanced degrees transition into data careers, focusing on essential skills for the data industry. The program includes both full-time and part-time options, career services, and partnerships with hiring companies to facilitate job placements. Participants engage in hands-on training covering topics like data wrangling, machine learning, SQL, and data visualization, culminating in real-world projects to showcase their skills.

Uploaded by

icecream7284

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

12 views28 pages

The Data Incubator Data Science RFI Packet

Uploaded by

icecream7284

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 28

Requested

Information
About The Data Incubator

The Data Incubator is an 8-week demand and how to prepare for

bootcamp to help people with PhDs interviews at companies in the
or master degrees transition from largest industries. On the flip side,
academia into a career in the data there are people trying to hire data
industry. The program also helps scientists, and it’s difficult to know
people whose background isn’t in how to spot an excellent candidate
data learn and develop new high- in a job interview.
demand data skills, and each year
there are four cohorts. So, the TDI program sets out to
solve this problem by establishing a
TDI started because there was a rigorous training program to ensure
challenge when it came to starting students’ readiness for the type
a career in data and hiring data of work they’ll do in the industry.
professionals. Then, through the hiring partners
program, those students have the
On one side, people trying to enter chance to match with companies
the data industry struggle to know actively hiring for relevant
what skills are in the highest positions.

The Data Incubator

Find the Right Data Program for You.
The Data Incubator makes it easy to enhance your data skills. Check out our
data program options below to determine the right option for you.

Data Science Data Engineering

APPLY NOW APPLY NOW

Full-Time Program Monday-Friday, 9AM to 5PM ET Monday-Friday, 9AM to 5PM ET

8 Weeks 8 Weeks

Part-Time Program Takes place in the evenings twice a week from

7:00 PM ET to 9:30 PM ET
20 Weeks

Career Services
Resume polishing, interview prep

Placement Services
Network with our hiring partners and appear in our resume book

Costs | Scholarships | Payment Options

Cost $11,000 $10,000

Priority Enrollment Program (Full-Time Only) Up to $2,000 off tuition Up to $2,000 off tuition

Full-Tuition Scholarships Data Excellence Scholarship Data Excellence Scholarship

Income Sharing Agreement

Pay $0 until you’re employed!
Private Loans from Skills Fund
Multiple repayment options available!

Programs | Tools | Skills

= Fundamental Training | = Intermediate Training | = Advanced Training

Data Wrangling
Machine Learning Basics
SQL
Visualization
Python

Copyright © 2021. Pragmatic Institute, LLC and The Data Incubator. All rights reserved.
pandas
Spark
TensorFlow
Tableau
Advanced Machine Learning
Data + Business
Data Communication
Scikit-learn
AWS
NoSQL
Data Pipelines
Production Systems
The Data Incubator
Data Science Fellowship
The Data Incubator Data Science Program is an immersive, hands-on data science bootcamp for those with a passion for data
looking to take the business world by storm. Designed by expert data scientists and with feedback from industry partners, TDI’s data
program helps you master in-demand skills so you can launch your career in data science. You’ll work on real-world projects that
showcase your world-class data science skills using real-world data to solve real-world business problems. Use a public data set to
build a functional capstone project that shows off the skills you’ve mastered to potential employers.

Program Outline
Get hands-on, business-focused data science training from expert instructors, and career services to help you find your
dream job in data. The part-time program takes place in the evenings twice a week from 7:00 PM ET to 9:30 PM ET. Expect
to spend 10 hours per week working outside of class. The full-time program lasts for 8 weeks, Monday-Friday, from 9:00
AM ET to 5:00 PM ET. Expect to spend 40 hours per week on the program. Office hours are available each week

Module Topic

1
Data Wrangling
Acquire and manipulate data in Python with the
foundational tools of data science.

2
Machine Learning
Master the basics of machine learning while building and
training different types of models.

3
SQL
Access databases using SQL interfaces and programs in a
professional environment.

4
Visualization
Present data visually for both technical and non-technical
audiences.

5
Advanced Machine Learning
Learn more advanced machine learning topics and
techniques, including unstructured data and time series.

6
Thinking Beyond the Data
Understand how data science interacts with business and
business concerns.

7
Spark
Distribute computations across multiple computers, such
as a cluster of the cloud, using PySpark.

8
TensorFlow
Build and train neural networks using TensorFlow.
Understand both theoretical and practical applications.

For a more detailed outline, visit TheDataIncubator.com/programs/data-science-fellowship

Career Services
To help you find your first—or next—job in business as a data
scientist, we include a robust career services program.

Resume Review
Our career experts review your resume and cover letter to position
your experience in the right light to appeal to employers.

Interview Coaching
Work closely with our career experts to hone your interviewing skills
so you can put your best foot forward during every interview.

Hiring Partners
We work closely with a number of organizations in a wide variety of
industries to place our candidates in their exciting open positions.

Learn more by visiting:

TheDataIncubator.com

Data Science
Fellowship Curriculum
MODULE 1: SKILLS AND TOOLS DATA SCIENCE CURRICULUM FELLOWSHIP | 1
T
HANK YOU FOR YOUR INTEREST IN THE
DATA INCUBATOR! We are the innovative
fellowship program for up-and-coming
data professionals. Below you’ll find our
Data Science Fellowship Curriculum and
more about what sets us apart from other fellowship
programs.

We offer practical and actionable training that provides

immediate impact to candidates by focusing on what
works. Based on decades of hands-on experience, our
immersive programs set the benchmark for professional
data education.

Our alumni are the pillar of our brand—they’ve trusted

our programs and elevated their skills with top-tier career
training to get hired with stand-out partners. Our alumni
have exclusive access to our network of hiring partners,
including thousands of companies around the world,
from startups to Fortune 500, who apply our models to
drive their business and power their data teams.

We don’t just do training—we provide proven

methodologies, adaptable resources, experienced
instructors, a robust hiring program and world-class
support.

INTRODUCTION DATA SCIENCE CURRICULUM FELLOWSHIP | 2

Our Curriculum

DATA WRANGLING
MODULE Students learn how to acquire and manipulate data in Python with the foundational tools of data science.

1 Prerequisites: Basic Python

The first step of data science is mastering the computational foundations on which data science is built. We cover the
fundamental topics of programming relevant for data science - including pandas, NumPy, SciPy, Matplotlib, regular
expressions, SQL, JSON, XML, checkpointing, and web scraping - that form the core libraries around handling structured
and unstructured data in Python. Students gain practical experience manipulating messy, real-world data using these
libraries. They also walk away with a firm understanding of tools like pip, git, Python, Jupyter notebooks, pdb, and unit
testing that leverage existing open-source packages to accelerate data exploration, development, debugging, and
collaboration.

Associated Project Work

Students will scrape picture captions off of a website that tracks the goings-on of New York’s socially well-to-do. By
extracting names from these captions, they will assemble a graph of friendships amongst this crowd. Analysis of this
graph will produce insights about the most connected New Yorkers.

SKILLS AND TOOLS ADDRESSED: n H

ow to (Software) Engineer Real Good
– Writing functional code
n Consuming APIs (and JSON) – Version control and other tools
– Handling URL parameters – Testing
– Authenticated APIs – Testing the web in Flask
–API Request Limitations – Linting
– Writing “good code”
n I terators, Generators, and Coroutines – Self-documenting code
– Iterables and Iterators – Code review
– Generators – Time management
– Generator “pipelines”
– Generator comprehensions n Pandas
– Time complexity – Pandas series
– Itertools in Python – Pandas DataFrame
– Coroutines – Loading data into pandas
– Coroutine “pipelines” – Pandas indices and selecting and slicing data
– Broadcasting – Using pandas for data analysis
– Coroutines as classes – Filtering data
– Unifying generators and coroutines – String operations and transformations
– Merging data sets
n O
verview of Scraping and Munging Technologies – Dealing with missing values
– Concepts, languages, and tools – Adding and dropping columns
– Concrete tasks in Python – Aggregating by groups
– Python library cheat sheet – Automating and repeating the analysis
– Exporting data frames to CSV or Excel file
– Pandas best practices
– Conclusion

MODULE 1: DATA WRANGLING DATA SCIENCE CURRICULUM FELLOWSHIP | 3

n Scraping n Exceptions
– HTTP requests and responses – Catching general exceptions
– Understanding URLs – Handling success
– HTML and the DOM – Doing something with the error
– Parsing HTML – Raising errors
– CSS selectors – Exceptions and the call stack
– Fetching subsequent pages – Reading traceback
– Scrapy in Python
n Debugging
n Dealing with Strings in Python – NameError
– The string data structure – TypeError
– Unicode and Byte Strings – AttributeError
– Basic string processing – KeyError
– StringIO in Python – Reading code critically
– Regular expressions
n Python
n NumPy and SciPy – Jupyter notebooks and the kernel
– NumPy – Variables
– Data types (the nouns) – Functions
– Operations (the verbs) – Logic and program flow
– Persisting NumPy objects – Iteration
– SciPy – Whitespace matters
– Putting it all together
n Matplotlib
– Matplotlib and Pyplot n Object-oriented programming
– Matplotlib plots from Pandas – Everything is an object
– Seaborn – Defining a Python class
– Adding attributes and methods
n Functions – Inheritance
– Functions as first-class objects – Putting it all together again
– Closures
– Variable arguments and keywords
– Decorators

MODULE 1: DATA WRAGLING: SKILLS AND TOOLS DATA SCIENCE CURRICULUM FELLOWSHIP | 4
INTRODUCTION TO MACHINE LEARNING
MODULE Students learn the basics of machine learning, and building and training different types of models.

2 Prerequisites: Basic Python, Basic to intermediate statistics, Basic linear algebra

In a world with abundant data, leveraging machines to learn valuable patterns from structured data can be extremely
powerful. We explore the basics of machine learning, discussing concepts like regression, classification, model
evaluation metrics, overfitting, variance versus bias, linear regression, ensemble methods, model selection, and
hyperparameter optimization. Through powerful packages such as Scikit-learn, students come away with a strong
understanding of core concepts in machine learning as well as the ability to efficiently train and benchmark accurate
predictive models. They gain hands-on experience building complex ETL pipelines to handle data in a variety of formats,
developing models with tools like feature unions and pipelines to reduce duplicate work, and practicing tricks like
parallelization to speed up prototyping and development.

Associated project work

Students will develop a series of models to predict a venue’s star rating from various features. Working from 100MB
of real-world data, they will start with location-based models before building models based on other attributes of the
venues. Finally, an ensemble model will blend the individual models into a final prediction of the venue’s popularity.

SKILLS AND TOOLS n Classification n T

ransformers and
ADDRESSED: – Precision and recall Preprocessing
– Other classification metrics – Feature scaling
n Introduction to Machine – Probabilistic models – Encoding categorical
Learning – Logistic regression variables
– Statistics vs machine – Multiclass classification – Imputation
learning problems – Dimensionality reduction
– Data as a matrix – Natural language processing
– Models as functions n B
ias, Variance, and – Custom Transformers
– Types of machine learning Overfitting – Answers to questions
– Parameters and learning – Decision trees
– In-sample error n K Nearest Neighbors
n Regression – Out-of-sample error – Tuning k and other
– Linear regression – Variance-bias tradeoff hyperparameters
– Regression metrics – Cross-validation strategies – Normalizing features
– Optimization – Grid search for tuning
– Stochastic gradient descent hyperparameters n Unsupervised Learning
– Adding features – Metrics for clustering
– Regularization n Scikit-learn Workflow – K-Means clustering
– Example: California housing – Writing custom estimators – Elongated clusters
data set and transformers – Gaussian mixture models
– Reference: statistical – Pipelines – Dimensionality reduction
motivation – Feature unions – Random projections
– Scikit-Learn API – Data types – Matrix factorization
– Classes vs objects – Validating your – Principal Components
– Estimators implementations Analysis (PCA)
– Transformers – Non-negative Matrix
– Pipelines Factorization (NMF)
– Comparison of PCA and
NMF

MODULE 2: INTRODUCTION TO MACHINE LEARNING DATA SCIENCE CURRICULUM FELLOWSHIP | 5

SQL AND PRODUCTION TOPICS
MODULE Students learn how to access databases using SQL interfaces and topics related to programming in a

3 professional environment.

Prerequisites: Basic Python, Basic SQL

Most data is stored in databases, and they have to be accessed through interfaces. The most common one is SQL, a
declarative language that lets us tell the database which data we want and how to present it. We cover the basics of the
language itself and some of the Python tools related to it.

Associated project work

Students will assemble a SQL database of 4 years worth of NYC restaurant inspection data. They will write and execute
queries against this database to understand the variations in scores and violations across the city and between
different types of restaurants.

SKILLS AND TOOLS ADDRESSED:

n Advanced SQL
– Creating Tables
– Database connectors
– Temporary tables and views
– SQL Alchemy
– A note on SQL flavors

n SQL - Structured Query Language

– SELECT - Getting information from the tables
– COUNT, SUM, and DISTINCT - Let SQL do work for you!
– WHERE, LIKE, and IN - Filtering the data
– ORDER BY - Sorting your outputs
– GROUP BY - Aggregating data
– HAVING - The “WHERE” clause for grouped data
– JOIN - Putting tables together
– Creating and using subqueries
– CASE - Returning values based on conditional statements

MODULE 3: SQL AND PRODUCTION TOPICS DATA SCIENCE CURRICULUM FELLOWSHIP | 6

VISUALIZATIONS
MODULE Students learn how to present data visually for both technical and non-technical audiences.

4 Prerequisites: Basic Python

Data science is about helping humans understand the story behind the data, and visualizations provide a powerful tool
for helping the analyst understand and communicate that story. We discuss the biases and limitations of both visual
and statistical analysis to promote a more holistic approach.

Associated project work

Students will build an interactive web site, giving information on NYC’s bus system. They will process historical data
and develop plots to illustrate trends. Using a live feed of bus information, they will compare the current state to this
historical average. All of the visualization will be deployed as a Flask app running on Heroku.

SKILLS AND TOOLS ADDRESSED: n Overview of Data Visualization

– Introduction
n Explanatory Visualization – Pandas plots
– Multiple interactive plots – Altair plots
– Data transformations: filtering and aggregating
– Layout and design n Visualization Theory
– Using Altair with large data sets – Different types of data for visualization purposes
– Exporting an Altair chart as HTML – Seven categories of visual cues
– Embedding a chart in an HTML document – Generic algorithm for creating a visualization
– Plotting geographic data – Portability & accessibility
– Perception and visual response
n Exploratory Visualization – Attention and memory
– Python visualization tools – Visual storytelling
– Describing a distribution
– Histograms n Layout and Design
– Box plots and Violin Plots – Design Elements & Principles
– Relationships between variables – Examples (mostly bad, sometimes good)
– Non-obvious patterns in the data – Axes (use them!)
– Interactivity in visualizations – Choosing the right mark
– Data-Ink ratio
– Dealing with multiple scales
– Small multiples

MODULE 4: VISUALIZATIONS DATA SCIENCE CURRICULUM FELLOWSHIP | 7

ADVANCED MACHINE LEARNING
MODULE Students learn more advanced machine learning topics and techniques, including dealing with

5 unstructured data and time series.

Prerequisites: Intermediate to advanced statistics, Intermediate linear algebra, Basic programming

While machine learning on structured data lays an important foundation, a larger world of analytical opportunities
becomes available through understanding advanced machine learning techniques and how to handle unstructured
data. We explore techniques such as support vector machines, decision trees, random forests, neural nets, clustering,
KMeans, expectation-maximization, time series, and signal processing. Students come away with intuition about the
suitability of different techniques for different problems. In addition to handling structured data, students directly apply
these techniques to large volumes of real-world unstructured data, solving problems in natural language processing
using Word2Vec, bag of words, feature hashing, and topic modeling.

Associated project work

Students will use NLP techniques to extract sentiment from English text. Working with 300MB of venue reviews, they
will build a series of models to predict the star rating associated with a given review. They will also examine statistically
improbable phrases that appear in the text corpus.

Students will examine methods of dealing with seasonality, as they build models to predict temperatures in several
cities. The training data come from National Weather Service observations and must be cleaned before use.

Skills and Tools Addressed:

n Support Vector Machines (SVM) n Sentiment Analysis
– Maximal margin classifier – Bag of words model
– Linear SVM – Interpreting the model
– Non-linear SVM – Grammar and other tools
– Multi-class SVM
– Approximating kernels n Time Series
– Support vector regression – Trends in time series data
– Outlier detection using SVM – Cross-validation for time series
– Modeling drift
n Decision Trees and Random Forests – Modeling seasonality
– Decision trees – Modeling “noise”
– Ensemble methods – Using external data sources as features
– Determining feature importance –M ore advanced time series modeling frameworks

n Natural Language Processing n Naive Bayes

– Text as a “bag of words” – Predictive modeling using Naive Bayes
–W ord importance: Term frequency–inverse document – Classifying mushrooms (an example)
frequency (TF-IDF)
– Document similarity metrics
– Engineering your features
– Building the classifier
– Additional NLP topics and resources

MODULE 5: ADVANCED MACHINE LEARNING DATA SCIENCE CURRICULUM FELLOWSHIP | 8

n Outlier Detection n Unbalanced Classes
– Motivation – Introduction: cancer detection case study
– Concepts – Definition and common scenarios for
– Scikit-learn Implementation unbalanced data
– One-class SVM – Simple techniques to deal with unbalanced data
– Isolation forest – Undersampling
– Case Study: Anomaly Detection in Time Series – Oversampling
– Modeling the background – Synthetic data augmentation
– Detecting seasonality with Fourier Transforms – Additional approaches
– Detrending – Train/Test split with unbalanced data
– z-Score – Probabilities with unbalanced data
– Moving-window averages – The Python imbalanced-learn package
– Including windowed data in model
– Bayesian change points n Digital Signals
– Online learning – Sampling
– References – Noise & filters
– Audio files
n Recommendation Engine – Filters
– Problem definition and data format – Frequency domain
– Feature engineering
– Nearest neighbors n C
hoosing the Correct Machine
– Tag data Learning Algorithm
– Dimensional reduction – Few features
– Recommendation for a user – Many features
– Cooperative learning – Few observations
– Regression of ratings – Many observations
– Baseline model overfitting and cross-validation – Underfitting
– Modeling interaction – Overfitting
– Surpriselib – Explicability
– References – Prediction speed
– Parallelization
– Online learning
– Feature scaling
– Outlier detection/novelty detection
– Comparing ML algorithms

MODULE 5: ADVANCED MACHINE LEARNING:: SKILLS AND TOOLS DATA SCIENCE CURRICULUM FELLOWSHIP | 9
THINKING OUTSIDE THE DATA
MODULE Students learn how data science interacts with business and business concerns.

6 Prerequisites: Intermediate to advanced statistics, Basic to intermediate programming

Sometimes the most important question to ask in data science comes from thinking beyond the data itself. We
explore a myriad of topics that affect data science decision making as a whole, and affect the implementation of
data-driven business policies. Important topics include data fidelity, relevance, and the value of additional data. Bias
is a major theme, and students think about how their conclusions are influenced by data collection, external factors,
internal structuring, procedural artifacts, and more. Students gain a broader understanding of how to balance trade-
offs to suit the business problem, such as when to favor accuracy over interpretability and vice versa. We also discuss
more practical engineering considerations like building for prediction speed or robustness, and deploying to different
environments. Students apply this knowledge to case studies that simulate what they would be expected to contribute
as part of a real-world team faced with a business problem.

SKILLS AND TOOLS ADDRESSED: – Continuous probability

– Hypothesis testing
Hypothesis Testing – Memoryless processes
– False positives versus false negatives
– Z-score Data Management
– CDF and the uniform distribution – Differing needs
– T-test – Data warehouses
– Standard error for a rate – Data lakes
– Standard error for a counting process – Self-service
– Power calculations
– Mnemonic summary Managing data science projects
– A/B testing – Strategies in software developmen
– Causality versus correlation – Minimum viable products
– Distributional tests – Agile and data science
– Multiple tests
– How trustworthy are your data? Metrics and Levers
– Metrics in business: KPIs
Personal Interview Questions – Improving KPIs - Pulling Levers
– Translating metrics
Algorithms and Data Structures – Real world considerations
– Sorting – Systematic bias
– Searching
– Dynamic programming Case Studies
– Graph theory – The prompt
– The process
What the data really says – The product
– Fallacies – Know your audience
– Data fidelity – Exercises
– Data relevance
– Modeling tradeoffs Data Science Case Studies
– Protecting privacy – What is a case study?
– How to Ace a Case Study
Statistics – Analysis
– Linearity of expectation – General Advice
– Bayes Theorem – Practice Case Studies
– Combinatorics

MODULE 6: THINKING OUTSIDE THE DATA DATA SCIENCE CURRICULUM FELLOWSHIP | 10

DISTRIBUTED COMPUTING WITH SPARK
MODULE Students learn how to distribute computations across multiple computers, such as a cluster or the cloud,

7 using PySpark.

Prerequisites: Basic Python, Basic to intermediate programming

Spark is a technology at the forefront of distributed computing that offers a more abstract but more powerful API. This
module is taught using the Python API. We cover core concepts of Spark like resilient distributed data sets, memory
caching, actions, transformations, tuning, and optimization. Students get to build functioning applications from end to
end. They apply that knowledge to directly developing, building, and deploying Spark jobs to run on large, real-world data
sets in the cloud (AWS and Google Cloud Platform).

Associated project work

Students will use Spark to parse and process 10GB of data on posts and users at a popular Q&A website. They will
extract insights on the posting habits of users and develop predictors of users’ behavior from their posts. Spark’s
machine-learning capabilities will be used to discover meaning in unstructured text data.

SKILLS AND TOOLS Streaming Technologies PySpark ML

ADDRESSED: – Apache Kafka – Algorithms
– Apache Storm – ML vs. MLlib packages
Introduction to Distributed – Spark streaming – Spark ML
Computing –B uilding a Spark streaming – Pipeline
– Big data application –C ross-validation and grid
– Distributed computing – Keeping track of state search
– MapReduce: A simple – Windowed state – Feature processing
distributed-computing – Streaming tweets demo
framework PySpark DataFrames
– Word Count: The “Hello PySpark Intro – Motivation and Spark SQL
World” of distributed – The Spark API –E xploring the Catalyst
computing – Word count example Optimizer
– Word Count in Spark – ETL example – SQL and DataFrames
– Other Spark Features – Computing statistics –A dding columns and
– Translating from SQL functions
Introduction to Functional – Joins in Spark – Type safety and DataSets
Programming Style – DataFrame Optimization
– Stateful vs. stateless code Creating Spark Applications – Joins
– Decorators – REPLs
– Map, filter, and reduce – Building Spark applications Advanced Topics in Spark
– Anonymous functions – Spark on Amazon Web – Key terminology
–R elation to Hadoop and
Tweet mini case study Services MapReduce
– Spark SQL and DataFrames - –S park on Google Cloud – Understanding the shuffle
a convenient abstraction Platform – Data partitioning
– Caching and persistence - the – Shared variables
key to Spark’s speed –B est practices and
optimization
– Resource tuning
– Spark UI

MODULE 7: SKILLS AND TOOLS DATA SCIENCE CURRICULUM FELLOWSHIP | 11

DEEP LEARNING IN TENSORFLOW
MODULE Students learn to build and train neural networks using Tensorflow. Both theoretical understanding and

8 practical applications and concerns are addressed.

Prerequisites: Basic Python

TensorFlow is taking the world of deep learning by storm. We demonstrate its capabilities through its Python and Keras
interfaces and build some simple machine learning models. We give a brief overview of the theory of neural networks,
including convolutional and recurrent layers. Students will practice building and testing these networks in TensorFlow
and Keras, using real-world data. They will come away with both theoretical and practical understanding of the
algorithms behind deep-learning algorithms.

Associated project work

Students will build a series of models to classify images from the Cifar-10 data set. These models will include basic
image analysis, convolutional neural networks, and transfer learned deep neural networks.

SKILLS AND TOOLS ADDRESSED: n Adversarial Noise

– Fooling neural networks
n Introduction to TensorFlow – Attacking networks
– Linear models – How do you find adversarial noise?
– Error metrics – Putting it all together
– Gradient descent – Exercise: extending immunity
– Gradient Descent in TensorFlow
– Tensors and operations n Convolutional Neural Networks
– Automatic differentiation and tf.GradientTape – Convolutions
– Built-in optimization – Convolutional neural networks
– TensorFlow API overview – Pre-trained CNNs (applications)

n Optimization with the Computation Graph n T

he Inception Model and the Deep Dream
– Computation graph Algorithm
– Using accelerators (GPUs/TPUs) – Inception model
– The deep dream algorithm
n Basic Neural Networks
– The XOR problem n Variational Autoencoders
– Logistic regression – Autoencoders
– Neural networks and hidden layers – Building an Autoencoder
– Activation functions – Adam optimizer
– Initial weights – Application: noise removal
– Generating new images
n Deep Neural Networks – Variational Autoencoders (VAEs)
– What is deep learning? – KL-Divergence
– Keras API – Exercise: new numbers
– TensorBoard – Exercise: different compression

n Optimization n Recurrent Neural Networks

– Stochastic Gradient Descent – Backpropagation through time
– Exploring the loss surface and learning curves – Applications
– Overfitting – Example: name classification
– Regularization – Exercise: introduce an embedding layer
– Dropout – Long-short term memory
– Batch normalization – Example: generating strata abstracts

MODULE 8: DEEP LEARNING IN TENSORFLOW DATA SCIENCE CURRICULUM FELLOWSHIP | 12

Our Instructors

ROBERT SCHROLL
Robert studied squishy physics in Chicago, Amherst, and Santiago, Chile, before uniting his
love of computers, teaching, and making pretty graphs at The Data Incubator. In his free
time, he plays tuba and right field, usually not simultaneously.
View Resume

DON FOX
Born and raised in deep South Texas, Don studied chemical engineering at MIT and
Cornell where he researched renewable energy systems. Don was attracted to data
science because it is an interdisciplinary field that combines math, statistics, and
computer science to derive insights of processes using data. He enjoys puns, wearing ties,
cardigans, and everything fall. He is a Data Scientist in Residence.
View Resume

ANA HOCEVAR
Ana obtained her PhD in Physics before becoming a postdoctoral fellow at the Rockefeller
University where she worked on developing and implementing an underwater touchscreen
for dolphins. Now she combines her love for coding and teaching as a Data Scientist in
Residence. She spends her free time doing pottery, sometimes climbing, and every now
and then scuba diving.
View Resume

RUSSELL MARTIN
Russ was born in TN, grew up in NY, and got his PhD in Applied Mathematics from Georgia
Tech. After that he lived and worked for seventeen years in the United Kingdom, including
Warwick University and the University of Liverpool. In his spare time, Russ reads all sorts
of science-y things he probably doesn’t really understand and plays board games.
View Resume

RICHARD OTT
Rich moved from particle physics to data science when he left academia, and is excited
to be joining his interests in data and programming with his love of teaching. In his spare
time, he’s a fan of science, speculative fiction, board games, and hiking.
View Resume

TDI INSTRUCTORS DATA SCIENCE CURRICULUM FELLOWSHIP | 13

What is
the benefit
of TDI?

The Data Incubator

When you start a program at TDI, you’re taught by live expert data scientists and
data engineers, so you can be confident that the tools and skills you are learning
are used in industry and are sought after by hiring companies.

TDI not only helps students refine some basic data skills but will also empower
students to take on larger data tasks with different sets of tools.

Additionally, after completing a program you’ll have a project that you’ve worked
on with the instructor’s guidance from start to finish, and you can use that work to
show companies what kind of skills you can bring to their data team.

Finally, as a graduate of TDI, you’ll have access to a network of hiring partners

who are familiar with and confident in data professionals who have completed the
coursework.

Without TDI, the same work of learning specific You’ll have

skills and tools, demonstrating experience with
real-world data, and finding and applying to jobs
a project
is more difficult and can take more time to reach from start to
the same goals. finish, and you
can use that
work to show
companies what
kind of skills
you can bring to
their data team.

thedataincubator.com
What are the
prerequisites
for acceptance
for TDI?

The Data Incubator

We want students to have a rewarding experience in the bootcamp. To ensure
success both during and after the program, there are several prerequisites that help
identify the individuals who will be both capable of doing the work and challenged
by the curriculum.

The first requirement is a deep interest and curiosity about a career in data. And
to demonstrate, students are asked to find a data set and work with the data to
explore what it might be like doing similar work in a data career. The exercise also
uncovers the kinds of intuition an applicant might have when it comes to working
with data sets.

Acceptance to TDI isn’t dependent on having specific data skills but it is dependent
on demonstrating a certain inclination for data.

Additionally, data is in many ways at the intersection between programming,

math and storytelling. So, TDI applicants are asked to show they understand the
fundamentals of those skills.

thedataincubator.com
What is the
difference
between a fellow
and a scholar in
the TDI?

The Data Incubator

The program started by only offering fellowships, which means the program is
free. However, there were many applicants, and TDI could only select the top 2
percent; it was a very selective program.

The approach limited the number of students TDI could serve each year. There
was a massive demand for this type of training.

So, the scholars pay for the

experience like a traditional
educational model.

Fellows and scholars are sitting

in the same room and working on
the same material. So, there is no
difference between the education
they receive and the training.
Creating these two tracks was a way
to serve more students.

So the difference is only in whether

or not the student is paying for the
training.

thedataincubator.com
How do I get
a full-tuition
scholarship?
Fellows are offered a full-tuition scholarship to the
full-time program and there are a limited number
of these scholarships for each cohort. Fellows
are expected to leave any current employment
during the program and are expected to interview
exclusively with hiring partners.

The Data Incubator

A TDI applicant who’d make an excellent fellow has:

n A master’s degree or Ph.D. in a scientific or technical field

n Solved real-world problems with data in a job or research program
n Is comfortable in at least one programming language
n Wishes to start a new job at the conclusion of the program
n Keeps an open mind about industries and locations for their next job
n Will complete an engaging and purposeful capstone project
n Scores well on the coding challenge

Ultimately, we’re looking for candidates who’d be a perfect fit for our hiring partners
after they’ve completed the program.

We know our hiring partners are looking for data professionals who can focus on
practical and impactful data projects, who learn quickly and can translate technical
topics for non-technical stakeholders.

Then, we choose fellows based on their performance on the challenges, and there
are two.

1. The data challenge: how do you analyze data?

2. The coding challenge: how well can you solve a puzzle?

We want to know that if you are given a somewhat vague problem, you can figure
out how to solve it. Why?

Because that is a part of working in the real world. You’re not going to get a
step-by-step instruction manual for anything you’ll do at your job. You need to be
comfortable making decisions with incomplete information.

Finally, the fellows will be judged on their ability to work in groups, their
understanding of technical topics and their flexibility during a one-hour group
interview.

thedataincubator.com
Let’s Get Serious About
Data Careers
So, maybe your curiosity is growing. We want to help
you take the next step in this journey.

The next step is to choose a program and apply!

Data Science Essentials

Data Science Essentials from The Data Incubator is a part-
time 8-week online class designed to strengthen your data
skills, whether it’s to improve your core data wrangling and
analysis or to qualify for our data science program.
LEARN MORE AND REGISTER

Data Science
An immersive data science bootcamp for those with STEM
degrees and a passion for data looking to take the business
world by storm. Choose from our full-time, 8-week program,
or our new part-time, 20-week program when you apply.
LEARN MORE AND APPLY

Data Science & Engineering

This program is ideal for those with a computer science or
software engineering background who have a passion for
improving productivity with data and enjoy the challenge of
the constantly evolving sources of semi-structured data.
LEARN MORE AND APPLY

DSC Data Science Career Track Syllabus 082823
No ratings yet
DSC Data Science Career Track Syllabus 082823
20 pages
Data Science and Gen AI
No ratings yet
Data Science and Gen AI
27 pages
Full Stack Data Science Brochure 2024
No ratings yet
Full Stack Data Science Brochure 2024
62 pages
Data Science Bootcamp Brochure
No ratings yet
Data Science Bootcamp Brochure
20 pages
Data Science Bootcamp Syllabus - HyperionDev
No ratings yet
Data Science Bootcamp Syllabus - HyperionDev
18 pages
Global Certificate in Data Science & AI 2024
No ratings yet
Global Certificate in Data Science & AI 2024
29 pages
PDF
No ratings yet
PDF
25 pages
Data Science Program
No ratings yet
Data Science Program
29 pages
Data Science Course Brochure
No ratings yet
Data Science Course Brochure
19 pages
HUI-CMP201 Note 5
No ratings yet
HUI-CMP201 Note 5
62 pages
Data Science Curriculum Brochure
No ratings yet
Data Science Curriculum Brochure
40 pages
Data Science Bootcamp - UG - V1 - 0324
No ratings yet
Data Science Bootcamp - UG - V1 - 0324
30 pages
Seminar On Data Science
100% (7)
Seminar On Data Science
25 pages
Full Stack Data Science Course Content Act
No ratings yet
Full Stack Data Science Course Content Act
25 pages
@vtucode - in 21AI63 Module 1 AI&ML 2021 Scheme
No ratings yet
@vtucode - in 21AI63 Module 1 AI&ML 2021 Scheme
38 pages
Data Science With Career Program - Compressed - English - 1666121133
No ratings yet
Data Science With Career Program - Compressed - English - 1666121133
15 pages
Data 20 Bootcamp
No ratings yet
Data 20 Bootcamp
29 pages
Data Science
No ratings yet
Data Science
8 pages
Jntuk R20 ML MANUAL
100% (1)
Jntuk R20 ML MANUAL
53 pages
Datamites Certified Python Developer Brochure
No ratings yet
Datamites Certified Python Developer Brochure
10 pages
Ap Internship Last
No ratings yet
Ap Internship Last
30 pages
Centralized 3MTT Cohort 2 Online Learning Curriculum
No ratings yet
Centralized 3MTT Cohort 2 Online Learning Curriculum
61 pages
Fellowship Program in NextGen Data Analytics With AI
No ratings yet
Fellowship Program in NextGen Data Analytics With AI
16 pages
Harsh Synopsis
No ratings yet
Harsh Synopsis
21 pages
Data Science Primer
No ratings yet
Data Science Primer
9 pages
Data Science Bootcamp Brochure
No ratings yet
Data Science Bootcamp Brochure
21 pages
TDI Data Science Program Outline
No ratings yet
TDI Data Science Program Outline
2 pages
Fundamentals of Data Science
100% (3)
Fundamentals of Data Science
62 pages
Unit 2 Notes - Final
No ratings yet
Unit 2 Notes - Final
32 pages
Metis Bootcamp Curriculum Data Analytics
No ratings yet
Metis Bootcamp Curriculum Data Analytics
17 pages
Data Science Course Brochure
No ratings yet
Data Science Course Brochure
20 pages
Data Science RoadMap Min
No ratings yet
Data Science RoadMap Min
27 pages
Data Science Meritshotv2-Brochure
No ratings yet
Data Science Meritshotv2-Brochure
20 pages
2.THE BEST Artificial Intelligence Questions and Answers
No ratings yet
2.THE BEST Artificial Intelligence Questions and Answers
32 pages
Data Science Bootcamp: Curriculum
No ratings yet
Data Science Bootcamp: Curriculum
19 pages
Data Science
No ratings yet
Data Science
13 pages
Metis Bootcamp Curriculum 2020
No ratings yet
Metis Bootcamp Curriculum 2020
19 pages
Data Science Course Syllabus
No ratings yet
Data Science Course Syllabus
37 pages
Data Science 8752
No ratings yet
Data Science 8752
28 pages
Data Science Bootcamp
No ratings yet
Data Science Bootcamp
21 pages
CodeOp DS Course Guide 2023
No ratings yet
CodeOp DS Course Guide 2023
15 pages
01 Introduction
No ratings yet
01 Introduction
7 pages
DSML - Curriculum Brochure
No ratings yet
DSML - Curriculum Brochure
32 pages
Data Scientist: Nanodegree Program Syllabus
No ratings yet
Data Scientist: Nanodegree Program Syllabus
16 pages
Final Industrial Report
No ratings yet
Final Industrial Report
34 pages
Data Scientist Syllabus Upd
No ratings yet
Data Scientist Syllabus Upd
7 pages
Final
No ratings yet
Final
45 pages
Optimizing Stroke Recognition With MediaPipe and Machine Learning An Explainable AI Approach For Facial Landmark Analysis
No ratings yet
Optimizing Stroke Recognition With MediaPipe and Machine Learning An Explainable AI Approach For Facial Landmark Analysis
26 pages
Data Science Course Brochure
No ratings yet
Data Science Course Brochure
19 pages
Data Science - Curriculum Brochure
No ratings yet
Data Science - Curriculum Brochure
31 pages
Metis Bootcamp Curriculum
No ratings yet
Metis Bootcamp Curriculum
18 pages
Boosting of Implicit Neural Representation-Based Image Denoiser
No ratings yet
Boosting of Implicit Neural Representation-Based Image Denoiser
5 pages
Fundamentals To Projects Complete Data Scientist Roadmap - NEW
No ratings yet
Fundamentals To Projects Complete Data Scientist Roadmap - NEW
11 pages
Data Science Training in Hyd PDF
No ratings yet
Data Science Training in Hyd PDF
16 pages
Capstone Review 2
No ratings yet
Capstone Review 2
30 pages
Practice Questions AI & Robotics
No ratings yet
Practice Questions AI & Robotics
17 pages
BROCHURE - Data Science Learning Path - Board - Infinity
No ratings yet
BROCHURE - Data Science Learning Path - Board - Infinity
30 pages
ML For Mat. Sc.
No ratings yet
ML For Mat. Sc.
41 pages
Unit 4 - Question Bank and Answers
No ratings yet
Unit 4 - Question Bank and Answers
23 pages
Best Data Science Training Institute in Hyderabad
No ratings yet
Best Data Science Training Institute in Hyderabad
16 pages
Enginerring Students
No ratings yet
Enginerring Students
4 pages
Brochure DartmouthDataScience 28-June-2021 V72
No ratings yet
Brochure DartmouthDataScience 28-June-2021 V72
15 pages
GreyAtom FSDSE Brochure PDF
No ratings yet
GreyAtom FSDSE Brochure PDF
25 pages
Artificial Intelligence and Machine Learning Question Bank
No ratings yet
Artificial Intelligence and Machine Learning Question Bank
23 pages
Data Science Bootcamp
No ratings yet
Data Science Bootcamp
26 pages
ML Questions
No ratings yet
ML Questions
9 pages
Elevate Your Engineering Career With Data Science2
No ratings yet
Elevate Your Engineering Career With Data Science2
4 pages
Berkley Data Science
No ratings yet
Berkley Data Science
4 pages
Machine Learning Notes
100% (3)
Machine Learning Notes
134 pages
Machine Learning: Lecture 13: Model Validation Techniques, Overfitting, Underfitting
100% (2)
Machine Learning: Lecture 13: Model Validation Techniques, Overfitting, Underfitting
26 pages
The Model
No ratings yet
The Model
21 pages
ML QB
No ratings yet
ML QB
13 pages
Artificial Neural Networks To Predict Deformation Modulus
No ratings yet
Artificial Neural Networks To Predict Deformation Modulus
18 pages
Mit Data Science Program
100% (1)
Mit Data Science Program
15 pages
Regularization: The Problem of Overfitting
No ratings yet
Regularization: The Problem of Overfitting
23 pages
League ML1
No ratings yet
League ML1
12 pages
Unit 2 1
No ratings yet
Unit 2 1
15 pages
Data Science With Machine Learning Curriculum 2021
No ratings yet
Data Science With Machine Learning Curriculum 2021
12 pages
Supervised Regression Notes
No ratings yet
Supervised Regression Notes
11 pages
A Data Quality-Driven View of Mlops
No ratings yet
A Data Quality-Driven View of Mlops
12 pages
Quantifying Generalization in Reinforcement Learning: Work Performed As An Openai Fellow
No ratings yet
Quantifying Generalization in Reinforcement Learning: Work Performed As An Openai Fellow
20 pages
Classifying Authentic and AI-Generated Images With A Fine - Tuned ResNet50 Model.
No ratings yet
Classifying Authentic and AI-Generated Images With A Fine - Tuned ResNet50 Model.
7 pages
Ocean Engineering: Pin Zhang, Zhen-Yu Yin, Yuanyuan Zheng, Fu-Ping Gao
No ratings yet
Ocean Engineering: Pin Zhang, Zhen-Yu Yin, Yuanyuan Zheng, Fu-Ping Gao
13 pages
HW1 Cse 5160-01
No ratings yet
HW1 Cse 5160-01
6 pages
Twelve-Week Onsite Bootcamp: Flow of The Day
No ratings yet
Twelve-Week Onsite Bootcamp: Flow of The Day
15 pages
Assignment 6: Introduction To Machine Learning Prof. B. Ravindran
No ratings yet
Assignment 6: Introduction To Machine Learning Prof. B. Ravindran
3 pages
Partial Least Square
No ratings yet
Partial Least Square
6 pages
Data Science Course and Machine Learnign Using Python
No ratings yet
Data Science Course and Machine Learnign Using Python
3 pages
Machine Learning For Networking Workflow, Advances and Opportunities
No ratings yet
Machine Learning For Networking Workflow, Advances and Opportunities
8 pages
Internet of Things (IoT) A Quick Start Guide: A to Z of IoT Essentials
From Everand
Internet of Things (IoT) A Quick Start Guide: A to Z of IoT Essentials
Chitra Lele
No ratings yet

The Data Incubator Data Science RFI Packet

Uploaded by

The Data Incubator Data Science RFI Packet

Uploaded by

Requested

The Data Incubator is an 8-week demand and how to prepare for

The Data Incubator

Data Science Data Engineering

Full-Time Program Monday-Friday, 9AM to 5PM ET Monday-Friday, 9AM to 5PM ET

Part-Time Program Takes place in the evenings twice a week from

Costs | Scholarships | Payment Options

Cost $11,000 $10,000

Full-Tuition Scholarships Data Excellence Scholarship Data Excellence Scholarship

Income Sharing Agreement

Programs | Tools | Skills

For a more detailed outline, visit TheDataIncubator.com/programs/data-science-fellowship

Learn more by visiting:

We offer practical and actionable training that provides

Our alumni are the pillar of our brand—they’ve trusted

We don’t just do training—we provide proven

INTRODUCTION DATA SCIENCE CURRICULUM FELLOWSHIP | 2

1 Prerequisites: Basic Python

Associated Project Work

SKILLS AND TOOLS ADDRESSED: n H

MODULE 1: DATA WRANGLING DATA SCIENCE CURRICULUM FELLOWSHIP | 3

2 Prerequisites: Basic Python, Basic to intermediate statistics, Basic linear algebra

Associated project work

SKILLS AND TOOLS n Classification n T

MODULE 2: INTRODUCTION TO MACHINE LEARNING DATA SCIENCE CURRICULUM FELLOWSHIP | 5

Prerequisites: Basic Python, Basic SQL

Associated project work

SKILLS AND TOOLS ADDRESSED:

n SQL - Structured Query Language

MODULE 3: SQL AND PRODUCTION TOPICS DATA SCIENCE CURRICULUM FELLOWSHIP | 6

4 Prerequisites: Basic Python

Associated project work

SKILLS AND TOOLS ADDRESSED: n Overview of Data Visualization

MODULE 4: VISUALIZATIONS DATA SCIENCE CURRICULUM FELLOWSHIP | 7

5 unstructured data and time series.

Prerequisites: Intermediate to advanced statistics, Intermediate linear algebra, Basic programming

Associated project work

Skills and Tools Addressed:

n Natural Language Processing n Naive Bayes

MODULE 5: ADVANCED MACHINE LEARNING DATA SCIENCE CURRICULUM FELLOWSHIP | 8

6 Prerequisites: Intermediate to advanced statistics, Basic to intermediate programming

SKILLS AND TOOLS ADDRESSED: – Continuous probability

MODULE 6: THINKING OUTSIDE THE DATA DATA SCIENCE CURRICULUM FELLOWSHIP | 10

Prerequisites: Basic Python, Basic to intermediate programming

Associated project work

SKILLS AND TOOLS Streaming Technologies PySpark ML

MODULE 7: SKILLS AND TOOLS DATA SCIENCE CURRICULUM FELLOWSHIP | 11

8 practical applications and concerns are addressed.

Prerequisites: Basic Python

Associated project work

SKILLS AND TOOLS ADDRESSED: n Adversarial Noise

n Optimization with the Computation Graph n T

n Optimization n Recurrent Neural Networks

MODULE 8: DEEP LEARNING IN TENSORFLOW DATA SCIENCE CURRICULUM FELLOWSHIP | 12

TDI INSTRUCTORS DATA SCIENCE CURRICULUM FELLOWSHIP | 13

The Data Incubator

Finally, as a graduate of TDI, you’ll have access to a network of hiring partners

Without TDI, the same work of learning specific You’ll have

The Data Incubator

Additionally, data is in many ways at the intersection between programming,

The Data Incubator

So, the scholars pay for the

Fellows and scholars are sitting

So the difference is only in whether

The Data Incubator

n A master’s degree or Ph.D. in a scientific or technical field

1. The data challenge: how do you analyze data?

The next step is to choose a program and apply!

Data Science Essentials

Data Science & Engineering

You might also like