0% found this document useful (0 votes)
12 views28 pages

The Data Incubator Data Science RFI Packet

The Data Incubator offers an 8-week bootcamp designed to help individuals with advanced degrees transition into data careers, focusing on essential skills for the data industry. The program includes both full-time and part-time options, career services, and partnerships with hiring companies to facilitate job placements. Participants engage in hands-on training covering topics like data wrangling, machine learning, SQL, and data visualization, culminating in real-world projects to showcase their skills.

Uploaded by

icecream7284
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
12 views28 pages

The Data Incubator Data Science RFI Packet

The Data Incubator offers an 8-week bootcamp designed to help individuals with advanced degrees transition into data careers, focusing on essential skills for the data industry. The program includes both full-time and part-time options, career services, and partnerships with hiring companies to facilitate job placements. Participants engage in hands-on training covering topics like data wrangling, machine learning, SQL, and data visualization, culminating in real-world projects to showcase their skills.

Uploaded by

icecream7284
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 28

Requested

Information
About The Data Incubator

The Data Incubator is an 8-week demand and how to prepare for


bootcamp to help people with PhDs interviews at companies in the
or master degrees transition from largest industries. On the flip side,
academia into a career in the data there are people trying to hire data
industry. The program also helps scientists, and it’s difficult to know
people whose background isn’t in how to spot an excellent candidate
data learn and develop new high- in a job interview.
demand data skills, and each year
there are four cohorts. So, the TDI program sets out to
solve this problem by establishing a
TDI started because there was a rigorous training program to ensure
challenge when it came to starting students’ readiness for the type
a career in data and hiring data of work they’ll do in the industry.
professionals. Then, through the hiring partners
program, those students have the
On one side, people trying to enter chance to match with companies
the data industry struggle to know actively hiring for relevant
what skills are in the highest positions.

The Data Incubator


Find the Right Data Program for You.
The Data Incubator makes it easy to enhance your data skills. Check out our
data program options below to determine the right option for you.

Data Science Data Engineering


APPLY NOW APPLY NOW

Full-Time Program Monday-Friday, 9AM to 5PM ET Monday-Friday, 9AM to 5PM ET


8 Weeks 8 Weeks

Part-Time Program Takes place in the evenings twice a week from


7:00 PM ET to 9:30 PM ET
20 Weeks

Career Services
Resume polishing, interview prep

Placement Services
Network with our hiring partners and appear in our resume book

Costs | Scholarships | Payment Options

Cost $11,000 $10,000

Priority Enrollment Program (Full-Time Only) Up to $2,000 off tuition Up to $2,000 off tuition

Full-Tuition Scholarships Data Excellence Scholarship Data Excellence Scholarship

Income Sharing Agreement


Pay $0 until you’re employed!
Private Loans from Skills Fund
Multiple repayment options available!

Programs | Tools | Skills


= Fundamental Training | = Intermediate Training | = Advanced Training

Data Wrangling
Machine Learning Basics
SQL
Visualization
Python

Copyright © 2021. Pragmatic Institute, LLC and The Data Incubator. All rights reserved.
pandas
Spark
TensorFlow
Tableau
Advanced Machine Learning
Data + Business
Data Communication
Scikit-learn
AWS
NoSQL
Data Pipelines
Production Systems
The Data Incubator
Data Science Fellowship
The Data Incubator Data Science Program is an immersive, hands-on data science bootcamp for those with a passion for data
looking to take the business world by storm. Designed by expert data scientists and with feedback from industry partners, TDI’s data
program helps you master in-demand skills so you can launch your career in data science. You’ll work on real-world projects that
showcase your world-class data science skills using real-world data to solve real-world business problems. Use a public data set to
build a functional capstone project that shows off the skills you’ve mastered to potential employers.

Program Outline
Get hands-on, business-focused data science training from expert instructors, and career services to help you find your
dream job in data. The part-time program takes place in the evenings twice a week from 7:00 PM ET to 9:30 PM ET. Expect
to spend 10 hours per week working outside of class. The full-time program lasts for 8 weeks, Monday-Friday, from 9:00
AM ET to 5:00 PM ET. Expect to spend 40 hours per week on the program. Office hours are available each week

Module Topic

1
Data Wrangling
Acquire and manipulate data in Python with the
foundational tools of data science.

2
Machine Learning
Master the basics of machine learning while building and
training different types of models.

3
SQL
Access databases using SQL interfaces and programs in a
professional environment.

4
Visualization
Present data visually for both technical and non-technical
audiences.

5
Advanced Machine Learning
Learn more advanced machine learning topics and
techniques, including unstructured data and time series.

6
Thinking Beyond the Data
Understand how data science interacts with business and
business concerns.

7
Spark
Distribute computations across multiple computers, such
as a cluster of the cloud, using PySpark.

8
TensorFlow
Build and train neural networks using TensorFlow.
Understand both theoretical and practical applications.

For a more detailed outline, visit TheDataIncubator.com/programs/data-science-fellowship


Copyright © 2021. Pragmatic Institute, LLC and The Data Incubator. All rights reserved.
Skills and Programs

Career Services
To help you find your first—or next—job in business as a data
scientist, we include a robust career services program.

Resume Review
Our career experts review your resume and cover letter to position
your experience in the right light to appeal to employers.

Interview Coaching
Work closely with our career experts to hone your interviewing skills
so you can put your best foot forward during every interview.

Hiring Partners
We work closely with a number of organizations in a wide variety of
industries to place our candidates in their exciting open positions.

Learn more by visiting:


TheDataIncubator.com

Copyright © 2021. Pragmatic Institute, LLC and The Data Incubator. All rights reserved.
thedataincubator.com

Data Science
Fellowship Curriculum
MODULE 1: SKILLS AND TOOLS DATA SCIENCE CURRICULUM FELLOWSHIP | 1
T
HANK YOU FOR YOUR INTEREST IN THE
DATA INCUBATOR! We are the innovative
fellowship program for up-and-coming
data professionals. Below you’ll find our
Data Science Fellowship Curriculum and
more about what sets us apart from other fellowship
programs.

We offer practical and actionable training that provides


immediate impact to candidates by focusing on what
works. Based on decades of hands-on experience, our
immersive programs set the benchmark for professional
data education.

Our alumni are the pillar of our brand—they’ve trusted


our programs and elevated their skills with top-tier career
training to get hired with stand-out partners. Our alumni
have exclusive access to our network of hiring partners,
including thousands of companies around the world,
from startups to Fortune 500, who apply our models to
drive their business and power their data teams.

We don’t just do training—we provide proven


methodologies, adaptable resources, experienced
instructors, a robust hiring program and world-class
support.

INTRODUCTION DATA SCIENCE CURRICULUM FELLOWSHIP | 2


Our Curriculum

DATA WRANGLING
MODULE Students learn how to acquire and manipulate data in Python with the foundational tools of data science.

1 Prerequisites: Basic Python

The first step of data science is mastering the computational foundations on which data science is built. We cover the
fundamental topics of programming relevant for data science - including pandas, NumPy, SciPy, Matplotlib, regular
expressions, SQL, JSON, XML, checkpointing, and web scraping - that form the core libraries around handling structured
and unstructured data in Python. Students gain practical experience manipulating messy, real-world data using these
libraries. They also walk away with a firm understanding of tools like pip, git, Python, Jupyter notebooks, pdb, and unit
testing that leverage existing open-source packages to accelerate data exploration, development, debugging, and
collaboration.

Associated Project Work


Students will scrape picture captions off of a website that tracks the goings-on of New York’s socially well-to-do. By
extracting names from these captions, they will assemble a graph of friendships amongst this crowd. Analysis of this
graph will produce insights about the most connected New Yorkers.

SKILLS AND TOOLS ADDRESSED: n H


 ow to (Software) Engineer Real Good
– Writing functional code
n Consuming APIs (and JSON) – Version control and other tools
– Handling URL parameters – Testing
– Authenticated APIs – Testing the web in Flask
–API Request Limitations – Linting
– Writing “good code”
n I terators, Generators, and Coroutines – Self-documenting code
– Iterables and Iterators – Code review
– Generators – Time management
– Generator “pipelines”
– Generator comprehensions n Pandas
– Time complexity – Pandas series
– Itertools in Python – Pandas DataFrame
– Coroutines – Loading data into pandas
– Coroutine “pipelines” – Pandas indices and selecting and slicing data
– Broadcasting – Using pandas for data analysis
– Coroutines as classes – Filtering data
– Unifying generators and coroutines – String operations and transformations
– Merging data sets
n O
 verview of Scraping and Munging Technologies – Dealing with missing values
– Concepts, languages, and tools – Adding and dropping columns
– Concrete tasks in Python – Aggregating by groups
– Python library cheat sheet – Automating and repeating the analysis
– Exporting data frames to CSV or Excel file
– Pandas best practices
– Conclusion

MODULE 1: DATA WRANGLING DATA SCIENCE CURRICULUM FELLOWSHIP | 3


n Scraping n Exceptions
– HTTP requests and responses – Catching general exceptions
– Understanding URLs – Handling success
– HTML and the DOM – Doing something with the error
– Parsing HTML – Raising errors
– CSS selectors – Exceptions and the call stack
– Fetching subsequent pages – Reading traceback
– Scrapy in Python
n Debugging
n Dealing with Strings in Python – NameError
– The string data structure – TypeError
– Unicode and Byte Strings – AttributeError
– Basic string processing – KeyError
– StringIO in Python – Reading code critically
– Regular expressions
n Python
n NumPy and SciPy – Jupyter notebooks and the kernel
– NumPy – Variables
– Data types (the nouns) – Functions
– Operations (the verbs) – Logic and program flow
– Persisting NumPy objects – Iteration
– SciPy – Whitespace matters
– Putting it all together
n Matplotlib
– Matplotlib and Pyplot n Object-oriented programming
– Matplotlib plots from Pandas – Everything is an object
– Seaborn – Defining a Python class
– Adding attributes and methods
n Functions – Inheritance
– Functions as first-class objects – Putting it all together again
– Closures
– Variable arguments and keywords
– Decorators

MODULE 1: DATA WRAGLING: SKILLS AND TOOLS DATA SCIENCE CURRICULUM FELLOWSHIP | 4
INTRODUCTION TO MACHINE LEARNING
MODULE Students learn the basics of machine learning, and building and training different types of models.

2 Prerequisites: Basic Python, Basic to intermediate statistics, Basic linear algebra

In a world with abundant data, leveraging machines to learn valuable patterns from structured data can be extremely
powerful. We explore the basics of machine learning, discussing concepts like regression, classification, model
evaluation metrics, overfitting, variance versus bias, linear regression, ensemble methods, model selection, and
hyperparameter optimization. Through powerful packages such as Scikit-learn, students come away with a strong
understanding of core concepts in machine learning as well as the ability to efficiently train and benchmark accurate
predictive models. They gain hands-on experience building complex ETL pipelines to handle data in a variety of formats,
developing models with tools like feature unions and pipelines to reduce duplicate work, and practicing tricks like
parallelization to speed up prototyping and development.

Associated project work


Students will develop a series of models to predict a venue’s star rating from various features. Working from 100MB
of real-world data, they will start with location-based models before building models based on other attributes of the
venues. Finally, an ensemble model will blend the individual models into a final prediction of the venue’s popularity.

SKILLS AND TOOLS n Classification n T


 ransformers and
ADDRESSED: – Precision and recall Preprocessing
– Other classification metrics – Feature scaling
n Introduction to Machine – Probabilistic models – Encoding categorical
Learning – Logistic regression variables
– Statistics vs machine – Multiclass classification – Imputation
learning problems – Dimensionality reduction
– Data as a matrix – Natural language processing
– Models as functions n B
 ias, Variance, and – Custom Transformers
– Types of machine learning Overfitting – Answers to questions
– Parameters and learning – Decision trees
– In-sample error n K Nearest Neighbors
n Regression – Out-of-sample error – Tuning k and other
– Linear regression – Variance-bias tradeoff hyperparameters
– Regression metrics – Cross-validation strategies – Normalizing features
– Optimization – Grid search for tuning
– Stochastic gradient descent hyperparameters n Unsupervised Learning
– Adding features – Metrics for clustering
– Regularization n Scikit-learn Workflow – K-Means clustering
– Example: California housing – Writing custom estimators – Elongated clusters
data set and transformers – Gaussian mixture models
– Reference: statistical – Pipelines – Dimensionality reduction
motivation – Feature unions – Random projections
– Scikit-Learn API – Data types – Matrix factorization
– Classes vs objects – Validating your – Principal Components
– Estimators implementations Analysis (PCA)
– Transformers – Non-negative Matrix
– Pipelines Factorization (NMF)
– Comparison of PCA and
NMF

MODULE 2: INTRODUCTION TO MACHINE LEARNING DATA SCIENCE CURRICULUM FELLOWSHIP | 5


SQL AND PRODUCTION TOPICS
MODULE Students learn how to access databases using SQL interfaces and topics related to programming in a

3 professional environment.

Prerequisites: Basic Python, Basic SQL

Most data is stored in databases, and they have to be accessed through interfaces. The most common one is SQL, a
declarative language that lets us tell the database which data we want and how to present it. We cover the basics of the
language itself and some of the Python tools related to it.

Associated project work


Students will assemble a SQL database of 4 years worth of NYC restaurant inspection data. They will write and execute
queries against this database to understand the variations in scores and violations across the city and between
different types of restaurants.

SKILLS AND TOOLS ADDRESSED:

n Advanced SQL
– Creating Tables
– Database connectors
– Temporary tables and views
– SQL Alchemy
– A note on SQL flavors

n SQL - Structured Query Language


– SELECT - Getting information from the tables
– COUNT, SUM, and DISTINCT - Let SQL do work for you!
– WHERE, LIKE, and IN - Filtering the data
– ORDER BY - Sorting your outputs
– GROUP BY - Aggregating data
– HAVING - The “WHERE” clause for grouped data
– JOIN - Putting tables together
– Creating and using subqueries
– CASE - Returning values based on conditional statements

MODULE 3: SQL AND PRODUCTION TOPICS DATA SCIENCE CURRICULUM FELLOWSHIP | 6


VISUALIZATIONS
MODULE Students learn how to present data visually for both technical and non-technical audiences.

4 Prerequisites: Basic Python

Data science is about helping humans understand the story behind the data, and visualizations provide a powerful tool
for helping the analyst understand and communicate that story. We discuss the biases and limitations of both visual
and statistical analysis to promote a more holistic approach.

Associated project work


Students will build an interactive web site, giving information on NYC’s bus system. They will process historical data
and develop plots to illustrate trends. Using a live feed of bus information, they will compare the current state to this
historical average. All of the visualization will be deployed as a Flask app running on Heroku.

SKILLS AND TOOLS ADDRESSED: n Overview of Data Visualization


– Introduction
n Explanatory Visualization – Pandas plots
– Multiple interactive plots – Altair plots
– Data transformations: filtering and aggregating
– Layout and design n Visualization Theory
– Using Altair with large data sets – Different types of data for visualization purposes
– Exporting an Altair chart as HTML – Seven categories of visual cues
– Embedding a chart in an HTML document – Generic algorithm for creating a visualization
– Plotting geographic data – Portability & accessibility
– Perception and visual response
n Exploratory Visualization – Attention and memory
– Python visualization tools – Visual storytelling
– Describing a distribution
– Histograms n Layout and Design
– Box plots and Violin Plots – Design Elements & Principles
– Relationships between variables – Examples (mostly bad, sometimes good)
– Non-obvious patterns in the data – Axes (use them!)
– Interactivity in visualizations – Choosing the right mark
– Data-Ink ratio
– Dealing with multiple scales
– Small multiples

MODULE 4: VISUALIZATIONS DATA SCIENCE CURRICULUM FELLOWSHIP | 7


ADVANCED MACHINE LEARNING
MODULE Students learn more advanced machine learning topics and techniques, including dealing with

5 unstructured data and time series.

Prerequisites: Intermediate to advanced statistics, Intermediate linear algebra, Basic programming

While machine learning on structured data lays an important foundation, a larger world of analytical opportunities
becomes available through understanding advanced machine learning techniques and how to handle unstructured
data. We explore techniques such as support vector machines, decision trees, random forests, neural nets, clustering,
KMeans, expectation-maximization, time series, and signal processing. Students come away with intuition about the
suitability of different techniques for different problems. In addition to handling structured data, students directly apply
these techniques to large volumes of real-world unstructured data, solving problems in natural language processing
using Word2Vec, bag of words, feature hashing, and topic modeling.

Associated project work


Students will use NLP techniques to extract sentiment from English text. Working with 300MB of venue reviews, they
will build a series of models to predict the star rating associated with a given review. They will also examine statistically
improbable phrases that appear in the text corpus.

Students will examine methods of dealing with seasonality, as they build models to predict temperatures in several
cities. The training data come from National Weather Service observations and must be cleaned before use.

Skills and Tools Addressed:


n Support Vector Machines (SVM) n Sentiment Analysis
– Maximal margin classifier – Bag of words model
– Linear SVM – Interpreting the model
– Non-linear SVM – Grammar and other tools
– Multi-class SVM
– Approximating kernels n Time Series
– Support vector regression – Trends in time series data
– Outlier detection using SVM – Cross-validation for time series
– Modeling drift
n Decision Trees and Random Forests – Modeling seasonality
– Decision trees – Modeling “noise”
– Ensemble methods – Using external data sources as features
– Determining feature importance –M ore advanced time series modeling frameworks

n Natural Language Processing n Naive Bayes


– Text as a “bag of words” – Predictive modeling using Naive Bayes
–W ord importance: Term frequency–inverse document – Classifying mushrooms (an example)
frequency (TF-IDF)
– Document similarity metrics
– Engineering your features
– Building the classifier
– Additional NLP topics and resources

MODULE 5: ADVANCED MACHINE LEARNING DATA SCIENCE CURRICULUM FELLOWSHIP | 8


n Outlier Detection n Unbalanced Classes
– Motivation – Introduction: cancer detection case study
– Concepts – Definition and common scenarios for
– Scikit-learn Implementation unbalanced data
– One-class SVM – Simple techniques to deal with unbalanced data
– Isolation forest – Undersampling
– Case Study: Anomaly Detection in Time Series – Oversampling
– Modeling the background – Synthetic data augmentation
– Detecting seasonality with Fourier Transforms – Additional approaches
– Detrending – Train/Test split with unbalanced data
– z-Score – Probabilities with unbalanced data
– Moving-window averages – The Python imbalanced-learn package
– Including windowed data in model
– Bayesian change points n Digital Signals
– Online learning – Sampling
– References – Noise & filters
– Audio files
n Recommendation Engine – Filters
– Problem definition and data format – Frequency domain
– Feature engineering
– Nearest neighbors n C
 hoosing the Correct Machine
– Tag data Learning Algorithm
– Dimensional reduction – Few features
– Recommendation for a user – Many features
– Cooperative learning – Few observations
– Regression of ratings – Many observations
– Baseline model overfitting and cross-validation – Underfitting
– Modeling interaction – Overfitting
– Surpriselib – Explicability
– References – Prediction speed
– Parallelization
– Online learning
– Feature scaling
– Outlier detection/novelty detection
– Comparing ML algorithms

MODULE 5: ADVANCED MACHINE LEARNING:: SKILLS AND TOOLS DATA SCIENCE CURRICULUM FELLOWSHIP | 9
THINKING OUTSIDE THE DATA
MODULE Students learn how data science interacts with business and business concerns.

6 Prerequisites: Intermediate to advanced statistics, Basic to intermediate programming

Sometimes the most important question to ask in data science comes from thinking beyond the data itself. We
explore a myriad of topics that affect data science decision making as a whole, and affect the implementation of
data-driven business policies. Important topics include data fidelity, relevance, and the value of additional data. Bias
is a major theme, and students think about how their conclusions are influenced by data collection, external factors,
internal structuring, procedural artifacts, and more. Students gain a broader understanding of how to balance trade-
offs to suit the business problem, such as when to favor accuracy over interpretability and vice versa. We also discuss
more practical engineering considerations like building for prediction speed or robustness, and deploying to different
environments. Students apply this knowledge to case studies that simulate what they would be expected to contribute
as part of a real-world team faced with a business problem.

SKILLS AND TOOLS ADDRESSED: – Continuous probability


– Hypothesis testing
Hypothesis Testing – Memoryless processes
– False positives versus false negatives
– Z-score Data Management
– CDF and the uniform distribution – Differing needs
– T-test – Data warehouses
– Standard error for a rate – Data lakes
– Standard error for a counting process – Self-service
– Power calculations
– Mnemonic summary Managing data science projects
– A/B testing – Strategies in software developmen
– Causality versus correlation – Minimum viable products
– Distributional tests – Agile and data science
– Multiple tests
– How trustworthy are your data? Metrics and Levers
– Metrics in business: KPIs
Personal Interview Questions – Improving KPIs - Pulling Levers
– Translating metrics
Algorithms and Data Structures – Real world considerations
– Sorting – Systematic bias
– Searching
– Dynamic programming Case Studies
– Graph theory – The prompt
– The process
What the data really says – The product
– Fallacies – Know your audience
– Data fidelity – Exercises
– Data relevance
– Modeling tradeoffs Data Science Case Studies
– Protecting privacy – What is a case study?
– How to Ace a Case Study
Statistics – Analysis
– Linearity of expectation – General Advice
– Bayes Theorem – Practice Case Studies
– Combinatorics

MODULE 6: THINKING OUTSIDE THE DATA DATA SCIENCE CURRICULUM FELLOWSHIP | 10


DISTRIBUTED COMPUTING WITH SPARK
MODULE Students learn how to distribute computations across multiple computers, such as a cluster or the cloud,

7 using PySpark.

Prerequisites: Basic Python, Basic to intermediate programming

Spark is a technology at the forefront of distributed computing that offers a more abstract but more powerful API. This
module is taught using the Python API. We cover core concepts of Spark like resilient distributed data sets, memory
caching, actions, transformations, tuning, and optimization. Students get to build functioning applications from end to
end. They apply that knowledge to directly developing, building, and deploying Spark jobs to run on large, real-world data
sets in the cloud (AWS and Google Cloud Platform).

Associated project work


Students will use Spark to parse and process 10GB of data on posts and users at a popular Q&A website. They will
extract insights on the posting habits of users and develop predictors of users’ behavior from their posts. Spark’s
machine-learning capabilities will be used to discover meaning in unstructured text data.

SKILLS AND TOOLS Streaming Technologies PySpark ML


ADDRESSED: – Apache Kafka – Algorithms
– Apache Storm – ML vs. MLlib packages
Introduction to Distributed – Spark streaming – Spark ML
Computing –B uilding a Spark streaming – Pipeline
– Big data application –C ross-validation and grid
– Distributed computing – Keeping track of state search
– MapReduce: A simple – Windowed state – Feature processing
distributed-computing – Streaming tweets demo
framework PySpark DataFrames
– Word Count: The “Hello PySpark Intro – Motivation and Spark SQL
World” of distributed – The Spark API –E xploring the Catalyst
computing – Word count example Optimizer
– Word Count in Spark – ETL example – SQL and DataFrames
– Other Spark Features – Computing statistics –A dding columns and
– Translating from SQL functions
Introduction to Functional – Joins in Spark – Type safety and DataSets
Programming Style – DataFrame Optimization
– Stateful vs. stateless code Creating Spark Applications – Joins
– Decorators – REPLs
– Map, filter, and reduce – Building Spark applications Advanced Topics in Spark
– Anonymous functions – Spark on Amazon Web – Key terminology
–R elation to Hadoop and
Tweet mini case study Services MapReduce
– Spark SQL and DataFrames - –S park on Google Cloud – Understanding the shuffle
a convenient abstraction Platform – Data partitioning
– Caching and persistence - the – Shared variables
key to Spark’s speed –B est practices and
optimization
– Resource tuning
– Spark UI

MODULE 7: SKILLS AND TOOLS DATA SCIENCE CURRICULUM FELLOWSHIP | 11


DEEP LEARNING IN TENSORFLOW
MODULE Students learn to build and train neural networks using Tensorflow. Both theoretical understanding and

8 practical applications and concerns are addressed.

Prerequisites: Basic Python

TensorFlow is taking the world of deep learning by storm. We demonstrate its capabilities through its Python and Keras
interfaces and build some simple machine learning models. We give a brief overview of the theory of neural networks,
including convolutional and recurrent layers. Students will practice building and testing these networks in TensorFlow
and Keras, using real-world data. They will come away with both theoretical and practical understanding of the
algorithms behind deep-learning algorithms.

Associated project work


Students will build a series of models to classify images from the Cifar-10 data set. These models will include basic
image analysis, convolutional neural networks, and transfer learned deep neural networks.

SKILLS AND TOOLS ADDRESSED: n Adversarial Noise


– Fooling neural networks
n Introduction to TensorFlow – Attacking networks
– Linear models – How do you find adversarial noise?
– Error metrics – Putting it all together
– Gradient descent – Exercise: extending immunity
– Gradient Descent in TensorFlow
– Tensors and operations n Convolutional Neural Networks
– Automatic differentiation and tf.GradientTape – Convolutions
– Built-in optimization – Convolutional neural networks
– TensorFlow API overview – Pre-trained CNNs (applications)

n Optimization with the Computation Graph n T


 he Inception Model and the Deep Dream
– Computation graph Algorithm
– Using accelerators (GPUs/TPUs) – Inception model
– The deep dream algorithm
n Basic Neural Networks
– The XOR problem n Variational Autoencoders
– Logistic regression – Autoencoders
– Neural networks and hidden layers – Building an Autoencoder
– Activation functions – Adam optimizer
– Initial weights – Application: noise removal
– Generating new images
n Deep Neural Networks – Variational Autoencoders (VAEs)
– What is deep learning? – KL-Divergence
– Keras API – Exercise: new numbers
– TensorBoard – Exercise: different compression

n Optimization n Recurrent Neural Networks


– Stochastic Gradient Descent – Backpropagation through time
– Exploring the loss surface and learning curves – Applications
– Overfitting – Example: name classification
– Regularization – Exercise: introduce an embedding layer
– Dropout – Long-short term memory
– Batch normalization – Example: generating strata abstracts

MODULE 8: DEEP LEARNING IN TENSORFLOW DATA SCIENCE CURRICULUM FELLOWSHIP | 12


Our Instructors

ROBERT SCHROLL
Robert studied squishy physics in Chicago, Amherst, and Santiago, Chile, before uniting his
love of computers, teaching, and making pretty graphs at The Data Incubator. In his free
time, he plays tuba and right field, usually not simultaneously.
View Resume

DON FOX
Born and raised in deep South Texas, Don studied chemical engineering at MIT and
Cornell where he researched renewable energy systems. Don was attracted to data
science because it is an interdisciplinary field that combines math, statistics, and
computer science to derive insights of processes using data. He enjoys puns, wearing ties,
cardigans, and everything fall. He is a Data Scientist in Residence.
View Resume

ANA HOCEVAR
Ana obtained her PhD in Physics before becoming a postdoctoral fellow at the Rockefeller
University where she worked on developing and implementing an underwater touchscreen
for dolphins. Now she combines her love for coding and teaching as a Data Scientist in
Residence. She spends her free time doing pottery, sometimes climbing, and every now
and then scuba diving.
View Resume

RUSSELL MARTIN
Russ was born in TN, grew up in NY, and got his PhD in Applied Mathematics from Georgia
Tech. After that he lived and worked for seventeen years in the United Kingdom, including
Warwick University and the University of Liverpool. In his spare time, Russ reads all sorts
of science-y things he probably doesn’t really understand and plays board games.
View Resume

RICHARD OTT
Rich moved from particle physics to data science when he left academia, and is excited
to be joining his interests in data and programming with his love of teaching. In his spare
time, he’s a fan of science, speculative fiction, board games, and hiking.
View Resume

TDI INSTRUCTORS DATA SCIENCE CURRICULUM FELLOWSHIP | 13


What is
the benefit
of TDI?

The Data Incubator


When you start a program at TDI, you’re taught by live expert data scientists and
data engineers, so you can be confident that the tools and skills you are learning
are used in industry and are sought after by hiring companies.

TDI not only helps students refine some basic data skills but will also empower
students to take on larger data tasks with different sets of tools.

Additionally, after completing a program you’ll have a project that you’ve worked
on with the instructor’s guidance from start to finish, and you can use that work to
show companies what kind of skills you can bring to their data team.

Finally, as a graduate of TDI, you’ll have access to a network of hiring partners


who are familiar with and confident in data professionals who have completed the
coursework.

Without TDI, the same work of learning specific You’ll have


skills and tools, demonstrating experience with
real-world data, and finding and applying to jobs
a project
is more difficult and can take more time to reach from start to
the same goals. finish, and you
can use that
work to show
companies what
kind of skills
you can bring to
their data team.

thedataincubator.com
What are the
prerequisites
for acceptance
for TDI?

The Data Incubator


We want students to have a rewarding experience in the bootcamp. To ensure
success both during and after the program, there are several prerequisites that help
identify the individuals who will be both capable of doing the work and challenged
by the curriculum.

The first requirement is a deep interest and curiosity about a career in data. And
to demonstrate, students are asked to find a data set and work with the data to
explore what it might be like doing similar work in a data career. The exercise also
uncovers the kinds of intuition an applicant might have when it comes to working
with data sets.

Acceptance to TDI isn’t dependent on having specific data skills but it is dependent
on demonstrating a certain inclination for data.

Additionally, data is in many ways at the intersection between programming,


math and storytelling. So, TDI applicants are asked to show they understand the
fundamentals of those skills.

thedataincubator.com
What is the
difference
between a fellow
and a scholar in
the TDI?

The Data Incubator


The program started by only offering fellowships, which means the program is
free. However, there were many applicants, and TDI could only select the top 2
percent; it was a very selective program.

The approach limited the number of students TDI could serve each year. There
was a massive demand for this type of training.

So, the scholars pay for the


experience like a traditional
educational model.

Fellows and scholars are sitting


in the same room and working on
the same material. So, there is no
difference between the education
they receive and the training.
Creating these two tracks was a way
to serve more students.

So the difference is only in whether


or not the student is paying for the
training.

thedataincubator.com
How do I get
a full-tuition
scholarship?
Fellows are offered a full-tuition scholarship to the
full-time program and there are a limited number
of these scholarships for each cohort. Fellows
are expected to leave any current employment
during the program and are expected to interview
exclusively with hiring partners.

The Data Incubator


A TDI applicant who’d make an excellent fellow has:

n A master’s degree or Ph.D. in a scientific or technical field


n Solved real-world problems with data in a job or research program
n Is comfortable in at least one programming language
n Wishes to start a new job at the conclusion of the program
n Keeps an open mind about industries and locations for their next job
n Will complete an engaging and purposeful capstone project
n Scores well on the coding challenge

Ultimately, we’re looking for candidates who’d be a perfect fit for our hiring partners
after they’ve completed the program.

We know our hiring partners are looking for data professionals who can focus on
practical and impactful data projects, who learn quickly and can translate technical
topics for non-technical stakeholders.

Then, we choose fellows based on their performance on the challenges, and there
are two.

1. The data challenge: how do you analyze data?


2. The coding challenge: how well can you solve a puzzle?

We want to know that if you are given a somewhat vague problem, you can figure
out how to solve it. Why?

Because that is a part of working in the real world. You’re not going to get a
step-by-step instruction manual for anything you’ll do at your job. You need to be
comfortable making decisions with incomplete information.

Finally, the fellows will be judged on their ability to work in groups, their
understanding of technical topics and their flexibility during a one-hour group
interview.

thedataincubator.com
Let’s Get Serious About
Data Careers
So, maybe your curiosity is growing. We want to help
you take the next step in this journey.

The next step is to choose a program and apply!

Data Science Essentials


Data Science Essentials from The Data Incubator is a part-
time 8-week online class designed to strengthen your data
skills, whether it’s to improve your core data wrangling and
analysis or to qualify for our data science program.
LEARN MORE AND REGISTER

Data Science
An immersive data science bootcamp for those with STEM
degrees and a passion for data looking to take the business
world by storm. Choose from our full-time, 8-week program,
or our new part-time, 20-week program when you apply.
LEARN MORE AND APPLY

Data Science & Engineering


This program is ideal for those with a computer science or
software engineering background who have a passion for
improving productivity with data and enjoy the challenge of
the constantly evolving sources of semi-structured data.
LEARN MORE AND APPLY

You might also like