0% found this document useful (0 votes)
132 views8 pages

Data Science Syllabus

The document outlines the 12-week curriculum for the NYC Data Science Academy bootcamp. The curriculum covers topics in data science tools and programming languages like Linux, Git, SQL, R, Python, and machine learning algorithms. Some key areas covered include data wrangling with R and Python, data visualization with ggplot2 and Matplotlib, interactive apps with Shiny, statistical analysis, linear and logistic regression, and clustering. Students will complete projects on exploratory data analysis, interactive applications, and linear regression programming.

Uploaded by

Valentina Klepak
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
132 views8 pages

Data Science Syllabus

The document outlines the 12-week curriculum for the NYC Data Science Academy bootcamp. The curriculum covers topics in data science tools and programming languages like Linux, Git, SQL, R, Python, and machine learning algorithms. Some key areas covered include data wrangling with R and Python, data visualization with ggplot2 and Matplotlib, interactive apps with Shiny, statistical analysis, linear and logistic regression, and clustering. Students will complete projects on exploratory data analysis, interactive applications, and linear regression programming.

Uploaded by

Valentina Klepak
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 8

NYC

Data Science Academy


12-Week Data Science Bootcamp C urriculum

Week 1
Data Science Toolkit Linux, Git, Bash, and SQL
Data Science with R Data Analytics Part I
Linux system
o Introduce Linux environment
o Learn Linux commands
o IO redirection and Pipe
o Introduce server-side Linux usage
Git
o Introduce modern source code management
o Learn common git operations
o Setup github and personal portfolio page
Other server related topics
o Text editors and IDEs
o ssh: how to communicate with a remote server
o Linux environment variables
SQL
o Introduction to relational database
o Introduction to structured query language
o SQL major commands and examples
Programming foundation in R I
o Syntax
o Data object: Vectors, Matrices, Data Frames, and Lists
o Common functions
o Rstudio environment and package management
o Local data input/output
o Introduction to R data visualization
Programming foundation in R II
o Data sorting and merging
o String manipulation
o Dates and times
o Connecting to an external database

Week 2
Data Science with R Data Analytics Part II
Data manipulation with dplyr
o Tables in R
o Join
o Subset
o Advanced manipulations with dplyr
Data Visualization with "ggplot2"
Updated June 29, 2016

NYC Data Science Academy


12-Week Data Science Bootcamp C urriculum

o Histogram
o Point graphics
o Columnar graphics
o Line charts
o Pie charts
o Box plots
o Scatter plots
o Visualizing multivariate data
o Matrix-based visualizations
o Maps
Introduction to Shiny
o Shiny introduction
o Design the User-interface
o Control widgets
o Build reactive output
o Use data table in Shiny Apps
o Use R scripts, data and packages
o UI and server for the App
o Make Shiny perform quickly
o Matrix-based visualizations
o Use reactive expressions
o Share and deploy Shiny apps
Lab: Moneyball
Project 1 Due: Exploratory Data Visualization

Week 3
Data Science with Python - Data Analytics Part I

Python Programming Language I


o
Simple Values and Expressions
o
Functions
o
Lists
o
Conditionals
o
Functional programming: map, filter and reduce

Python Programming Language II


o
String operations
o
File input/output and searching
o
Data Structures:

Mutating operations on Lists

Tuples, sets and dictionaries

Python Programming Language III


o
Control flows
Updated June 29, 2016

Errors and exceptions


Object-oriented programming

Web scraping
o
Regular expression
o
HTML, beautiful soup and scrapy
o
NoSQL and MongoDB


Week 4
Data Science with Python Data Analytics Part II

Numpy and Scipy


o
Basic data structure and operations
o
Matrices and linear algebra
o
Stats module
o
Random Sampling

Pandas
o
Series and data frame
o
I/O of pandas data frame
o
Concatenation and merge
o
Arithmetic, drop, apply and describe
o
Selection and filter
o
Missing values
o
Grouping and aggregation
o
Time series
o
Interacting with data base
Matplotlib and Seaborn
o
Basic plots
o
Statistical plots:

Scatter plots

Histogram

Boxplot

Barchart
o
Multiple figures
o
Advanced plots with seaborn
Python lab: linear regression from scratch
Project 2 Due: R Shiny Interactive Applications

Week 5
Data Science with R - Machine Learning Part I

Foundations of Statistics
o
Descriptive Statistics
o
o

Updated June 29, 2016

NYC Data Science Academy


12-Week Data Science Bootcamp C urriculum

NYC Data Science Academy


12-Week Data Science Bootcamp C urriculum

Measures of Centrality

Measures of Variability

Frequency, Proportion & Contingency Tables

Correlation
o
Hypothesis Testing

One Sample t-test

Two Sample t-test

F-test

One-way ANOVA

X2 Test of Independence
o
Introduction to Machine Learning

Supervised Learning

Regression

Classification

Unsupervised Learning

Clustering

Dimension Reduction

Missingness & Imputation


o
Types of Missingness

MCAR

MAR

MNAR
o
Basic Methods of Imputation

Mean Value Imputation

Simple Random Imputation

Regression Prediction
o
K-Nearest Neighbors

Voronoi Tessellations

KNN for Classification

KNN for Regression

Distance Measures

Linear Regression I
o
Simple Linear Regression

From a Mathematical Standpoint

Accuracy of the Coefficient Estimates

Performing Hypothesis Tests

Constructing Confidence Intervals


o
Assumptions & Diagnostics
o
Transformations

Power Transformation

Box-Cox Transformation
Updated June 29, 2016

The Coefficient of Determination R2

Linear Regression II
o
Multiple Linear Regression

From a Mathematical Standpoint


o
Assumptions & Diagnostics
o
Potential Problems
o
Research Questions
o
Variable Selection
o
Factors
o
Interactions
o
Higher-Order Terms
o


Week 6
Data Science with R - Machine Learning Part II

Lab: Building Bridges

Generalized Linear Models


o
Logistic Regression

The Curse of Dimensionality


o
Ridge Regression
o
Lasso Regression
o
Cross-Validation
o
Bias/Variance Tradeoff
o
Density
o
Principal Component Analysis

The Curse of Dimensionality


o
Density
o
Principal Components Analysis

Guest Lecture: Dataiku Part I


Project 3 Due: Python Web Scraping

Week 7
Data Science with R - Machine Learning Part III

Classification
o
Feature Selection
o
Support Vector Machines
o
Decision Trees
o
Pruning/Purity/Entropy/GINI
o
Random Forests
o
Bagging
o
Boosting

Cluster Analysis
Updated June 29, 2016

NYC Data Science Academy


12-Week Data Science Bootcamp C urriculum

o
o
o

NYC Data Science Academy


12-Week Data Science Bootcamp C urriculum

K-Means Clustering
Agglomerative Clustering
Hierarchical Clustering
Neural Networks


Week 8
Data Science with R - Machine Learning Part IV
Introduction to Natural Language Processing

Case Study: Spam Detection

Association Rules
o
Market Basket Analysis

Nave Bayes Analysis

Introduction to Natural Language Processing


o
Creating corpus: stemming and lemmatization
o
POS tag and chunking
o
Text classification

Time Series Analysis


o
Smoothing
o
Seasonal Decomposition
o
ARIMA

Guest Lecture: Dataiku Part II



Week 9
Data Science with Python - Machine Learning
Machine Learning Recap / Linear Regression
o
Introduction to scikit learn
o
Simple linear regression
o
Multiple linear regression
o
Stats module
Classification part I
o
Logistic regression
o
Discriminant analysis
o
Nave Bayes
Model Selection
o
Cross-validation
o
Bootstrap
o
Feature selection
o
Regularization
o
Grid search
Classification part II
Updated June 29, 2016

NYC Data Science Academy


12-Week Data Science Bootcamp C urriculum

o
Support vector machine
o
Decision tree
o
Random forest
Unsupervised learning
o
Principal Components Analysis
o
Kmeans and Hierarchical Clustering
Project 4 Due: Machine Learning Project (It can be a Kaggle competition, a hiring partner
project or a non-profit project from our partners)

Week 10
Big Data
Parallel processing: Introduction to Hadoop and MapReduce
o
HDFS
o
MapReduce

Conceptual framework

Streaming and Python

o
Examples and lab work
MapReduce design pattern
o
Filtering patterns

Simple filtering

Top N
o
Summarization patterns

Numerical summarizations

Inverted Index summarizations


Apache Hive:
o
Databases for Hadoop
o
Hive

Select

Joins
o
Compiling HiveQL to MapReduce
o
Technical aspects of Hive
o
Extending Hive with TRANSFORM
Spark
o
Basics concepts

RDDs, transformations and actions

PairRDDs
o
Examples

Wordcount

Mean and variance


Updated June 29, 2016

NYC Data Science Academy


12-Week Data Science Bootcamp C urriculum


Week 11
Big Data and Algorithms

Spark MLlib
Amazon Web Service
Introduction to Algorithms
o
Analysis of algorithms: big-O notation
Sorting
o
Elementary sorts
o
Merge sorts
o
Quick sorts
Searching
o
Linear search
o
Binary search
o
Hash tables
Machine Learning Theory Defense Practice


Week 12
Capstone Project Presentations and Review

Machine learning theory defense practice


SQL code review
R code review
Python code review
From the beginning of Bootcamp, you will work on hands-on projects. Now your
Capstone Project lets you create your own data product that showcases your interests
and talents. Students are free to use anything covered in class on this project.

Project 5 Due: Capstone Project


Updated June 29, 2016

You might also like