0% found this document useful (0 votes)
123 views14 pages

EdYoda Data Scientist Program Curriculum

The document outlines the curriculum for EdYoda's Data Scientist Program. The program covers Python, data wrangling, mathematics fundamentals, and machine learning. Key topics include data visualization, NumPy, Pandas, linear regression, decision trees, clustering, and support vector machines. The goal is to teach students to implement machine learning techniques using Python and analyze raw data.

Uploaded by

Hhhh
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
123 views14 pages

EdYoda Data Scientist Program Curriculum

The document outlines the curriculum for EdYoda's Data Scientist Program. The program covers Python, data wrangling, mathematics fundamentals, and machine learning. Key topics include data visualization, NumPy, Pandas, linear regression, decision trees, clustering, and support vector machines. The goal is to teach students to implement machine learning techniques using Python and analyze raw data.

Uploaded by

Hhhh
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 14

EdYoda

Data Scientist Program

Program Curriculum
Learning outcomes:
• Learn to implement Machine Learning techniques using Python
• Learn data visualization techniques
• Learn to analyze raw data
• Learn Big Data and Spark

Python

1. Introduction to Python

• Useful Python Resources


• Python Tools and Utilities
• Python Features

2. Python Environment

• Local Environment Setup


• Downloads and Installations
• Setting up Environment Path

3. Executing Python

• Interactive Mode
• Scripting Mode
• Integrated Development Environment

4. Python Basic Syntax

• Python Identifiers
• Reserved Words
• Lines and Indentation

www.edyoda.com [email protected]
5. Python Variable Types

• Assigning Values to Variables


• Multiple Assignment
• Standard Data Types
• Data Type Conversion

6. Python Basic Operators

• Arithmetic Operators
• Comparison Operators
• Assignment Operators
• Bitwise Operators
• Logical Operators
• Membership Operators
• Identity Operators
• Operators Precedence

7. Python Decision Making

• IF statements
• IF...ELIF...ELSE Statements
• Nested IF statements

8. Python Loops

• While loop
• For loop
• Nested loop
• Break control statement
• Continue statement
• Pass statement

9. Python Numbers

• Number type conversion


• Mathematical function
• Random number function
• Trigonometric function

www.edyoda.com [email protected]
10. Python Strings

• String special operators


• String formatting operator
• Built-in string methods

11. Python Lists

• Basic list operations


• Indexing and slicing
• Built-in functions and methods

12. Python Tuples

• Basic tuple operations


• Indexing and slicing
• Built-in functions

13. Python Dictionary

• Basic Dictionary operations


• Built-in Functions and Methods
• Use cases

14. Python Functions

• Pass by reference and value


• Function Arguments
• Scope of variables
• Default Argument Values
• Keyword Arguments
• Arbitrary Argument Lists
• Unpacking Argument Lists
• Lambda Expressions
• Documentation Strings

www.edyoda.com [email protected]
15. Python Modules

• Importing Modules
• Namespaces and scoping
• Packages

16. Python Files I/O

• Writing and Parsing Text Files


• Parsing Text Using Regular Expressions
• Writing and Parsing XML Files
• Writing and Parsing JSON Files
• Writing and Parsing CSV Files

17. Python Exceptions

• The except clause with multiple exceptions


• The try-finally clause
• Argument of an Exception
• Raising an exception
• User-Defined Exceptions

18. Python Classes and Objects

• Creating Classes
• Creating instance objects
• Destroying Objects (Garbage Collection)
• Custom Classes
• Attributes and Methods
• Inheritance and Polymorphism
• Using Properties to Control Attribute Access

19. Functional Programming

• Lambda
• Filter
• Map
• Functools

www.edyoda.com [email protected]
20. Iterators and Generators

• Itertools
• Generators
• Decorators

21. Collections

• Deque
• Counter
• OrderedDict
• ChainMap

23. Debugging, Testing

• Pdb
• Breakpoints

24. Regular Expressions

• Characters and Character Classes


• Quantifiers
• Grouping and Capturing
• Assertions and Flags
• The Regular Expression Module

25. Deploying Python Applications

• Pip
• Virtualenv
• The init.py files
• The setup.py file
• Installing the package
• Software deployment in Python

www.edyoda.com [email protected]
Data Wrangling

1. Black Box Introduction to Machine Learning

• What is not Machine Learning


• What is Machine Learning
• Types of ML - Supervised, Unsupervised
• Supervised - Classification, Regression
• Unsupervised - Clustering, Association
• Machine Learning Pipeline

2. Essential NumPy

• Introduction to NumPy
• Creation
• Access
• Stacking and Splitting
• Methods
• Broadcasting

3. Pandas for Machine Learning

• Introduction to Pandas
• Understanding Series & DataFrames
• Loading CSV,JSON
• Connecting databases
• Descriptive Statistics
• Accessing subsets of data - Rows, Columns, Filters
• Handling Missing Data
• Dropping rows & columns
• Handling Duplicates
• Function Application - map, apply, groupby, rolling, str
• Merge, Join & Concatenate
• Stacking, Unstacking & Melting
• Pivot-tables
• Normalizing JSON
• Application - EDA on Employee data, sales data

www.edyoda.com [email protected]
4. Understanding Visualization:

• Introduction to matplotlib & seaborn


• Basic Plotting
• Title, Labels, Legends, Grid, colormap, xticks, yticks
• Color, linewidth
• Sub Plotting
• Scatter plot
• Histogram
• Bar Graphs
• Plotting distributions
• Plotting 3D data
• Fundamentals of Tableau

Mathematics Fundamentals

1. Essential Maths & Statistics

• Essential Linear Algebra


• Matrix Operations
• Understanding distributions
• Probability Concepts
• Calculus
• Understanding distributions
• Mean, Median, Mode, Quantile
• Other statistics Concepts
• Sampling Techniques

Machine Learning

1. Linear Models for Classification & Regression

• Simple Linear Regression using Ordinary Least Squares


• Gradient Descent Algorithm
• Regularized Regression Methods - Ridge, Lasso, Elastic Net
• Logistic Regression for Classification
• OnLine Learning Methods - Stochastic Gradient Descent & Passive Aggressive
• Robust Regression - Dealing with outliers & Model errors
• Polynomial Regression
• Bias-Variance Tradeoff
• Application - House Price, Cancer Prediction, Insurance Prediction

www.edyoda.com [email protected]
2. Preprocessing for Machine Learning

• Introduction to Preprocessing
• StandardScaler
• MinMaxScaler
• RobustScaler
• Normalization
• Binarization
• Encoding Categorical (Ordinal & Nominal) Features
• Imputation
• Polynomial Features
• Custom Transformer
• Text Processing
• CountVectorizer
• TfIdf
• HashingVectorizer
• Image using skimage

3. Decision Trees

• Introduction to Decision Trees


• The Decision Tree Algorithms
• Decision Tree for Classification
• Decision Tree for Regression
• Advantages & Limitations of Decision Trees
• Application - Cloth Prediction

4. Naive Bayes

• Introduction Bayes' Theorem


• Naive Bayes Classifier
• Gaussian Naive Bayes
• Multinomial Naive Bayes
• Bernoulli’s Naive Bayes
• Naive Bayes for out-of-core
• Application - Text Classification, Sentiment Analysis and Spam & Non-spam
classification

www.edyoda.com [email protected]
5. Composite Estimators using Pipelines & FeatureUnions

• Introduction to Composite Estimators


• Pipelines
• Transformed Target Regressor
• FeatureUnions
• ColumnTransformer
• GridSearch on pipeline
• Application - Author classification

6. Model Selection & Evaluation

• Cross Validation
• Hyperparameter Tuning
• Model Evaluation
• Model Persistence
• Validation Curves
• Learning Curves

7. Feature Selection & Dimensionality Reduction

• Introduction to Feature Selection


• Variance Threshold
• Chi-squared stats
• ANOVA using f_classif
• Univariate Linear Regression Tests using f_regression
• F-score vs Mutual Information
• Mutual Information for discrete value
• Mutual Information for continues value
• SelectKBest
• SelectPercentile
• SelectFromModel
• Recursive Feature Elimination
• PCA
• SVD
• Application - Credit Risk Prediction

8. Nearest Neighbors

• Fundamentals of Nearest Neighbor Algorithm


• Unsupervised Nearest Neighbors
• Nearest Neighbors for Classification

www.edyoda.com [email protected]
• Nearest Neighbors for Regression
• Nearest Centroid Classifier
• Application - Nearest neighbour for face inpainting

9. Clustering Techniques

• Introduction to Unsupervised Learning


• Clustering
• Similarity or Distance Calculation
• Clustering as an Optimization Function
• Types of Clustering Methods
• Partitioning Clustering - KMeans & Meanshift
• Hierarchical Clustering - Agglomerative
• Density Based Clustering - DBSCAN
• Measuring Performance of Clusters
• Comparing all clustering methods
• Application - Grouping similar customers

10. Anomaly Detection

• What are Outliers ?


• Statistical Methods for Univariate Data
• Using Gaussian Mixture Models
• Fitting an elliptic envelope
• Isolation Forest
• Local Outlier Factor
• Using clustering method like DBSCAN
• Application - Anomaly detection for credit risk prediction

11. Support Vector Machines

• Introduction to Support Vector Machines


• Maximal Margin Classifier
• Soft Margin Classifier
• SVM Algorithm for Classification
• SVM for Regression
• Hyper-parameters in SVM
• Application - Face recognition and breast cancer classification

www.edyoda.com [email protected]
12. Dealing with Imbalanced Classes

• What are imbalanced classes & their impact?


• OverSampling
• UnderSampling
• Connecting Sampler to pipelines
• Making classification algorithm aware of Imbalance
• Anomaly Detection
• Application - Fraud detection

13. Ensemble Methods

• Introduction to Ensemble Methods


• RandomForest
• AdaBoost
• Gradient Boosting Tree
• VotingClassifier
• XGBoost
• Application - Malicious data detection

14. Recommendation Engine

• Understanding distance vector calculation - cosine, euclidean, manhattan


• Types of Recommendation Engines
• Recommendation based on similarity
• Application - Grouping videos based on description, user rating prediction

15. Time Series Modeling

• Simple Average & Moving Average


• Single Exponential Smoothing
• Holt’s linear trend method
• Holt’s winter seasonal method
• ARIMA

16. Packaging & Deployment

• Creating Python Package


• Deploy trained model behind REST interface
• Deploy model behind API call
• Deploy on AWS cloud (optional)

www.edyoda.com [email protected]
Big Data Ecosystem

1. Introduction to Big Data

• Big Data
• Understanding distributed computing
• Introduction to Hadoop
• HDFS, YARN, MapReduce
• Limitations of Hadoop
• Introduction to Spark
• Introduction to Kafka
• Hive
• Cassandra

2. Internal Details of Spark

• Driver
• Executors
• Partitions
• Jobs
• Stages
• Tasks
• Resilient Distributed Datastructure
• DataFrames as a High Level Datastructure

3. Foundations of Spark using RDD

• Basics of Distributed Computing


• Resilient Distributed Dataset
• Simple Transformers - map,filter,groupby
• Actions - Collect, count, foreach
• Complex api - combinebykey
• Caching, Debugging
• Important Configuration

4. Data Wrangling using DataFrames

• Creating DataFrames from collections


• Creating a DataFrame from csv,json etc.
• DataFrame Row

www.edyoda.com [email protected]
• DataFrame Column
• Creating tables from dataframe
• SQL query
• DataFrame Grouping
• DataFrame Functions
• User Defined Functions (UDF)

5. Packaging & Deployment of Spark Applications

• The spark-submit command


• Command line parameters
• Deploying the app programmatically
• Configuring your SparkSession
• Modularizing code
• Structure of the module
• Building an egg
• User defined functions in Spark
• Submitting a job
• Monitoring execution

www.edyoda.com [email protected]

You might also like