Intro To Data Science: Ashwin Yenigalla PGP Data Science and Engineering From Great Lakes
Intro To Data Science: Ashwin Yenigalla PGP Data Science and Engineering From Great Lakes
Ashwin Yenigalla
PGP Data Science and Engineering from Great Lakes
Agenda
• To understand problems solvable with Data Science
• Stochasticity
• Homoscadasticity/Heteroscadasticity
• Bias/Variance Tradeoff
Stochasticity
• Which one is less Stochastic
1 2
Stochasticity
• Which one is less Stochastic
1 2
Stochasticity
• Which one is less Stochastic
1 2
Machine Learning
• Unsupervised
• Supervised
Unsupervised Learning
• Clustering for unlabeled data
Unsupervised Learning
Unsupervised Learning
Supervised Learning
• Classification
• Regression
Regression
• Linear Regression with OLS minimization
Classification
• Logistic Regression with OLS minimization
ML Pipeline
• Data Mining
• Data Cleaning and Imputation
• EDA and Visualisations
• Inferencing
• Define X and Y -> Split Train and Test Data
• Train the ML Estimator
• Predict/Validate/Score the Model
• Hyper-parameter Optimisation
ML Pipelin
• Expectation
ML Pipeline
• Reality
Python Ecosystem for DS
• Anaconda
• Pandas
• Scipy
• Numpy
• Matplotlib
• Seaborn
• SciKitLearn
Python Ecosystem
• TensorFlow
• PyTorch
Learning Sources
• Competitions
• Datasets
• Kernels
• Job Postings
• Tutorials
Learning Sources
• Code snippets
• Solutions to common problems
• Responsive user base
Learning Sources
• Practice Puzzles and Tests
• Hackerrank Codewars
Learning Sources
• MOOC’s
• Coursera
• EdX
• Tutorialspoint