0% found this document useful (0 votes)
39 views

Intro To Data Science: Ashwin Yenigalla PGP Data Science and Engineering From Great Lakes

The document introduces the topics of data science including supervised vs unsupervised machine learning and the machine learning pipeline. It discusses what problems are solvable with data science and some limitations. Key concepts in statistics, machine learning models like regression and classification are covered. The Python ecosystem for data science is introduced, highlighting popular packages like Pandas, SciPy, NumPy. Finally, various learning sources for data science are listed such as online courses, tutorials, forums and coding platforms.

Uploaded by

Ashwin Yenigalla
Copyright
© © All Rights Reserved
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
39 views

Intro To Data Science: Ashwin Yenigalla PGP Data Science and Engineering From Great Lakes

The document introduces the topics of data science including supervised vs unsupervised machine learning and the machine learning pipeline. It discusses what problems are solvable with data science and some limitations. Key concepts in statistics, machine learning models like regression and classification are covered. The Python ecosystem for data science is introduced, highlighting popular packages like Pandas, SciPy, NumPy. Finally, various learning sources for data science are listed such as online courses, tutorials, forums and coding platforms.

Uploaded by

Ashwin Yenigalla
Copyright
© © All Rights Reserved
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 28

Intro to Data Science

Ashwin Yenigalla
PGP Data Science and Engineering from Great Lakes
Agenda
• To understand problems solvable with Data Science

• Unsupervised vs Supervised and applicable data

• Machine Learning Pipeline

• Data Science Ecosystem (Python) and further learning


What Data Science Can vs Can’t do

• Neural Nets can’t learn patterns where none exist in data


What Data Science Can vs Can’t do
What Data Science Can vs Can’t do
What Data Science Can vs Can’t do
What Data Science Can do
Statistics
• Sampling Distributions and Population Estimates

• Stochasticity

• Homoscadasticity/Heteroscadasticity

• Bias/Variance Tradeoff
Stochasticity
• Which one is less Stochastic

1 2
Stochasticity
• Which one is less Stochastic

1 2
Stochasticity
• Which one is less Stochastic

1 2
Machine Learning
• Unsupervised

• Supervised
Unsupervised Learning
• Clustering for unlabeled data
Unsupervised Learning
Unsupervised Learning
Supervised Learning
• Classification

• Regression
Regression
• Linear Regression with OLS minimization
Classification
• Logistic Regression with OLS minimization
ML Pipeline
• Data Mining
• Data Cleaning and Imputation
• EDA and Visualisations
• Inferencing
• Define X and Y -> Split Train and Test Data
• Train the ML Estimator
• Predict/Validate/Score the Model
• Hyper-parameter Optimisation
ML Pipelin
• Expectation
ML Pipeline
• Reality
Python Ecosystem for DS

• Anaconda
• Pandas
• Scipy
• Numpy
• Matplotlib
• Seaborn
• SciKitLearn
Python Ecosystem

• TensorFlow

• PyTorch
Learning Sources
• Competitions
• Datasets
• Kernels
• Job Postings
• Tutorials
Learning Sources
• Code snippets
• Solutions to common problems
• Responsive user base
Learning Sources
• Practice Puzzles and Tests
• Hackerrank Codewars
Learning Sources
• MOOC’s

• Coursera

• EdX

• Tutorialspoint

• Pythonlibs Official Websites


Learning Sources
• r/datascience
• r/MachineLearning
• r/Python
• r/bigdata

You might also like