0% found this document useful (0 votes)

22 views19 pages

Introduction To Machine Learning and Python

This document provides an introduction to machine learning. It discusses how much data is created every day from sources like Google searches, social media posts, and IoT devices. It then gives an overview of machine learning, including common algorithms for classification, regression, unsupervised learning, and reinforcement learning. Some key concepts discussed are bias-variance tradeoff, curse of dimensionality, and splitting data into training, validation, and test sets. The document aims to explain core ideas in machine learning at a high level.

Uploaded by

rokr58

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

22 views19 pages

Introduction To Machine Learning and Python

Uploaded by

rokr58

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 19

Introduction to Machine

Learning
Sourav Nandi
ABOUT.ME/SOURAV.NANDI
Symphony AI, IIT Kanpur
Social Handles- @souravstat
1
How much Data is Created Every Day?
 ‘Google’ has become a Verb! (3.5 billion search queries every day)
 2.5 quintillion bytes of data are produced by us every day. (18 Zeroes!)
 ~90% of the World’s Data created in last 2 years – Accelerating Pace
 Every Day-
➢ ~250 billion emails are sent (45% are Spam- Hit the Unsubscribe button!)
➢ 100+ million photos and videos are shared on Instagram
➢ ~500 million Tweets are made (~45% of Covid-19 tweets estd to be Bot-Generated!)
 By the end of 2020, ~31 Billion IoT devices. The estimated size of the entire
digital universe will be a whopping 44 zettabytes (21 Zeroes in a ZB!)
 What to do with all these Data?
2

Source- Statista, IEEE, TechJury, ILS, Raconteur, NPR, Sendpulse

Machine Learning- The Art and Science of
Learning from Data
 We are drowning in Information and starving for
Knowledge — John Naisbitt (Author of ‘Megatrends’)
 Is Learning Possible?
 Generalization/ Pattern Recognition (Easy) vs
Extrapolation/ Finding Higher Dimensional Insights (Hard)

3
Timeline of Machine Learning (Wikipedia)

4
The ML
Family Tree

Image Credit: Vas3k, https://noeliagorod.com/

Classification- Split into Categories
 Usage: Fraud Detection (Online Transaction), Spam Filtering (Email),
Sentiment Analysis (+ve/-ve/neutral), Handwriting Recognition (MNIST) etc.
 Overview of Popular Algorithms
➢ Logistic Regression (GLM* with Logit link), Multinomial Logit
➢ Decision Tree, Random Forest, Bagging, Boosting
➢ Naïve Bayes (Conditional Independence
b/w Features, Given a Category)

* Allows Response (Dependent) Variables to have error distribution models other than a normal distribution
Classification (Contd.)
 KNN (K-Nearest Neighbors) Algorithm
(Idea: Find Closest K Neighbors)
 Distance Measures: Euclidean distance,
Mahalanobis distance, Manhattan distance,
Cosine Distance etc
 Support Vector Machine and Kernel Trick

Image Credit: Towards data science

Regression
 Multiple Linear Regression
(Discussed in earlier classes in
detail)
 Ordinary Least Square Method:
Computes the unique line (or
hyperplane) that minimizes the sum
of squared distances (usually
vertical) between the true data and
that line
 Ridge Regression and LASSO
(reducing model complexity to
prevent overfitting, Variable
Selection)

Image Credit: ISLR book (Ref 3) 8

Unsupervised Learning
 Market Segmentation (Clustering),
Anomaly Detection, Image Compression
(Dimensionality Reduction) etc
 K-means Clustering
 Principal Component Analysis
(Projection into Lower Dimensional
Space, Summarizing Information)

Image Credit:Vas3K
Unsupervised Learning
 Hierarchical Clustering
 Distance between Clusters
➢ Single Linkage (Min)
➢ Complete Linkage (Max)
➢ Average Linkage (All Pairs)
➢ Centroid Linkage
 Association Rule Mining (Looking
for patterns, eg, analyzing Shopping
behavior, Marketing Strategy)

10
Image Credit: saedsayad.com
Reinforcement Learning
 Model is trained by having an
Agent interact with environment
 Desired Action gets Rewarded
 “Good Behaviors are Reinforced”
 One of Three fundamental ML
Paradigms (along with Supervised
learning and Unsupervised
learning)
 When in an active Environment,
like Video Games (Super Mario!),
Self Driving Car etc
 Goal is to Minimize Error (maybe
difficult to predict all possible
Credit: TWIML Online
moves) 11
Diving Deeper into some Core Ideas

12
Correlation does not
Imply Causation

Image Credit: xkcd

13
Bias-Variance Tradeoff:
Underfitting vs Overfitting

14
Curse of Dimensionality
 Problems in High Dimension due to Data Sparsity
 Adding each new dimension (ie, adding a feature)
increases the data set requirement exponentially
 Separation of Wind Turbines- 2D vs 3D view

Image Credit: deepai.org

15
Comparison of
some Popular
Algorithms

Table Credit: dataiku.com

16
Things to keep in mind
 Splitting the Dataset:
➢ Training Set (Data Sample used to Fit the Model, to get the Parameters)
➢ Validation Set (Tuning Hyperparameters to choose final model)
➢ Test Set (To evaluate the final model, should not be used for training)
 Other Common Pitfalls in ML (Violation of Assumptions):
➢ Non-Linearity (Plotting Residuals against Fitted Values, Non-linear Transformation)
➢ High Leverage Points (Cook’s Distance Plot)
➢ Correlation of Error Terms (eg, Time Series data)- Controlled Experiment
➢ Heteroscedasticity (Non-constant Variance of Error Term)
➢ Multicollinearity (Correlated Predictors, eg. Dummy Variable Trap)
17
That’s All, Friends!

STAT&ML Lab is a non-profit

organization to bring young minds
into research projects on
Statistics & Machine Learning.

We aim to provide training and

research projects on Statistics,
Data Science, and ML.

The primary goal of this lab is to

promote research in Statistics in
India and throughout the world

18
Thanks & References
1) Special Thanks to All the Participants, BKC College, WBSU and STAT & ML Lab:
https://www.ctanujit.org/statml-lab.html
2) Image Credit- Wikipedia, Reddit, SlideShare, me.me, Imgflip, xkcd,
3) http://faculty.marshall.usc.edu/gareth-james/ISL/ (ISLR Book)
4) https://developers.google.com/machine-learning/guides/good-data-analysis
5) https://hackernoon.com/choosing-the-right-machine-learning-algorithm-68126944ce1f
6) https://en.wikipedia.org/wiki/Reinforcement_learning
7) Download Latest Version Of this PPT (& other Materials):
https://github.com/souravstat/
8) Please feel free to reach out to me anytime for a discussion:
https://about.me/sourav.nandi , [email protected]

Notes Unit 1
No ratings yet
Notes Unit 1
13 pages
Machine Learning?
100% (2)
Machine Learning?
114 pages
AIR Map Playbook Final S.
No ratings yet
AIR Map Playbook Final S.
102 pages
Introduction To Machine Learning 1
No ratings yet
Introduction To Machine Learning 1
18 pages
Machine Learning
No ratings yet
Machine Learning
30 pages
Intro
No ratings yet
Intro
35 pages
Lecture - 2 Classification (Machine Learning Basic and KNN)
No ratings yet
Lecture - 2 Classification (Machine Learning Basic and KNN)
90 pages
AI Unit 1
No ratings yet
AI Unit 1
30 pages
1 - Machine Learning Overview
No ratings yet
1 - Machine Learning Overview
56 pages
Machine Learning (ML) - Comprehensive Summary
No ratings yet
Machine Learning (ML) - Comprehensive Summary
7 pages
Week 12 Intro To DS and ML
No ratings yet
Week 12 Intro To DS and ML
67 pages
Asset-V1 MKAU+SEng9032+DEV 01+type@asset+block@ChapOne
No ratings yet
Asset-V1 MKAU+SEng9032+DEV 01+type@asset+block@ChapOne
29 pages
01 Introduction
No ratings yet
01 Introduction
28 pages
Top 10 Machine Learning Algo PDF
No ratings yet
Top 10 Machine Learning Algo PDF
15 pages
FML - KNN
No ratings yet
FML - KNN
64 pages
ML Chap 2
No ratings yet
ML Chap 2
60 pages
Machine: Learning ATO Z - I
No ratings yet
Machine: Learning ATO Z - I
131 pages
AML Slides Indexed 2in1
No ratings yet
AML Slides Indexed 2in1
33 pages
Machine Learning Updated
No ratings yet
Machine Learning Updated
14 pages
ML Revision
No ratings yet
ML Revision
207 pages
Shanthi ML
No ratings yet
Shanthi ML
26 pages
Lesson 4 - Introduction Machine Learning
No ratings yet
Lesson 4 - Introduction Machine Learning
44 pages
Introduction to Machine Learning Lecture Notes
No ratings yet
Introduction to Machine Learning Lecture Notes
3 pages
Social Media Analytics Techniques
No ratings yet
Social Media Analytics Techniques
77 pages
ML 7th Sem AIML ITE Notes Complete LONG (1) - 10-33
No ratings yet
ML 7th Sem AIML ITE Notes Complete LONG (1) - 10-33
24 pages
Machine Learning - Brief
No ratings yet
Machine Learning - Brief
12 pages
Introduction To Machine Learning Basics
No ratings yet
Introduction To Machine Learning Basics
12 pages
Module 1 ML
No ratings yet
Module 1 ML
51 pages
ML_unit1
No ratings yet
ML_unit1
6 pages
ML_Module_I
No ratings yet
ML_Module_I
71 pages
Machine Learning
No ratings yet
Machine Learning
16 pages
1 - Machine Learning Overview
No ratings yet
1 - Machine Learning Overview
53 pages
Study On Machine Learning Research Paper
No ratings yet
Study On Machine Learning Research Paper
17 pages
Data Science
No ratings yet
Data Science
4 pages
01 Introduction Overview
No ratings yet
01 Introduction Overview
43 pages
Unit I
No ratings yet
Unit I
23 pages
ML Lec 02 Introduction II
No ratings yet
ML Lec 02 Introduction II
22 pages
Machine Learning
No ratings yet
Machine Learning
5 pages
Lecture 2
No ratings yet
Lecture 2
36 pages
Machine Learning 1
No ratings yet
Machine Learning 1
34 pages
Ds Unit 2
No ratings yet
Ds Unit 2
36 pages
Intro ML 1 Day
No ratings yet
Intro ML 1 Day
43 pages
ML CH-1 Introduction To Machine Learning
No ratings yet
ML CH-1 Introduction To Machine Learning
12 pages
ML Unit 1
No ratings yet
ML Unit 1
9 pages
ML Unit1 Docx Unitr 2
No ratings yet
ML Unit1 Docx Unitr 2
46 pages
Fam Question Bank CT
No ratings yet
Fam Question Bank CT
14 pages
Lec2 Intro To ML
No ratings yet
Lec2 Intro To ML
35 pages
Introduction to Machine Learning
No ratings yet
Introduction to Machine Learning
5 pages
A Comprehensive Introduction To Machine Learning
No ratings yet
A Comprehensive Introduction To Machine Learning
4 pages
Aiya Session 4
No ratings yet
Aiya Session 4
42 pages
Mastering Machine Learning - A Comprehensive Guide
No ratings yet
Mastering Machine Learning - A Comprehensive Guide
19 pages
Introduction to Machine Learning
No ratings yet
Introduction to Machine Learning
3 pages
Advance ML - Unit 1
No ratings yet
Advance ML - Unit 1
12 pages
ML Unit1
No ratings yet
ML Unit1
25 pages
NEP Syllabus Questions
No ratings yet
NEP Syllabus Questions
3 pages
Lecture 1 - Introduction To Machine Learning-HO - Ch0
No ratings yet
Lecture 1 - Introduction To Machine Learning-HO - Ch0
44 pages
ICT - Machine - Learning - Presentation
No ratings yet
ICT - Machine - Learning - Presentation
13 pages
ML Cahp 1
No ratings yet
ML Cahp 1
35 pages
Machinelearning Unit1
No ratings yet
Machinelearning Unit1
9 pages
ML MU Unit 1 Introduction To MLPDF 2025 02 07 10 53 02
No ratings yet
ML MU Unit 1 Introduction To MLPDF 2025 02 07 10 53 02
49 pages
Microsoft Build DevOps Challenge
No ratings yet
Microsoft Build DevOps Challenge
805 pages
CI - CD Project
100% (3)
CI - CD Project
123 pages
Cheatsheet - Continuous Delivery
No ratings yet
Cheatsheet - Continuous Delivery
7 pages
Opus by Owl
No ratings yet
Opus by Owl
34 pages
Process Synchronization Buffer Queue: Producer-Consumer Problem
No ratings yet
Process Synchronization Buffer Queue: Producer-Consumer Problem
10 pages
114AG01
0% (1)
114AG01
2 pages
2025 Cybersecurity Outlook - Trends and Skills You Can't Ignore
No ratings yet
2025 Cybersecurity Outlook - Trends and Skills You Can't Ignore
6 pages
Federated Machine Learning Concept and Application
No ratings yet
Federated Machine Learning Concept and Application
20 pages
NSDC Ai Data Scientist
No ratings yet
NSDC Ai Data Scientist
33 pages
A Review On Large Language Models Architectures Applications Taxonomies Open Issues and Challenges
No ratings yet
A Review On Large Language Models Architectures Applications Taxonomies Open Issues and Challenges
36 pages
Inductive Bias
No ratings yet
Inductive Bias
3 pages
Debate Writing
No ratings yet
Debate Writing
2 pages
The Machine Question
100% (1)
The Machine Question
104 pages
Role of Artificial Intelligence in Robotics Engineering
No ratings yet
Role of Artificial Intelligence in Robotics Engineering
12 pages
Sample Copy OF REPORT
No ratings yet
Sample Copy OF REPORT
26 pages
CSCourses
No ratings yet
CSCourses
31 pages
M01 Machine Learning
No ratings yet
M01 Machine Learning
25 pages
Deep Learning: - Course Code: - Unit 2
No ratings yet
Deep Learning: - Course Code: - Unit 2
15 pages
Cete - 48 NPTEL PDF
No ratings yet
Cete - 48 NPTEL PDF
22 pages
Manovich and Arielli - Artificial Aesthetics - All Chapters Final
No ratings yet
Manovich and Arielli - Artificial Aesthetics - All Chapters Final
194 pages
ChatGPT Caught Lying To Developers - New AI Model Tries To Save Itself From Being
No ratings yet
ChatGPT Caught Lying To Developers - New AI Model Tries To Save Itself From Being
12 pages
Large Language Models
No ratings yet
Large Language Models
2 pages
674fdfebb5105DIP CEP
No ratings yet
674fdfebb5105DIP CEP
4 pages
Chatbot and Text Summarization
No ratings yet
Chatbot and Text Summarization
5 pages
NeurIPS 2024 Make Your LLM Fully Utilize The Context Paper Conference
No ratings yet
NeurIPS 2024 Make Your LLM Fully Utilize The Context Paper Conference
29 pages
ChatGPT Education Use Cases, Benefits & Challenges in 2023
No ratings yet
ChatGPT Education Use Cases, Benefits & Challenges in 2023
9 pages
Navigating The Future Expected Trends in India's BFSI Sector in 2024
No ratings yet
Navigating The Future Expected Trends in India's BFSI Sector in 2024
2 pages
Yubraj Shrestha
No ratings yet
Yubraj Shrestha
15 pages
Understanding AI Technology
No ratings yet
Understanding AI Technology
20 pages
Clear Answers and Standards On Data Privacy Protection."
No ratings yet
Clear Answers and Standards On Data Privacy Protection."
2 pages
Avishkaaaaaaa
No ratings yet
Avishkaaaaaaa
14 pages
AI 100 Labs
No ratings yet
AI 100 Labs
99 pages
Information Systems 1A Exam
No ratings yet
Information Systems 1A Exam
7 pages
Age of Ai Infosys Research Report
100% (2)
Age of Ai Infosys Research Report
28 pages

Introduction To Machine Learning and Python

Uploaded by

Introduction To Machine Learning and Python

Uploaded by

Introduction to Machine

Source- Statista, IEEE, TechJury, ILS, Raconteur, NPR, Sendpulse

Image Credit: Vas3k, https://noeliagorod.com/

Image Credit: Towards data science

Image Credit: ISLR book (Ref 3) 8

Image Credit: xkcd

Image Credit: deepai.org

Table Credit: dataiku.com

STAT&ML Lab is a non-profit

We aim to provide training and

The primary goal of this lab is to

You might also like