0% found this document useful (0 votes)

18 views19 pages

Introduction To Machine Learning and Python

This document provides an introduction to machine learning. It discusses how much data is created every day from sources like Google searches, social media posts, and IoT devices. It then gives an overview of machine learning, including common algorithms for classification, regression, unsupervised learning, and reinforcement learning. Some key concepts discussed are bias-variance tradeoff, curse of dimensionality, and splitting data into training, validation, and test sets. The document aims to explain core ideas in machine learning at a high level.

Uploaded by

rokr58

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

18 views19 pages

Introduction To Machine Learning and Python

Uploaded by

rokr58

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 19

Introduction to Machine

Learning
Sourav Nandi
ABOUT.ME/SOURAV.NANDI
Symphony AI, IIT Kanpur
Social Handles- @souravstat
1
How much Data is Created Every Day?
 ‘Google’ has become a Verb! (3.5 billion search queries every day)
 2.5 quintillion bytes of data are produced by us every day. (18 Zeroes!)
 ~90% of the World’s Data created in last 2 years – Accelerating Pace
 Every Day-
➢ ~250 billion emails are sent (45% are Spam- Hit the Unsubscribe button!)
➢ 100+ million photos and videos are shared on Instagram
➢ ~500 million Tweets are made (~45% of Covid-19 tweets estd to be Bot-Generated!)
 By the end of 2020, ~31 Billion IoT devices. The estimated size of the entire
digital universe will be a whopping 44 zettabytes (21 Zeroes in a ZB!)
 What to do with all these Data?
2

Source- Statista, IEEE, TechJury, ILS, Raconteur, NPR, Sendpulse

Machine Learning- The Art and Science of
Learning from Data
 We are drowning in Information and starving for
Knowledge — John Naisbitt (Author of ‘Megatrends’)
 Is Learning Possible?
 Generalization/ Pattern Recognition (Easy) vs
Extrapolation/ Finding Higher Dimensional Insights (Hard)

3
Timeline of Machine Learning (Wikipedia)

4
The ML
Family Tree

Image Credit: Vas3k, https://noeliagorod.com/

Classification- Split into Categories
 Usage: Fraud Detection (Online Transaction), Spam Filtering (Email),
Sentiment Analysis (+ve/-ve/neutral), Handwriting Recognition (MNIST) etc.
 Overview of Popular Algorithms
➢ Logistic Regression (GLM* with Logit link), Multinomial Logit
➢ Decision Tree, Random Forest, Bagging, Boosting
➢ Naïve Bayes (Conditional Independence
b/w Features, Given a Category)

* Allows Response (Dependent) Variables to have error distribution models other than a normal distribution
Classification (Contd.)
 KNN (K-Nearest Neighbors) Algorithm
(Idea: Find Closest K Neighbors)
 Distance Measures: Euclidean distance,
Mahalanobis distance, Manhattan distance,
Cosine Distance etc
 Support Vector Machine and Kernel Trick

Image Credit: Towards data science

Regression
 Multiple Linear Regression
(Discussed in earlier classes in
detail)
 Ordinary Least Square Method:
Computes the unique line (or
hyperplane) that minimizes the sum
of squared distances (usually
vertical) between the true data and
that line
 Ridge Regression and LASSO
(reducing model complexity to
prevent overfitting, Variable
Selection)

Image Credit: ISLR book (Ref 3) 8

Unsupervised Learning
 Market Segmentation (Clustering),
Anomaly Detection, Image Compression
(Dimensionality Reduction) etc
 K-means Clustering
 Principal Component Analysis
(Projection into Lower Dimensional
Space, Summarizing Information)

Image Credit:Vas3K
Unsupervised Learning
 Hierarchical Clustering
 Distance between Clusters
➢ Single Linkage (Min)
➢ Complete Linkage (Max)
➢ Average Linkage (All Pairs)
➢ Centroid Linkage
 Association Rule Mining (Looking
for patterns, eg, analyzing Shopping
behavior, Marketing Strategy)

10
Image Credit: saedsayad.com
Reinforcement Learning
 Model is trained by having an
Agent interact with environment
 Desired Action gets Rewarded
 “Good Behaviors are Reinforced”
 One of Three fundamental ML
Paradigms (along with Supervised
learning and Unsupervised
learning)
 When in an active Environment,
like Video Games (Super Mario!),
Self Driving Car etc
 Goal is to Minimize Error (maybe
difficult to predict all possible
Credit: TWIML Online
moves) 11
Diving Deeper into some Core Ideas

12
Correlation does not
Imply Causation

Image Credit: xkcd

13
Bias-Variance Tradeoff:
Underfitting vs Overfitting

14
Curse of Dimensionality
 Problems in High Dimension due to Data Sparsity
 Adding each new dimension (ie, adding a feature)
increases the data set requirement exponentially
 Separation of Wind Turbines- 2D vs 3D view

Image Credit: deepai.org

15
Comparison of
some Popular
Algorithms

Table Credit: dataiku.com

16
Things to keep in mind
 Splitting the Dataset:
➢ Training Set (Data Sample used to Fit the Model, to get the Parameters)
➢ Validation Set (Tuning Hyperparameters to choose final model)
➢ Test Set (To evaluate the final model, should not be used for training)
 Other Common Pitfalls in ML (Violation of Assumptions):
➢ Non-Linearity (Plotting Residuals against Fitted Values, Non-linear Transformation)
➢ High Leverage Points (Cook’s Distance Plot)
➢ Correlation of Error Terms (eg, Time Series data)- Controlled Experiment
➢ Heteroscedasticity (Non-constant Variance of Error Term)
➢ Multicollinearity (Correlated Predictors, eg. Dummy Variable Trap)
17
That’s All, Friends!

STAT&ML Lab is a non-profit

organization to bring young minds
into research projects on
Statistics & Machine Learning.

We aim to provide training and

research projects on Statistics,
Data Science, and ML.

The primary goal of this lab is to

promote research in Statistics in
India and throughout the world

18
Thanks & References
1) Special Thanks to All the Participants, BKC College, WBSU and STAT & ML Lab:
https://www.ctanujit.org/statml-lab.html
2) Image Credit- Wikipedia, Reddit, SlideShare, me.me, Imgflip, xkcd,
3) http://faculty.marshall.usc.edu/gareth-james/ISL/ (ISLR Book)
4) https://developers.google.com/machine-learning/guides/good-data-analysis
5) https://hackernoon.com/choosing-the-right-machine-learning-algorithm-68126944ce1f
6) https://en.wikipedia.org/wiki/Reinforcement_learning
7) Download Latest Version Of this PPT (& other Materials):
https://github.com/souravstat/
8) Please feel free to reach out to me anytime for a discussion:
https://about.me/sourav.nandi , souravsijna@gmail.com

MACHINE LEARNING R23 material
100% (10)
MACHINE LEARNING R23 material
32 pages
Machine Learning?
100% (2)
Machine Learning?
114 pages
Introduction To Machine Learning 1
No ratings yet
Introduction To Machine Learning 1
18 pages
Machine Learning
No ratings yet
Machine Learning
30 pages
Lecture - 2 Classification (Machine Learning Basic and KNN)
No ratings yet
Lecture - 2 Classification (Machine Learning Basic and KNN)
90 pages
AI Unit 1
No ratings yet
AI Unit 1
30 pages
1 - Machine Learning Overview
No ratings yet
1 - Machine Learning Overview
56 pages
1
No ratings yet
1
7 pages
Week 12 Intro to DS and ML
No ratings yet
Week 12 Intro to DS and ML
67 pages
asset-v1_MKAU+SEng9032+DEV_01+type@asset+block@ChapOne
No ratings yet
asset-v1_MKAU+SEng9032+DEV_01+type@asset+block@ChapOne
29 pages
Lecture 01 Introducing ML 13102022 031101pm
No ratings yet
Lecture 01 Introducing ML 13102022 031101pm
36 pages
Top 10 Machine Learning Algo PDF
No ratings yet
Top 10 Machine Learning Algo PDF
15 pages
FML - KNN
No ratings yet
FML - KNN
64 pages
ML-chap-2
No ratings yet
ML-chap-2
60 pages
Machine: Learning ATO Z - I
No ratings yet
Machine: Learning ATO Z - I
131 pages
AML Slides Indexed 2in1 - Converted
No ratings yet
AML Slides Indexed 2in1 - Converted
33 pages
Machine Learning Updated
No ratings yet
Machine Learning Updated
14 pages
ML Revision
No ratings yet
ML Revision
207 pages
Shanthi ML PPT
No ratings yet
Shanthi ML PPT
26 pages
Notes Unit 1
No ratings yet
Notes Unit 1
13 pages
Lesson 4 -Introduction Machine Learning
No ratings yet
Lesson 4 -Introduction Machine Learning
44 pages
Social Media Analytics Techniques[1] (1)
No ratings yet
Social Media Analytics Techniques[1] (1)
77 pages
ML_7th_Sem_AIML_ITE_Notes_Complete_LONG[1]-10-33
No ratings yet
ML_7th_Sem_AIML_ITE_Notes_Complete_LONG[1]-10-33
24 pages
Machine Learning - Brief
No ratings yet
Machine Learning - Brief
12 pages
Introduction to Machine Learning Basics
No ratings yet
Introduction to Machine Learning Basics
12 pages
Module 1 ML
No ratings yet
Module 1 ML
51 pages
MachineLearning
No ratings yet
MachineLearning
16 pages
1 - Machine Learning Overview
No ratings yet
1 - Machine Learning Overview
53 pages
Study On Machine Learning Research Paper
No ratings yet
Study On Machine Learning Research Paper
17 pages
Data Science
No ratings yet
Data Science
4 pages
Unit-I
No ratings yet
Unit-I
23 pages
ML Lec 02 Introduction II
No ratings yet
ML Lec 02 Introduction II
22 pages
Machine Learning
No ratings yet
Machine Learning
5 pages
LECTURE-2
No ratings yet
LECTURE-2
36 pages
Machine Learning 1
No ratings yet
Machine Learning 1
34 pages
Intro ML 1 Day
No ratings yet
Intro ML 1 Day
43 pages
ds unit 2
No ratings yet
ds unit 2
36 pages
ML Unit 1
No ratings yet
ML Unit 1
9 pages
ML Unit1 Docx Unitr 2
No ratings yet
ML Unit1 Docx Unitr 2
46 pages
FAM_QUESTION_BANK_CT[1]
No ratings yet
FAM_QUESTION_BANK_CT[1]
14 pages
Lec2 Intro to ML
No ratings yet
Lec2 Intro to ML
35 pages
A Comprehensive Introduction to Machine Learning
No ratings yet
A Comprehensive Introduction to Machine Learning
4 pages
AIYA SESSION 4
No ratings yet
AIYA SESSION 4
42 pages
Mastering Machine Learning - A Comprehensive Guide
No ratings yet
Mastering Machine Learning - A Comprehensive Guide
19 pages
Advance ML - Unit 1
No ratings yet
Advance ML - Unit 1
12 pages
ML Unit1
No ratings yet
ML Unit1
25 pages
NEP Syllabus Questions
No ratings yet
NEP Syllabus Questions
3 pages
Lecture 1- Introduction to Machine Learning-HO - Ch0
No ratings yet
Lecture 1- Introduction to Machine Learning-HO - Ch0
44 pages
ICT - Machine_Learning_Presentation
No ratings yet
ICT - Machine_Learning_Presentation
13 pages
ML-cahp-1
No ratings yet
ML-cahp-1
35 pages
ML_MU_Unit_1_Introduction_to_MLpdf__2025_02_07_10_53_02 (2)
No ratings yet
ML_MU_Unit_1_Introduction_to_MLpdf__2025_02_07_10_53_02 (2)
49 pages
machineLearning-unit1
No ratings yet
machineLearning-unit1
9 pages
An Enlightenment To Machine Learning
100% (1)
An Enlightenment To Machine Learning
16 pages
Chapter1 Machine Learning (1)
No ratings yet
Chapter1 Machine Learning (1)
26 pages
Karthik
No ratings yet
Karthik
10 pages
ML Notes
No ratings yet
ML Notes
52 pages
ML_notion_1
No ratings yet
ML_notion_1
18 pages
Data Management and Data Transformation, Introduction To Machine Learning
No ratings yet
Data Management and Data Transformation, Introduction To Machine Learning
54 pages
Lecture Notes on Machine Learning Concepts.docx
No ratings yet
Lecture Notes on Machine Learning Concepts.docx
5 pages
Microprediction: Building an Open AI Network
From Everand
Microprediction: Building an Open AI Network
Peter Cotton
No ratings yet
Microsoft Build DevOps Challenge
No ratings yet
Microsoft Build DevOps Challenge
805 pages
CI - CD Project
100% (3)
CI - CD Project
123 pages
Cheatsheet - Continuous Delivery
No ratings yet
Cheatsheet - Continuous Delivery
7 pages
Opus by Owl
No ratings yet
Opus by Owl
34 pages
Process Synchronization Buffer Queue: Producer-Consumer Problem
No ratings yet
Process Synchronization Buffer Queue: Producer-Consumer Problem
10 pages
DeepSeek texte
No ratings yet
DeepSeek texte
4 pages
Exam Chemistry 2022
No ratings yet
Exam Chemistry 2022
2 pages
Data Structures and Algorithm Analysis in JavaTM 3rd edition by Mark Weiss 9780133465013 0133465012 - Get instant access to the full ebook with detailed content
No ratings yet
Data Structures and Algorithm Analysis in JavaTM 3rd edition by Mark Weiss 9780133465013 0133465012 - Get instant access to the full ebook with detailed content
45 pages
List of Books - Pharmacology: Authorname Book Title Publisher Edition
No ratings yet
List of Books - Pharmacology: Authorname Book Title Publisher Edition
2 pages
2022 Ial Phy Y13 HW 07
No ratings yet
2022 Ial Phy Y13 HW 07
4 pages
AQA GCSE Bio Combined End of Topic B14
No ratings yet
AQA GCSE Bio Combined End of Topic B14
7 pages
18.2017 Zubco, Botnarenco INTEGRITY OF LAND RELATIONS AND TERRESTRIAL MEASUREMENTS WITHIN THE MULTIFUNCTIONALITY OF CADASTRE, GEOMAT 2017
No ratings yet
18.2017 Zubco, Botnarenco INTEGRITY OF LAND RELATIONS AND TERRESTRIAL MEASUREMENTS WITHIN THE MULTIFUNCTIONALITY OF CADASTRE, GEOMAT 2017
10 pages
Surviving: Your First 1,000 Days in Business
No ratings yet
Surviving: Your First 1,000 Days in Business
13 pages
Order Lookup Details: Ccavenue Ref# Order Type Order No Order Datetime
No ratings yet
Order Lookup Details: Ccavenue Ref# Order Type Order No Order Datetime
171 pages
Conceptual Design of Hybrid-Electric Aircraft
100% (1)
Conceptual Design of Hybrid-Electric Aircraft
111 pages
Parks and Protected Areas
No ratings yet
Parks and Protected Areas
21 pages
Semi-Detailed Lesson Plan
No ratings yet
Semi-Detailed Lesson Plan
3 pages
CJ Crime Paper
No ratings yet
CJ Crime Paper
4 pages
BS 7th Semester 2019 PDF
No ratings yet
BS 7th Semester 2019 PDF
99 pages
Galene-sphalerite Solubilité Barrett1988
No ratings yet
Galene-sphalerite Solubilité Barrett1988
8 pages
Managing Mental Health in The Workplace
100% (2)
Managing Mental Health in The Workplace
7 pages
40432 MODULO RENESOLA 570W RS6-570NG-E3
No ratings yet
40432 MODULO RENESOLA 570W RS6-570NG-E3
2 pages
RE 4
No ratings yet
RE 4
60 pages
KHADIJAH
No ratings yet
KHADIJAH
35 pages
Exploratory Data Visualization Using Python
No ratings yet
Exploratory Data Visualization Using Python
3 pages
Kevin's Resume
No ratings yet
Kevin's Resume
3 pages
Achievement Goals and Achievement Emotions: A Meta-Analysis
No ratings yet
Achievement Goals and Achievement Emotions: A Meta-Analysis
30 pages
The Affective Domain
No ratings yet
The Affective Domain
6 pages
Proposal Sponsorship BOMA 2024 Update Nov 24
No ratings yet
Proposal Sponsorship BOMA 2024 Update Nov 24
5 pages
Tutorial Sheet 6
No ratings yet
Tutorial Sheet 6
3 pages
Sheikh Kamal Sports Complex PDF
100% (2)
Sheikh Kamal Sports Complex PDF
67 pages
Che 416 L4 PDF
No ratings yet
Che 416 L4 PDF
16 pages
Do LE#7 Harvesting and Threshing Machinery Revised
67% (3)
Do LE#7 Harvesting and Threshing Machinery Revised
42 pages
PDF Maker 1728290780121
100% (1)
PDF Maker 1728290780121
3 pages

Introduction To Machine Learning and Python

Uploaded by

Introduction To Machine Learning and Python

Uploaded by

Introduction to Machine

Source- Statista, IEEE, TechJury, ILS, Raconteur, NPR, Sendpulse

Image Credit: Vas3k, https://noeliagorod.com/

Image Credit: Towards data science

Image Credit: ISLR book (Ref 3) 8

Image Credit: xkcd

Image Credit: deepai.org

Table Credit: dataiku.com

STAT&ML Lab is a non-profit

We aim to provide training and

The primary goal of this lab is to

You might also like