0% found this document useful (0 votes)

6 views29 pages

Intro To ML

The document provides an overview of machine learning (ML), detailing its types such as classification and regression, and the essential steps in the ML process including data mining, pre-processing, and model training. It emphasizes the importance of data cleaning, scaling, and selecting appropriate algorithms while addressing the concepts of overfitting and underfitting. Additionally, it recommends resources for further learning in ML, including books and online platforms.

Uploaded by

Dion Wisent

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

6 views29 pages

Intro To ML

Uploaded by

Dion Wisent

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

You are on page 1/ 29

Lunch and Mlearn

Dexter Fichuk
https://goo.gl/VaWHrb (content here)
https://www.continuum.io/downloa
ds

http://tiny.cc/conda
What is ML?
Types of ML

Classification Regression
Classification can if something is Predicting a value based on the
true or false (1 or 0), could be input, could be predicting a
classifying a picture as a cat or credit score, the temperature,
dog or classifying if something is stocks, or anything where the
a square, triangle or circle. there is continuous output
options, (eg. 2.4893, 1.00049,
59.23)
The Flow
Training/
Data Mining Pre-Processing
Evaluating

Collecting a Dataset Cleaning the Data Building a Complete

Model
Mostly doing Detecting the
supervised learning values/features Involves testing different
here, meaning that our (columns) that matter, algorithms/hyperparamet
training set already has removing ones that ers to find the highest
outcome labels. don’t. accuracy for the dataset.
Could also involve Normalizing/Scaling
creating simulation data
datasets (transactions,
Sometimes plotting
etc.)
Data-Mining
● Try searching sites like
kaggle, open data
government sites, and the
UCI machine learning
Data Mining repository besides Google.
● If simulating the data,
make sure to research
reasonable ranges and
occurrences of different
cases.
Pre-Processing
Involves tasks such as:

Pre- ● Removing irrelevant

features

Processing ● Deciding what to do with

null entries (replace with
column avg., remove row,
Cleaning your dataset etc.)
● Scaling inputs and
transforming text fields to
numerical representations.
Data
Variables X (input data) y (output label)
Representing Data
outputs / labels
one sample 1.1 2.2 3.4 5.6 1.0 1.6
6.7 0.5 0.4 2.6 1.6 2.7
2.4 9.3 7.3 6.4 2.8 4.4

X= 1.5
0.5
0.0
3.5
4.3
8.1
8.3
3.6
3.4
4.6
y= 0.5
0.2
5.1 9.7 3.5 7.9 5.1 5.6
3.7 7.8 2.6 3.2 6.3 6.7

one feature (column)

Categorical If you mapped:
Variables {red->0, green->1, blue->2}, a
linear relationship would be
“red” “green” “blue” imposed between the values,
therefore it is better to perform a
1 0 0 categorical transformation on
types of text fields that are

0 1 0 options, rather than ratings.

1 .2 Whether an input should be

scaled is largely dependent on

3 .6 the learning algorithm you’re

selecting.

5 1 Scaling is great for algorithms

such as Neural Networks and

2 .4 SVMs.
Training A Model
Splitting the Data
Simple Splitting
The gold standard of evaluating a model is by testing it on data it has
not seen in training. This means taking a percentage out of the training
set (typically 10-20%), and running it through the trained model to see
it’s accuracy.

It’s important to set a random state for the split, so you can evaluate
your model on the same training set every time, making your results
reproducible.
Training and Testing Data
training set
1.1 2.2 3.4 5.6 1.0 1.6
6.7 0.5 0.4 2.6 1.6 2.7
2.4 9.3 7.3 6.4 2.8 4.4

X= 1.5
0.5
0.0
3.5
4.3
8.1
8.3
3.6
3.4
4.6
y= 0.5
0.2
BAD SPLIT

5.1 9.7 3.5 7.9 5.1 5.6

3.7 7.8 2.6 3.2 6.3 6.7

test set
Picking an Algorithm
There are many algorithms to choose from, but lucky for us, Scikit-Learn
has a ton built in and can be used mostly interchangeably, meaning
that different classifiers can be used in a loop then plotted to compare
performance.

Each algorithm has better use cases and could outperform others for a
specific task. There is no master algorithm.

Scikit-Learn has a great cheat sheet for picking algorithms.

Source: https://goo.gl/liKQbr
Generalizing
Overfitting and Underfitting
Training

Training

Sweet spot

Accuracy
Testing Generalization

Underfitting Overfitting

Model complexity
Overfitting and Underfitting
● Gradient Boosting

Algorithms (XGBoost, LightGBM)

● Random Forests
● Multi-Layer Perceptron (NN)
A few great ones for
● Support Vector Machines
baselining.
Parameter
Tuning ● GridSearch
● RandomSearch
● Hyperopt
Each Algorithm has a
variety of parameters, there
are a few ways of finding
optimal ones.
Recap

Data Mining Pre-Processing Splitting Data

Trainin
Evaluating
g
Jupyter Notebook Use
Recommende
d Resources
Accuracy
● Hands-On Machine Learning with
Scikit-Learn and TensorFlow by
Aurélien Géron
● Deep Learning with Python by
François Chollet
● Kaggle
github.com/dexterfichuk/ML-
Bootcamp

https://goo.gl/VaWHrb
http://scikit-learn.org/

How To Build A Machine Learning Model - by Chanin Nantasenamat - Towards Data Science
No ratings yet
How To Build A Machine Learning Model - by Chanin Nantasenamat - Towards Data Science
37 pages
Microsoft Premium PL-900 by VCEplus 157q
100% (1)
Microsoft Premium PL-900 by VCEplus 157q
123 pages
Aws ML PDF
No ratings yet
Aws ML PDF
74 pages
12 Issue Akira
100% (1)
12 Issue Akira
20 pages
Zelle Transfer Tutorials
No ratings yet
Zelle Transfer Tutorials
21 pages
LESSON 1: The Project Management and Information Technology Context
No ratings yet
LESSON 1: The Project Management and Information Technology Context
3 pages
Computer Applications Sample Papers Knowledge Boat
No ratings yet
Computer Applications Sample Papers Knowledge Boat
76 pages
Data Science Bootcamp (Day-01) (1) - Compressed
No ratings yet
Data Science Bootcamp (Day-01) (1) - Compressed
161 pages
Research Trends in Machine Learning: Muhammad Kashif Hanif
No ratings yet
Research Trends in Machine Learning: Muhammad Kashif Hanif
80 pages
Designing Machine Learning Systems by Chip Huygen by Rick
No ratings yet
Designing Machine Learning Systems by Chip Huygen by Rick
15 pages
Scalable Machine Learning With Apache Spark en
No ratings yet
Scalable Machine Learning With Apache Spark en
145 pages
Lecture 5 - Feature Extraction, Model Building & Evaluation
No ratings yet
Lecture 5 - Feature Extraction, Model Building & Evaluation
35 pages
Building Good Training Sets UNIT 1 PART2
No ratings yet
Building Good Training Sets UNIT 1 PART2
46 pages
Unit 3 13 Assignment 3 Develop A Website Harrison Odonnell It Level 2
No ratings yet
Unit 3 13 Assignment 3 Develop A Website Harrison Odonnell It Level 2
18 pages
1756-RM006 - Logix5000 Controllers Process and Drives Instructions Reference Manual - InGLES
No ratings yet
1756-RM006 - Logix5000 Controllers Process and Drives Instructions Reference Manual - InGLES
567 pages
2024 Machine Learning Intro
No ratings yet
2024 Machine Learning Intro
50 pages
ML Interactively
No ratings yet
ML Interactively
273 pages
Guidebook Machine Learning Basics PDF
100% (1)
Guidebook Machine Learning Basics PDF
16 pages
Slides Scalable Machine Learning With Apache Spark
No ratings yet
Slides Scalable Machine Learning With Apache Spark
155 pages
UDT and UDF Related Issues
No ratings yet
UDT and UDF Related Issues
5 pages
Modeling Transformer Differential Protection With Harmonic Restraint
No ratings yet
Modeling Transformer Differential Protection With Harmonic Restraint
6 pages
04 Machine Learning Overview
No ratings yet
04 Machine Learning Overview
109 pages
04 Machine Learning Overview
No ratings yet
04 Machine Learning Overview
109 pages
Scalable-ML-3 4 1
No ratings yet
Scalable-ML-3 4 1
147 pages
Unit 03 Our Findings Show
100% (1)
Unit 03 Our Findings Show
19 pages
04 Machine Learning Overview
No ratings yet
04 Machine Learning Overview
109 pages
DATA 2024 - Dist
No ratings yet
DATA 2024 - Dist
72 pages
C2 - W1 Mlopssadsa
No ratings yet
C2 - W1 Mlopssadsa
111 pages
AIch 5
No ratings yet
AIch 5
50 pages
Lecture - 2 Classification (Machine Learning Basic and KNN)
No ratings yet
Lecture - 2 Classification (Machine Learning Basic and KNN)
90 pages
Practical Research Paper Smart Dustbin 12 Euler 1
No ratings yet
Practical Research Paper Smart Dustbin 12 Euler 1
41 pages
DR Kruti Dangarwala CSE & IT Department Svmit: Python For Data Science Unit 5: Data Wrangling
No ratings yet
DR Kruti Dangarwala CSE & IT Department Svmit: Python For Data Science Unit 5: Data Wrangling
91 pages
Machine Learning
No ratings yet
Machine Learning
57 pages
ML Chap 2
No ratings yet
ML Chap 2
60 pages
AI Unit 1
No ratings yet
AI Unit 1
30 pages
1-19#-LS-909-SIZE ANALYSIS MACHINE-user Manual PDF
No ratings yet
1-19#-LS-909-SIZE ANALYSIS MACHINE-user Manual PDF
101 pages
APS1070 Lecture (3) Slides
No ratings yet
APS1070 Lecture (3) Slides
70 pages
ML SIG - Day 1
No ratings yet
ML SIG - Day 1
55 pages
Module 5.pptx - 20250608 - 201231 - 0000
No ratings yet
Module 5.pptx - 20250608 - 201231 - 0000
43 pages
2021 Machine Learning Intro
No ratings yet
2021 Machine Learning Intro
43 pages
Unit 4 - Question Bank and Answers
No ratings yet
Unit 4 - Question Bank and Answers
23 pages
ML Lectures Summary 2
No ratings yet
ML Lectures Summary 2
52 pages
Algorithmeknn 121213175830 Phpapp02
No ratings yet
Algorithmeknn 121213175830 Phpapp02
52 pages
Lesson 2 - Introduction To ML
No ratings yet
Lesson 2 - Introduction To ML
36 pages
ML 02 Dataset-Feature Selection PDF
No ratings yet
ML 02 Dataset-Feature Selection PDF
44 pages
ML Da
No ratings yet
ML Da
55 pages
HKSI LE Withdrawal Application Form Chinese
No ratings yet
HKSI LE Withdrawal Application Form Chinese
1 page
Introduction To Machine Learning
No ratings yet
Introduction To Machine Learning
31 pages
Air Quality Prediction Using Machine Learning
No ratings yet
Air Quality Prediction Using Machine Learning
29 pages
Capstone Project
No ratings yet
Capstone Project
40 pages
Train and Test Datasets in Machine Learning
No ratings yet
Train and Test Datasets in Machine Learning
26 pages
AI and Robotics
No ratings yet
AI and Robotics
22 pages
Machine Learning
No ratings yet
Machine Learning
28 pages
Machine
No ratings yet
Machine
61 pages
Building A ML System
No ratings yet
Building A ML System
42 pages
Notes Unit 1-3 Part-II
No ratings yet
Notes Unit 1-3 Part-II
20 pages
Machine Learning Basics
No ratings yet
Machine Learning Basics
32 pages
A Project Report: in Partial Fulfillment For The Award of The Degree
No ratings yet
A Project Report: in Partial Fulfillment For The Award of The Degree
50 pages
Chapter 02 Overview - 4
No ratings yet
Chapter 02 Overview - 4
43 pages
ML Notion 1
No ratings yet
ML Notion 1
18 pages
Data Science
No ratings yet
Data Science
38 pages
Lecture 12 - Machine Learning
No ratings yet
Lecture 12 - Machine Learning
18 pages
Class1 - Introduction and Foundation-1717413257735
No ratings yet
Class1 - Introduction and Foundation-1717413257735
23 pages
Unit 1
No ratings yet
Unit 1
28 pages
1.10 Taylor and Maclaurin Series
No ratings yet
1.10 Taylor and Maclaurin Series
12 pages
Sameera CV-english 2024
No ratings yet
Sameera CV-english 2024
11 pages
Deep Learning Workflow
No ratings yet
Deep Learning Workflow
11 pages
ATS2805A
No ratings yet
ATS2805A
21 pages
Machine Learning Section2 Ebook
No ratings yet
Machine Learning Section2 Ebook
16 pages
MapReduce Patterns, Algorithms, and Use Cases - Highly Scalable Blog
No ratings yet
MapReduce Patterns, Algorithms, and Use Cases - Highly Scalable Blog
7 pages
Multi-Output Classification With Machine Learning
No ratings yet
Multi-Output Classification With Machine Learning
10 pages
Lec 2
No ratings yet
Lec 2
13 pages
CRISC GF Application
No ratings yet
CRISC GF Application
10 pages
Approaching (Almost) Any Machine Learning Problem - Abhishek Thakur - No Free Hunch
No ratings yet
Approaching (Almost) Any Machine Learning Problem - Abhishek Thakur - No Free Hunch
22 pages
Machine Learning With Python
No ratings yet
Machine Learning With Python
6 pages
Op-Amp Lecture Notes
No ratings yet
Op-Amp Lecture Notes
36 pages
Implementation of An E-Commerce System For The Automation and Improvement of Commercial Management at A Business Level
No ratings yet
Implementation of An E-Commerce System For The Automation and Improvement of Commercial Management at A Business Level
7 pages
Final ML
No ratings yet
Final ML
2 pages
Dashpute Smita A.: Brief Overview
No ratings yet
Dashpute Smita A.: Brief Overview
3 pages
Nonin 9590 Vantage
No ratings yet
Nonin 9590 Vantage
2 pages
Machine Learning - Brief
No ratings yet
Machine Learning - Brief
12 pages
Unit III - I
No ratings yet
Unit III - I
15 pages
RhinoGold 4.0 - Level 1 - Tutorial 015P - Channel Pendant PDF
No ratings yet
RhinoGold 4.0 - Level 1 - Tutorial 015P - Channel Pendant PDF
2 pages
Manifestation of Globalization
No ratings yet
Manifestation of Globalization
3 pages
HFY-Checklist-14!06!04Inspection Checklist FOTE Test（FOTE试验（现场验收试验）
No ratings yet
HFY-Checklist-14!06!04Inspection Checklist FOTE Test（FOTE试验（现场验收试验）
1 page
Bio 11 Syllabus
No ratings yet
Bio 11 Syllabus
4 pages
Chapter1 2challenges
No ratings yet
Chapter1 2challenges
5 pages
Virtual Private Network (VPN) : Ipsec
No ratings yet
Virtual Private Network (VPN) : Ipsec
4 pages
DATA MINING and MACHINE LEARNING: CLUSTER ANALYSIS and kNN CLASSIFIERS. Examples with MATLAB
From Everand
DATA MINING and MACHINE LEARNING: CLUSTER ANALYSIS and kNN CLASSIFIERS. Examples with MATLAB
César Pérez López
No ratings yet
DATA MINING AND MACHINE LEARNING. PREDICTIVE TECHNIQUES: REGRESSION, GENERALIZED LINEAR MODELS, SUPPORT VECTOR MACHINE AND NEURAL NETWORKS
From Everand
DATA MINING AND MACHINE LEARNING. PREDICTIVE TECHNIQUES: REGRESSION, GENERALIZED LINEAR MODELS, SUPPORT VECTOR MACHINE AND NEURAL NETWORKS
César Pérez López
No ratings yet

Intro To ML

Uploaded by

Intro To ML

Uploaded by

Lunch and Mlearn

Collecting a Dataset Cleaning the Data Building a Complete

Pre- ● Removing irrelevant

Processing ● Deciding what to do with

one feature (column)

0 1 0 options, rather than ratings.

A field such as 5-star ratings

1 .2 Whether an input should be

3 .6 the learning algorithm you’re

5 1 Scaling is great for algorithms

5.1 9.7 3.5 7.9 5.1 5.6

Scikit-Learn has a great cheat sheet for picking algorithms.

Algorithms (XGBoost, LightGBM)

Data Mining Pre-Processing Splitting Data

You might also like