Introduction

Uploaded by

mert

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

13 views41 pages

Introduction

Uploaded by

mert

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 41

CMPE 442 INTRODUCTION

MACHINE LEARNING
MACHINE LEARNING
 Machine learning is the field of study that gives
computers the ability to learn without being
explicitly programmed.
MACHINE LEARNING
 Machine learning is the field of study that gives
computers the ability to learn without being
explicitly programmed.
 A computer program is said to learn from
experience E with respect to some task T and
some performance measure P, if its performance
on T, as measured by P, improves with
experience E.
MACHINE LEARNING
 Example: Spam filter- given examples of spam e-
mails and examples of ham e-mails, learns to flag
spam.
 Training set- examples that the system uses to learn.
 T (task)- flag spam for new e-mails
 E (experience)- training data
 P (performance)- ? needs to be defined:
 Ex: the ratio of correctly classified emails  accuracy
EVALUATING PERFORMANCE ON A TASK
 Machine learning problems don’t have a
“correct” answer.
 Consider sorting problem:
 Many sorting algorithms available: bubble sort, quick
sort, insertion sort ...
 The performance is measured in terms of how fast
they are and how much data they can handle.
 Would we compare the sorting algorithms with
respect to the correctness of the result?
EVALUATING PERFORMANCE ON A TASK
 Machine learning problems don’t have a
“correct” answer.
 Consider sorting problem:
 Many sorting algorithms available: bubble sort, quick
sort, insertion sort ...
 The performance is measured in terms of how fast
they are and how much data they can handle.
 Would we compare the sorting algorithms with
respect to the correctness of the result?
 Algorithm that isn’t guaranteed to produce a sorted list
every time is useless as a sorting algorithm.
EVALUATING PERFORMANCE ON A TASK
 No perfect solution in machine learning
 Perfect e-mail spam filter does not exist!!!
 In many cases data is “noisy”
 Examples mislabelled
 Features contain errors
o Performance evaluation of learning algorithms is
important in machine learning.
WHY USE MACHINE LEARNING?
WHY USE MACHINE LEARNING?
 Let’s write a spam filter using traditional
programming technique
1) Study spam emails and get the patterns and most
occurring words.
2) Write detection algorithm.
3) Test and repeat steps 1 and 2 until it is good
enough
WHY USE MACHINE LEARNING?

Launch!

Study the Write

Evaluate
problem rules

Analyze
errors
WHY USE MACHINE LEARNING?

Launch!
Data

Study the Train ML

Evaluate
problem algorithm

Analyze
errors
WHY USE MACHINE LEARNING?
 Consider the example of recognizing handwritten
digits.

 Each digit corresponds to a 28x28 pixel image

and so can be represented by a vector x
comprising 784 real numbers.
 Goal: build a machine that will take such a vector
x as input and that will produce the identity of
the digit 0, …, 9 as the output.
WHY USE MACHINE LEARNING?
 Better use a machine learning approach where a
large set N digits called the training set is
used to tune the parameters of an adaptive model.
 The categories of the digits in the training set are
known in advance  target vector t.
 The goal is to determine the function y(x) which takes
a new digit image x as an input and generates an
output vector y  learning, training phase
 Once the model is trained we can run it on the test
set.
 The ability to categorize correctly new examples that
differ from those used for training is known as
generalization.
WHY USE MACHINE LEARNING?
 For problems that are too complex for traditional
approach.
 For problem that have no known algorithm.
 Ex.: Speech recognition
 Helps human learn: applying ML techniques to
large amounts of data reveals patterns that were
not immediately apparent data mining.
SOME ML PROBLEMS
 Speech Recognition

 Document Classification

 Face Detection and Recognition

 ...
TYPES OF MACHINE LEARNING SYSTEMS
 Whether or not they are trained with human
supervision: supervised, unsupervised, semi-
supervised, reinforcement learning.

 Instance-based versus model-based learning.

SUPERVISED LEARNING
 The training data includes the desired solutions,
called labels.
 Spam filter  classification
SPAM FILTERING AS A CLASSIFICATION
TASK
MACHINE LEARNING FOR SPAM
FILTERING
SUPERVISED LEARNING
 The training data includes the desired solutions,
called labels.
 House price prediction  regression
SUPERVISED LEARNING
 Some most important supervised algorithms:
 K-Nearest Neighbours
 Linear Regression
 Naïve Bayes
 Logistic Regression
 Support Vector Machines
 Decision Trees and Random Forests
 Neural Networks
UNSUPERVISED LEARNING
 The training data is unlabelled.
 The system tries to learn without anyone's
guidance.
UNSUPERVISED LEARNING
 Some most important unsupervised algorithms:
 Clustering
 K-Means
 Hierarchical Cluster Analysis

 Expectation Maximization

 Visualization and Dimensionality Reduction

 Principal Component Analysis (PCA)
 Locally-Linear Embedding (LLE)

 t-distribution Stochastic Neighbour Embedding (t-SNE)

SUPERVISED/UNSUPERVISED LEARNING
INSTANCE-BASED VS. MODEL-BASED
LEARNING
 Most ML problems are about making prediction
 Given training examples, the system needs to be
able to generalize to examples it has never seen
before
 The true goal is to perform well on new instances

 Two main generalization approaches:

 Instance-based: The system learns the examples by
heart, then generalizes to new cases using a
similarity measure.
 Model-based: generalizes from a set of examples by
building a model of these examples, then use that
model to make predictions.
INSTANCE-BASED LEARNING
MODEL-BASED LEARNING
REGRESSION PROBLEM
LINEAR REGRESSION
LINEAR REGRESSION
LINEAR REGRESSION
PROJECT PHASES
 Study data
 Select a learning algorithm

 Train it on the training data

 Apply the model to make predictions on new

cases
MAIN CHALLENGES IN MACHINE
LEARNING
 Two things that can go wrong:
 Bad data
 Bad algorithm
BAD DATA
 Insufficient quantity of training data
 It takes a lot of data for most ML algorithms to work
properly.
 Non-representative training data
 It is crucial that your training data is representative of the
new cases you want to generalize to.
 Poor-quality data
 It is better to spend time cleaning up the training data:
decide about outliers and missing features.
 Irrelevant features
 Feature engineering involves:
 Feature selection: selecting the most useful features to train on
among existing features
 Feature extraction: combining existing features to produce a
more useful one
 Creating new features by gathering new data
BAD ALGORITHM
 Overfitting the training data:
 Happens when the model performs well on the
training data, but it does not generalize well.
 Underfitting the training data
 Happens when the model is too simple to learn the
underlying structure of the data
BAD ALGORITHM: EXAMPLE
 Simple regression problem: Suppose we observe a
real-valued input variable x and we wish to use
this observation to predict the value of a real-
valued target variable t.
 The data for this example is generated from the
function with random noise included in
the target values.
 Suppose we are given a training set containing N
observations of x, and the
corresponding observations t, t
BAD ALGORITHM: EXAMPLE
 N=10, the input data set x is generated by
choosing values of , for , spaced
uniformly in range
 The target data set t is obtained by computing
for corresponding x values and adding
small level of noise having Gaussian distribution
 Goal: exploit the training set in order to
make predictions of the value of the target
variable for some new value of the input
variable.
 In other words we are trying to
discover the underlying function
POLYNOMIAL CURVE FITTING


M order of polynomial
 coefficients
CURVE FITTING
TESTING AND VALIDATING
 Once you have a trained model, evaluate it and
fine-tune it.
 Split your data into two sets: training set and the
test set.
 Generalization error: error rate on the new cases,
estimated by evaluating the model on test set.
 If the training error is low (makes few mistakes
on training set) but the generalization error is
high, then the model is overfitting the training
set.
HOW ML HELPS TO SOLVE A TASK?

Machine Learning in Modeling and Simulation
No ratings yet
Machine Learning in Modeling and Simulation
456 pages
CourseDiary_MVJ22SAD22(B) - Deep Learning
No ratings yet
CourseDiary_MVJ22SAD22(B) - Deep Learning
60 pages
AI-Powered+Predictive+Analytics+for+Vehicle+Maintenance+Scheduling (1)
No ratings yet
AI-Powered+Predictive+Analytics+for+Vehicle+Maintenance+Scheduling (1)
16 pages
Week 4 R Programming Model Validation
No ratings yet
Week 4 R Programming Model Validation
5 pages
ML cheat sheet(1)
No ratings yet
ML cheat sheet(1)
2 pages
MACHINE LEARNING ALGORITHM - Unit-1-1
100% (1)
MACHINE LEARNING ALGORITHM - Unit-1-1
78 pages
07-Overview-of-Machine-Learning
No ratings yet
07-Overview-of-Machine-Learning
113 pages
Chapter 8 V7.0
No ratings yet
Chapter 8 V7.0
129 pages
CMPE472 Quiz#1
100% (1)
CMPE472 Quiz#1
52 pages
B3 Twitter Data
No ratings yet
B3 Twitter Data
68 pages
ML_Unit I_Final
No ratings yet
ML_Unit I_Final
132 pages
Credit Risk Analysis Using Machine and Deep Learning
No ratings yet
Credit Risk Analysis Using Machine and Deep Learning
19 pages
1-s2.0-S2666016425000878-main
No ratings yet
1-s2.0-S2666016425000878-main
37 pages
Master Thesis Firas Ouerghi 2020
No ratings yet
Master Thesis Firas Ouerghi 2020
94 pages
Multiple Linear Regression
No ratings yet
Multiple Linear Regression
21 pages
Stock Prediction Presentation
No ratings yet
Stock Prediction Presentation
15 pages
Questions 4
No ratings yet
Questions 4
16 pages
BE02000041 Funda of AI Unit 3 Basics of ML
No ratings yet
BE02000041 Funda of AI Unit 3 Basics of ML
86 pages
Week 03
No ratings yet
Week 03
28 pages
u 1
No ratings yet
u 1
12 pages
Data in ML
No ratings yet
Data in ML
26 pages
Unveiling The Power: A Comparative Analysis of Data Mining Tools Through Decision Tree Classification On The Bank Marketing Dataset
No ratings yet
Unveiling The Power: A Comparative Analysis of Data Mining Tools Through Decision Tree Classification On The Bank Marketing Dataset
11 pages
Chapter 5 Machine Learning
No ratings yet
Chapter 5 Machine Learning
96 pages
Module 2 Notes
No ratings yet
Module 2 Notes
20 pages
International Journal of Plasticity: Dong Phill Jang, Piemaan Fazily, Jeong Whan Yoon
No ratings yet
International Journal of Plasticity: Dong Phill Jang, Piemaan Fazily, Jeong Whan Yoon
17 pages
machines-13-00135
No ratings yet
machines-13-00135
18 pages
L1 Overview
No ratings yet
L1 Overview
28 pages
Symptom-Based_Disease_Prediction_A_Machine_Learnin
No ratings yet
Symptom-Based_Disease_Prediction_A_Machine_Learnin
10 pages
Lab3 UDPTCP
No ratings yet
Lab3 UDPTCP
4 pages
1 (1)
No ratings yet
1 (1)
10 pages
Ai - Foundations of Machine Learning I
No ratings yet
Ai - Foundations of Machine Learning I
39 pages
ML -1_Sovan_Introduction to ML
No ratings yet
ML -1_Sovan_Introduction to ML
83 pages
Lecture 4 Machine Learning - Bcsc
No ratings yet
Lecture 4 Machine Learning - Bcsc
45 pages
Machine Learning
No ratings yet
Machine Learning
26 pages
AI in Neurosurgery A Systematic Review
No ratings yet
AI in Neurosurgery A Systematic Review
12 pages
Machine Learning_v1 (1)
No ratings yet
Machine Learning_v1 (1)
30 pages
Analysis and Prediction of Healthcare Sector Stock Price Using Machine Learning Techniques Healthcare Stock Analysis
No ratings yet
Analysis and Prediction of Healthcare Sector Stock Price Using Machine Learning Techniques Healthcare Stock Analysis
15 pages
Overfitting_Underfitting
No ratings yet
Overfitting_Underfitting
2 pages
Lab 08
No ratings yet
Lab 08
2 pages
Lab 03
No ratings yet
Lab 03
2 pages
Introduction to ML Unit-1 PPT
No ratings yet
Introduction to ML Unit-1 PPT
90 pages
Afafdfsregf
No ratings yet
Afafdfsregf
9 pages
3171617_introduction_1175
No ratings yet
3171617_introduction_1175
58 pages
4. Ai_foundations of Machine Learning i
No ratings yet
4. Ai_foundations of Machine Learning i
40 pages
MLUnit_1
No ratings yet
MLUnit_1
131 pages
UNIT I 1 ML Introduction To ML Well Posed Learning Problem
No ratings yet
UNIT I 1 ML Introduction To ML Well Posed Learning Problem
48 pages
Machine Learning
No ratings yet
Machine Learning
57 pages
Computer Engineering Department TED University: CMPE 252 - C Programming, Spring 2021 Lab 2
No ratings yet
Computer Engineering Department TED University: CMPE 252 - C Programming, Spring 2021 Lab 2
4 pages
Lecture 01 - Machine Learning Basics Revision
No ratings yet
Lecture 01 - Machine Learning Basics Revision
80 pages
Computer Engineering Department TED University: CMPE 252 - C Programming, Spring 2021 Lab 1
No ratings yet
Computer Engineering Department TED University: CMPE 252 - C Programming, Spring 2021 Lab 1
3 pages
AIC and BIC
No ratings yet
AIC and BIC
5 pages
UNIT-I
No ratings yet
UNIT-I
132 pages
UNIT 1
No ratings yet
UNIT 1
38 pages
Building Condition Assessment Using Artificial Neural Network and Structural Equations
No ratings yet
Building Condition Assessment Using Artificial Neural Network and Structural Equations
12 pages
unit 01
No ratings yet
unit 01
32 pages
Presentation on ML - Copy
No ratings yet
Presentation on ML - Copy
469 pages
Machine Learning - 1
No ratings yet
Machine Learning - 1
52 pages
Unit II Data Science Notes
No ratings yet
Unit II Data Science Notes
38 pages
Machine Learning Slides
No ratings yet
Machine Learning Slides
46 pages
Introduction to ML
No ratings yet
Introduction to ML
17 pages
P138 A Backtesting Protocol
No ratings yet
P138 A Backtesting Protocol
11 pages
Introduction To Machine Learning
No ratings yet
Introduction To Machine Learning
45 pages
Mlfa Autumn 22 Lec 01
No ratings yet
Mlfa Autumn 22 Lec 01
43 pages
Machine Learning - ch1
No ratings yet
Machine Learning - ch1
46 pages
Chapter 2
No ratings yet
Chapter 2
35 pages
Lec1 Intoduction
No ratings yet
Lec1 Intoduction
34 pages
Unit 1
No ratings yet
Unit 1
62 pages
Chapter 1 - Introduction
No ratings yet
Chapter 1 - Introduction
28 pages
ENG6500 1 IntroductionToMLDL Part1
No ratings yet
ENG6500 1 IntroductionToMLDL Part1
63 pages
Machine - Learning - Unit - 1
No ratings yet
Machine - Learning - Unit - 1
70 pages
Introduction To ML
100% (1)
Introduction To ML
39 pages
aws ai cert mock exam 1 (detailed ans)
No ratings yet
aws ai cert mock exam 1 (detailed ans)
88 pages
Lecture 1
No ratings yet
Lecture 1
47 pages
1051637-Worksheet Part b Unit7 Evaluation
No ratings yet
1051637-Worksheet Part b Unit7 Evaluation
5 pages
Module 1 ML
No ratings yet
Module 1 ML
51 pages
Machine Learning: Professional CORE (CET3006B) T. Y. B.Tech CSE
No ratings yet
Machine Learning: Professional CORE (CET3006B) T. Y. B.Tech CSE
106 pages
Unit 1&2
No ratings yet
Unit 1&2
270 pages
Machine Learning INTRO
No ratings yet
Machine Learning INTRO
12 pages
Introduction To Machine Learning
No ratings yet
Introduction To Machine Learning
24 pages
ML Full Slides Final
No ratings yet
ML Full Slides Final
458 pages
ML 01
No ratings yet
ML 01
15 pages
Data Mining Week 1 2
No ratings yet
Data Mining Week 1 2
117 pages
Python UNIT-5
100% (1)
Python UNIT-5
67 pages
Unit-1 Introduction To Machine Learning
No ratings yet
Unit-1 Introduction To Machine Learning
24 pages
Unit-1 ML
No ratings yet
Unit-1 ML
19 pages
ML - Full Slides Srikanth Allamshatty
No ratings yet
ML - Full Slides Srikanth Allamshatty
369 pages
An Enlightenment To Machine Learning
100% (1)
An Enlightenment To Machine Learning
16 pages
Dr. Ahmed Elngar - ML
No ratings yet
Dr. Ahmed Elngar - ML
118 pages
Unit-1 MLT
No ratings yet
Unit-1 MLT
51 pages
Lect3 Machine Learning
No ratings yet
Lect3 Machine Learning
27 pages
Concept Learning
No ratings yet
Concept Learning
85 pages
GWU MSBA Analyzing Data With SAS Visual Statistics Student Homework Exercises We
No ratings yet
GWU MSBA Analyzing Data With SAS Visual Statistics Student Homework Exercises We
9 pages
Professional Data Engineer Sample Questions - Docx-22 Qa Imp
0% (1)
Professional Data Engineer Sample Questions - Docx-22 Qa Imp
20 pages
Machine Learning Basics
No ratings yet
Machine Learning Basics
16 pages
Unit - 5.1 - Introduction To Machine Learning
No ratings yet
Unit - 5.1 - Introduction To Machine Learning
38 pages
Data Science Module1
No ratings yet
Data Science Module1
20 pages
Unit I MACHINE LEARNING
No ratings yet
Unit I MACHINE LEARNING
87 pages
DATA MINING and MACHINE LEARNING. PREDICTIVE TECHNIQUES: ENSEMBLE METHODS, BOOSTING, BAGGING, RANDOM FOREST, DECISION TREES and REGRESSION TREES.: Examples with MATLAB
From Everand
DATA MINING and MACHINE LEARNING. PREDICTIVE TECHNIQUES: ENSEMBLE METHODS, BOOSTING, BAGGING, RANDOM FOREST, DECISION TREES and REGRESSION TREES.: Examples with MATLAB
César Pérez López
No ratings yet