0% found this document useful (0 votes)
141 views36 pages

CSE445 1 Intro To ML

Uploaded by

zikbal100
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
141 views36 pages

CSE445 1 Intro To ML

Uploaded by

zikbal100
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 36

1

Mirza Mohammad Lutfe Elahi;


Silvia Ahmed CSE 445 Machine Learning
Department of Electrical and
Computer Engineering Introduction to Machine Learning
CSE445 Machine Learning Introduction to Machine Learning ECE@NSU
Topics 2

• What is ML
• Types of ML
• Supervised/Unsupervised/Semi-Supervised/Reinforcement Learning
• Online/Batch Learning
• Instance/Model Based Learning
• Challenges of ML

CSE445 Machine Learning Introduction to Machine Learning ECE@NSU


Learning goals 3

• After this presentation, you should be able to


• Explain what machine learning is
• Understand the different machine-learning systems
• Know different jargon related to machine learning
• Understand the main challenges in machine learning

CSE445 Machine Learning Introduction to Machine Learning ECE@NSU


Problem statement 4

Spam email/SMS:
Emails and/SMS that contain unwanted or dangerous content.

Solution:
Use a spam filter to identify such emails/SMSs to flag them as spam

CSE445 Machine Learning Introduction to Machine Learning ECE@NSU


Traditional Approach 5

1. Study the problem: consider what spam


typically looks like. Finding out patterns in the
sender’s name and email/SMS body.
2. Write rules: Write a detection algorithm for
each of the patterns
3. Evaluate
4. Analyze errors or launch

Problem:
Spammers change the patterns. There is a need Figure 1: The traditional approach to software designing
to keep writing new rules forever.

CSE445 Machine Learning Introduction to Machine Learning ECE@NSU


Machine Learning (ML) Approach 6

Figure 2: The Machine Learning approach Figure 3: Automatically adapting to change

CSE445 Machine Learning Introduction to Machine Learning ECE@NSU


What is ML? 7

• “[Machine Learning is the] field of study that gives computers the


ability to learn without being explicitly programmed.”
- Arthur Samuel, 1959

• A computer program is said to learn from experience E with


respect to some task T and some performance measure P, if its
performance on T, as measured by P, improves with experience
E.
- Tom Mitchell, 1997

CSE445 Machine Learning Introduction to Machine Learning ECE@NSU


ML: Example 8

A computer program is said to learn from experience E with respect


to some task T and some performance measure P, if its performance
on T, as measured by P, improves with experience E.
Example:
Spam filter is a machine learning (ML) program that, given examples
of spam emails and examples of regular emails, can learn to flag
spam.
Here,
• Task, T: to flag “spam” for new emails,

• Experience, E: the training data,

• Performance, P: ratio of correctly classified emails (accuracy)

CSE445 Machine Learning Introduction to Machine Learning ECE@NSU


Application of ML 9

• Analyzing images of products on a production line to classify them automatically;


• Detecting tumors in brain scans;
• Automatically classifying news articles;
• Automatically flagging offensive comments on discussion forums;
• Summarizing long documents automatically;
• Creating a chatbot or a personal assistant;
• Forecasting your company’s revenue next year, based on many performance metrics;
• Making your app react to voice commands;
• Detecting credit card fraud;
• Segmenting clients based on their purchases so that you can design a different marketing
strategy for each segment;
• Many more….

CSE445 Machine Learning Introduction to Machine Learning ECE@NSU


Types of ML 10

Broad categories based on the following criteria:

Trained with human supervision? Incremental learning Patterns in example or


• Supervised • Online learning rule-based?
• Unsupervised • Batch learning • Instance-based
• Semi-supervised learning
• Reinforcement Learning • Model-based learning

CSE445 Machine Learning Introduction to Machine Learning ECE@NSU


Supervised Learning 11

• Probably the most common problem type in machine learning


• Data set is given
• Already know what the correct output should look like, having the
idea that there is a relationship between the input and the output.

Figure 4: A labeled training set for spam classification (example of supervised learning)

CSE445 Machine Learning Introduction to Machine Learning ECE@NSU


Supervised Learning – Categories and Example 12

• Regression – trying to predict results within a continuous output


• Classification - trying to predict results in a discrete output
Example 1:
• Given data about the size of houses on the real estate market, try to predict their
price. Price as a function of size is a continuous output, so this is a regression
problem.

Figure 5: A labeled
training set for housing
price prediction (example
of supervised learning)

• We could turn this example into a classification problem by instead making our output
about whether the house "sells for more or less than the asking price." Here we are
classifying the houses based on price into two discrete categories.
CSE445 Machine Learning Introduction to Machine Learning ECE@NSU
Supervised Learning Example (contd.) 13

What approaches can we use to solve this?


• Straight line through data
• Maybe $150 000
• Second order polynomial
• Maybe $200 000
• One thing we discuss later - how to chose straight or curved line?
• We know actual prices for houses
• The idea is we can learn what makes the price a certain value from the training data
• The algorithm should then produce more right answers based on new training data
where we don't know the price already
• i.e. predict the price

CSE445 Machine Learning Introduction to Machine Learning ECE@NSU


Supervised Learning Example (contd.) 14

Example 2:
• Can we define breast cancer as malignant or benign based on
tumor size?

Figure 6: A labeled training set for breast cancer detection


(example of supervised learning)

• Can you estimate prognosis based on tumor size?

CSE445 Machine Learning Introduction to Machine Learning ECE@NSU


Supervised Learning Example (contd.) 15

Example 2:
• This is an example of a classification problem
• Classify data into one of two discrete classes - no in between, either malignant or not
• In classification problems, can have a discrete number of possible values for the output
• e.g. maybe have four values
• 0 - benign
• 1 - type 1
• 2 - type 2
• 3 - type 4

CSE445 Machine Learning Introduction to Machine Learning ECE@NSU


Supervised Learning Example (contd.) 16

Example 2:
• In classification problems we can plot data in a different way

• Using only one attribute (size)

Figure 7: A labeled training set for breast cancer detection with


one attribute only

• In other problems may have multiple attributes


• We may also, for example, know age and tumor size
Figure 8: A labeled training set for
breast cancer detection with multiple
attributes
CSE445 Machine Learning Introduction to Machine Learning ECE@NSU
Supervised Learning Example (contd.) 17

Example 3:
• (a) Regression - Given a picture of male/female, we have to
predict his/her age on the basis of given picture.
• (b) Classification - Given a picture of male/female, we have to
predict whether he/she is of high school, college, graduate age.
• Another example for classification - Banks have to decide whether
or not to give a loan to someone on the basis of his credit history.

CSE445 Machine Learning Introduction to Machine Learning ECE@NSU


Supervised Learning Example (contd.) 18

You’re running a company, and you want to develop learning


algorithms to address each of two problems.

Problem 1: You have a large inventory of identical items. You want to


predict how many of these items will sell over the next 3 months.

Problem 2: You’d like software to examine individual customer


accounts, and for each account decide if it has been
hacked/compromised.

Should you treat these as classification or as regression problems?

CSE445 Machine Learning Introduction to Machine Learning ECE@NSU


Supervised Learning Algorithm 19

• Here are some of the most important supervised learning


algorithms:
• K-Nearest Neighbors
• Linear Regression
• Logistic Regression
• Support Vector Machines (SVMs)
• Decision Trees and Random Forests
• Neural Networks

CSE445 Machine Learning Introduction to Machine Learning ECE@NSU


Unsupervised Learning 20

• Here is a data set, can you structure it?


• Can derive structure from data where it
doesn't necessarily know the effect of
the variables.
• Can derive this structure by clustering Figure 9: An unlabeled training set for
the data based on relationships among unsupervised learning

the variables in the data.


• There is no feedback based on the
prediction results.

CSE445 Machine Learning Introduction to Machine Learning ECE@NSU


Unsupervised Learning Example 21

• Have a group of individuals


• On each measure expression of a gene
• Run algorithm to cluster individuals into types of people

CSE445 Machine Learning Introduction to Machine Learning ECE@NSU


Unsupervised Learning Algorithms 22

• Clustering
• K-Means
• DBSCAN
• Hierarchial Cluster Analysis (HCA)
• Anomaly Detection and novelty detection
• One-class SVM
• Isolation Forest
• Visualization and dimensionality reduction
• Principal Component Analysis (PCA)
• Kernel PCA
• Locally Linear Embedding (LLE)
• T-Distributed Stochastic Neighbor Embedding (t-SNE)
• Association rule learning
• Apriori
• Eclat

CSE445 Machine Learning Introduction to Machine Learning ECE@NSU


Semi-Supervised Learning 23

• Supervised learning works on labeled data


• Unsupervised learning works on unlabeled data
• In Semi-Supervised learning, the training data contains both
labeled and unlabeled data
• Mostly combinations of unsupervised
and supervised algorithms.
• Example: Google photos

Figure 10: A partially labeled training set for semisupervised learning

CSE445 Machine Learning Introduction to Machine Learning ECE@NSU


Reinforcement Learning
• The learning system, called an
agent, can observe the
environment, select and perform
actions, and get rewards in return
(or penalties).
• It must then learn by itself what is
the best strategy, called a policy, to
get the most reward over time.

Figure 11: Reinforcement learning

CSE445 Machine Learning Introduction to Machine Learning ECE@NSU


Batch and Online Learning 25

• Another criterion used to classify Machine Learning Systems is


whether or not the system can learn incrementally from a stream
of incoming data
• Batch Learning
• Incapable of learning incrementally
• It must be trained using all available data
• Generally takes a lot of time and computing resources – so it is typically
done offline
• Hence, also known as offline learning

CSE445 Machine Learning Introduction to Machine Learning ECE@NSU


Batch and Online Learning (contd.) 26

• Online Learning
• Train the system incrementally by feeding data instances sequentially,
either individually or in small groups called mini-batches.
• Each learning step is fast and cheap
• The system can learn about new data on the fly as it arrives.
• Suitable for systems that receive data as a continuous flow (e.g. stock
prices) and need to adapt autonomously.
• Also suitable to train systems on huge datasets that cannot fit in one
machine’s main memory (out-of-core learning)
• Challenge: feeding bad data gradually declines the system’s performance

CSE445 Machine Learning Introduction to Machine Learning ECE@NSU


Instance-Based vs Model-Based Learning 27

• One more way to classify machine learning algorithms is by how


they generalize.
• Instance Based Learning
• The most trivial form of learning is rote learning (to learn by heart)
• System learns the examples by heart, then generalizes to new case by
using a similarity measure to compare them to the learned examples (or a
subset of them)

Figure 12: Instance-Based learning

CSE445 Machine Learning Introduction to Machine Learning ECE@NSU


Instance-Based vs Model-Based Learning 28

• Model Based Learning


• Another way to generalize from a set of examples is to build a model of
these examples and then use the model to make predictions

Figure 13: Model-Based learning

CSE445 Machine Learning Introduction to Machine Learning ECE@NSU


Main challenges of Machine Learning 29

• Insufficient quantity of training data


• Non-representative training data
• Poor-quality data
• Irrelevant features
• Overfitting the training data
• Underfitting the training data

CSE445 Machine Learning Introduction to Machine Learning ECE@NSU


Challenge: Insufficient quantity of training data 30

Figure 14: The importance of data versus algorithms. Figure reproduced with permission from Banko and Brill
(2001), “Learning Curves for Confusion Set Disam‐ biguation.”

CSE445 Machine Learning Introduction to Machine Learning ECE@NSU


Challenge: Nonrepresentative Training Data 31

Figure 15: A more representative training sample

CSE445 Machine Learning Introduction to Machine Learning ECE@NSU


Challenge: Poor Quality Data 32

If training data is full of:


• Errors,
• Outliers,
• Missing values, and
• Noise,
It will be harder for the system to detect the underlying pattern, so
the system is less likely to perform well.

CSE445 Machine Learning Introduction to Machine Learning ECE@NSU


Challenge: Irrelevant Features 33

• Feature Engineering steps:


• Feature Selection (selecting the most useful features to train on among
existing features)
• Feature Extraction (combining existing features to produce more useful
ones)
• Creating new features by gathering new data

CSE445 Machine Learning Introduction to Machine Learning ECE@NSU


Challenge: Overfitting the training data 34

Figure 16: Overfitting the training data

Figure 17: Regularization reduces the risk of


overfitting

CSE445 Machine Learning Introduction to Machine Learning ECE@NSU


Challenge: Underfitting the training data 35

• It occurs when your model is too simple to learn the underlying


structure of the data.
• Potential Solutions:
• Select a more powerful model, with more parameters
• Feed better features to the learning algorithm
• Reduce the constraints on the model

CSE445 Machine Learning Introduction to Machine Learning ECE@NSU


Reference and further reading 36

• Chapter 1: “Hands-on Machine Learning with Scikit-Learn, Keras &


TensorFlow”, Aurelien Geron.
• https://www.coursera.org/collections/machine-learning

Python Tutorials:
• https://www.w3schools.com/python/python_syntax.asp

• Video tutorial: Jupyter Notebook Tutorial for Beginners with Python

• https://www.geeksforgeeks.org/how-to-use-jupyter-notebook-an-
ultimate-guide/
• Google Colab tutorial:
https://colab.research.google.com/drive/16pBJQePbqkz3QFV54L4NIk
On1kwpuRrj

CSE445 Machine Learning Introduction to Machine Learning ECE@NSU

You might also like