CSE445 1 Intro To ML
CSE445 1 Intro To ML
• What is ML
• Types of ML
• Supervised/Unsupervised/Semi-Supervised/Reinforcement Learning
• Online/Batch Learning
• Instance/Model Based Learning
• Challenges of ML
Spam email/SMS:
Emails and/SMS that contain unwanted or dangerous content.
Solution:
Use a spam filter to identify such emails/SMSs to flag them as spam
Problem:
Spammers change the patterns. There is a need Figure 1: The traditional approach to software designing
to keep writing new rules forever.
Figure 4: A labeled training set for spam classification (example of supervised learning)
Figure 5: A labeled
training set for housing
price prediction (example
of supervised learning)
• We could turn this example into a classification problem by instead making our output
about whether the house "sells for more or less than the asking price." Here we are
classifying the houses based on price into two discrete categories.
CSE445 Machine Learning Introduction to Machine Learning ECE@NSU
Supervised Learning Example (contd.) 13
Example 2:
• Can we define breast cancer as malignant or benign based on
tumor size?
Example 2:
• This is an example of a classification problem
• Classify data into one of two discrete classes - no in between, either malignant or not
• In classification problems, can have a discrete number of possible values for the output
• e.g. maybe have four values
• 0 - benign
• 1 - type 1
• 2 - type 2
• 3 - type 4
Example 2:
• In classification problems we can plot data in a different way
Example 3:
• (a) Regression - Given a picture of male/female, we have to
predict his/her age on the basis of given picture.
• (b) Classification - Given a picture of male/female, we have to
predict whether he/she is of high school, college, graduate age.
• Another example for classification - Banks have to decide whether
or not to give a loan to someone on the basis of his credit history.
• Clustering
• K-Means
• DBSCAN
• Hierarchial Cluster Analysis (HCA)
• Anomaly Detection and novelty detection
• One-class SVM
• Isolation Forest
• Visualization and dimensionality reduction
• Principal Component Analysis (PCA)
• Kernel PCA
• Locally Linear Embedding (LLE)
• T-Distributed Stochastic Neighbor Embedding (t-SNE)
• Association rule learning
• Apriori
• Eclat
• Online Learning
• Train the system incrementally by feeding data instances sequentially,
either individually or in small groups called mini-batches.
• Each learning step is fast and cheap
• The system can learn about new data on the fly as it arrives.
• Suitable for systems that receive data as a continuous flow (e.g. stock
prices) and need to adapt autonomously.
• Also suitable to train systems on huge datasets that cannot fit in one
machine’s main memory (out-of-core learning)
• Challenge: feeding bad data gradually declines the system’s performance
Figure 14: The importance of data versus algorithms. Figure reproduced with permission from Banko and Brill
(2001), “Learning Curves for Confusion Set Disam‐ biguation.”
Python Tutorials:
• https://www.w3schools.com/python/python_syntax.asp
• https://www.geeksforgeeks.org/how-to-use-jupyter-notebook-an-
ultimate-guide/
• Google Colab tutorial:
https://colab.research.google.com/drive/16pBJQePbqkz3QFV54L4NIk
On1kwpuRrj