0% found this document useful (0 votes)
8 views

Lecture01 Introduction To Machine Learning (Chapter1)

Uploaded by

emad qedies
Copyright
© © All Rights Reserved
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
8 views

Lecture01 Introduction To Machine Learning (Chapter1)

Uploaded by

emad qedies
Copyright
© © All Rights Reserved
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 64

Lecture 1 Introduction

CSC 484 / 584, DA 515


Introduction of Machine Learning

Fall 2024
Outline
 Fundamental concepts:
 What Is Machine Learning, Why do we use Machine Learning
 Types of Learning
 Supervised Learning
 Unsupervised Learning
 Recommendation
 Reinforcement Learning
 Design cycle
 Data collection
 Feature choice
 Model choice
 Training
 Evaluation
 Demo: SK-learn example, Linear Regression
2
The main goal for this course
 Get a deeper view into the process.
 the theory of machine learning
 why, when, which, what…

 There are many people who self-learned machine


learning and start working in the area after they learned it
 realizing that they are missing a lot of foundations.

 On the practical side of ML, the Internet is full of


resources for you to learn ML.

3
What is Machine Learning?
 Definition 1: Machine Learning is the science (and art) of
programming computers so they can learn from data.

 Definition 2: A computer program is said to learn from


experience E (or data D) with respect to some task T, and
some performance measure P, if its performance on T, as
measured by P, improves with experience E (or data D).

- For example: Think about how a doctor diagnoses the


illness.

4
Why Use Machine Learning?
 Consider how you would write a program of spam filter

The traditional approach: you figure out the rules


5
What are the rules you can use?

6
Why Use Machine Learning?
 Consider how you would write a program of spam filter

Machine Learning approach: learning rules from data


7
Why Use Machine Learning?
 Classification: Looking for a Function: y = f(X)
 Simple regression

y=f( ) => y =ax +b

• The relation y = f(X) is very hard to figure out


⁻ Speech Recognition
y=f( ) => “How are you”

⁻ Image Recognition
y=f( ) => “dog
” 8
Machine Learning approach
Why Machine Learning Is Possible?
 Mass Storage
 More data available

 Higher Performance of Computer vs distributed Computing


 Larger memory in handling the data
 Greater computational power for calculating and even

online learning

 More and more Algorithms

9
Types of Learning
 Supervised learning (Classification)
 Training data includes desired outputs
 Unsupervised learning (Clustering)
 Training data does not include desired outputs
 Recommendation (Collaborative Learning)

 Semi-supervised learning
 Training data includes a few desired outputs
 Reinforcement learning
 Rewards from sequence of actions
 Learning from delayed feedback by interact with
10
environment
Supervised Learning
 In supervised learning, the training data you feed to the
algorithm includes the desired solutions, called labels.

 Example: learn how to classify new emails

11
Supervised Learning Algorithms
 k-Nearest Neighbors
 Linear Regression/Generalized Regression
 Logistic Regression
 Support Vector Machines (SVMs)
 Decision Trees
 Random Forests
 Neural Networks
 Naïve Bayes

12
Unsupervised Learning
 In unsupervised learning, the training data you feed to the
algorithm is unlabeled.

An unlabeled training set for Clustering


unsupervised learning

 Visualization: https://projector.tensorflow.org/
13
Unsupervised Machine Learning Algorithms
 Unsupervised Machine Learning Algorithms
 Clustering
 k-Means
 Hierarchical Cluster Analysis (HCA)

 Expectation Maximization

 Visualization and dimensionality reduction


 Principal Component Analysis (PCA)
 Kernel PCA
 Locally-Linear Embedding (LLE)
 t-distributed Stochastic Neighbor Embedding (t-SNE)
 Association rule learning (Skipped)
 Apriori
 Eclat 14
Semi-supervised Learning
 Some algorithms can deal with partially labeled training
data, usually a lot of unlabeled data and a little bit of
labeled data.

 Example: Google Photos


 you upload all your family photos to the service, it automatically
recognizes that the same person A shows up in photos 1, 5, and
11, while another person B shows up in photos 2, 5, and 7. This is
the unsupervised part of the algorithm (clustering).
 the system needs you to tell it who these people are. Just one
label per person, and it is able to name everyone in every photo,
which is useful for searching photos.

15
Reinforcement Learning
 The learning system, called an agent, can observe the
environment, select and perform actions, and get rewards
in return (or penalties in the form of negative rewards)
 learn by itself what is the best strategy, called a policy

A policy defines what


action the agent should
choose when it is in a
given situation.

16
Reinforcement Learning
 Learning a policy: A sequence of outputs
 No supervised output but delayed reward
 Credit assignment problem
 Game playing
 Robot in a maze
 Multiple agents, partial observability, ...

17
Another criterion used to
classify ML systems: Batch and
Online Learning
- Whether or not they can learn incrementally
(1) Batch learning(Offline) (2) On the fly (online)
Example of Online Learning (such as stock price)
A model is trained and launched into production, and then it keeps learning as
new data comes in

18
Instance-Based Versus Model-Based
Learning
Third way to categorize Machine Learning systems
 Whether they work by simply comparing new data points to known
data points, or instead detect patterns in the training data and build a
predictive model, much like scientists do

 instance-based versus model-based learning

19
Applications
Application: Character recognition
 Automated mail sorting, processing bank checks
 Scanner captures an image of the text
 Image is converted into constituent characters

21
21
Different Algorithms

22
Application: Finger prints recognition

23
Application:
Image Segmentation

24
Application: Brain Tissue Segmentation

25
More Applications Book P.5
 Analyzing images of products on a production line to automatically
classify them
 Detecting tumors in brain scans
 Automatically classifying news articles
 Automatically flagging offensive comments on discussion forums
 Summarizing long documents automatically
 Creating a chatbot or a personal assistant
 Forecasting your company’s revenue next year, based on many
performance metrics
 Making your app react to voice commands
 Detecting credit card fraud
 Segmenting clients based on their purchases so that you can design
a different marketing
 strategy for each segment
 Representing a complex, high-dimensional dataset in a clear and
insightful diagram 26
 Recommending a product that a client may be interested in, based
ChatGPT (Nov. 2022)
 ChatGPT is a chatbot launched by OpenAI

 It is built on top of OpenAI's GPT-3 family of large language


models (now GPT-4o)
o supervised learning and

o reinforcement learning

o Similar work: Co-pilot (MSFT), Gemini (Google), LLaMa(FB),


etc.

o What can you do with LLMs:


- Demo: co-pilot from Microsoft

27
A typical machine learning system
 A machine learning system contains
 A sensor
 A preprocessing mechanism
 A feature extraction mechanism (manual or automated)
 A classification algorithm
 A set of examples (training set) already classified or described

28
Common Machine Learning Algorithms
1. Supervised:
 Regression: Linear Regression and more
 Classification

 Naïve Bayes Classifier Algorithm


 Support Vector Machine Algorithm

 Decision Trees
 Nearest Neighbors
 Logistic Regression

 Artificial Neural Networks


I will try to explain these
 Random Forests algorithms to you in this
2. Unsupervised: semester.
 K Means Clustering Algorithm

 Hierarchical Cluster Analysis (HCA)

 PCA .. 29
The design cycle
 Data collection
 Probably the most time-intensive component of a PR project
 How many examples are enough?
 Feature Selection/Engineering
 Critical to the success of the PR problem
 “Garbage in, garbage out”
 Requires basic prior knowledge
 Model choice
 Statistical, neural and structural approaches
 Parameter settings
 Training
 Given a feature set and a “blank” model, adapt the model to explain
the data
 Supervised, unsupervised and reinforcement learning
 Evaluation
 How well does the trained model do?
 Overfitting vs. generalization
30
Features

 Features: These are measurable quantities obtained


from the patterns, and the classification task is based
on their respective values.
x1 ,..., xl ,
 Feature vectors: A number of features constitute
the feature vector
 T
x  x1 ,..., xl  R l

 Feature vectors are treated as random vectors.

31
Features
 The combination of d features is represented as a d-
dimensional column vector called a feature vector
 The d-dimensional space defined by the feature vector is
called the feature space
 Objects are represented as points in feature space. This
representation is called a scatter plot

32
Feature extraction

Task: to extract features which are good for classification.


Good features:
• Objects from the same class have similar feature values.
• Objects from different classes have different values.

“Good” features “Bad” features


33
More feature properties

34
The design cycle
 Data collection
 Probably the most time-intensive component of a PR project
 How many examples are enough?
 Feature choice
 Critical to the success of the PR problem
 “Garbage in, garbage out”
 Requires basic prior knowledge
 Model choice
 Statistical, neural and structural approaches
 Parameter settings
 Training
 Given a feature set and a “blank” model, adapt the model to explain
the data
 Supervised, unsupervised and reinforcement learning
 Evaluation
 How well does the trained model do?
 Overfitting vs. generalization
35
Consider the following scenario:
Classification
 A fish processing plan wants to automate the process of sorting
incoming fish according to species (salmon or sea bass)
 The automation system consists of
 a conveyor belt for incoming products
 two conveyor belts for sorted products
 a pick-and-place robotic arm
 a vision system with an overhead CCD camera
 a computer to analyze images and control the robot arm

36
36

From [Duda, Hart and Stork, 2001]


Improving the performance of our ML system
 We first use “length” as a feature for classification
 Determined to achieve a recognition rate of 95%, we try a number of
features such as width, area, position of the eyes w.r.t. mouth...
 Finally we find a “good” feature: average intensity of the scales, and
we reach a recognition rate of 93%.

37
Improving the performance of our ML system
 We combine “length” and “average intensity of the
scales” to improve class separability
 We compute a linear discriminant function to separate
the two classes, and obtain a classification rate of 95.7%

Task: maximization of
classification accuracy.

Task: minimization of
classification error.

38
Cost versus Classification rate
 Our linear classifier was designed to minimize the overall
misclassification rate
 Is this the best objective function for our fish processing plant?
 The cost of misclassifying salmon as sea bass is that the end customer will
occasionally find a tasty piece of salmon when he purchases sea bass
 The cost of misclassifying sea bass as salmon is an end customer upset
when he finds a piece of sea bass purchased at the price of salmon
 Intuitively, we could adjust the decision boundary to minimize this
cost function

39
The issue of generalization
 The recognition rate of our linear classifier (95.7%) met the design
specs, but we still think we can improve the performance of the
system
 We then design an artificial neural network with five hidden layers, a
combination of logistic and hyperbolic tangent activation functions, train
it with the Levenberg-Marquardt algorithm and obtain an impressive
classification rate of 99.9975% with the following decision boundary

40
The design cycle
 Data collection
 Probably the most time-intensive component of a PR project
 How many examples are enough?
 Feature choice
 Critical to the success of the PR problem
 “Garbage in, garbage out”
 Requires basic prior knowledge
 Model choice
 Statistical, neural and structural approaches
 Parameter settings
 Training
 Given a feature set and a “blank” model, adapt the model to explain
the data
 Supervised, unsupervised and reinforcement learning
 Evaluation
 How well does the trained model do?
 Overfitting vs. generalization
41
The issue of generalization
 Satisfied with our classifier, we integrate the system and
deploy it to the fish processing plant
 After a few days, the plant manager calls to complain
that the system is misclassifying an average of 25% of
the fish

 What went wrong?

42
Overfitting and underfitting

Problem: Training vs Testing

underfitting good fit overfitting

43
Avoid overfitting/underfitting
 Dataset Splitting: Split your data into two sets

 Training Error vs Testing Error


 Underfitting: model is too simple.
 Overfitting: If the training error is low (i.e., your

model makes few mistakes on the training set),


but the generalization error (in testing set) is
high 44
Bias, Variant, residual error
Assume:
(1) There is a underlining model, we use data to generalize it
(2) Data is not accurate, we allow some errors.
Total Errors = ModelError(Bias) + SampleError(Variant) + residual

45
DA 515 vs. DA 516 will cover:
 DA 515:
 Basic Knowledge
 Traditional algorithms

 SK-learn

 DA 535: Deep Learning


 Convolutional Neural Network

 Deep neural Learning


 Recurrent Neural Network

 Transformer, GPT

 Reinforcement Learning
 Keras/Tensorflow

46
DEMO: Linear Regression

47
Example: Linear Regression
 Linear Regression
 Formula
 Gradient descent
 Sk-Learn

48
Regression-1: Evaluation
How to measure your model

49
Regression-2:
find the optimum fitness function
 Optimization problem

 Optimization Methods: Gradient Descent, Stochastic …..

50
Hidden behind:
Gradient Descent (from Internet)

51
Demo: 1. A Linear Regression.ipynb
1. B Decision Tree.ipynb
 Data preparation
X_train, y_train, X_test, y_test
 SK-Learn Library
1. Model representation
lin_reg = LinearRegression()

2. Training (Optimization)
lin_reg.fit(X_train, y_train)

3. Testing
predictions = lin_reg.predict(X_test)
52
Summary of Chapter 1
 Data collection
 Probably the most time-intensive component of a PR project
 How many examples are enough?
 Feature choice
 Critical to the success of the PR problem
 “Garbage in, garbage out”
 Requires basic prior knowledge
 Model choice
 Statistical, neural and structural approaches
 Parameter settings
 Training
 Given a feature set and a “blank” model, adapt the model to explain
the data
 Supervised, unsupervised and reinforcement learning
 Evaluation
 How well does the trained model do?
 Overfitting vs. generalization
53
Setup Python Environment

We will use Jupyter notebook through


this class

Installation Instruction is give separately.

54
Software
 Learning Python
 Google Developer Python Tutorial
 https://developers.google.com/edu/python/

 NumPy Tutorial
 https://www.tutorialspoint.com/numpy/

 Python tutorial
 http://docs.python.org/tutorial/

 Python quick reference


 https://www.python.org/ftp/python/doc/quick-ref.1.3.html

55
Software
 Python Library
 Scikit-learn -- machine learning in Python
 https://scikit-learn.org/stable/

 Tensorflow -- open-source low-level machine learning


library
 https://www.tensorflow.org/
 https://www.tensorflow.org/tutorials

 Keras -- Python deep learning library


 https://keras.io/

56
Resources
 Kaggle Competition
https://www.kaggle.com/
 Web pages:
https://machinelearningmastery.com/
https://www.geeksforgeeks.org/machine-learning/

 Ready-to-use Data Science code snippets created by


industry experts https://www.dezyre.com/project/recipe-list

 There are a lot of more resources:


Github/Youtube/Google
57
End of lecture
 SK-learn overview
https://scikit-learn.org/stable/

 Students:
 Read Chapter 1
 Next Lecture: Chapter 2 (ML pipeline)

58
Last Several Slides

For your reference


What is the difference?
 Statistics
 Data Mining

 Pattern Recognition

 Machine Learning

 Artificial Intelligent (AI)

These aren't just buzzwords we use to sound cool. To the


uninitiated, all the terms tend to sound alike, and many of
them have been used more or less interchangeably in the
popular press. However, there are subtle differences
60
What is the difference?
 Statistics is just about the numbers, and quantifying the data.
(descriptive vs. inferential )
 There are many tools for finding relevant properties of the data but

this is pretty close to pure mathematics.

 Data Mining is about using Statistics as well as other programming


methods to find patterns hidden in the data so that you can explain some
phenomenon.
 Data Mining builds intuition about what is really happening in some

data and is still little more towards math than programming, but uses
both.

 (Statistical) Pattern Recognition has more to do with the task a


Machine Learning system is trying to accomplish. It is a branch of
machine learning which works by recognizing the patterns and
61
regularities in data.
What is the difference?
 Machine Learning is an umbrella term that covers all
technologies in which a machine is able to “learn” on its
own, without having that knowledge explicitly
programmed into it.
 Machine Learning is a form of Pattern Recognition.
Machine learning is basically the idea of training machines
to recognize patterns and apply it to practical problems.
 Machine Learning uses Data Mining techniques and
other learning algorithms to build models of what is
happening behind some data so that it can predict future
outcomes.

62
What is the difference?
 Artificial Intelligence uses models built by Machine
Learning and other ways to reason about the world and
give rise to intelligent behavior whether this is playing a
game or driving a robot/car.

 Artificial Intelligence has some goal to achieve by


predicting how actions will affect the model of the world
and chooses the actions that will best achieve that goal.

63
Summary
 Statistics quantifies numbers
 Pattern Recognition finding patterns
 Data Mining explains patterns
 Machine Learning predicts with models
 Artificial Intelligence behaves and reasons

 Data Analytics vs Data Science:


1. Data analysts examine large data sets to identify trends,
develop charts, and create visual presentations to help
businesses make more strategic decisions.
2. Data scientists, on the other hand, design and construct new
processes for data modeling and production using prototypes,
algorithms, predictive models, and custom analysis.

64

You might also like