0% found this document useful (0 votes)
2 views

Module 01 - Introduction (1)

The document outlines the course CYBR 7240, which provides an introduction to machine learning (ML) concepts and algorithms for upper-level undergraduates. It covers various topics including supervised, unsupervised, and reinforcement learning, along with practical applications and evaluation methods. Prerequisites include basic knowledge of probability, linear algebra, and calculus.

Uploaded by

KarimBerra
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
2 views

Module 01 - Introduction (1)

The document outlines the course CYBR 7240, which provides an introduction to machine learning (ML) concepts and algorithms for upper-level undergraduates. It covers various topics including supervised, unsupervised, and reinforcement learning, along with practical applications and evaluation methods. Prerequisites include basic knowledge of probability, linear algebra, and calculus.

Uploaded by

KarimBerra
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 35

CYBR 7240

Cyber Analytics and Intelligence


Introduction
(Adapted from various sources)
Course Description
• This introductory course is designed:
to give upper level undergraduate a broad overview of many concepts and
algorithms in ML.
covers the theory and practical algorithms for machine learning from a variety of
perspectives.
• Topics:
Statistical and probabilistic methods, generative and discriminative models, linear
and logistic regression, decision tree learning, unsupervised learning and clustering
and dimensionality reduction.

• In addition, the course covers fundamental concepts such as training,


validation, overfitting, and error rates
Prerequisites
• Basics knowledge of probability, linear algebra and calculus.
• For example, standard probability distributions and also how to calculate
derivatives.
A Few Quotes
• “A breakthrough in machine learning would be worth ten Microsofts” (Bill
Gates, Founder, Microsoft)
• “Machine learning is the next Internet”
• (Tony Tether, Director, DARPA)
• Machine learning is the hot new thing”
• (John Hennessy, President, Stanford)
• “Machine learning is Google’s top priority” (Eric Schmidt, Chairman,
Alphabet)
• “Machine learning is Microsoft Research’s largest investment area” (Peter
Lee, Head, Microsoft Research)
• “‘Data scientist’ is the hottest job title in Silicon Valley” (Tim O’Reilly,
Founder, O’Reilly Media)
What is ML?
• Algorithm and processes that learn from past data in order to predict
future outcomes.
• Set of mathematical techniques enables a process of info mining,
pattern discovery, and drawing inference from data.
What is ML?
Example:
• If shape of object is rounded and having color Red then
it will be labelled as –Apple.
• If shape of object is long curving cylinder having color Green-Yellow then it
will be labelled as –Banana.
What is Machine Learning?
Machine Learning Based Data Analytics
• Machine learning eliminates a lot of needs for
human monitoring in analytics
• That is not to say it can do everything by itself
• The models can learn from data, or be trained
from data to determine
• Which features are important
• New features that are not known to the users
• Which set of rules will best map features to the
desired output
• Machine learning models can continue learning to
adapt to new data

Key difference: Machine learning models learn from


data (or are trained from data) instead of being built
step-by-step by analysts
Sample, Feature, and Label
Types of Data
• Numerical vs.
Categorical (Nominal)
ML in practice
• Understanding domain, prior knowledge, and goals
• Data integration, selection, cleaning
• pre-processing, etc.
• Learning models
• Interpreting results
• Consolidating and deploying discovered knowledge
• Loop
Types of learning
• Supervised (inductive) learning
• Training data includes desired outputs
• Unsupervised learning
• Training data does not include desired outputs
• Semi-supervised learning
• Training data includes a few desired outputs
• Reinforcement learning
• Rewards from sequence of actions
What is ML (Supervised)?
• Supervised:
• we teach or train the machine using data which is well labeled.
• Data is already tagged with the correct answer.
• Given new set of examples, the ML supervised learning algorithm analyses
the training data and produces a correct outcome from labeled data.

• Two categories:
• Classification: A classification problem is when the output variable is a
category, such as “Red” or “blue” or “disease” and “no disease”.
• Regression: A regression problem is when the output variable is a real value,
such as “dollars” or “weight”.
Classification
Example:
Ref: Western Digital
Typical Supervised Learning Techniques
• K Nearest Neighbors
• Linear Regression
• Logistic Regression
• Support Vector Machines (SVMs)
• Decision Trees
• Random Forests
• Certain Types of Neural Networks
Supervised Learning Techniques
Unsupervised learning
• Create an internal representation of the input, capturing
regularities/structure in data
• Example:
Clustering: Discover groups of similar inputs (documents, images, etc)
What is ML (Unsupervised)?
Unsupervised
• Training of ML using information that is neither classified nor labeled.
• ML algorithms act on that information without guidance (group
unsorted information according to similarities, patterns and
differences without any prior training of data).
• No training will be given to the machine.
• ML is restricted to find the hidden structure in unlabeled data.
What is ML?
Two categories of algorithms:
• Clustering: discovering the inherent groupings in the data, such as
grouping customers by purchasing behavior.
• Association: discovering rules that describe large portions of data,
such as people that buy X also tend to buy Y.
Unsupervised Learning Techniques
What is ML?
Example:
• Document Clustering
• Finding fraudulent transactions
Clustring
Example:
Clustring
Example:
Ref: Western Digital
Example: Association Rules
• Stores can base on customers’
purchase history to determine their
shopping patterns
• If someone buys certain combinations of
products, it’s more likely they will also
buy some other products
• Useful for placing items in stores and
targeting ads
Typical Unsupervised Learning Techniques
• Clustering
• k-Means
• Hierarchical Cluster Analysis (HCA)
• Expectation Maximization
• Visualization and dimensionality reduction
• Principal Component Analysis (PCA)
• Kernel PCA Locally-Linear Embedding (LLE)
• t-distributed Stochastic Neighbor Embedding (t-SNE)
• Association rule learning
• Apriori
• Eclat
Supervised vs. Unsupervised Learning
#Account Balance 3-month 6-month Outcome
past due past due History Literature Math Chemistry Physics Group
3 120 0 0 Good 70 75 90 95 93 Good at Science
1 100 120 0 Good 77 79 85 83 81 Average at both
5 1000 600 300 Bad 90 95 75 80 73 Good at Social
3 300 100 0 Bad 90 90 95 90 95 Good at both

Supervised models Unsupervised models


 Need to know both the features and  Do not need to know both the features
the labels during training and the labels during training
 Which means we feed both the  Which means we feed only the
features and the labels in data to a features of to an unsupervised model
supervised model  Outcome is solely speculated by the
 Try to recreate labels from features that model
match the true labels as accurate as  May or may not be true!
possible
What is ML?
• Reinforcement Learning
• The agent acts in an environment in order to maximize the rewards and
minimize the penalty.
• Unlike supervised learning, no data is provided to the agent.
• The agent itself takes action or sequence of actions whether right or wrong
to perform a task and learn from the experience.
• Example:
• Game Playing
• Robot Navigation
Reinforcement Learning
• Trains agents to take actions in an
environment that results in the
most reward
• Is more on the Artificial
Intelligence side of machine
learning
Semi-supervised Learning

• A hybrid between supervised and unsupervised learning


• Predefined labels are available for a small portions of data
• The rest are unlabeled
The Overall Picture Machine Learning

Supervised Unsupervised Semi-Supervised Reinforcement


Learning Learning Learning Learning

 Classification  Clustering
 Regression  Dimension
Reduction
 Association Rules
The machine learning framework
y = f(x)
output prediction Image
function feature

• Training: given a training set of labeled examples {(x1,y1), …, (xN,yN)}, estimate the
prediction function f by minimizing the prediction error on the training set
• Testing: apply f to a never before seen test example x and output the predicted
value y = f(x)

Slide credit: L. Lazebnik


ML Steps
Training Training
Labels
Training
Images
Image Learned
Training
Features model

Testing
Image Learned
Prediction
Features model
Test Image
Slide credit: D. Hoiem and L.
Lazebnik
Types of testing
• Evaluate performance by testing on data NOT used for testing
(both should be randomly sampled)
• Cross validation methods for small data sets
• The more (relevant) data the better.
Testing
• How well the learned system work?
• Generalization
• Performance on unseen or unknown scenarios or data
• Brittle vs robust performance
Evaluation
• Given some data, how can we tell if a function is “good”?
• Accuracy
• Precision and recall
• Squared error
• Likelihood
• Posterior probability
• Cost / Utility
• Margin
• Entropy
• K-L divergence
• Etc.

You might also like