0% found this document useful (0 votes)
7 views

Module 1

Uploaded by

anushaj
Copyright
© © All Rights Reserved
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
7 views

Module 1

Uploaded by

anushaj
Copyright
© © All Rights Reserved
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 34

The Machine Learning Landscape

• Common Misconception: Machine Learning = Robots


(helpful or harmful)
• Reality: Machine Learning is already here (decades old)
• Examples of Existing Machine Learning Applications:
– Optical Character Recognition (OCR)
– Spam Filters (1990s)
• Machine Learning in Everyday Products & Features:
– Improved Recommendations (e.g., online shopping)
– Voice Search
Introduction
• What is Machine Learning?
– Not simply downloading data
– Machine Learning Exploration:
– Understanding Core Concepts
• Key Regions & Landmarks:
– Supervised vs Unsupervised Learning
– Online vs Batch Learning
– Instance-based vs Model-based Learning
What Is Machine Learning?
• Science & Art of Programming Learning Computers they learn
from data
• General Definition: Give computers ability to learn without explicit
programming (Arthur Samuel, 1959)
• Engineering Definition: Program learns from experience to
improve performance on a specific task (Tom Mitchell, 1997)
• Machine Learning in Action - Spam Filter Example
– Task (T): Flag spam emails
– Experience (E): Training data (examples of spam & non-spam
emails)
– Performance Measure (P): Accuracy (ratio of correctly
classified emails)
Cont...
• Machine Learning vs. Just Downloading Data
– Downloading data (e.g., Wikipedia) doesn't make a computer
learn or improve at tasks
– Machine Learning requires using data to improve performance
on a specific task
Why Use Machine Learning?
The traditional approach - a spam filter using traditional
programming technique
1. First you would look at what spam
typically looks like. You might notice that
some words or phrases (such as “4U,”
“credit card,” “free,” and amazing”) tend to
come up a lot in the subject. Perhaps you
would also notice a few other patterns
in the sender’s name, the email’s body,
and so on.
2. You would write a detection algorithm for
each of the patterns that you noticed,
and your program would flag emails as
spam if a number of these patterns are
detected.
3. You would test your program, and repeat
steps 1 and 2 until it is good enough
Problem - your program will likely
become a long list of complex rules—
pretty hard to maintain.
Traditional Programming:
• Pros: • Cons:
– Precise control over – Requires manual coding for
program logic and every specific task
functionality – Can be inflexible for
– Easier to understand and adapting to new data or
interpret the code situations
– More efficient for well- – Difficulty in handling
defined tasks with clear complex or large datasets
rules – Time-consuming to modify
– Often faster for simpler or update code for changing
tasks requirements
Machine Learning approach

The program is much shorter, easier to maintain,


and most likely more accurate

A spam filter based on Machine Learning


techniques automatically learns which words and
phrases are good predictors of spam by detecting
unusually frequent patterns of words in the spam
examples

if spammers notice that all their emails


containing “4U” are blocked, they might start
writing “For U” instead. A spam filter using
traditional programming techniques would need
to be updated to flag “For U” emails. If spammers
keep working around your spam filter, you will
need to keep writing new rules forever
Machine Learning Approach:
• Pros: • Cons:
– Learns from data, improving – Can be less interpretable ("black
performance over time box") - understanding how the
– Can identify patterns and model arrives at a decision can be
make predictions in difficult
complex data – Requires expertise in Machine
– Adapts to new data and Learning and data preparation
situations without explicit – Training data can be time-
programming consuming and expensive to collect
– Efficient for handling large and label
and evolving datasets – Performance can be unpredictable
and may require ongoing fine-tuning
Automatically adapting to change
• spam filter based on Machine
Learning techniques automatically
notices that “For U” has become
unusually frequent in spam flagged
by users, and it starts flagging
them without your intervention

• Speech recognition - to spell “one”,


Two
• No algorithm and complex
• So machine learning can be
used by providing numerous
recordings.
Machine Learning can help humans learn
Machine Learning can help humans learn
• ML algorithms can be inspected to see what
they have learned (although for some
algorithms this can be tricky).
• For instance, once the spam filter has been
trained on enough spam, it can easily be
inspected to reveal the list of words and
combinations of words that it believes are the
best predictors of spam. Sometimes this will
reveal unsuspected correlations or new
trends, and thereby lead to a better
understanding of the problem.
• Applying ML techniques to dig into large
amounts of data can help discover patterns that
were not immediately apparent. This is called
data mining.
summary:
• Traditional programming is ideal for well-defined tasks
with clear rules and where precise control is needed.
• Machine Learning is a powerful tool for complex problems
with large datasets, where the ability to learn and adapt is
crucial.
Machine Learning is great for:
• Problems for which existing solutions require a lot of hand-tuning
or long lists of rules: one Machine Learning algorithm can often
simplify code and perform better.
• Complex problems for which there is no good solution at all using
a traditional approach: the best Machine Learning techniques can
find a solution.
• Fluctuating environments: a Machine Learning system can adapt
to new data.
• Getting insights about complex problems and large amounts of
data
Types of Machine Learning Systems
• Classify them in broad categories based on:
– Whether or not they are trained with human
supervision (supervised, unsupervised,
semisupervised, and Reinforcement Learning)
– Whether or not they can learn incrementally on the fly
(online versus batchlearning)
– Whether they work by simply comparing new data
points to known data points,or instead detect patterns in
the training data and build a predictive model, much like
scientists do (instance-based versus model-based
learning)
Supervised/Unsupervised Learning

• There are four major categories according to the


amount and type of supervision they get during
training
– supervised learning,
– unsupervised learning,
– semisupervised learning, and
– Reinforcement Learning
Supervised learning
According to the amount and type of supervision they get during training
A typical supervised learning task is classification. The spam filter is a good example of this: it
is trained with many example emails along with their class (spam or ham),and it must learn
how to classify new emails
Supervised learning and Type - Regression
Another typical task is to predict a target numeric value, such as the price of a car, given a set of
features (mileage, age, brand, etc.) called predictors. This sort of task is called regression. To
train the system, you need to give it many examples of cars, including both their predictors and
their labels (i.e., their prices).
Supervised learning and Types
• Classification problems ask the algorithm to predict a discrete
value that can identify the input data as a member of a particular
class or group. Taking up the animal photos dataset, each photo
has been labeled as a dog, a cat, etc., and then the algorithm has
to classify the new images into any of these labeled categories.
• Regression problems are responsible for continuous data, e.g.,
for predicting the price of a piece of land in a city, given the area,
location, etc.. Here, the input is sent to the machine for predicting
the price according to previous instances. And the machine
determines a function that would map the pairs. If it is unable to
provide accurate results, backward propagation is used to repeat
the whole function until it receives satisfactory results.
supervised learning algorithms
• important supervised learning algorithms
– k-Nearest Neighbors
– Linear Regression
– Logistic Regression
– Support Vector Machines (SVMs)
– Decision Trees and Random Forests
– Neural networks
Unsupervised learning

In unsupervised learning, as you might guess, the training data is unlabeled. The system tries
to learn without a teacher.
Unsupervised learning
types of unsupervised learning

there are Four types of unsupervised learning tasks:


– clustering,
– Anomaly detection and novelty detection
– association rules, and
– visualization and dimensionality reduction.
Cont..

• Clustering • Visualization
Cont..

• Anamaly Detection • Association rules


Reinforcement Learning
Comparision
Semi - Supervised Learning
Batch and Online Learning
• Another criterion used to classify Machine Learning whether or
not the system can learn incrementally from a stream of
incoming data.
• Types -
• Batch learning
• Online learning
Instance-Based Versus Model-Based Learning
• Instance-based learning • Model-based learning
Main Challenges of Machine Learning
• “bad algorithm” and “bad data.”
• BAD DATA
– Insufficient Quantity of Training Data
– Nonrepresentative Training Data
– Poor-Quality Data
– Irrelevant Features
– Overfitting the Training Data
– Underfitting the Training Data
Insufficient Quantity of Training Data
The Unreasonable Effectiveness of Data
Nonrepresentative Training Data
Overfitting the Training Data
Cont..

You might also like