0% found this document useful (0 votes)

24 views41 pages

Module 1

Uploaded by

gaya3devi.2003b

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

24 views41 pages

Module 1

Uploaded by

gaya3devi.2003b

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

Module 1

Department of Computer Science & www.cambridge.edu.in

Engineering
Department of Computer Science & www.cambridge.edu.in
Engineering
Why Use Machine Learning?

HAM and SPAM E-Mails

Department of Computer Science & www.cambridge.edu.in

Engineering
Automatically adapting to change

Eg: 4U/For U

Machine Learning Approach

Department of Computer Science & www.cambridge.edu.in

Engineering
Machine Learning can help humans learn

Department of Computer Science & www.cambridge.edu.in

Engineering
To summarize, Machine Learning is great for:
• Problems for which existing solutions require a lot of hand-tuning or
long lists of rules: one Machine Learning algorithm can often simplify
code and perform better.
• Complex problems for which there is no good solution at all using a
traditional approach: the best Machine Learning techniques can find a
solution.
• Fluctuating environments: a Machine Learning system can adapt to
new data.
• Getting insights about complex problems and large amounts of data.

Department of Computer Science & www.cambridge.edu.in

Engineering
Types of Machine Learning Systems

There are so many different types of Machine Learning systems that it is

useful to classify them in broad categories based on:
• Whether or not they are trained with human supervision (supervised,
unsupervised, semi supervised, and Reinforcement Learning)
• Whether or not they can learn incrementally on the fly (online versus
batch learning)
• Whether they work by simply comparing new data points to known data
points, or instead detect patterns in the training data and build a predictive
model, much like scientists do (instance-based versus model-based
learning)

Department of Computer Science & www.cambridge.edu.in

Engineering
Supervised/Unsupervised Learning

Machine Learning systems can be classified according to the

amount and type of supervision they get during training.
There are four major categories:
• Supervised learning
• Unsupervised learning
• Semi supervised learning
• Reinforcement Learning

Department of Computer Science & www.cambridge.edu.in

Engineering
Supervised learning

In supervised learning, the

training data you feed to the
algorithm includes the desired
solutions, called labels

Department of Computer Science & www.cambridge.edu.in

Engineering
• A typical supervised learning task is classification.
• The spam filter is a good example of this: it is
trained with many example emails along with their
class (spam or ham),
• and it must learn how to classify new emails.
• Another typical task is to predict a target numeric
Here are some of the most important value, such as the price of a car, given a set of
supervised learning algorithms (covered in this features (mileage, age, brand, etc.) called
book): predictors. This sort of task is called regression
• k-Nearest Neighbors
• Linear Regression • To train the system, you need to give it many
• Logistic Regression examples of cars, including both their predictors
• Support Vector Machines (SVMs) and their labels (i.e., their prices).
• Decision Trees and Random Forests
• Neural networks2

Department of Computer Science & www.cambridge.edu.in

Engineering
Machine Learning Regression is a technique for investigating the relationship
between independent variables or features and a dependent variable or
outcome. It's used as a method for predictive modelling in machine learning, in which
an algorithm is used to predict continuous outcomes

Department of Computer Science & www.cambridge.edu.in

Engineering
Unsupervised learning

In unsupervised learning, as you might guess, the training

data is unlabeled
The system tries to learn without a teacher.

Department of Computer Science & www.cambridge.edu.in

Engineering
Some of the most important unsupervised learning algorithms
• Clustering
—K-Means
—DBSCAN
—Hierarchical Cluster Analysis (HCA)
• Anomaly detection and novelty detection
—One-class SVM
—Isolation Forest
• Visualization and dimensionality reduction
—Principal Component Analysis (PCA)
—Kernel PCA
—Locally-Linear Embedding (LLE)
—t-distributed Stochastic Neighbor Embedding (t-SNE)
• Association rule learning
—Apriori
—Eclat

Department of Computer Science & www.cambridge.edu.in

Engineering
For example, say you have a lot of data about your blog’s visitors.
You may want to run a clustering algorithm to try to
detect groups of similar visitors (Figure 1-8). At no point do you
tell the algorithm which group a visitor belongs to: it finds those
connections without your help. For example, it might notice that
40% of your visitors are males who love comic books and
generally read your blog in the evening, while 20% are young sci-
fi lovers who visit during the weekends, and so on. If you use a
hierarchical clustering algorithm, it may also subdivide each
group into smaller groups. This may help you target your posts for
each group.

Department of Computer Science & www.cambridge.edu.in

Engineering
Visualization algorithms are also good
examples of unsupervised learning algorithms:
you feed them a lot of complex and unlabeled data, and
they output a 2D or 3D representation of your data that
can easily be plotted (Figure 1-9). These algorithms try
to preserve as much structure as they can (e.g., trying to
keep separate clusters in the input space from
overlapping in the visualization), so you can understand
how the data is organized and perhaps identify
unsuspected patterns.

Department of Computer Science & www.cambridge.edu.in

Engineering
A related task is dimensionality
reduction, in which the goal is to simplify
the data without losing too much information.
One way to do this is to merge several
correlated features into one. For example, a
car’s mileage may be very correlated with its
age, so the dimensionality reduction algorithm
will merge them into one feature that represents
the car’s wear and tear. This is called feature
extraction
Department of Computer Science & www.cambridge.edu.in
Engineering
Yet another important unsupervised task is anomaly
detection—for example, detecting unusual credit card
transactions to prevent fraud, catching manufacturing defects, or
automatically removing outliers from a dataset before feeding it
to another learning algorithm. The system is shown mostly
normal instances during training, so it learns to recognize them
and when it sees a new instance it can tell whether it looks like a
normal one or whether it is likely an anomaly. A very similar task
is novelty detection: the difference is that novelty detection
algorithms expect to see only normal data during training, while
anomaly detection algorithms are usually more tolerant, they can
often perform well even with a small percentage of outliers in the
training set

Department of Computer Science & www.cambridge.edu.in

Engineering
Finally, another common unsupervised task is
association rule learning, in which the
goal is to dig into large amounts of data and discover
interesting relations between attributes. For example,
suppose you own a supermarket. Running an association
rule on your sales logs may reveal that people who
purchase barbecue sauce and potato chips also tend to buy
steak. Thus, you may want to place these items close to
each other.

Department of Computer Science & www.cambridge.edu.in

Engineering
Semisupervised learning

Some algorithms can deal with partially labeled training data, usually a lot of unlabeled data and a little bit
of labeled data. This is called semisupervised learning

Some photo-hosting services, such as Google Photos, are good examples of this. Once you upload all
your family photos to the service, it automatically recognizes that the same person A shows up in
photos 1, 5, and 11, while another person B shows up in photos 2, 5, and 7. This is the unsupervised
part of the algorithm (clustering). Now all the system needs is for you to tell it who these people are.
Just one label per person,4 and it is able to name everyone in every photo, which is useful for searching
photos.

Department of Computer Science & www.cambridge.edu.in

Engineering
Department of Computer Science & www.cambridge.edu.in
Engineering
Reinforcement Learning

The learning system, called an agent in this context,

can observe the environment, select and perform
actions, and get rewards in return (or penalties in the
form of negative rewards). It must then learn by itself
what is the best strategy, called a policy, to get the
most reward over time. A policy defines what action
the agent should choose when it is in a given
situation.

For example, many robots implement Reinforcement

Learning algorithms to learn how to walk.

Department of Computer Science & www.cambridge.edu.in

Engineering
Department of Computer Science & www.cambridge.edu.in
Engineering
Batch and Online Learning
Another criterion used to classify Machine Learning systems is
whether or not the system can learn incrementally from a stream of
incoming data.
In batch learning, the system is incapable of learning incrementally: it must be
trained using all the available data. This will generally take a lot of time and
Batch learning computing resources, so it is typically done offline. First the system is trained,
and then it is launched into production and runs without learning anymore; it
just applies what it has learned. This is called offline learning.

If you want a batch learning system to know about new data (such as a new type of spam), you
need to train a new version of the system from scratch on the full dataset (not just the new data,
but also the old data), then stop the old system and replace it with the new one.

Department of Computer Science & www.cambridge.edu.in

Engineering
Drawbacks of batch processing system

• training using the full set of data can take many hours
• training on the full set of data requires a lot of computing resources (CPU, memory
space, disk space, disk I/O, network I/O, etc.). If you have a lot of data and you
automate your system to train from scratch every day, it will end up costing you a lot
of money. If the amount of data is huge, it may even be impossible to use a batch
learning algorithm
• Finally, if your system needs to be able to learn autonomously and it has limited
resources (e.g., a smartphone application or a rover on Mars), then carrying around
large amounts of training data and taking up a lot of resources to train for hours
every day is a showstopper

Fortunately, a better option in all these cases is to use algorithms that are capable of
learning incrementally.

Department of Computer Science & www.cambridge.edu.in

Engineering
Online learning
In online learning, you train the system incrementally by feeding it data instances
sequentially, either individually or by small groups called mini-batches. Each learning step
is fast and cheap, so the system can learn about new data on the fly, as it arrives

Department of Computer Science & www.cambridge.edu.in

Engineering
Department of Computer Science & www.cambridge.edu.in
Engineering
Online learning is great for systems that receive data as a continuous flow (e.g., stock
prices) and need to adapt to change rapidly or autonomously.
It is also a good option if you have limited computing resources: once an online learning
system has learned about new data instances, it does not need them anymore, so you can
discard them (unless you want to be able to roll back to a previous state and “replay” the data).
This can save a huge amount of space.

Online learning algorithms can also be used to train

systems on huge datasets that cannot fit in one
machine’s main memory (this is called out-of-core
learning). The algorithm loads part of the data, runs a
training step on that data, and repeats the process until it
has run on all of the data

Department of Computer Science & www.cambridge.edu.in

Engineering
Department of Computer Science & www.cambridge.edu.in
Engineering
One important parameter of online learning systems is how fast they should adapt to
changing data: this is called the learning rate. If you set a high learning rate, then your
system will rapidly adapt to new data, but it will also tend to quickly forget the old data
(you don’t want a spam filter to flag only the latest kinds of spam it was shown).
Conversely, if you set a low learning rate, the system will have more inertia; that is, it
will learn more slowly, but it will also be less sensitive to noise in the new data or to
sequences of nonrepresentative data points (outliers).

A big challenge with online learning is that if bad data is fed to the system, the system’s performance will
gradually decline. If we are talking about a live system, your clients will notice. For example, bad data could
come from a malfunctioning sensor on a robot, or from someone spamming a search engine to try to rank high in
search results. To reduce this risk, you need to monitor your system closely and promptly switch learning off (and
possibly revert to a previously working state) if you detect a drop in performance. You may also want to monitor
the input data and react to abnormal data

Department of Computer Science & www.cambridge.edu.in

Engineering
Instance-Based Versus Model-Based Learning

• One more way to categorize Machine Learning systems is by how they

generalize
• Most Machine Learning tasks are about making predictions. This means that
given a number of training examples, the system needs to be able to
generalize to examples it has never seen before
• There are two main approaches to generalization: instance-based learning
and model-based learning

Department of Computer Science & www.cambridge.edu.in

Engineering
Instance-based learning

Instead of just flagging emails that are identical to known spam emails, your
spam filter could be programmed to also flag emails that are very similar to
known spam emails.
This requires a measure of similarity between two emails.
A (very basic) similarity measure between two emails could be to count the
number of words they have in common.
The system would flag an email as spam if it has many words in common with
a known spam email.
This is called instance-based learning: the system learns the examples by
heart, then generalizes to new cases by comparing them to the learned
examples (or a subset of them), using a similarity measure.

Department of Computer Science & www.cambridge.edu.in

Engineering
Department of Computer Science & www.cambridge.edu.in
Engineering
Model-based learning
Another way to generalize from a set of examples is to build a model of these
examples, then use that model to make predictions. This is called model-based
learning

Department of Computer Science & www.cambridge.edu.in

Engineering
Life satisfaction goes up more or less
linearly as the country’s GDP per capita
increases.
So you decide to model life satisfaction as
a linear function of GDP per capita.
This step is called model selection: you
selected a linear model of life satisfaction
with just one attribute, GDP per capita

Department of Computer Science & www.cambridge.edu.in

Engineering
Department of Computer Science & www.cambridge.edu.in
Engineering
Department of Computer Science & www.cambridge.edu.in
Engineering
The linear model that fits the training data best

Department of Computer Science & www.cambridge.edu.in

Engineering
Department of Computer Science & www.cambridge.edu.in
Engineering
Department of Computer Science & www.cambridge.edu.in
Engineering
In summary:
• You studied the data.
• You selected a model.
• You trained it on the training data (i.e., the learning
algorithm searched for the model parameter values that
minimize a cost function).
• Finally, you applied the model to make predictions on
new cases (this is called inference), hoping that this model
will generalize well.

Department of Computer Science & www.cambridge.edu.in

Engineering
Main Challenges of Machine Learning

• Insufficient Quantity of Training Data

• Nonrepresentative Training Data
• Poor-Quality Data
• Irrelevant Features
• Overfitting the Training Data
• Underfitting the Training Data

Department of Computer Science & www.cambridge.edu.in

Engineering

Introduction to Machine Learning Basics
No ratings yet
Introduction to Machine Learning Basics
9 pages
Machine-Learning NOTE2025 2
No ratings yet
Machine-Learning NOTE2025 2
331 pages
The Machine Learning Landscape
No ratings yet
The Machine Learning Landscape
25 pages
ML L1 PDF
No ratings yet
ML L1 PDF
43 pages
Module 1 Notes
No ratings yet
Module 1 Notes
38 pages
Machine Learning Basics & Types
No ratings yet
Machine Learning Basics & Types
56 pages
What Is Machine Learning
No ratings yet
What Is Machine Learning
6 pages
Module1 ML
No ratings yet
Module1 ML
114 pages
ML Unit 1
No ratings yet
ML Unit 1
19 pages
ML Unit 1
No ratings yet
ML Unit 1
42 pages
Full Notes
No ratings yet
Full Notes
37 pages
ML Unit 1
No ratings yet
ML Unit 1
37 pages
ML Unit 1
100% (1)
ML Unit 1
42 pages
ML Unit 1
No ratings yet
ML Unit 1
42 pages
UNIT IV Learning
No ratings yet
UNIT IV Learning
34 pages
Unit5 ML Introduction
No ratings yet
Unit5 ML Introduction
32 pages
Ch4 Machine Learning
No ratings yet
Ch4 Machine Learning
21 pages
ML Unit 1
100% (1)
ML Unit 1
71 pages
Machine Learning
No ratings yet
Machine Learning
33 pages
Unit-5 Machine Learning
No ratings yet
Unit-5 Machine Learning
25 pages
Python UNIT-5
100% (1)
Python UNIT-5
67 pages
AI Unit-4
No ratings yet
AI Unit-4
58 pages
Unit 1
No ratings yet
Unit 1
52 pages
Ml-Unit 1
No ratings yet
Ml-Unit 1
53 pages
Machine Learning - Data
No ratings yet
Machine Learning - Data
11 pages
MACHINE LEARNING ALGORITHM - Unit-1-1
100% (1)
MACHINE LEARNING ALGORITHM - Unit-1-1
78 pages
4.introduction To Learning - Unit 2
No ratings yet
4.introduction To Learning - Unit 2
8 pages
AI Learning: Types and Applications
No ratings yet
AI Learning: Types and Applications
18 pages
Machine Learning: Sri Chandrasekharendra Saraswathi Viswa Mahavidyalaya
No ratings yet
Machine Learning: Sri Chandrasekharendra Saraswathi Viswa Mahavidyalaya
333 pages
DS&ML 1
No ratings yet
DS&ML 1
9 pages
1 ML Landscape, ML Categories
No ratings yet
1 ML Landscape, ML Categories
3 pages
Ad8552 ML Unit I
No ratings yet
Ad8552 ML Unit I
31 pages
Machine Learning
No ratings yet
Machine Learning
7 pages
Datascience Notes
No ratings yet
Datascience Notes
16 pages
ML Notes
No ratings yet
ML Notes
113 pages
Introduction to Machine Learning
No ratings yet
Introduction to Machine Learning
14 pages
Machine Learning Course Overview
No ratings yet
Machine Learning Course Overview
225 pages
Module 1
No ratings yet
Module 1
47 pages
Intro to Machine Learning Basics
No ratings yet
Intro to Machine Learning Basics
10 pages
Ai Unit 4
No ratings yet
Ai Unit 4
32 pages
Machine Learning
No ratings yet
Machine Learning
20 pages
Machine Learning
No ratings yet
Machine Learning
17 pages
What Is Machine Learning - Saagie
No ratings yet
What Is Machine Learning - Saagie
7 pages
Introduction to Machine Learning Concepts
No ratings yet
Introduction to Machine Learning Concepts
21 pages
Machine Learning: BE Sixth Semester 20CS610
No ratings yet
Machine Learning: BE Sixth Semester 20CS610
211 pages
Machine L
No ratings yet
Machine L
29 pages
Chapter 1 Introduction
No ratings yet
Chapter 1 Introduction
21 pages
Intro to Machine Learning Basics
100% (3)
Intro to Machine Learning Basics
24 pages
ML - Unit 1
No ratings yet
ML - Unit 1
87 pages
Unit 5
No ratings yet
Unit 5
30 pages
UNIT 1ML Removed Removed
No ratings yet
UNIT 1ML Removed Removed
123 pages
Supervised and Unsupervised Machine Learning
No ratings yet
Supervised and Unsupervised Machine Learning
3 pages
Intro - Types of Machine Learning
No ratings yet
Intro - Types of Machine Learning
24 pages
Unit 5
No ratings yet
Unit 5
132 pages
Report
No ratings yet
Report
27 pages
Machine Learning Types and Algorithms
No ratings yet
Machine Learning Types and Algorithms
11 pages
FAM Unit5
No ratings yet
FAM Unit5
47 pages
Introduction To AI
No ratings yet
Introduction To AI
17 pages
Machine Learning Is Fun PDF
0% (1)
Machine Learning Is Fun PDF
16 pages
Chapter 6 Validity
No ratings yet
Chapter 6 Validity
28 pages
Impact of Motivation On Students' Academic Performance A Case Study of University Sultan Zainal Abidin Stuents
No ratings yet
Impact of Motivation On Students' Academic Performance A Case Study of University Sultan Zainal Abidin Stuents
7 pages
Types of Machine Learning Models Explained
No ratings yet
Types of Machine Learning Models Explained
14 pages
Retirement Planning's Impact in Nigeria
No ratings yet
Retirement Planning's Impact in Nigeria
8 pages
Biw First Torsional Mode ML Prediction
No ratings yet
Biw First Torsional Mode ML Prediction
13 pages
Econometrics: Linear Regression Analysis
No ratings yet
Econometrics: Linear Regression Analysis
20 pages
Why Are We Using Logistic Regression To Analyze Employee Attrition?
No ratings yet
Why Are We Using Logistic Regression To Analyze Employee Attrition?
4 pages
Correlation
No ratings yet
Correlation
9 pages
2.15 2.16 Linear Regression
No ratings yet
2.15 2.16 Linear Regression
27 pages
Abalone Age Prediction Using ML
No ratings yet
Abalone Age Prediction Using ML
60 pages
Term Paper On "Examining The Influence of Firm Characteristics On The Capital Structure of Banking Industry in Bangladesh"
No ratings yet
Term Paper On "Examining The Influence of Firm Characteristics On The Capital Structure of Banking Industry in Bangladesh"
31 pages
1.1.1 Simple Linear Regression
No ratings yet
1.1.1 Simple Linear Regression
4 pages
Factors Influencing Student Performance
No ratings yet
Factors Influencing Student Performance
4 pages
Sample Final
No ratings yet
Sample Final
10 pages
Presentation4 - Bivariate Analysis and Simple Linear Regression
No ratings yet
Presentation4 - Bivariate Analysis and Simple Linear Regression
31 pages
Chapter II
No ratings yet
Chapter II
71 pages
Regression Analysis of Unpaid Tax
No ratings yet
Regression Analysis of Unpaid Tax
3 pages
Majorship Area: English Focus: Language and Literature Research LET Competencies
No ratings yet
Majorship Area: English Focus: Language and Literature Research LET Competencies
13 pages
Assessing The Efficacy of Very Early Motor Rehabilitation in Children With Down Syndrome PDF
No ratings yet
Assessing The Efficacy of Very Early Motor Rehabilitation in Children With Down Syndrome PDF
6 pages
Ovarian Cancer Trends by Age
No ratings yet
Ovarian Cancer Trends by Age
20 pages
The Effect of Online Customer Experience Towards Repurchase Intention
No ratings yet
The Effect of Online Customer Experience Towards Repurchase Intention
11 pages
Shreya Bansal - 250418 - 153433
No ratings yet
Shreya Bansal - 250418 - 153433
971 pages
Excel Business Analytics Guide
No ratings yet
Excel Business Analytics Guide
100 pages
Real Estate Value Analysis in Artvin
No ratings yet
Real Estate Value Analysis in Artvin
18 pages
Statistics for IT Students
No ratings yet
Statistics for IT Students
1 page
Sas/Stat 14.3 User's Guide: The GEE Procedure
No ratings yet
Sas/Stat 14.3 User's Guide: The GEE Procedure
63 pages
A Comparison of Rural Australian First Nations and Non-First Nations Survey Responses To Covid-19 Risks and Imapcts - Implications For Health Comunications
No ratings yet
A Comparison of Rural Australian First Nations and Non-First Nations Survey Responses To Covid-19 Risks and Imapcts - Implications For Health Comunications
10 pages
MODULE 4 The Use of Variables in Rseearch
No ratings yet
MODULE 4 The Use of Variables in Rseearch
8 pages
Ensayo Sobre La Teoría de La Desorganización Social
100% (1)
Ensayo Sobre La Teoría de La Desorganización Social
8 pages
Regression Analysis Overview
No ratings yet
Regression Analysis Overview
5 pages

Module 1

Uploaded by

Module 1

Uploaded by

Module 1

Department of Computer Science & www.cambridge.edu.in

HAM and SPAM E-Mails

Department of Computer Science & www.cambridge.edu.in

Machine Learning Approach

Department of Computer Science & www.cambridge.edu.in

Department of Computer Science & www.cambridge.edu.in

Department of Computer Science & www.cambridge.edu.in

There are so many different types of Machine Learning systems that it is

Department of Computer Science & www.cambridge.edu.in

Machine Learning systems can be classified according to the

Department of Computer Science & www.cambridge.edu.in

In supervised learning, the

Department of Computer Science & www.cambridge.edu.in

Department of Computer Science & www.cambridge.edu.in

Department of Computer Science & www.cambridge.edu.in

In unsupervised learning, as you might guess, the training

Department of Computer Science & www.cambridge.edu.in

Department of Computer Science & www.cambridge.edu.in

Department of Computer Science & www.cambridge.edu.in

Department of Computer Science & www.cambridge.edu.in

Department of Computer Science & www.cambridge.edu.in

Department of Computer Science & www.cambridge.edu.in

Department of Computer Science & www.cambridge.edu.in

The learning system, called an agent in this context,

For example, many robots implement Reinforcement

Department of Computer Science & www.cambridge.edu.in

Department of Computer Science & www.cambridge.edu.in

Department of Computer Science & www.cambridge.edu.in

Department of Computer Science & www.cambridge.edu.in

Online learning algorithms can also be used to train

Department of Computer Science & www.cambridge.edu.in

Department of Computer Science & www.cambridge.edu.in

• One more way to categorize Machine Learning systems is by how they

Department of Computer Science & www.cambridge.edu.in

Department of Computer Science & www.cambridge.edu.in

Department of Computer Science & www.cambridge.edu.in

Department of Computer Science & www.cambridge.edu.in

Department of Computer Science & www.cambridge.edu.in

Department of Computer Science & www.cambridge.edu.in

• Insufficient Quantity of Training Data

Department of Computer Science & www.cambridge.edu.in

You might also like