0% found this document useful (0 votes)

17 views6 pages

Chapter 1

Uploaded by

ebru.sara123

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

17 views6 pages

Chapter 1

Uploaded by

ebru.sara123

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

You are on page 1/ 6

Chapter 1.

The Machine Learning Analyzing images of products on a production line to

automatically classify them.
Landscape
 This is image classification, typically performed using
What Is Machine Learning?
convolutional neural networks.
Machine Learning is the science (and art) of programming
computers so they can learn from data. Detecting tumors in brain scans
Why Use Machine Learning?  This is semantic segmentation, where each pixel in the
image is classified (as we want to determine the exact
Consider how you would write a spam filter using traditional
location and shape of tumors), typically using CNNs as
programming techniques (Figure 1-1):
well.
1. First you would consider what spam typically looks
Automatically classifying news articles
like. You might notice that some words or phrases
(such as “4U,” “credit card,” “free,” and “amazing”)  This is natural language processing (NLP), and more
tend to come up a lot in the subject line. Perhaps you specifically text classification, which can be tackled using
would also notice a few other patterns in the sender’s recurrent neural networks (RNNs), CNNs, or Transformers.
name, the email’s body, and other parts of the email.
Automatically flagging offensive comments on discussion
2. You would write a detection algorithm for each of the
forums
patterns that you noticed, and your program would
flag emails as spam if a number of these patterns  This is also text classification, using the same NLP tools.
were detected.
3. You would test your program and repeat steps 1 and Summarizing long documents automatically
2 until it was good enough to launch.  This is a branch of NLP called text summarization, again
using the same tools.

Creating a chatbot or a personal assistant

 This involves many NLP components, including natural

language understanding (NLU) and question-answering
modules.

Forecasting your company’s revenue next year, based on

many performance metrics.
Figure 1-1. The traditional approach
 This is a regression task (i.e., predicting values) that may
Since the problem is difficult, your program will likely become be tackled using any regression model, such as a Linear
a long list of complex rules—hard to maintain. Regression or Polynomial Regression model, a regression
SVM, a regression Random Forest, or an artificial neural
In contrast, a spam filter based on Machine Learning
network. If you want to take into account sequences of
techniques automatically learns which words and phrases are
past performance metrics, you may want to use RNNs,
good predictors of spam by detecting unusually frequent
CNNs, or Transformers.
patterns of words in the spam examples compared to the ham
examples (Figure 1-2). The program is much shorter, easier to Making your app react to voice commands.
maintain, and most likely more accurate.
 This is speech recognition, which requires processing
audio samples: since they are long and complex
sequences, they are typically processed using RNNs,
CNNs, or Transformers.

Detecting credit card fraud

 This is anomaly detection.

Segmenting clients based on their purchases so that you can

design a different marketing strategy for each segment
Figure 1-2. The Machine Learning approach
 This is clustering.
Examples of Applications
Let’s look at some concrete examples of Machine Learning Representing a complex, high-dimensional dataset in a clear
tasks, along with the techniques that can tackle them: and insightful diagram
 This is data visualization, often involving dimensionality example emails along with their class (spam or ham), and it
reduction techniques. must learn how to classify new emails.

Recommending a product that a client may be interested in, Another typical task is to predict a target numeric value, such
based on past purchases. as the price of a car, given a set of features (mileage, age,
brand, etc.) called predictors. This sort of task is called
 This is a recommender system. One approach is to feed
regression (Figure 1-6). To train the system, you need to give
past purchases (and other information about the client) to
it many examples of cars, including both their predictors and
an artificial neural network and get it to output the most
their labels (i.e., their prices).
likely next purchase. This neural net would typically be
trained on past sequences of purchases across all clients. Unsupervised learning
In unsupervised learning, as you might guess, the training data
Building an intelligent bot for a game
is unlabeled (Figure 1-7). The system tries to learn without a
 This is often tackled using Reinforcement Learning, which teacher.
is a branch of Machine Learning that trains agents (such
as bots) to pick the actions that will maximize their
rewards over time (e.g., a bot may get a reward every
time the player loses some life points), within a given
environment (such as the game). The famous AlphaGo
program that beat the world champion at the game of Go
was built using RL.

This list could go on and on, but hopefully it gives you a sense
of the incredible breadth and complexity of the tasks that Figure 1-7. An unlabeled training set for unsupervised learning
Machine Learning can tackle, and the types of techniques that Semisupervised learning
you would use for each task. Since labeling data is usually time-consuming and costly, you
Types of Machine Learning Systems will often have plenty of unlabeled instances, and few labeled
instances. Some algorithms can deal with data that’s partially
There are so many different types of Machine Learning
labeled. This is called semisupervised learning (Figure 1-11).
systems that it is useful to classify them in broad categories,
based on the following criteria:

 Whether or not they are trained with human supervision

(supervised, unsupervised, semisupervised, and
Reinforcement Learning)

Supervised/Unsupervised Learning
Machine Learning systems can be classified according to the
amount and type of supervision they get during training.
There are four major categories: supervised learning, Figure 1-11. Semisupervised learning with two classes
unsupervised learning, semisupervised learning, and (triangles and squares): the unlabeled examples (circles) help
Reinforcement Learning. classify a new instance (the cross) into the triangle class
rather than the square class, even though it is closer to the
Supervised learning labeled squares
In supervised learning, the training set you feed to the
algorithm includes the desired solutions, called labels (Figure Instance-Based Versus Model-Based
1-5). Learning
One more way to categorize Machine Learning systems is by
how they generalize. Most Machine Learning tasks are about
making predictions. This means that given a few training
examples, the system needs to be able to make good
predictions for (generalize to) examples it has never seen
before. Having a good performance measure on the training
data is good, but insufficient; the true goal is to perform well
Figure 1-5. A labeled training set for spam classification (an
on new instances.
example of supervised learning)

A typical supervised learning task is classification. The spam

filter is a good example of this: it is trained with many
Model-based learning This model has two model parameters, θ0 and θ1. By tweaking
Another way to generalize from a set of examples is to build a these parameters, you can make your model represent any
model of these examples and then use that model to make linear function, as shown in Figure 1-18.
predictions. This is called model-based learning (Figure 1-16).

Figure 1-16. Model-based learning

For example, suppose you want to know if money makes Figure 1-18. A few possible linear models
people happy, so you download the Better Life Index data
Before you can use your model, you need to define the
from the OECD’s website and stats about gross domestic
parameter values θ0 and θ1. How can you know which values
product (GDP) per capita from the IMF’s website. Then you
will make your model perform best? To answer this question,
join the tables and sort by GDP per capita. Table 1-1 shows an
you need to specify a performance measure. You can either
excerpt of what you get.
define a utility function (or fitness function) that measures
Table 1-1. Does money make people happier? how good your model is, or you can define a cost function
that measures how bad it is. For Linear Regression problems,
people typically use a cost function that measures the
distance between the linear model’s predictions and the
training examples; the objective is to minimize this distance.
This is where the Linear Regression algorithm comes in: you
feed it your training examples, and it finds the parameters
that make the linear model fit best to your data. This is called
training the model. In our case, the algorithm finds that the
optimal parameter values are θ0 = 4.85 and θ1 = 4.91 × 10-5.

Let’s plot the data for these countries (Figure 1-17). WARNING

Model selection consists in choosing the type of model and

fully specifying its architecture. Training a model means
running an algorithm to find the model parameters that will
make it best fit the training data (and hopefully make good
predictions on new data)

Now the model fits the training data as closely as possible (for
a linear model), as you can see in Figure 1-19.

Figure 1-17

There does seem to be a trend here! Although the data is

noisy (i.e., partly random), it looks like life satisfaction goes up
more or less linearly as the country’s GDP per capita
increases. So you decide to model life satisfaction as a linear
function of GDP per capita. This step is called model selection:
you selected a linear model of life satisfaction with just one
attribute, GDP per capita (Equation 1-1).
Figure 1-19. The linear model that fits the training data best
Equation 1-1. A simple linear model
You are finally ready to run the model to make predictions.
life_satisfaction = θ0 + θ1 × GDP_per_capita For example, say you want to know how happy Cypriots are,
and the OECD data does not have the answer. Fortunately,
you can use your model to make a good prediction: you look
up Cyprus’s GDP per capita, find $22,587, and then apply your
model and find that life satisfaction is likely to be somewhere
around 4.85 + 22,587 × 4.91 × 10-5 = 5.96.

Figure 1-21. A more representative training sample

If you train a linear model on this data, you get the solid line,
while the old model is represented by the dotted line. As you
can see, not only does adding a few missing countries
significantly alter the model, but it makes it clear that such a
simple linear model is probably never going to work well. It
seems that very rich countries are not happier than
moderately rich countries (in fact, they seem unhappier), and
conversely some poor countries seem happier than many rich
countries.

By using a nonrepresentative training set, we trained a model

that is unlikely to make accurate predictions, especially for
very poor and very rich countries.

It is crucial to use a training set that is representative of the

If all went well, your model will make good predictions. If not, cases you want to generalize to. This is often harder than it
you may need to use more attributes (employment rate, sounds: if the sample is too small, you will have sampling
health, air pollution, etc.), get more or better-quality training noise (i.e., nonrepresentative data because of chance), but
data, or perhaps select a more powerful model (e.g., a even very large samples can be nonrepresentative if the
Polynomial Regression model). sampling method is flawed. This is called sampling bias.

In summary: Poor-Quality Data

Obviously, if your training data is full of errors, outliers, and
 You studied the data.
noise (e.g., due to poor-quality measurements), it will make it
 You selected a model.
harder for the system to detect the underlying patterns, so
 You trained it on the training data (i.e., the learning
your system is less likely to perform well. It is often well worth
algorithm searched for the model parameter values that
the effort to spend time cleaning up your training data. The
minimize a cost function).
truth is, most data scientists spend a significant part of their
 Finally, you applied the model to make predictions on
time doing just that. The following are a couple of examples of
new cases (this is called inference), hoping that this model
when you’d want to clean up training data:
will generalize well.
 If some instances are clearly outliers, it may help to
Main Challenges of Machine Learning simply discard them or try to fix the errors manually.
In short, since your main task is to select a learning algorithm  If some instances are missing a few features (e.g., 5%
and train it on some data, the two things that can go wrong of your customers did not specify their age), you must
are “bad algorithm” and “bad data.” Let’s start with examples decide whether you want to ignore this attribute
of bad data. altogether, ignore these instances, fill in the missing
Nonrepresentative Training Data values (e.g., with the median age), or train one model
In order to generalize well, it is crucial that your training data with the feature and one model without it.
be representative of the new cases you want to generalize to. Overfitting the Training Data
This is true whether you use instance-based learning or Say you are visiting a foreign country and the taxi driver rips
model-based learning. you off. You might be tempted to say that all taxi drivers in
For example, the set of countries we used earlier for training that country are thieves.
the linear model was not perfectly representative; a few In Machine Learning this is called overfitting: it means that the
countries were missing. Figure 1-21 shows what the data model performs well on the training data, but it does not
looks like when you add the missing countries. generalize well.
Figure 1-22 shows an example of a high-degree polynomial complex than the model, so its predictions are bound to be
life satisfaction model that strongly overfits the training data. inaccurate, even on the training examples.
Even though it performs much better on the training data
Here are the main options for fixing this problem:
than the simple linear model, would you really trust its
predictions?  Select a more powerful model, with more parameters.
 Feed better features to the learning algorithm (feature
engineering).
 Reduce the constraints on the model (e.g., reduce the
regularization hyperparameter).

Testing and Validating

The only way to know how well a model will generalize to new
Figure 1-22. Overfitting the training data cases is to actually try it out on new cases. One way to do that
is to put your model in production and monitor how well it
Complex models such as deep neural networks can detect
performs. This works well, but if your model is horribly bad,
subtle patterns in the data, but if the training set is noisy, or if
your users will complain—not the best idea.
it is too small (which introduces sampling noise), then the
model is likely to detect patterns in the noise itself. A better option is to split your data into two sets: the training
set and the test set. As these names imply, you train your
WARNING model using the training set, and you test it using the test set.
Overfitting happens when the model is too complex relative The error rate on new cases is called the generalization error
to the amount and noisiness of the training data. Here are (or out-of-sample error), and by evaluating your model on the
possible solutions: test set, you get an estimate of this error. This value tells you
how well your model will perform on instances it has never
 Simplify the model by selecting one with fewer seen before.
parameters (e.g., a linear model rather than a high-degree
polynomial model), by reducing the number of attributes If the training error is low (i.e., your model makes few
in the training data, or by constraining the model. mistakes on the training set) but the generalization error is
 Gather more training data. high, it means that your model is overfitting the training data.
 Reduce the noise in the training data (e.g., fix data errors
TIP
and remove outliers)
It is common to use 80% of the data for training and hold out
Constraining a model to make it simpler and reduce the risk of 20% for testing. However, this depends on the size of the
overfitting is called regularization. For example, the linear dataset: if it contains 10 million instances, then holding out
model we defined earlier has two parameters, θ and θ . This 1% means your test set will contain 100,000 instances,
gives the learning algorithm two degrees of freedom to adapt probably more than enough to get a good estimate of the
the model to the training data: it can tweak both the height (θ generalization error.
) and the slope (θ ) of the line. If we forced θ = 0, the
algorithm would have only one degree of freedom and would
have a much harder time fitting the data properly: all it could
do is move the line up or down to get as close as possible to
the training instances, so it would end up around the mean. A
very simple model indeed! If we allow the algorithm to modify
θ but we force it to keep it small, then the learning algorithm
will effectively have somewhere in between one and two
degrees of freedom. It will produce a model that’s simpler
than one with two degrees of freedom, but more complex
than one with just one. You want to find the right balance
between fitting the training data perfectly and keeping the
model simple enough to ensure that it will generalize well

Underfitting the Training Data

As you might guess, underfitting is the opposite of overfitting:
it occurs when your model is too simple to learn the
underlying structure of the data. For example, a linear model
of life satisfaction is prone to underfit; reality is just more

Machine Learning Basics
No ratings yet
Machine Learning Basics
78 pages
The Machine Learning Landscape
No ratings yet
The Machine Learning Landscape
25 pages
Chapter 1
No ratings yet
Chapter 1
40 pages
What Is Machine Learning
No ratings yet
What Is Machine Learning
6 pages
ML Unit-I Part 1
No ratings yet
ML Unit-I Part 1
7 pages
Unit 1: Shobana T S Assistant Professor Dept. of ISE, BMSCE
No ratings yet
Unit 1: Shobana T S Assistant Professor Dept. of ISE, BMSCE
114 pages
Module 1
No ratings yet
Module 1
34 pages
@vtucode - in 21AI63 Module 1 AI&ML 2021 Scheme
No ratings yet
@vtucode - in 21AI63 Module 1 AI&ML 2021 Scheme
38 pages
Unit-2 AI Python
No ratings yet
Unit-2 AI Python
57 pages
UNIT I Introduction To Machine Learning
No ratings yet
UNIT I Introduction To Machine Learning
150 pages
Module1 ML
No ratings yet
Module1 ML
114 pages
ML Unit I
No ratings yet
ML Unit I
10 pages
21AI63 Module 1
No ratings yet
21AI63 Module 1
38 pages
ML m1-m5 NOTES
No ratings yet
ML m1-m5 NOTES
160 pages
Spammer Detection Fake Pople Identification On Social Networks1
No ratings yet
Spammer Detection Fake Pople Identification On Social Networks1
64 pages
Chapter 1 Introduction To ML
No ratings yet
Chapter 1 Introduction To ML
52 pages
Machine Learning Basics and Applications For Beginners
No ratings yet
Machine Learning Basics and Applications For Beginners
15 pages
ML Maths Full Notes
No ratings yet
ML Maths Full Notes
120 pages
Unit 3
No ratings yet
Unit 3
80 pages
ML Module 1
No ratings yet
ML Module 1
26 pages
ML Notes
No ratings yet
ML Notes
18 pages
01 - ML - Introduction
No ratings yet
01 - ML - Introduction
65 pages
Computer Project X
No ratings yet
Computer Project X
11 pages
Unit3 - Updated
No ratings yet
Unit3 - Updated
116 pages
Uvuhiihijno
No ratings yet
Uvuhiihijno
14 pages
ML@Chapter 1
No ratings yet
ML@Chapter 1
29 pages
ML Unit-1
No ratings yet
ML Unit-1
34 pages
Chap 1
No ratings yet
Chap 1
56 pages
21ai63 Mod 1
No ratings yet
21ai63 Mod 1
38 pages
UNIT III DKD
No ratings yet
UNIT III DKD
48 pages
Lec 2
No ratings yet
Lec 2
18 pages
Unit-I Machine Leaning Notes
No ratings yet
Unit-I Machine Leaning Notes
13 pages
Unit-3 Machine Learning
No ratings yet
Unit-3 Machine Learning
81 pages
Machine Learning
No ratings yet
Machine Learning
97 pages
Machine Learning Tutorial
100% (1)
Machine Learning Tutorial
44 pages
Introduction To AI
No ratings yet
Introduction To AI
17 pages
Introduction To ML
100% (1)
Introduction To ML
50 pages
ML Unit1.2
No ratings yet
ML Unit1.2
24 pages
ML, Types, Application, Life Cycle, Issues
No ratings yet
ML, Types, Application, Life Cycle, Issues
29 pages
Article On Machine Learning
No ratings yet
Article On Machine Learning
4 pages
1 ML Landscape, ML Categories
No ratings yet
1 ML Landscape, ML Categories
3 pages
Unit Iii - Aiml
No ratings yet
Unit Iii - Aiml
47 pages
Ain3001 - 01.3 - ML - Fast.tutorial
No ratings yet
Ain3001 - 01.3 - ML - Fast.tutorial
58 pages
Unit 1 ML
No ratings yet
Unit 1 ML
96 pages
Unit 1 1. Define Machine Learning. Application of Machine Learning Applications of ML
No ratings yet
Unit 1 1. Define Machine Learning. Application of Machine Learning Applications of ML
40 pages
Lecture01 Introduction To Machine Learning (Chapter1)
No ratings yet
Lecture01 Introduction To Machine Learning (Chapter1)
64 pages
Introduction To Machine Learning Notes
No ratings yet
Introduction To Machine Learning Notes
26 pages
UNIT I-Part 1
No ratings yet
UNIT I-Part 1
52 pages
Unit 5
No ratings yet
Unit 5
26 pages
Unit 1
No ratings yet
Unit 1
23 pages
Internship Report
No ratings yet
Internship Report
31 pages
ML Unit 1
No ratings yet
ML Unit 1
34 pages
Machine Learning and Soft Computing: CSCC53 Mca V Sem 2020
No ratings yet
Machine Learning and Soft Computing: CSCC53 Mca V Sem 2020
33 pages
Industrial Training Report On Machine Le
No ratings yet
Industrial Training Report On Machine Le
21 pages
What Is Machine Learning
No ratings yet
What Is Machine Learning
10 pages
r20 ML Unit 1 KKJJVBBJJV
No ratings yet
r20 ML Unit 1 KKJJVBBJJV
24 pages
Machine Learning A Basic Approach
No ratings yet
Machine Learning A Basic Approach
9 pages
Artificial Intelligence Algorithms
From Everand
Artificial Intelligence Algorithms
akosnemeth
No ratings yet
DATA MINING and MACHINE LEARNING. PREDICTIVE TECHNIQUES: ENSEMBLE METHODS, BOOSTING, BAGGING, RANDOM FOREST, DECISION TREES and REGRESSION TREES.: Examples with MATLAB
From Everand
DATA MINING and MACHINE LEARNING. PREDICTIVE TECHNIQUES: ENSEMBLE METHODS, BOOSTING, BAGGING, RANDOM FOREST, DECISION TREES and REGRESSION TREES.: Examples with MATLAB
César Pérez López
No ratings yet
Machine Learning with Python: Foundations and Applications: ML, #1
From Everand
Machine Learning with Python: Foundations and Applications: ML, #1
Mohammed Nurudeen
No ratings yet
2FA Set Up
No ratings yet
2FA Set Up
17 pages
Von Neumann Architecture
No ratings yet
Von Neumann Architecture
3 pages
Calculating Devices FV
No ratings yet
Calculating Devices FV
13 pages
Portfolio Rituja
No ratings yet
Portfolio Rituja
18 pages
Alex Watts CV
No ratings yet
Alex Watts CV
2 pages
The Customers Will Be Able To Search For The Different Flower Bouquet Shops That Are Available Near To Their Places So That They Will Be Able To Order Online
No ratings yet
The Customers Will Be Able To Search For The Different Flower Bouquet Shops That Are Available Near To Their Places So That They Will Be Able To Order Online
32 pages
32 Secret Combinations On Your Keyboard
100% (1)
32 Secret Combinations On Your Keyboard
2 pages
Find Changes Logs For A Table Using SM30 - SAP Blogs
No ratings yet
Find Changes Logs For A Table Using SM30 - SAP Blogs
7 pages
Microprocessor Unit 3
No ratings yet
Microprocessor Unit 3
58 pages
BES 516-3005-G-E4-C-S49-00,3 Order Code: BES00HC: Inductive Sensors
No ratings yet
BES 516-3005-G-E4-C-S49-00,3 Order Code: BES00HC: Inductive Sensors
2 pages
고등영어 Day 2
No ratings yet
고등영어 Day 2
4 pages
Advanced Continuous Historian
No ratings yet
Advanced Continuous Historian
7 pages
Netflix On AWS
No ratings yet
Netflix On AWS
6 pages
Spool Generated For Class of Oracle by Satish K Yellanki
No ratings yet
Spool Generated For Class of Oracle by Satish K Yellanki
98 pages
C Handbook
No ratings yet
C Handbook
22 pages
Tda 6107 Ajf
No ratings yet
Tda 6107 Ajf
16 pages
One To One and Onto1
No ratings yet
One To One and Onto1
9 pages
Form IEPF 4 - 2017 18 1
No ratings yet
Form IEPF 4 - 2017 18 1
6 pages
SaaS Implementation Best Practices - v2
No ratings yet
SaaS Implementation Best Practices - v2
24 pages
Tally Shortcuts - Quick Short Cuts
No ratings yet
Tally Shortcuts - Quick Short Cuts
6 pages
Auto Cad Assignment 3
0% (1)
Auto Cad Assignment 3
3 pages
B-Jac Us
No ratings yet
B-Jac Us
8 pages
Razberi User Manual (Razberi VMS)
No ratings yet
Razberi User Manual (Razberi VMS)
34 pages
BSNL Landline Broadband Closure Letter
0% (1)
BSNL Landline Broadband Closure Letter
2 pages
Television 2008 03
No ratings yet
Television 2008 03
52 pages
Designing and Implementing Weather Effects in Opengl: Stephen Tucker
No ratings yet
Designing and Implementing Weather Effects in Opengl: Stephen Tucker
64 pages
Greater Amman Water SCADA Project (GASS) - TECO GROUP
No ratings yet
Greater Amman Water SCADA Project (GASS) - TECO GROUP
2 pages
HP F210 User Manual
No ratings yet
HP F210 User Manual
31 pages
Main Ldap Training Day2
No ratings yet
Main Ldap Training Day2
39 pages
Simplification
No ratings yet
Simplification
18 pages

Chapter 1

Uploaded by

Chapter 1

Uploaded by

Chapter 1.

The Machine Learning Analyzing images of products on a production line to

Creating a chatbot or a personal assistant

 This involves many NLP components, including natural

Forecasting your company’s revenue next year, based on

Detecting credit card fraud

 This is anomaly detection.

Segmenting clients based on their purchases so that you can

 Whether or not they are trained with human supervision

A typical supervised learning task is classification. The spam

Figure 1-16. Model-based learning

Model selection consists in choosing the type of model and

There does seem to be a trend here! Although the data is

Figure 1-21. A more representative training sample

By using a nonrepresentative training set, we trained a model

It is crucial to use a training set that is representative of the

In summary: Poor-Quality Data

Testing and Validating

Underfitting the Training Data

You might also like