0% found this document useful (0 votes)
13 views

Unit-2 AI Python

123
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
13 views

Unit-2 AI Python

123
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 57

Unit – 2

Machine Learning Basics

1. What is Machine Learning? ...................................................................... 2


2. Why Use Machine Learning? .................................................................... 4
3. Types of Machine Learning Systems ........................................................ 8
3.1 Batch and Online Learning ................................................................ 15
3.2 Instance-Based Versus Model-Based Learning ............................... 20
4. Challenges of Machine Learning ........................................................... 26
5. Testing and Validating dataset ............................................................... 30
Hyperparameter Tuning and Model Selection ....................................... 32
Data Mismatch .......................................................................................... 34
6. Working with classification (MNIST) ........................................................ 35
7. Training a Binary Classifier....................................................................... 38
8. Performance Measures ........................................................................... 41
8.1 Measuring Accuracy Using Cross-Validation .................................. 41
8.2 Confusion Matrix ................................................................................. 45
8.3 Precision and Recall ........................................................................... 49
8.4 The ROC Curve. ................................................................................... 53

1|Page
1. What is Machine Learning?
A subset of artificial intelligence known as machine learning focuses
primarily on the creation of algorithms that enable a computer to
independently learn from data and previous experiences.

Arthur Samuel first used the term "machine learning" in 1959. It could be
summarized as follows:

Without being explicitly programmed, machine learning enables a


machine to automatically learn from data, improve performance from
experiences, and predict things.

A machine learning system builds prediction models, learns from previous


data, and predicts the output of new data whenever it receives it. The
amount of data helps to build a better model that accurately predicts the
output, which in turn affects the accuracy of the predicted output.

Let's say we have a complex problem in which we need to make


predictions. Instead of writing code, we just need to feed the data to
generic algorithms, which build the logic based on the data and predict the
output.

Our perspective on the issue has changed as a result of machine learning.


The Machine Learning algorithm's operation is depicted in the following
block diagram:

2|Page
Features of Machine Learning:

 Machine learning uses data to detect various patterns in a given


dataset.
 It can learn from past data and improve automatically.
 It is a data-driven technology.
 Machine learning is much similar to data mining as it also deals with the
huge amount of the data.

Needs of Machine Learning:

The demand for machine learning is steadily rising. Because it is able to


perform tasks that are too complex for a Humans.

Following are some key points which is show an importance of machine


learning:

 Rapid increment in the production of data


 Solving complex problems, which are difficult for a human
 Decision making in various sector including finance
 Finding hidden patterns and extracting useful information from data.

3|Page
2. Why Use Machine Learning?
Machine learning is a technique where computers learn patterns from
data and use those patterns to make decisions or predictions.

Following are three main types of Machine Learning algorithms:

1. Supervised Learning: Uses labeled data to train algorithms that predict


outcomes based on input features. Example: predicting house prices
based on features like size and location.
2. Unsupervised Learning: Finds patterns and structures in data without
labeled outcomes. Example: clustering similar customer groups based
on purchasing behavior.
3. Reinforcement Learning: Teaches algorithms to make sequences of
decisions by rewarding them for good choices. Example: training AI
agents to play games by learning from trial and error.

Why Machine Learning is use in today’s world let’s understand it in detail:

 Using Traditional Technique:


Consider how you would write a spam filter using traditional programming
techniques with (Figure 1-1):

4|Page
To write a spam filter using traditional programming techniques, you
would:

1. Identify common characteristics of spam, such as frequent words and


phrases like “4U”, “credit card”, “free”, and “amazing”, and patterns in the
sender's name or email body.

2. Create detection algorithms for these patterns, flagging emails as spam


if they match.

3. Test and refine your program by repeating the first two steps until it
performs well.

 Using Machine Learning Technique:


Traditional spam filters rely on complex, hard-to-maintain rules. In
contrast, a Machine Learning-based filter learns which words and phrases
indicate spam by analyzing patterns in spam versus non-spam emails.

This makes the program shorter, easier to maintain, and more accurate.
Let’s see it in (Figure 1-2):

5|Page
Traditional spam filters need constant rule updates to catch new spam
tricks, like changing "4U" to "For U." In contrast, Machine Learning-based
filters automatically learn new spam patterns, like "For U," without
needing manual updates.

Let’s Understand it with (Figure 1-3):

Machine Learning is really good in solving problems that are either too
complex for traditional approaches.

For example, speech recognition is difficult for traditional approaches but


Machine Learning is good at solving hard problems. Like teaching a
computer to understand words like "one" and "two" is tricky. Instead of
writing rules, we let the computer learn by listening to many examples.
This makes it better at understanding words.

Finally, Machine Learning can helps humans to learn easily. For example,
after training a spam filter with lots of spam, we can see which words it
thinks are best for spotting spam.

This helps us find new patterns and understand things better.We can see it
in (Figure 1-4):

6|Page
In summary, Machine Learning is great for:

 Problems that need lots of manual adjustments or long lists of rules;


Machine Learning can simplify code and work better.
 Complex problems that traditional approaches can't solve well or at all;
Machine Learning can find solutions.
 Changing situations; Machine Learning systems can adjust to new
information.
 Understanding complex problems and large amounts of data to gain
insights.

7|Page
3. Types of Machine Learning Systems
Machine learning systems can be categorized based on various criteria:

1. Human Supervision:

 Supervised Learning: Trained with labeled data.


 Unsupervised Learning: Trained with unlabeled data.
 Semisupervised Learning: Combination of labeled and unlabeled data.
 Reinforcement Learning: Learns through rewards and punishments.

2. Learning Method:

 Online Learning: Learns incrementally from new data.


 Batch Learning: Learns from a fixed dataset all at once.

3. Approach to Learning:

 Instance-Based Learning: Compares new data to known data points.


 Model-Based Learning: Detects patterns in training data and builds a
predictive model.

These methods can be mixed together. For example, a spam filter might
use a model that learns continuously from new emails (online learning),
recognizes patterns (model-based learning), and is trained with labeled
examples of spam and not-spam emails (supervised learning).

Let’s explore it in detail:

8|Page
1. Supervised Learning:
Supervised learning is defined as when a model gets trained on
a “Labelled Dataset”. Labelled datasets have both input and output
parameters.

we can say that first, we train the machine with the input and
corresponding output, and then we ask the machine to predict the output
using the test dataset.

The two main tasks of supervised learning are:

1. Classification: Predicting categorical labels (e.g., classifying emails as


"spam" or "not spam").

A common supervised learning task is classification. For example, a spam


filter is trained with many example emails that are labeled as either spam
or not spam (ham). The algorithm learns from these labeled examples how
to classify new, incoming emails into either spam or not spam categories.

Let’s understand it with figure – 1.5:

9|Page
2. Regression: Predicting continuous numerical values (e.g., Predicting the
price of a car based on features such as mileage, age, brand, and other
predictors).

The image shows how regression can predict a car's price based on
features like mileage and age:

 Feature 1 (x-axis): Represents a car feature, such as mileage.


 Value (y-axis): Represents the car price.
 Data Points: Light brown circles are cars with known mileage and prices.
 New Instance (X): A car with a specific mileage and age for which we
want to predict the price.
 Value?: The predicted price for the new car.

By analyzing the pattern of existing car prices and their features, the
regression model estimates the price for the new car.

10 | P a g e
Here are some of the most important supervised learning algorithms:
• k-Nearest Neighbors
• Linear Regression
• Logistic Regression
• Support Vector Machines (SVMs)
• Decision Trees and Random Forests

2. Unsupervised Learning:
Unsupervised learning is different from the Supervised learning technique;
as its name suggests, there is no need for supervision.

It means, in unsupervised machine learning, the machine is trained using


the unlabeled dataset, and the machine predicts the output without any
supervision.

Let's take an example to understand it more preciously; suppose there are


some fruit images, and we input it into the machine learning model. The
images are totally unknown to the model, and the task of the machine is
to find the patterns and categories of the objects.

So, now the machine will discover its patterns and differences, such as
colour difference, shape difference, and predict the output when it is
tested with the test dataset.

11 | P a g e
Here are some of the most important unsupervised learning algorithms:

1. Clustering:
• K-Means
• DBSCAN
• Hierarchical Cluster Analysis (HCA)
2. Anomaly detection and novelty detection:
• One-class SVM
• Isolation Forest
3. Visualization and dimensionality reduction:
• Principal Component Analysis (PCA)
• Kernel PCA
• Locally-Linear Embedding (LLE)
• t-distributed Stochastic Neighbor Embedding (t-SNE)
4. Association rule learning:
• Apriori
• Eclat

12 | P a g e
3. Semisupervised Learning:
Semisupervised learning involves algorithms that handle training data with
some labeled data and a large amount of unlabeled data. For example,
Google Photos groups similar faces together without knowing who they
are (unsupervised learning). Once you label one photo, it can recognize
and label that person in all your photos, making it easier to find pictures of
that person.

This figure shows how semisupervised learning works:

1. Shapes and Colors:

 Green Triangles: These are labeled data points of one type.


 Yellow Squares: These are labeled data points of another type.
 Gray Dots: These are data points without labels (unlabeled).

2. Question Mark (Black Cross):

The black cross with "Class?" is an unlabeled data point that we want to
classify.

13 | P a g e
In semisupervised learning, the algorithm uses the few labeled points
(green triangles and yellow squares) to help understand and classify the
many unlabeled points (gray dots).

The goal is to figure out whether the black cross should be a green triangle
or a yellow square based on its location among all the points.

4. Reinforcement Learning:
Reinforcement learning involves an agent that interacts with its
environment by performing actions and receiving rewards or penalties.
The agent learns by itself to develop a strategy, or policy, to maximize its
rewards over time. A policy tells the agent which action to take in different
situations. Let’s understand it with following figure:

14 | P a g e
3.1 Batch and Online Learning
Another way to classify Machine Learning systems is based on their ability
to learn incrementally from a continuous stream of incoming data.

Batch Learning:
In batch learning, the system is incapable of learning incrementally: it
must be trained using all the available data. This will generally take a lot of
time and computing resources, so it is typically done offline.

First the system is trained, and then it is launched into production and
runs without learning anymore; it just applies what it has learned. This is
called offline learning.

If you want a batch learning system to know about new data (such as a
new type of spam), you need to train a new version of the system from
scratch on the full dataset (not just the new data, but also the old data),
then stop the old system and replace it with the new one.

You can automate the steps of training, checking, and launching a Machine
Learning system. This means you can easily update the system by
refreshing the data and retraining it whenever needed, even if it's a batch
learning system.

This solution is simple and often works fine, but training using the full set
of data can take many hours, so it's usually done once a day or once a
week. For systems that need to quickly adjust to changing data, like
predicting stock prices, a faster solution is required, then you need a more
responsive solution.

15 | P a g e
Training on all the data needs a lot of computing resources such as(CPU,
memory space, disk space, disk I/O, network I/O, etc.) and can be very
costly. Automating daily training with huge data may be too expensive or
even impractical.

When to Use:

 You have a static or fixed dataset.


 You don't need frequent updates or real-time learning.
 You can afford the time and resources to train the model on the entire
dataset at once.

As compare to Batch learning, a better solution is to use algorithms that


can learn incrementally, requiring fewer resources and less data at a time.

16 | P a g e
Online Learning:
In online learning, you train the system incrementally by feeding it data
examples sequentially, either individually or by small groups called mini-
batches. Each learning step is fast and cheap, so the system can learn
about new data on the fly.

Online learning is perfect for systems that get data constantly, like stock
prices, and need to adjust quickly. It's also good if you have limited
computer power because it can learn from data and discards this data
after learning, that saving a lot of space.

Online learning algorithms can train systems on huge datasets that don't
fit in a single machine's memory. This process, called out-of-core learning,
involves loading and training on parts of the data repeatedly until all data
is used.
Let’s understand it with figure 1-14:

17 | P a g e
This figure explains the concept of out-of-core learning (also known as
incremental learning) with a flowchart. Here:

1. Lots of Data: The process starts with a large dataset that is too big to fit
into a single machine's memory.

2. Chop into Pieces: The dataset is divided into smaller, manageable


chunks.

3. Train Online ML Algorithm: Each chunk is sequentially fed into the


online machine learning algorithm. This allows the algorithm to learn
incrementally, updating its model with each new chunk of data.

4. Evaluate Solution: After training on all chunks, the solution is evaluated.


If the solution is satisfactory (smiley face), it can be launched. If not (sad
face), further analysis is required.

5. Analyze Errors: If the solution is not satisfactory, errors are analyzed to


understand the problems.

18 | P a g e
6. Study the Problem: The problem is studied in more detail based on the
error analysis.

7. Feedback Loop: The insights from studying the problem feed back into
improving the online learning algorithm, and the process repeats.

In online learning systems, the learning rate is an important setting. It


controls how fast the system adapts to new data.

A high learning rate means the system learns quickly but can forget old
information easily.

A low learning rate means the system learns slowly but remembers old
information better and handles noisy or unusual data more effectively.

A big problem with online learning is that bad data can make the system
worse over time. This is especially noticeable in live systems. Bad data can
come from broken sensors or spam. To prevent this, watch the system
carefully, stop learning if it gets worse, and check data for any issues using
special tools.

19 | P a g e
3.2 Instance-Based Versus Model-Based Learning
Machine Learning systems can also be categorized by how well they
generalize. The goal is to make accurate predictions on new, unseen
examples, not just perform well on the training data.

There are two main approaches to generalization: instance-based learning


and model-based learning.

Instance-Based Learning:
Instance-based learning also known as lazy learning or memory-based
learning, this method compares new instances to similar instances in the
training data that have been stored in memory.

Instance-Based Learning works by remembering specific examples and


using them to make decisions.

Steps of working process:

 The system stores a collection of previous examples.


 When a new example comes in, the system compares it to the stored
examples.
 The decision is made based on the similarity between the new example
and the stored ones.

Spam Example:
Imagine you have an email account and you manually mark certain emails
as spam. In an instance-based learning system, the filter remembers each
of these spam emails.

20 | P a g e
1. Stored Examples: You mark some emails as spam. The system
remembers these emails, including their words and sender addresses.

2. New Email: When a new email arrives, the system checks if it looks like
the spam emails you marked before. For example, if the old spam emails
had the words "Win a prize," and the new email also has these words, it
might flag the new email.

3. Decision: The new email is marked as spam if it looks similar to the


emails you marked as spam before.

Let’s see the figure1-16 to better understand instance-based learning:

 Axes (Feature 1 and Feature 2):


The graph has two axes, labeled "Feature 1" and "Feature 2." These
features represent characteristics or properties of the data points.

 Training Instances:
The image shows many training instances, which are represented by
triangles and squares. These are the examples the system has already
seen and learned from.

21 | P a g e
 New Instance:
The black "X" marks a new instance that the system needs to classify.
This new instance is the data point we want to predict or categorize.

 Nearest Neighbors:
The image shows arrows pointing from the new instance to its nearest
neighbors (closest points in the training data). In this case, the nearest
neighbors are two green triangles and one yellow square.

 Classification:
The new instance is classified based on the majority of its nearest
neighbors. Since the nearest neighbors include two triangles and one
square, the new instance will likely be classified as a triangle.

This method is simple but might not catch new types of spam that don't
resemble the stored examples.

22 | P a g e
Model-Based Learning:
Model-Based Learning involves creating a model that captures patterns
from data and uses these patterns to make predictions.

This method involves learning a mathematical model that maps inputs to


outputs, such as a linear model, decision tree, or neural network. The
model uses patterns from the training data to predict new data.

1. Training Data: The system looks at past data to find patterns.

2. Creating the Model: The system builds a model (a set of rules or a


formula) based on these patterns.

3. Making Predictions: The model uses the rules to predict future


outcomes.

Prediction Example (Weather Forecast):


1. Training Data: Imagine you have information from previous days, like
temperature, humidity, wind speed, and whether it rained or not.

2. Creating the Model: The system finds patterns in this data. For instance,
it might learn that high humidity and low temperature often mean rain. It
creates a rule like: "If humidity is above 80% and temperature is below
20°C, it will probably rain."

3. Making Predictions: When you give today's weather data (like humidity
and temperature), the model uses the rule it learned to predict if it will
rain tomorrow.

23 | P a g e
Let’s see the figure1-16 to better understand model-based learning:

 Axes (Feature 1 and Feature 2):


The graph has two axes labeled "Feature 1" and "Feature 2," which
represent characteristics or properties of the data points.

 Training Instances:
The image shows many training instances represented by green
triangles and yellow squares. These are the examples the system has
already seen and learned from.

 Model:
The dashed line represents the model. This model separates the space
into regions based on the patterns it learned from the training data. It
tries to draw a boundary that best separates the green triangles from
the yellow squares.

24 | P a g e
 New Instance:
The black "X" marks a new instance that the system needs to classify.
This new instance is the data point we want to predict or categorize.

 Classification:
The new instance falls on the side of the boundary where most yellow
squares are located. Therefore, according to the model, the new
instance will be classified as a yellow square.

This way, model-based learning helps make predictions by understanding


and applying patterns from past data.

Summary:
Instance-Based Learning is like remembering specific past examples and
comparing new situations to them. For example, a spam filter checks if
new emails are similar to previously marked spam emails.

Model-Based Learning creates a general model based on patterns in the


data. For example, a weather model uses rules learned from past weather
data to predict future weather.

25 | P a g e
4. Challenges of Machine Learning
When training a learning algorithm, the main issues are using a "bad
algorithm" or having "bad data." Let's start by discussing examples of bad
data.

1. Insufficient Quantity of Training Data:


Teaching a toddler (a young child who is learning to walk and talk) to
recognize an apple is easy—they just need to see it and hear the word
"apple" a few times. Machine Learning isn't that simple. It often requires a
lot of data: thousands of examples for simple problems and millions for
complex ones like image or speech recognition.

2. Non-representative Training Data:


To generalize well, your training data must accurately represent the new
cases you want to predict, no matter which methods you're using
instance-based or model-based learning.

Example:

Imagine you want to train a model to recognize action music videos. If you
only collect videos from a YouTube search for "action music," you might
end up with mostly popular tracks or videos from specific regions. This
means your training data might not include all types of action music.
Because of this, your model might not recognize other action music styles
correctly.

Using incomplete or biased data can lead to poor predictions. To make


accurate predictions, it's important to use data that truly represents what

26 | P a g e
you want the model to recognize. However, this can be challenging due to
potential errors or poor data collection methods.

3. Poor-Quality Data:
If your training data is full of errors, outliers, and noise (e.g., due to poor-
quality measurements), it will make it harder for the system to detect the
underlying patterns, so your system is less likely to perform well.

To improve model performance, it's crucial to invest time in cleaning the


data. This might involve:

Removing Outliers: If some instances are clearly outliers, it may help to


simply discard them or try to fix the errors manually.

Handling Missing Information: Decide what to do if some data is missing,


like ignoring it, removing those data points, filling in the missing values
(e.g., using an average), or using different models with and without that
missing information.

Data scientists often spend a lot of time on these cleaning tasks to make
their models more accurate.

4. Irrelevant Features:
The saying "garbage in, garbage out" means that a machine learning
system's performance depends on the quality of the training data. To learn
effectively, the training data must have enough relevant features and not
too many irrelevant ones. A critical part of a successful machine learning
project is creating a good set of features for training, known as feature
engineering. This involves:

27 | P a g e
 Feature selection: Choosing the most useful features from the existing
ones.
 Feature extraction: combining existing features to produce a more
useful one such as dimensionality reduction algorithms.
 Creating new features: Gathering new data to generate additional
features.

5. Overfitting the Training Data:


Overfitting happens when a machine learning model works well on
training data but doesn't do well on new data. It's like judging all taxi
drivers based on one bad experience. For example, a complex model might
fit training data perfectly but give poor results on new data.

Overfitting can be addressed by:

 Simplifying the model (e.g., using fewer parameters or attributes).


 Gathering more training data.
 Reducing noise in the training data (e.g., fixing errors and removing
outliers).

Regularization is a way to prevent overfitting by making the model


simpler. For example, a simpler linear model might not fit the training data
as well but will perform better on new data.

6. Underfitting the Training Data:


As you might guess, underfitting is the opposite of overfitting: it occurs
when your model is too simple to learn the underlying structure of the
data.

28 | P a g e
For example, a simple linear model for predicting life happiness is too
basic and won't give accurate results because it doesn't match the
complexity of real life.

To fix underfitting, you can:

 Use a more powerful model with more parameters.


 Provide better features through feature engineering.
 Reduce constraints on the model, like lowering regularization.

7. Stepping Back:
We learned a lot about Machine Learning, but it might feel awesome. Let’s
take a step back and review the main points.

Machine Learning: It's about improving tasks by learning from data, rather
than coding rules manually.

Types of ML Systems: There are various types, including supervised,


unsupervised, batch, online, instance-based, and model-based.

ML Project Steps: Collect data, use it to train an algorithm. Model-based


algorithms adjust parameters to fit the data, while instance-based
algorithms learn by comparing new data to examples.

Performance Issues: A model won’t work well if the training data is too
small, unrepresentative, noisy, or irrelevant. Also, the model shouldn’t be
too simple (underfitting) or too complex (overfitting).

Finally, after training a model, it's crucial to evaluate and fine-tune it to


ensure it generalizes well to new cases.

29 | P a g e
5. Testing and Validating dataset
The only way to know if a model works well on new data is to test it with
new examples. One way to do this is to use the model in real life and see
how it performs. But, if the model is very bad, users will complain, so this
method can be risky.

A better way to test a model is to divide your data into two sets: the
training set and the test set. Train your model on the training set and test
it on the test set. The error rate on the test set, called the generalization
error, estimates how well your model will perform on new, unseen data.

If your model does well on the training data but poorly on new data, it
means the model is overfitting.

Usually, 80% of the data is used for training and 20% for testing. However,
if the dataset is very large, such as 10 million for examples, even using just
1% (100,000 examples) for testing can give you a good idea of how well
your model works.

Sure! Imagine you’re building a model to recognize different types of pets


in photos: cats, dogs, and birds.

Example:
1. Splitting the Data:

 Training Set (60%): 600 photos


 Validation Set (20%): 200 photos
 Test Set (20%): 200 photos

30 | P a g e
2. Training: You use the 600 photos in the training set to teach your model
to recognize cats, dogs, and birds. The model learns from these examples.

3. Validating: While training, you check the model's performance with the
200 photos in the validation set. If it struggles to correctly identify a cat or
dog in these photos, you make adjustments to improve accuracy.

4. Testing: Once you’re satisfied with the training and validation results,
you use the 200 photos in the test set to see how well the model can
identify pets it hasn’t seen before. This gives you an idea of how well the
model will perform with new pet photos in real life.

This way, you confirm that your model isn’t just memorizing the training
photos but can correctly identify new pet photos too.

There are two important process of Testing and Validating:

1. Hyperparameter Tuning and Model Selection


2. Data Mismatch

31 | P a g e
Hyperparameter Tuning and Model Selection
Imagine you have two models: one is a simple model (like a straight line)
and the other is more complex (like a curve). You want to see which model
is better at predicting future outcomes. So, you train both models on some
data and then test them on a new set of data to see which one performs
better.

The model that performs best on this test data is usually the one that can
make better predictions in the future. This is called generalization how
well the model works on new, unseen data.

Suppose you have a good simple model, but you want to make it even
better by preventing it from making mistakes on the training data. You add
something called regularization to stop the model from focusing too much
on the training data.

To find the best regularization, you try different options. But if you keep
changing the model based on the same test data, it might get too
"attached" to that data. This means the model might work really well on
that test data but not as well on new data.

This happens because the model learned the test data too perfectly
instead of learning general patterns that work for any data.

The key is to avoid over-tuning your model to the test data so it stays
flexible enough to handle new, unseen data in the real world.

A common solution to this problem is called holdout validation:

1. Split Your Data: You divide your dataset into two parts: a training set
and a test set. Sometimes, a third part called a validation set is also
used.
32 | P a g e
2. Train the Model: You use the training set to train your model. This is
where the model learns from the data.
3. Evaluate the Model: After training, you test the model on the test set
to see how well it performs on new, unseen data. This gives you an idea
of how the model might work in real-world situations.
4. Use a Validation Set (Optional): If you have a validation set, you can try
different settings (hyperparameters) for your model and choose the
best one based on its performance on this set.
5. Final Testing: After selecting the best model, you can re-train it using
the full training set and then evaluate its performance again on the test
set.

So, holdout validation helps you understand how well your model
generalizes to new data by separating your data into training and testing
sets, and optionally using a validation set to tune the model.

Holdout validation works well, but if the validation set is too small, you
might pick a less effective model. If it's too large, the remaining training
data might be too small.

To fix this, you can use repeated cross-validation. This involves using many
small validation sets to test your model. It gives a better idea of how well
your model performs, but it takes more time to train the model multiple
times.

33 | P a g e
Data Mismatch

In machine learning, data mismatch occurs when the data used to train a
model is different from the data it will encounter in real-world situations.

This can lead to poor model performance because the model hasn't
learned to generalize well to the actual data it will see during deployment.

For example, if you train a model on high-quality, images of flowers taken


from the internet, but in real life, the model is used to identify flowers in
dimly lit, blurry images from a smartphone, the model might struggle
because it wasn’t trained on data that looks like the real-world images.

To identify whether a model's poor performance is due to data mismatch


or overfitting, you can set separately parts of the training data (called the
"train-dev set"). After training the model, test it on the train-dev set. If
the model performs well on this set but poorly on the validation set, the
issue is likely data mismatch.

You can address this by making the training data more similar to the real-
world data. If the model performs poorly on the train-dev set, it's probably
overfitting, so you should simplify the model, add more data, or clean the
existing data.

34 | P a g e
6. Working with classification (MNIST)
The MNIST dataset is a collection of 70,000 small images of handwritten
digits, created by high school students and US Census Bureau employees.
Each image is labeled with the corresponding digit.

Due to its widespread use and familiarity in the field, the MNIST dataset is
often mentioned as the "Hello World" of Machine Learning. It's commonly
used to test new classification algorithms and is a popular choice for
beginners for machine learning.

Scikit-Learn provides many helper functions to download popular datasets.


MNIST is one of them. The following code fetches the MNIST dataset:

The code snippet shows how to load the MNIST dataset using Scikit-
Learn's fetch_openml function. The dataset is stored as a dictionary with
keys such as:

DESCR: Provides a description of the dataset.

data: Contains an array where each row represents an instance (image)


and each column represents a feature (pixel value).

target: Contains the labels for each instance (the digit each image
represents).

35 | P a g e
Let’s look at these arrays:

There are 70,000 images, and each image has 784 features. This is because
each image is 28×28 pixels, and each feature simply represents one pixel’s
intensity, from 0 (white) to 255 (black).

Let’s take a look at one digit from the dataset. All you need to do is grab
an instance’s feature vector, reshape it to a 28×28 array, and display it
using Matplotlib’s imshow() function:

36 | P a g e
This looks like a 5, and now let’s look what the label tells us:

Following figure 3-1 shows several images from the MNIST dataset to help
understand the complexity involved in classifying handwritten digits.

Before analyzing the data, it's important to set aside a test set. The MNIST
dataset is already divided into a training set of 60,000 images and a test
set of 10,000 images.

37 | P a g e
7. Training a Binary Classifier
Training a binary classifier involves teaching a machine to recognize and
classify data into one of two categories (e.g., yes or no, cat or dog, spam or
not spam).

Let’s simplify the problem for now and only try to identify one digit for
example, the number 5.

1. Splitting the Data:

Training Data: We’ll use a portion of the images (e.g., 80%) to teach the
classifier what a "5" looks like versus what other digits look like.

Testing Data: The remaining images (e.g., 20%) will be used to test the
classifier’s performance.

This “5-detector” will be an example of a binary classifier, capable of


differentiating between just two classes, 5 and not-5. After splitting it, let’s
create the target vectors for this classification task:

2. Choosing the Model:

We can use a model like SGDClassifier(Stochastic Gradient Descent


Classifier) for this task. It’s a good choice for large datasets and works well
for binary classification.

This classifier has the advantage of being capable of handling very large
datasets efficiently.

38 | P a g e
3. Training the Model:

The classifier looks at the pixel values of each image. Pixels that are part of
the number (dark) have different values from pixels that are part of the
background (light).

Learning: The model learns the patterns that typically represent the digit 5
by adjusting its internal settings to minimize errors when guessing
whether an image is a 5 or not.

Let’s explain how SGD and random_state work in above the code:

SGD (Stochastic Gradient Descent):

Think of SGD like a teacher that helps the computer learn step by step. It
looks at one example at a time (like one image of a digit), and it tries to
guess if it’s a 5 or not.

Learning Process: If the guess is wrong, SGD corrects itself a little bit. Then,
it moves on to the next example and repeats this process. Over time, it
gets better and better at making the right guesses.

random_state=42:

What it does: random_state is like setting the rules of a game. By using


random_state=42, you make sure the game starts the same way every
time you play.

This helps in getting consistent results. So, whenever you train the model
again with the same data, you’ll get the same outcomes every time.

39 | P a g e
4. Making Predictions:

After training, the model can look at new, unseen images and predict
whether the digit is a 5 or not based on what it has learned.

The classifier guesses that this image represents a 5 (True). Looks like it
guessed right in this particular case! Now, let’s evaluate this model’s
performance.

40 | P a g e
8. Performance Measures
In machine learning (ML), performance measures help evaluate how well
a model is working.

Evaluating a classifier is harder than evaluating a regressor, so we will


focus on it. There are many ways to measure how well a classifier works,
so get ready to learn new terms and ideas.

8.1 Measuring Accuracy Using Cross-Validation


Measuring accuracy using cross-validation in machine learning helps us
see how well a model works on data it hasn't seen before. Let's make it
simple:

What is Accuracy?

Accuracy tells us how often our model makes correct predictions. For
example, if a model predicts correctly 80 out of 100 times, its accuracy is
80%.

It’s the ratio of the number of correct predictions to the total number of
predictions made by the model. Mathematically, it’s expressed as:

What is Cross-Validation?

When you train a machine learning model, it’s important to evaluate how
well it performs not just on the data it has seen (training data) but also on
data it hasn't seen before (test data).

41 | P a g e
However, splitting the data into a single training and test set (e.g., 80/20
split) might not always provide a reliable measure of the model’s
performance, especially if the dataset is small or imbalanced.

Cross-validation helps solve this by using multiple subsets of the data to


train and test the model.

Simple Example of Applying cross validation:


from sklearn.model_selection import cross_val_score

# Cross-validation for accuracy


cross_val_accuracy = cross_val_score(sgd_clf, X_train, y_train_5, cv=3,
scoring="accuracy")

print(cross_val_accuracy)

Cross-Validation Setup: The cross_val_score function is used to perform 3-


fold cross-validation. In 3-fold cross-validation, the dataset is divided into
three equal parts (or folds). The model is trained on two folds and tested
on the third fold. This process is repeated three times, each time using a
different fold as the test set.

Accuracy as the Scoring Metric: The scoring="accuracy" parameter is


used, which measures the proportion of correct predictions (both True
Positives and True Negatives) out of all predictions.

Output: The cross_val_accuracy variable contains the accuracy scores for


each of the three folds. For example, you might get an output like
[0.95035, 0.96035, 0.9604 ] which represents the accuracy scores for the
three folds.

42 | P a g e
There are several types of cross-validation methods, each with its own
approach to splitting the data and testing the model. Here are the most
common types:

1. k-Fold Cross-Validation
k-Fold Cross-Validation is a method used to evaluate how well a machine
learning model will perform on new, unseen data.

How does it work?

The idea is to split your data into k equal parts, called folds.

You train the model k times, each time using a different fold as the testing
set and the remaining k-1 folds as the training set.

 Split the Data: Divide your data into k folds. For example, if you choose
k=5, your data is split into 5 parts.
 Train and Test: Train the model on 4 folds and test it on the remaining 1
fold. Repeat this process 5 times, each time using a different fold as the
test data.
 Average the Results: After all iterations, calculate the accuracy (or any
performance metric) for each test set, then take the average of these
results to get the final performance.

Why use it?

It gives a better idea of how the model will perform on different parts of
the data.

It uses all the data for both training and testing, making the evaluation
more reliable.

43 | P a g e
2. Stratified k-Fold Cross-Validation:
Stratified k-Fold Cross-Validation is a variation of k-Fold Cross-Validation,
but it ensures that each fold has the same distribution of class labels (e.g.,
categories) as the original dataset.

How does it work?

The process is similar to regular k-Fold Cross-Validation, but with an added


step to ensure the class distribution is preserved in each fold. This is
particularly useful when you have an imbalanced dataset, where some
classes have more examples than others.

 Split the Data with Stratification: Just like in k-fold, divide your data
into k folds. But now, ensure that each fold has a similar percentage of
classes (e.g., if 70% of your data belongs to class A and 30% to class B,
each fold will have this same ratio).
 Train and Test: Train the model on k-1 folds and test it on the
remaining fold, just like in regular k-fold.
 Average the Results: After k iterations, average the accuracy results to
get the final performance.

Why use it?

It’s essential when your dataset is imbalanced, meaning one class is more
frequent than others. Stratified k-Fold ensures that each fold represents
the overall distribution of classes, leading to more reliable performance
metrics.

44 | P a g e
8.2 Confusion Matrix
A Confusion Matrix is a table used to evaluate the performance of a
classification model in machine learning.

It helps you understand how well your model is predicting different


classes.

A confusion matrix is typically a 2x2 table for binary classification (but can
be larger for multi-class classification).

Simple Example of Applying Confusion Matrix:


from sklearn.model_selection import cross_val_predict
from sklearn.metrics import confusion_matrix

# Cross-validation predictions
y_train_pred = cross_val_predict(sgd_clf, X_train, y_train_5, cv=3)

# Confusion Matrix
conf_matrix = confusion_matrix(y_train_5, y_train_pred)
print(conf_matrix)

sgd_clf: This is your classifier (e.g., Stochastic Gradient Descent classifier).

X_train: The training data (features).

y_train_5: The labels you're predicting (whether the digit is 5 or not, for
example).

cv=3: This means 3-fold cross-validation. The data is split into 3 parts, and
the model is trained on two parts while tested on the third. This process
repeats three times.

45 | P a g e
Here The cross_val_predict function performs cross-validation and gives
you predictions for each instance in the training data.

Now, in Confusion Matrix:

confusion_matrix(y_train_5, y_train_pred): This creates the confusion


matrix by comparing the true labels (y_train_5) with the predicted labels
(y_train_pred).

In a binary classification problem (like determining if a digit is 5 or not), the


confusion matrix will look something like this:
[[ TN, FP ],
[ FN, TP ]]

Here's how it looks:

1. True Positives (TP):


 The model correctly predicts the positive class.
 Example: The model predicts a '5', and the image really is a '5'.
2. True Negatives (TN):
 The model correctly predicts the negative class.
 Example: The model predicts the image is not a '5', and it really isn't
a '5'.

46 | P a g e
3. False Positives (FP) (Type I Error):
 The model incorrectly predicts the positive class when it’s actually
negative.
 Example: The model predicts a '5', but the image is actually a '3'. This
is a mistake.
4. False Negatives (FN) (Type II Error):
 The model incorrectly predicts the negative class when it’s actually
positive.
 Example: The model predicts that an image isn't a '5', but it actually
is a '5'. This is also a mistake.

Key Metrics Derived from the Confusion Matrix:

1. Accuracy:
What it means: Accuracy tells us how many predictions the model got
right overall.

Formula:
Example: If the model correctly predicts '5' and correctly identifies images
that are not '5', and does this correctly for 90 out of 100 images, the
accuracy is 90%.

2. Precision:
What it means: Precision tells us how many of the images predicted as '5'
were actually '5'.

Formula:
Example: If the model predicts 10 images as '5', but only 8 of those are
actually '5', then the precision is 80%.
47 | P a g e
3. Recall (Sensitivity or True Positive Rate):
What it means: Recall tells us how many of the actual '5' images were
correctly identified by the model.

Formula:
Example: If there are 20 actual '5' images, and the model correctly
predicts 18 of them, then the recall is 90%.

These metrics help assess how well your model is performing, especially
when making predictions about specific classes like the digit '5'.

This gives a clear picture of how well the classifier is performing, showing
both the correct and incorrect predictions.

48 | P a g e
8.3 Precision and Recall
Scikit-Learn provides several functions to compute classifier metrics,
including precision and recall:

Simple Example of find Precision and Recall:

According to above code:

1. Precision:
Formula Used:

 True Positives (TP): 3530 (These are the images correctly predicted as
"5s" by the classifier.)
 False Positives (FP): 687 (These are the wrong predictions where the
classifier said "5," but it wasn't actually a "5.")

49 | P a g e
Explanation:

 The precision score is 0.8371, or 83.7%. This means that when the
classifier predicts an image as a "5," it is correct 83.7% of the time. In
other words, out of all the times the classifier says "This is a 5," it's right
most of the time, but it does make mistakes sometimes.
 Imagine you predicted 100 images as "5." Precision tells you that 84 of
those predictions are correct, and 16 are incorrect.

2. Recall:
Formula Used:

 True Positives (TP): 3530 (Again, these are the images correctly
predicted as "5s" by the classifier.)
 False Negatives (FN): 1891 (These are the times when the classifier
missed a "5," meaning it was actually a "5," but the classifier didn't
recognize it.)

Explanation:

 The recall score is 0.6512, or 65.1%. This means that out of all the actual
"5s" in the dataset, the classifier correctly identified 65.1% of them. So,
it found most of the "5s," but missed some.
 Imagine there are 100 actual "5s" in your dataset. Recall tells you that
65 of them were correctly found, but 35 were missed.

50 | P a g e
Summary:

 Precision is about how accurate the classifier is when it predicts a "5."


 Recall is about how well the classifier finds all the actual "5s."

This helps you see how good your classifier is at identifying "5s" and how
careful it is when making predictions.

3. F1 Score:
It is often convenient to combine precision and recall into a single metric
called the F1 score, in particular if you need a simple way to compare two
classifiers.

The F1 score is the harmonic mean of precision and recall.

While the regular mean treats all values equally, the harmonic mean gives
much more weight to low values.

As a result, the classifier will only get a high F1 score if both recall and
precision are high.

Simple Example of find Precision and Recall:

F1 Score Formula:

According to above code:


F1 score Formula:

51 | P a g e
In Our case:

 Precision: 0.8371
 Recall: 0.6512

So, the F1 score is:

Explanation:

 The F1 score of 0.7325 tells you that your classifier achieves a good
balance between precision and recall.
 While precision is higher (83.7%), meaning the classifier is usually
correct when it predicts "5", the recall is lower (65.1%), indicating that
it misses some "5s".
 The F1 score provides a single number that reflects both of these
aspects.

In summary, the F1 score of 0.7325 shows that your model has a good,
balanced performance when considering both how often it is correct when
predicting a "5" (precision) and how well it finds all the "5s" (recall).

52 | P a g e
8.4 The ROC Curve.
ROC stands for Receiver Operating Characteristics, and the ROC curve is
the graphical representation of the effectiveness of the binary
classification model. It plots the true positive rate (TPR) vs the false
positive rate (FPR) at different classification thresholds.

The ROC curve shows the trade-off between two important metrics:

 True Positive Rate (TPR), which is also known as recall.


 False Positive Rate (FPR), which is the probability of falsely identifying a
negative instance as positive.

The ROC curve is a plot of TPR (y-axis) vs FPR (x-axis) at different


thresholds.

Now what are thresholds?

 When a model makes predictions, it often outputs a probability score


(e.g., 0.8 for 80% chance that this is a 5).
 A threshold is a decision point. For example, if the threshold is 0.5, any
prediction with a probability greater than 0.5 is classified as a positive
(5), and anything lower is classified as a negative (not a 5).
 By adjusting the threshold, you change the balance between true
positives and false positives.

Area Under the ROC Curve (AUC)

AUC is a numerical value that represents the area under the ROC curve. It
provides a single value to summarize the performance of the model across
all possible thresholds.

53 | P a g e
It measures the ability of the model to differentiate between the positive
and negative classes.

Key Points About AUC

 Range of AUC:
 The AUC value ranges from 0 to 1.
 An AUC of 0.5 indicates a model that performs no better than
random chance.
 An AUC closer to 1 indicates a model with excellent performance.
 Interpretation of AUC Values:
 0.9 - 1.0: Excellent
 0.8 - 0.9: Good
 0.7 - 0.8: Fair
 0.6 - 0.7: Poor
 0.5 - 0.6: Fail

Simple Example of AUC-ROC Curve:


from sklearn.metrics import roc_curve, roc_auc_score
import matplotlib.pyplot as plt

# Get the decision scores for ROC curve


y_scores = cross_val_predict(sgd_clf, X_train, y_train_5, cv=3,
method="decision_function")

# Calculate ROC curve


fpr, tpr, thresholds = roc_curve(y_train_5, y_scores)

# Plot ROC curve


plt.figure(figsize=(8, 6))
54 | P a g e
plt.plot(fpr, tpr, linewidth=2)
plt.plot([0, 1], [0, 1], 'k--') # Diagonal line (random model)
plt.axis([0, 1, 0, 1])
plt.xlabel('False Positive Rate (FPR)')
plt.ylabel('True Positive Rate (Recall)')
plt.title('ROC Curve')
plt.grid(True)]
plt.show()

# Calculate AUC score


roc_auc = roc_auc_score(y_train_5, y_scores)
print(f"ROC AUC Score: {roc_auc}")

55 | P a g e
According to Output:

 The blue line represents the ROC curve, which plots the True Positive
Rate (Recall) against the False Positive Rate at various classification
thresholds.
 The dotted diagonal line is the line of no judgement (also known as a
baseline or random guess curve).
 It represents a model that makes random predictions, where the True
Positive Rate equals the False Positive Rate.
 Any model whose ROC curve is close to this diagonal line is performing
no better than random guessing.
 The blue curve is hold close the top-left corner of the plot, which means
the model has a high True Positive Rate and a low False Positive Rate
across various thresholds. This indicates that the model performs very
well.
 Since the curve stays far from the diagonal line and covers a large area,
the model is effective at distinguishing between the two classes.
 AUC = 0.9605 is close to 1, meaning the model has an excellent ability
to distinguish between positive and negative classes.

56 | P a g e
Thank You
: : Any Query : :
Contact: Ruparel Education Pvt. Ltd.
Mobile No: 7600044051

57 | P a g e

You might also like