0% found this document useful (0 votes)

11 views

Machine Learnimg Notes

Uploaded by

baffoebenaiah

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

11 views

Machine Learnimg Notes

Uploaded by

baffoebenaiah

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

You are on page 1/ 13

Thanks to machine learning, exciting changes are happening across all areas of society, including:

 Recent advancements in industries such as autonomous vehicles.

 Accurate and rapid translation of the text into hundreds of languages.

 AI assistants you might find in your home.

 Worker safety improvements.

 Quicker pharmaceutical design and development.

Machine learning is a complex subject area. Our goal in this lesson is to introduce you to some of the most common
terms and ideas used in machine learning. We will then walk you through the different steps involved in machine
learning and finish with a series of examples that use machine learning to solve real-world situations.

Let's look at the outline for this lesson.

Machine learning is part of the broader field of artificial intelligence. This field is concerned with the capability of
machines to perform activities using human-like intelligence. Within machine learning there are several different
kinds of tasks or techniques:

 In supervised learning, every training sample from the dataset has a corresponding label or output value
associated with it. As a result, the algorithm learns to predict labels or output values. We will explore this in-
depth in this lesson.

 In unsupervised learning, there are no labels for the training data. A machine learning algorithm tries to learn
the underlying patterns or distributions that govern the data. We will explore this in-depth in this lesson.

 In reinforcement learning, the algorithm figures out which actions to take in a situation to maximize a reward
(in the form of a number) on the way to reaching a specific goal. This is a completely different approach than
supervised and unsupervised learning. We will dive deep into this in the next lesson.

In traditional problem-solving with software, a person analyzes a problem and engineers a solution in code to solve
that problem. For many real-world problems, this process can be laborious (or even impossible) because a correct
solution would need to take a vast number of edge cases into consideration.
Imagine, for example, the challenging task of writing a program that can detect if a cat is present in an image. Solving
this in the traditional way would require careful attention to details like varying lighting conditions, different types of
cats, and various poses a cat might be in.

In machine learning, the problem solver abstracts away part of their solution as a flexible component called a model,
and uses a special program called a model training algorithm to adjust that model to real-world data. The result is a
trained model which can be used to predict outcomes that are not part of the dataset used to train it.

In a way, machine learning automates some of the statistical reasoning and pattern-matching the problem solver
would traditionally do.

The overall goal is to use a model created by a model-training algorithm to generate predictions or find patterns in
data that can be used to solve a problem.
Machine learning is a new field created at the intersection of statistics, applied math, and computer science. Because
of the rapid and recent growth of machine learning, each of these fields might use slightly different formal definitions
of the same terms.

Terminology

Machine learning, or ML, is a modern software development technique that enables computers to solve problems by
using examples of real-world data.

In supervised learning, every training sample from the dataset has a corresponding label or output value associated
with it. As a result, the algorithm learns to predict labels or output values.

In reinforcement learning, the algorithm figures out which actions to take in a situation to maximize a reward (in the
form of a number) on the way to reaching a specific goal.

In unsupervised learning, there are no labels for the training data. A machine learning algorithm tries to learn the
underlying patterns or distributions that govern the data.

tep 1: Define the problem

Is it possible to find clusters of similar books based on the presence of common words in the book descriptions?

You do editorial work for a book recommendation company, and you want to write an article on the largest book
trends of the year. You believe that a trend called "micro-genres" exists, and you have confidence that you can use
the book description text to identify these micro-genres.

By using an unsupervised machine learning technique called clustering, you can test your hypothesis that the book
description text can be used to identify these "hidden" micro-genres.

Identify the machine learning task you could use

By using an unsupervised machine learning technique called clustering, you can test your hypothesis that the book
description text can be used to identify these "hidden" micro-genres.

Earlier in this lesson, you were introduced to the idea of unsupervised learning. This machine learning task is
especially useful when your data is not labeled.
Step 2: Build your dataset

To test the hypothesis, you gather book description text for 800 romance books published in the current year. You
plan to use this text as your dataset.

Data exploration, cleaning, and preprocessing

In the lesson about building your dataset, you learned about how sometimes it is necessary to change the format of
the data that you want to use. In this case study, we need use a process called vectorization. Vectorization is a
process whereby words are converted into numbers.

Data cleaning and exploration

For this project, you believe capitalization and verb tense will not matter, and therefore you remove capitals and
convert all verbs to the same tense using a Python library built for processing human language. You also remove
punctuation and words you don’t think have useful meaning, like 'a' and 'the'. The machine learning community
refers to these words as stop words.

Data preprocessing
Before you can train the model, you need to do a type of data preprocessing called data vectorization, which is used
to convert text into numbers.
As shown in the following image, you transform this book description text into what is called a bag of words
representation, so that it is understandable by machine learning models.

How the bag of words representation works is beyond the scope of this lesson. If you are interested in learning more,
see the What's next section at the end of this chapter.
Step 3: Train the model

Now you are ready to train your model.

You pick a common cluster-finding model called k-means. In this model, you can change a model parameter, k, to be
equal to how many clusters the model will try to find in your dataset.

Your data is unlabeled and you don't how many micro-genres might exist. So, you train your model multiple times
using different values for k each time.

What does this even mean? In the following graphs, you can see examples of when k=2 and when k=3.

During the model evaluation phase, you plan on using a metric to find which value for k is the most appropriate.

Step 4: Model evaluation

In machine learning, numerous statistical metrics or methods are available to evaluate a model. In this use case,
the silhouette coefficient is a good choice. This metric describes how well your data was clustered by the model. To
find the optimal number of clusters, you plot the silhouette coefficient as shown in the following image below. You
find the optimal value is when k=19.
Often, machine learning practitioners do a manual evaluation of the model's findings.

You find one cluster that contains a large collection of books that you can categorize as "paranormal teen romance."
This trend is known in your industry, and therefore you feel somewhat confident in your machine learning approach.
You don’t know if every cluster is going to be as cohesive as this, but you decide to use this model to see if you can
find anything interesting about which to write an article.

Step 5: Model inference

As you inspect the different clusters found when k=19, you find a surprisingly large cluster of books. Here's an
example from fictionalized cluster #7.

As you inspect the preceding table, you can see that most of these text snippets indicate that the characters are in
some kind of long-distance relationship. You see a few other self-consistent clusters and feel you now have enough
useful data to begin writing an article on unexpected modern romance micro-genres.

Wrap-up

In this example, you saw how you can use machine learning to help find micro-genres in books by using the text
found on the back of the book. Here is summary of key moments from the lesson you just finished.

One
For some applications of machine learning, you need to not only clean and preprocess the data but also convert the
data into a format that is machine readable. In this example, the words were converted into numbers through a
process called data vectorization.
Two
Solving problems in machine learning requires iteration. In this example you saw how it was necessary to train the
model multiples times for different values of k. After training your model over multiple iterations you saw how the
silhouette coefficient could be use to determine the optimal value for k.

Three
During model inference you continued to inspect the clusters for accuracy to ensure that your model was generative
useful predictions.

Terminology

 Bag of words: A technique used to extract features from text. It counts how many times a word appears in a
document (corpus), and then transforms that information into a dataset.

 Data vectorization: A process that converts non-numeric data into a numerical format so that it can be used
by a machine learning model.

 Silhouette coefficients: A score from -1 to 1 describing the clusters found during modeling. A score near zero
indicates overlapping clusters, and scores less than zero indicate data points assigned to incorrect clusters. A
score approaching 1 indicates successful identification of discrete non-overlapping clusters.

 Stop words: A list of words removed by natural language processing tools when building your dataset. There
is no single universal list of stop words used by all-natural language processing tools.

Step 1: Defining the problem

Imagine you run a company that offers specialized on-site janitorial services.
One client - an industrial chemical plant - requires a fast response for spills
and other health hazards. You realize if you could automatically detect spills
using the plant's surveillance system, you could mobilize your janitorial team
faster.

Machine learning could be a valuable tool to solve this problem.

Choosing a model
As shown in the image above, your goal will be to predict if each image
belongs to one of the following classes:
 Contains spill
 Does not contain spill

Step 2: Building a dataset

Collecting
 Using historical data, as well as safely staged spills, quickly build a
collection of images that contain both spills and non-spills in multiple
lighting conditions and environments.
Exploring and cleaning
 Go through all of the photos to ensure that the spill is clearly in the shot.
There are Python tools and other techniques available to improve image
quality, which you can use later if you determine that you need to
iterate.
Data vectorization (converting to numbers)
 Many models require numerical data, so you must transform all of your
image data needs to be transformed into a numerical format. Python
tools can help you do this automatically.
 In the following image, you can see how each pixel in the image
immediately below can be represented in the image beneath it using a
number between 0 and 1, with 0 being completely black and 1 being
completely white.

Split the data

 Split your image data into a training dataset and a test dataset.

 Step 3: Model Training

 Traditionally, solving this problem would require hand-engineering features on top of the underlying
pixels (for example, locations of prominent edges and corners in the image), and then training a
model on these features.
Today, deep neural networks are the most common tool used for solving this kind of problem. Many
deep neural network models are structured to learn the features on top of the underlying pixels so
you don’t have to learn them. You’ll have a chance to take a deeper look at this in the next lesson, so
we’ll keep things high-level for now.
 CNN (convolutional neural network)
 Neural networks are beyond the scope of this lesson, but you can think of them as a collection of
very simple models connected together. These simple models are called neurons, and the
connections between these models are trainable model parameters called weights.
Convolutional neural networks are a special type of neural network that is particularly good at
processing images.

Step 4: Model evaluation

As you saw in the last example, there are many different statistical metrics
that you can use to evaluate your model. As you gain more experience in
machine learning, you will learn how to research which metrics can help you
evaluate your model most effectively. Here's a list of common metrics:
 Accuracy
 Confusion matrix
 F1 score
 False positive rate
 False negative rate
 Log loss
 Negative predictive value
 Precession
 Recall
 ROC Curve
 Specificity
In cases such as this, accuracy might not be the best evaluation mechanism.

Why not? The model will see the does not contain spill' class almost all the
time, so any model that just predicts no spill most of the time will seem pretty
accurate.
What you really care about is an evaluation tool that rarely misses a real spill.
After doing some internet sleuthing, you realize this is a common problem and
that precision and recall will be effective. Think of precision as answering the
question, "Of all predictions of a spill, how many were right?" and recall as
answering the question, "Of all actual spills, how many did we detect?"
Manual evaluation plays an important role. If you are unsure if your staged
spills are sufficiently realistic compared to actual spills, you can get a better
sense how well your model performs with actual spills by finding additional
examples from historical records. This allows you to confirm that your model
is performing satisfactorily.
Step 5: Model inference
The model can be deployed on a system that enables you to run machine
learning workloads such as AWS Panorama.

Thankfully, most of the time, the results will be from the class does not
contain spill.

But, when the class contains spill' is detected, a simple paging system could
alert the team to respond.
Wrap-up
In this example, you saw how you can use machine learning to help detect
spills in a work environment. This example also used a modern machine
learning technique called a convolutional neural network (CNN).

Here is summary of key moments from the lesson that you just finished.
One
For some applications of machine learning, you need to use more complicated
techniques to solve the problem. While modern neural networks are a
powerful tool, don’t forget their cost in terms of being easily explained.

Two
High quality data once again was very important to the success of this
application, to the point where even staging some fake data was required.
Once again, the process of data vectorization was required so it was
important to convert the images into numbers so that they could be used by
the neural network.

Three
During model inference you continued to inspect the predictions for accuracy.
It is especially important in this case because you created some fake data to
use when training your model.
Terminology
Neural networks: a collection of very simple models connected together.
 These simple models are called neurons.
 The connections between these models are trainable model parameters
called weights.
Convolutional neural networks(CNN): a special type of neural network
particularly good at processing images.

The WatchGuard - Sourcebook 3E
93% (14)
The WatchGuard - Sourcebook 3E
161 pages
KATTAR Root Cause Analysis Tool
100% (1)
KATTAR Root Cause Analysis Tool
3 pages
OptionalQuestions1
No ratings yet
OptionalQuestions1
1 page
Aws ML
No ratings yet
Aws ML
125 pages
AWS Machine Learning Fundamentals
No ratings yet
AWS Machine Learning Fundamentals
41 pages
Introduction To ML
No ratings yet
Introduction To ML
3 pages
Aws Scholarship
No ratings yet
Aws Scholarship
48 pages
PUSHKAR
No ratings yet
PUSHKAR
15 pages
Machine Learning
No ratings yet
Machine Learning
6 pages
The Machine Learning Landscape
No ratings yet
The Machine Learning Landscape
25 pages
1589372770679_spammer detection fake pople identification on social networks1 (1)
No ratings yet
1589372770679_spammer detection fake pople identification on social networks1 (1)
64 pages
Fundamentals of Machine Learning II
No ratings yet
Fundamentals of Machine Learning II
13 pages
An Enlightenment To Machine Learning - Resp
No ratings yet
An Enlightenment To Machine Learning - Resp
22 pages
Machine Learning Basics
No ratings yet
Machine Learning Basics
78 pages
Chapter 1
No ratings yet
Chapter 1
40 pages
Computer Project X
No ratings yet
Computer Project X
11 pages
Lecture 1
No ratings yet
Lecture 1
47 pages
Unit 1 ML
No ratings yet
Unit 1 ML
70 pages
INTRODUCTION
No ratings yet
INTRODUCTION
51 pages
ML Unit1.2
No ratings yet
ML Unit1.2
24 pages
ML notes
No ratings yet
ML notes
18 pages
Unit 1 Notes
No ratings yet
Unit 1 Notes
68 pages
Unit I MACHINE LEARNING
No ratings yet
Unit I MACHINE LEARNING
87 pages
Introduction To Machine Learning
No ratings yet
Introduction To Machine Learning
24 pages
Machine Learning Tutorial
100% (1)
Machine Learning Tutorial
44 pages
ML Merged
No ratings yet
ML Merged
433 pages
Machine Learning: Bilal Khan
No ratings yet
Machine Learning: Bilal Khan
26 pages
Overview of machine learning
No ratings yet
Overview of machine learning
60 pages
mehakreport
No ratings yet
mehakreport
23 pages
ML Chapter 1
No ratings yet
ML Chapter 1
37 pages
ML L1 PDF
No ratings yet
ML L1 PDF
43 pages
Task The Problems That Can Be Solved With Machine Learning
No ratings yet
Task The Problems That Can Be Solved With Machine Learning
9 pages
Intro Machine Learning
No ratings yet
Intro Machine Learning
4 pages
Technical Report 2.0
No ratings yet
Technical Report 2.0
8 pages
Machine Learning - A Comprehensive, Step-by-Step Guide to Learning and Applying Advanced Concepts and Techniques in Machine Learning: 3
From Everand
Machine Learning - A Comprehensive, Step-by-Step Guide to Learning and Applying Advanced Concepts and Techniques in Machine Learning: 3
Peter Bradley
No ratings yet
Machine Learning
No ratings yet
Machine Learning
39 pages
Updated Unit 1
No ratings yet
Updated Unit 1
57 pages
An Enlightenment To Machine Learning
100% (1)
An Enlightenment To Machine Learning
16 pages
Data Science IV
No ratings yet
Data Science IV
126 pages
@vtucode - in 21AI63 Module 1 AI&ML 2021 Scheme
No ratings yet
@vtucode - in 21AI63 Module 1 AI&ML 2021 Scheme
38 pages
Internship Report
No ratings yet
Internship Report
31 pages
Chapter 1
No ratings yet
Chapter 1
27 pages
Research Paper (Machine Learning & Clustering)
No ratings yet
Research Paper (Machine Learning & Clustering)
8 pages
mlintro-2
No ratings yet
mlintro-2
28 pages
mlintro-4
No ratings yet
mlintro-4
28 pages
Guidebook Machine Learning Basics PDF
100% (1)
Guidebook Machine Learning Basics PDF
16 pages
Machine Learning
No ratings yet
Machine Learning
21 pages
Next Level Deep Machine Learning: Complete Tips and Tricks to Deep Machine Learning
From Everand
Next Level Deep Machine Learning: Complete Tips and Tricks to Deep Machine Learning
Joe Grant
No ratings yet
MAchine Learning
No ratings yet
MAchine Learning
10 pages
CE469 - Introduction To Machine Learning: Lecturer Contact
No ratings yet
CE469 - Introduction To Machine Learning: Lecturer Contact
33 pages
5th Sem Report
No ratings yet
5th Sem Report
29 pages
Unit - 3 - ML
No ratings yet
Unit - 3 - ML
53 pages
python-Final-Internship-Report
No ratings yet
python-Final-Internship-Report
29 pages
Machine Learning
No ratings yet
Machine Learning
24 pages
Machine Learning
No ratings yet
Machine Learning
51 pages
UNIT I Introduction To Machine Learning
No ratings yet
UNIT I Introduction To Machine Learning
150 pages
What Is Machine Learning
No ratings yet
What Is Machine Learning
4 pages
Machine Learning: 1.1 Types of Problems and Tasks
No ratings yet
Machine Learning: 1.1 Types of Problems and Tasks
9 pages
Chapter 11 Introduction to Machine Learning
No ratings yet
Chapter 11 Introduction to Machine Learning
11 pages
Module1 ML
No ratings yet
Module1 ML
114 pages
ML Lec 1
No ratings yet
ML Lec 1
47 pages
mlintro-3
No ratings yet
mlintro-3
28 pages
Mastering Classification Algorithms for Machine Learning: Learn how to apply Classification algorithms for effective Machine Learning solutions (English Edition)
From Everand
Mastering Classification Algorithms for Machine Learning: Learn how to apply Classification algorithms for effective Machine Learning solutions (English Edition)
PARTHA MAJUMDAR
No ratings yet
Presentation Notes
No ratings yet
Presentation Notes
5 pages
(20370830 - Research On Education and Media) Computational Thinking in Primary Education - A Systematic Literature Review
No ratings yet
(20370830 - Research On Education and Media) Computational Thinking in Primary Education - A Systematic Literature Review
31 pages
Exegetical Translation of The Qur'an: An Action Research On Prospective Islamic Teachers in Indonesia
No ratings yet
Exegetical Translation of The Qur'an: An Action Research On Prospective Islamic Teachers in Indonesia
15 pages
Attitude Formation
No ratings yet
Attitude Formation
13 pages
Thesis On Process Safety
100% (3)
Thesis On Process Safety
8 pages
Towards Resistant Geostatistics - Cressie2
No ratings yet
Towards Resistant Geostatistics - Cressie2
586 pages
Introduction To Statistics
No ratings yet
Introduction To Statistics
18 pages
Savitribai Phule Pune University: Assistance by SPPU For Project-Based Innovative Research (Aspire)
No ratings yet
Savitribai Phule Pune University: Assistance by SPPU For Project-Based Innovative Research (Aspire)
30 pages
RESEARCH Drills With o
No ratings yet
RESEARCH Drills With o
36 pages
Selling The TQ Concept: Service Quality
No ratings yet
Selling The TQ Concept: Service Quality
4 pages
TM1
No ratings yet
TM1
64 pages
The Influence of Students Sociocultural
No ratings yet
The Influence of Students Sociocultural
8 pages
Resilience in Young Children at Risk A Systematic Literature Review On The Studies Conducted To Date and Their Outcomes
No ratings yet
Resilience in Young Children at Risk A Systematic Literature Review On The Studies Conducted To Date and Their Outcomes
49 pages
Download Strategic Management A Competitive Advantage Approach Concepts and Cases Sixteenth Edition David ebook All Chapters PDF
100% (1)
Download Strategic Management A Competitive Advantage Approach Concepts and Cases Sixteenth Edition David ebook All Chapters PDF
62 pages
Resume: Ramesh Kumar Sharma (
No ratings yet
Resume: Ramesh Kumar Sharma (
2 pages
40 Questions On Probability For Data Science
No ratings yet
40 Questions On Probability For Data Science
42 pages
Statement of Financial Position
No ratings yet
Statement of Financial Position
10 pages
Cultural Sustainability and Social Sustainability Influences On The Sustainability of Rural Museums in China
No ratings yet
Cultural Sustainability and Social Sustainability Influences On The Sustainability of Rural Museums in China
19 pages
Oose Assignment: Submitted To
No ratings yet
Oose Assignment: Submitted To
2 pages
Cbar Format 1
No ratings yet
Cbar Format 1
15 pages
Quiz 3
No ratings yet
Quiz 3
7 pages
The Customer Behaviour & Custmer Satisfaction at Aviva
100% (1)
The Customer Behaviour & Custmer Satisfaction at Aviva
85 pages
Get Embracing Industry 4 0 Selected Articles From MUCET 2019 Mohd Azraai Mohd Razman PDF Ebook With Full Chapters Now
100% (4)
Get Embracing Industry 4 0 Selected Articles From MUCET 2019 Mohd Azraai Mohd Razman PDF Ebook With Full Chapters Now
62 pages
592-Article Text-3119-1-10-20230129
No ratings yet
592-Article Text-3119-1-10-20230129
16 pages
FINAL SIP Activated Carbon
No ratings yet
FINAL SIP Activated Carbon
9 pages
Prevalence of Lower Urinary Tract Symptoms in Austria As Assessed by An Open Survey of 2,096 Men
No ratings yet
Prevalence of Lower Urinary Tract Symptoms in Austria As Assessed by An Open Survey of 2,096 Men
6 pages
MGT3306
No ratings yet
MGT3306
8 pages