0% found this document useful (0 votes)
67 views

Lesson 3 - Machine Learning Workflow

Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
67 views

Lesson 3 - Machine Learning Workflow

Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 53

Introduction to Artificial Intelligence

Lesson 3: Machine Learning Workflow

© Simplilearn. All rights reserved.


Learning Objectives

Describe the seven steps of machine learning workflow


Machine Learning Workflow

Topic 1: Machine Learning Process


The Machine Learning Workflow

It is essential for all technical and non-technical stakeholders to understand


machine learning workflow to understand:
• The job of data scientist
• The processes a data scientist follows to provide feedback to decision-makers
• The machine learning process in a business environment
The Machine Learning Process

2. Ask a
sharp Yes
question
3. Add the
7. Use the
data to
answer
Yes
No
table
Yes

Yes
1. Get more Yes

data
No No

6. Answer
4. Check for
the No
quality
question
Yes

Yes 5.
Transform
features
Step 1: Get More Data

• The data collected is used to investigate a business challenge.

• The quality of the predictive model depends upon the quantity and quality of the data gathered.

• The data can be collected in different formats.


Data Format Examples

Names Numbers

Type Sonia Tran Money $300m


Variety Caramel latte Count 69 pizzas
Id Air Force One Pixel brightness 232/255
Model number R2-D2 Temperature 30 degree F
Category Chocolate Sound intensity 0.64
Text Best. Show. Ever.

Names that look like numbers


Names that look like numbers
and can be turned into numbers

Zip code 95126 Place First, second, third


Social security number 602-47-1899 Time zone Pacific, mountain, central,
eastern
Serial number 100000023987 Diridon, San Francisco,
Train stops Sunnyvale, Menlo Park
Credit card number 5467-3345-2122-5508
Sound intensity 0.64 Side Left, middle, right
Sound intensity 0.64
Goals of Machine Learning Workflow

The goals of data science and machine learning is to:

• Derive answers to business challenges

• Derive meaningful conclusions from complicated issues

• Identify actionable steps given a wide set of variables

To be able to do so, you need to ask a sharp question as opposed to a vague one.
Step 2: Ask a Sharp Question

It helps you get clear answers to the questions.

? It is direct and specific.

Need for a
sharp question It focuses on a single topic.

It focuses on the exact need and requirement.


Vague vs. Sharp Questions

? Vague questions ? Sharp questions

1. What should you do? 1. Which route will get you to work fastest?
2. How should you live my life? 2. How many times a user will use the new
3. Which career path should you take? product features?
4. Which data can tell you about your business? 3. How can the revenue be maximized?
Sharp Question Example

Goal • Your ultimate goal is to be able to analyze your historical data and predict the
stock price at a future date.
• You have to study all different tables of data in your database and analyze how
your company is doing month by month in terms of sales.
• This will ultimately lead you to understand how the company is doing in terms
of it’s market share.

Sharp Question • What will be your company’s stock price next week?
Sharp Question Example

The company’s database is divided into following tables:


Step 3: Add Data to the Table

01 Data analyst arranges data in database tables in a systematic manner.

02 Systematic arrangement of data helps in detailed analysis.

03 Data is stored in the table in the form of columns and rows.

Table columns represent data of a single type and rows represent


04
records pertaining to one entity.

The final step is to aggregate, distribute, compute, or measure to derive


05
a data analysis.
Data Analysis in Machine Learning

• Data analysis is the process of deriving new findings from the historical data.

• It mainly focuses on aggregating table data to find the answers to various business problems.

• It is one of the essential steps performed by data analysts to build machine learning algorithm.
Example: Add Data to the Table

• Each table row represents observations across given attributes.


• The stock price column shows the stock value across different dates.
Example: Data Analysis

Aggregate and distribute the data as shown here:

Quarter Total Sales

2015Q4 119.2M

2016Q1 221.0M

2016Q2 215.9M Month Total Sales

2016Q3 189.3M 2016/01 43.0M

2016Q4 211.2M 2016/02 60.1M

… … 2016/03 55.5M
Example: Aggregate

• You can aggregate the data in the table to derive answers.


• This process is called data analysis and involves counting total observations in a table or
combining data from multiple tables.
Example: Aggregate and Distribute

• You can focus on all observations for a particular column or feature and total it.
Example: Distribute, Compute, and Measure

• This is an example of performing aggregate, distribute, compute and measure


operations on data in tables.
• Each feature and their observations are distributed across the table and then combined.
Example: Estimate

• The market share column shows the estimated stock price values of the company that are
derived from the previous steps.
Step 4: Check for Quality

Quality check determines if the data is acceptable for further investigation.

For an algorithm to work, the data in a column should be in a consistent format.

It involves computation and analysis of the data derived from previous steps.
Check for Quality: Example

• The Birth year column in the table has data format inconsistencies.
• The date in this column needs to be converted to a consistent format to make it readable for
the ML algorithm.
Check for Quality: Example

• The Birth year column in the table has inconsistent format.


Step 5: Transform Features

• This step includes Feature Engineering.


• Each characteristic of a data element is known as a feature.
• Feature engineering enables you to make sense out of the data, especially when
there are multiple features.
• Some features may not give useful information for the model, whereas some features
may be combined to derive meaningful information.
• Feature engineering helps you overcome such challenges.
Tricks of Feature Engineering

• Scale Invariant Feature Transform (SIFT): Images


Data-specific
• Term Frequency-Inverse Document Frequency (TF-IDF): Text

• Econometric, technological, agricultural, and sociological data


Domain-specific
engineering

Deep Learning • Images, text, and audio data engineering


Transform Features: Example

• There are 3 columns and 65,670 rows.


• Features 0 and 1 have similar values.
• The numbers are meaningless and scattered.
Transform Features: Example (contd.)

• Values of feature column 0 is multiplied with every observation in feature column 1.


• These values are plotted in image 2.

Image 1 Image 2
Transform Features: Example (contd.)
• When we subtract feature 0 from feature 1 and plot it, we get a curve.
• This curve is normal or Gaussian distribution or bell-shaped curve.
Step 6: Answer the Questions

• This step helps analyze if the obtained answers are clear.


• These questions include:

1 How much or how many?

2 Which category?

3 Which group?

4 Does this look strange?

5 Which action?
Answer the Questions: Type 1

How much or how many?

What will be the temperature this


1
Friday?

2 How many people will like my post?

What will be my product sales next


3
month?
Answer the Questions: Type 1 (contd.)

How much or how many?

What will be the temperature this


1
Friday?

2 How many people will like my post?

What will be my product sales next


3
month?
Answer the Questions: Type 1 (contd.)

How much or how many?

What will be the temperature this


1
Friday?

2 How many people will like my post?

What will be my product sales next


3
month?
Answer the Questions: Type 2

Which category?

1 Is this an image of a dog?

2 What is the topic of this news article?

Which hotel in my area offers free


3
Wi-Fi?
Answer the Questions: Type 2 (contd.)

Which category?

1 Is this an image of a dog?

2 What is the topic of this news article?

Which hotel in my area offers free


3
Wi-Fi?
Answer the Questions: Type 2 (contd.)

Which category?

1 Is this an image of a dog?

2 What is the topic of this news article?

Which hotel in my area offers free


3
Wi-Fi?
Answer the Questions: Type 3

Which group?

Which shoppers purchase similar


1
products?

Which group of viewers like horror


2
movies?

How best can you divide this book


3
into ten topics?
Answer the Questions: Type 3 (contd.)

Which group?

Which shoppers purchase similar


1
products?

Which group of viewers like horror


2
movies?

How best can you divide this book


3
into ten topics?
Answer the Questions: Type 3 (contd.)

Which group?

Which shoppers purchase similar


1
products?

Which group of viewers like horror


2
movies?

How best can you divide this book


3
into ten topics?
Answer the Questions: Type 4

Does this look strange?

1 Is this internet message typical?

2 Is this heart beat reading abnormal?

Does these transactions look unusual


3 as opposed to customer’s usual credit
card transactions ?
Answer the Questions: Type 4 (contd.)

Does this look strange?

1 Is this internet message typical?

2 Is this heartbeat reading abnormal?

Does these transactions look unusual


3 as opposed to customer’s usual credit
card transactions ?
Answer the Questions: Type 4 (contd.)

Does this look strange?

1 Is this internet message typical?

2 Is this heart beat reading abnormal?

Do these transactions look unusual


3 as opposed to customer’s usual credit
card transactions ?
Answer the Questions: Type 5

Which action?

Should I vacuum again or should I


1
not?

2 Should I run the red light?

Should I raise or lower the


3
temperature ?
Answer the Questions: Type 5 (contd.)

Which action?

Should I vacuum again or should I


1
not?

2 Should I run the red light?

Should I raise or lower the


3
temperature ?
Answer the Questions: Type 5 (contd.)

Which action?

Should I vacuum again or should I


1
not?

2 Should I run the red light?

Should I raise or lower the


3
temperature ?
Step 7: Use the Answer

There are plenty of ways to use the answer derived from the previous step.

1 For making up a decision

2 For proposing the price of an item

3 For publishing the results obtained as a part of research paper

4 For constructing a dashboard on power BI

5 For making changes to product features

Note: Power BI is a business analytics tool by Microsoft.


Demo
Machine Learning Workflow

A demo on how a buyer decides which property he can purchase.


Key Takeaways

Machine learning workflow involves seven steps.

Step one involves getting more data, which is the process of deriving relevant data
to answer business questions.
The next step is to always ask sharp questions and avoid using vague ones to get
the desired response for a question.

Third step is to arrange the raw data in tables to analyze the data better.

In the fourth step, data quality is checked to ensure data consistency.

In the fifth step, transform features help you in making the machine learning
model more efficient.
In the sixth step, answers are derived from the data model to help you answer the
business questions.
In the seventh step, this answer is used to implement in production or ML algorithm.
Quiz
QUIZ
What are the different kinds of data?
1

a. Data as numbers only

b. Data can only be names that can be changed into numbers

Data includes names, numbers, and names that can be turned into numbers. But, it
c. excludes names that look like numbers
Data includes names, numbers, names that can be turned into numbers, and names
d. that look like numbers
QUIZ
What are the different kinds of data?
1

a. Data as numbers only

b. Data can only be names that can be changed into numbers

Data includes names, numbers, and names that can be turned into numbers. But, it
c. excludes names that look like numbers.
Data includes names, numbers, names that can be turned into numbers, and names
d. that look like numbers

The correct answer is D

Data can be names, numbers, names that look like numbers, and names that can be turned into numbers
QUIZ
What are the different ways to ensure data quality?
2

a. Data quality is due to business unit malfunction or due to providing incomplete data

b. Data quality can be handled through communicating with business unit(s), handling missing
numbers, removing outliers, plotting the values in a column, and fitting to a distribution

c. Once missing values in a column are removed, every column has value/observations and
data quality reaches close to 100%

d. Data quality is the job of data analysts and Database Administrators (DBA)
QUIZ
What are the different ways to ensure data quality?
2

a. Data quality is due to business unit malfunction or due to providing incomplete data

b. Data quality can be handled through communicating with business unit(s), handling missing
numbers, removing outliers, plotting the values in a column, and fitting to a distribution

c. Once missing values in a column are removed, every column has value/observations and
data quality reaches close to 100%

d. Data quality is the job of data analysts and Database Administrators (DBA)

The correct answer is B

Data can be made consistent by handling missing numbers, plotting the column values, fitting them to distributions,
and removing outliers.
This concludes “Machine Learning Workflow.”
The next lesson is “Performance Metrics.”

©Simplilearn. All rights reserved

You might also like