Lesson 3 - Machine Learning Workflow
Lesson 3 - Machine Learning Workflow
2. Ask a
sharp Yes
question
3. Add the
7. Use the
data to
answer
Yes
No
table
Yes
Yes
1. Get more Yes
data
No No
6. Answer
4. Check for
the No
quality
question
Yes
Yes 5.
Transform
features
Step 1: Get More Data
• The quality of the predictive model depends upon the quantity and quality of the data gathered.
Names Numbers
To be able to do so, you need to ask a sharp question as opposed to a vague one.
Step 2: Ask a Sharp Question
Need for a
sharp question It focuses on a single topic.
1. What should you do? 1. Which route will get you to work fastest?
2. How should you live my life? 2. How many times a user will use the new
3. Which career path should you take? product features?
4. Which data can tell you about your business? 3. How can the revenue be maximized?
Sharp Question Example
Goal • Your ultimate goal is to be able to analyze your historical data and predict the
stock price at a future date.
• You have to study all different tables of data in your database and analyze how
your company is doing month by month in terms of sales.
• This will ultimately lead you to understand how the company is doing in terms
of it’s market share.
Sharp Question • What will be your company’s stock price next week?
Sharp Question Example
• Data analysis is the process of deriving new findings from the historical data.
• It mainly focuses on aggregating table data to find the answers to various business problems.
• It is one of the essential steps performed by data analysts to build machine learning algorithm.
Example: Add Data to the Table
2015Q4 119.2M
2016Q1 221.0M
… … 2016/03 55.5M
Example: Aggregate
• You can focus on all observations for a particular column or feature and total it.
Example: Distribute, Compute, and Measure
• The market share column shows the estimated stock price values of the company that are
derived from the previous steps.
Step 4: Check for Quality
It involves computation and analysis of the data derived from previous steps.
Check for Quality: Example
• The Birth year column in the table has data format inconsistencies.
• The date in this column needs to be converted to a consistent format to make it readable for
the ML algorithm.
Check for Quality: Example
Image 1 Image 2
Transform Features: Example (contd.)
• When we subtract feature 0 from feature 1 and plot it, we get a curve.
• This curve is normal or Gaussian distribution or bell-shaped curve.
Step 6: Answer the Questions
2 Which category?
3 Which group?
5 Which action?
Answer the Questions: Type 1
Which category?
Which category?
Which category?
Which group?
Which group?
Which group?
Which action?
Which action?
Which action?
There are plenty of ways to use the answer derived from the previous step.
Step one involves getting more data, which is the process of deriving relevant data
to answer business questions.
The next step is to always ask sharp questions and avoid using vague ones to get
the desired response for a question.
Third step is to arrange the raw data in tables to analyze the data better.
In the fifth step, transform features help you in making the machine learning
model more efficient.
In the sixth step, answers are derived from the data model to help you answer the
business questions.
In the seventh step, this answer is used to implement in production or ML algorithm.
Quiz
QUIZ
What are the different kinds of data?
1
Data includes names, numbers, and names that can be turned into numbers. But, it
c. excludes names that look like numbers
Data includes names, numbers, names that can be turned into numbers, and names
d. that look like numbers
QUIZ
What are the different kinds of data?
1
Data includes names, numbers, and names that can be turned into numbers. But, it
c. excludes names that look like numbers.
Data includes names, numbers, names that can be turned into numbers, and names
d. that look like numbers
Data can be names, numbers, names that look like numbers, and names that can be turned into numbers
QUIZ
What are the different ways to ensure data quality?
2
a. Data quality is due to business unit malfunction or due to providing incomplete data
b. Data quality can be handled through communicating with business unit(s), handling missing
numbers, removing outliers, plotting the values in a column, and fitting to a distribution
c. Once missing values in a column are removed, every column has value/observations and
data quality reaches close to 100%
d. Data quality is the job of data analysts and Database Administrators (DBA)
QUIZ
What are the different ways to ensure data quality?
2
a. Data quality is due to business unit malfunction or due to providing incomplete data
b. Data quality can be handled through communicating with business unit(s), handling missing
numbers, removing outliers, plotting the values in a column, and fitting to a distribution
c. Once missing values in a column are removed, every column has value/observations and
data quality reaches close to 100%
d. Data quality is the job of data analysts and Database Administrators (DBA)
Data can be made consistent by handling missing numbers, plotting the column values, fitting them to distributions,
and removing outliers.
This concludes “Machine Learning Workflow.”
The next lesson is “Performance Metrics.”