Module 2
Module 2
Anusha J
Assistant Professor
Dept of AIML
JSSATEB
End to end Machine learning Project:
• In this chapter, you will go through an example project end to end, pretending to be a
recently hired data scientist in a real estate company.1 Here are the main steps you will go
through:
1. Look at the big picture.
2. Get the data.
3. Discover and visualize the data to gain insights.
4. Prepare the data for Machine Learning algorithms.
5. Select a model and train it.
6. Fine-tune your model.
7. Present your solution.
8. Launch, monitor, and maintain your system
Working with Real Data
Question:
• What exactly is the business objective; building a model is
probabaly not the end goal. How does the company
expect t use and beneift from this model??
• This is important because it will determine how you frame
the problem, what algorithms you will select, what
performance measure you will use to evaluate your
model, and hpw much effort you should spend tweaking it.
• Why Understanding the Business Objective is Crucial
– The passage highlights that building a model is just a means to an end. You need
to understand the company's business objective for using the model.
– This objective will influence various aspects of your project, such as:
– Problem Framing: How you define the problem (supervised learning, regression, etc.)
depends on the desired outcome.
– Algorithm Selection: The best algorithms for your model depend on the problem type.
– Performance Measure: You'll choose a metric (like RMSE) that reflects how well the
model meets the business goal.
– Development Effort: The amount of time spent refining the model depends on its
impact on the business.
The Business Objective in this Scenario
• The company's goal is to decide on real estate
investments.
• Your housing price prediction model will be fed into
another Machine Learning system . This system likely
analyzes various factors (signals) beyond housing price.
• By predicting housing prices accurately, the overall
system can make better investment decisions, directly
impacting the company's revenue.
Frame the problem.
Current Solution and Expected Improvement
• Currently, housing prices are estimated manually by
experts using complex rules. This is expensive, time-
consuming, and inaccurate (estimates can be off by more
than 20%).
• The company expects a Machine Learning model to be
more efficient and accurate in predicting housing prices.
Understanding the Data Pipeline (Optional
Background)
• The passage also briefly introduces data pipelines, which
are common in Machine Learning.
• They involve a sequence of components that process and
transform data.
• These components can operate independently and
communicate through data storage.
• This modular design makes the system scalable and
easier to manage.
Pipelines
– A data pipeline is a series of connected components that process and
transform data.
– Imagine an assembly line in a factory, but instead of physical objects, the data
flows through different stations that perform specific tasks.