0% found this document useful (0 votes)

33 views10 pages

End To End Machine Learning Project-2

Uploaded by

egorboy2004

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

33 views10 pages

End To End Machine Learning Project-2

Uploaded by

egorboy2004

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 10

End-to-End Machine Learning Project:

A Step-by-Step Guide
Julia Wieczorek
August 1, 2024

1 Overview
In this tutorial, we will walk through a complete machine learning project using the
California housing dataset. The objective is to predict housing prices based on various
features. This project involves the following key steps:

1. Frame the Problem

2. Get the Data

3. Explore the Data

4. Prepare the Data for Machine Learning Algorithms

5. Select and Train a Model

6. Fine-tune the Model

7. Present the Solution

8. Launch, Monitor, and Maintain the System

Let’s dive into each step with detailed instructions.

2 Frame the Problem

Objective: Predict median housing prices in California districts using various features.

2.1 Instructions
1. Define the Objective:

• Explain the business problem: Predict the housing prices for better decision-
making.
• Identify the target variable: median house value.

2. Performance Measure:

1
• Use Root Mean Square Error (RMSE) as the performance metric.
3. Assumptions:
• Highlight any assumptions about the data or project scope.

3 Get the Data

Objective: Access and load the California housing dataset.

3.1 Instructions
1. Import Libraries:
1 import os
2 import tarfile
3 import urllib . request
4 import pandas as pd
Listing 1: Import necessary libraries

2. Fetch the Data:

1 DOWNLOAD_ROOT = " https :// raw . githu buserc ontent . com / ageron /
,→ handson - ml2 / master / "
2 HOUSING_PATH = os . path . join ( " datasets " , " housing " )
3 HOUSING_URL = DOWNLOAD_ROOT + " datasets / housing / housing . tgz "
4

5 def fe tc h_ ho usi ng _d at a ( housing_url = HOUSING_URL , housing_path =

,→ HOUSING_PATH ) :
6 os . makedirs ( housing_path , exist_ok = True )
7 tgz_path = os . path . join ( housing_path , " housing . tgz " )
8 urllib . request . urlretrieve ( housing_url , tgz_path )
9 housing_tgz = tarfile . open ( tgz_path )
10 housing_tgz . extractall ( path = housing_path )
11 housing_tgz . close ()
Listing 2: Fetch the housing data

3. Load the Data:

1 def l oad_ho using_ data ( housing_path = HOUSING_PATH ) :
2 csv_path = os . path . join ( housing_path , " housing . csv " )
3 return pd . read_csv ( csv_path )
4

5 housing = lo ad_ho using_ data ()

Listing 3: Load the housing data

4 Explore the Data

Objective: Understand the structure and characteristics of the data.

2
4.1 Instructions
1. Examine the Data:
1 housing . head ()
Listing 4: View the first few rows of the data

• Discuss the meaning of each attribute.

2. Check for Missing Values:

1 housing . info ()
Listing 5: Check for missing values

3. View Numerical Attribute Summary:

1 housing . describe ()
Listing 6: Summary statistics for numerical attributes

4. Visualize the Data:

(a) Histograms:
1 import matplotlib . pyplot as plt
2 housing . hist ( bins =50 , figsize =(20 ,15) )
3 plt . show ()
Listing 7: Plot histograms for numerical attributes

(b) Geographical Scatter Plot:

1 housing . plot ( kind = " scatter " , x = " longitude " , y = " latitude " ,
,→ alpha =0.4 ,
2 s = housing [ " population " ]/100 , label = "
,→ population " , figsize =(10 ,7) ,
3 c = " me di an _h ou se _v al ue " , cmap = plt . get_cmap ( "
,→ jet " ) , colorbar = True )
4 plt . legend ()
Listing 8: Visualize geographical data with scatter plot

5 Prepare the Data for Machine Learning Algorithms

Objective: Clean and transform the data to make it suitable for machine learning
algorithms.

3
5.1 Instructions
1. Create a Test Set:
1 import numpy as np
2

3 def split_train_test ( data , test_ratio ) :

4 np . random . seed (42)
5 shuffled_indices = np . random . permutation ( len ( data ) )
6 test_set_size = int ( len ( data ) * test_ratio )
7 test_indices = shuffled_indices [: test_set_size ]
8 train_indices = shuffled_indices [ test_set_size :]
9 return data . iloc [ train_indices ] , data . iloc [ test_indices ]
10

11 train_set , test_set = split_train_test ( housing , 0.2)

Listing 9: Split the data into train and test sets

2. Stratified Sampling Based on Income Category:

1 housing [ " income_cat " ] = pd . cut ( housing [ " median_income " ] ,
2 bins =[0. , 1.5 , 3.0 , 4.5 , 6. ,
,→ np . inf ] ,
3 labels =[1 , 2 , 3 , 4 , 5])
4

5 from sklearn . model_selection import S t r a t i f i e d S h u f f l e S p l i t

7 split = S t r a t i f i e d S h u f f l e S p l i t ( n_splits =1 , test_size =0.2 ,

,→ random_state =42)
8 for train_index , test_index in split . split ( housing , housing [ "
,→ income_cat " ]) :
9 strat_train_set = housing . loc [ train_index ]
10 strat_test_set = housing . loc [ test_index ]
Listing 10: Perform stratified sampling based on income

3. Visualize Stratified Data:

1 strat_test_set [ " income_cat " ]. value_counts () / len (
,→ strat_test_set )
Listing 11: Check the distribution of stratified data

4. Drop the Income Category:

1 for set_ in ( strat_train_set , strat_test_set ) :
2 set_ . drop ( " income_cat " , axis =1 , inplace = True )
Listing 12: Remove the income category column

5. Data Cleaning:

• Handle missing values:

4
1 from sklearn . impute import SimpleImputer
2

3 imputer = SimpleImputer ( strategy = " median " )

4 housing_num = strat_train_set . drop ( " ocean_proximity " ,
,→ axis =1)
5 imputer . fit ( housing_num )
6 X = imputer . transform ( housing_num )
7 housing_tr = pd . DataFrame (X , columns = housing_num . columns ,
,→ index = housing_num . index )
Listing 13: Handle missing data using SimpleImputer

6. Handle Text and Categorical Attributes:

1 housing_cat = strat_train_set [[ " ocean_proximity " ]]
2 housing_cat_encoded , hou si ng _c at eg or ie s = housing_cat .
,→ factorize ()
3 from sklearn . preprocessing import OneHotEncoder
4

5 encoder = OneHotEncoder ()
6 housing_cat_1hot = encoder . fit_transform ( housing_cat )
Listing 14: Encode categorical attributes

7. Feature Scaling:
1 from sklearn . preprocessing import StandardScaler
2

3 scaler = StandardScaler ()
4 hous ing_tr _scale d = scaler . fit_transform ( housing_tr )
Listing 15: Scale the numerical features

8. Custom Transformers (Optional):

1 from sklearn . base import BaseEstimator , TransformerMixin
2

3 # Define column indices

4 rooms_ix , bedrooms_ix , population_ix , household_ix = 3 , 4 , 5 ,
,→ 6
5

6 class C o m b i n e d A t t r i b u t e s A d d e r ( BaseEstimator , TransformerMixin

,→ ) :
7 def __init__ ( self , a d d _ b e d r o o m s _ p e r _ r o o m = True ) : # no *
,→ args or ** kargs
8 self . a d d _ b e d r o o m s _ p e r _ r o o m = a d d _ b e d r o o m s _ p e r _ r o o m
9 def fit ( self , X , y = None ) :
10 return self # nothing else to do
11 def transform ( self , X ) :
12 r o om s _ pe r _ ho u s eh o l d = X [: , rooms_ix ] / X [: ,
,→ household_ix ]
13 p o p u l a t i o n _ p e r _ h o u s e h o l d = X [: , population_ix ] / X [: ,
,→ household_ix ]

5
14 if self . a d d _ b e d r o o m s _ p e r _ r o o m :
15 bedr ooms_p er_roo m = X [: , bedrooms_ix ] / X [: ,
,→ rooms_ix ]
16 return np . c_ [X , rooms_per_household ,
,→ population_per_household ,
17 bedr ooms_ per_ro om ]
18 else :
19 return np . c_ [X , rooms_per_household ,
,→ p o p u l a t i o n _ p e r _ h o u s e h o l d ]
20

21 attr_adder = C o m b i n e d A t t r i b u t e s A d d e r ( a d d _ b e d r o o m s _ p e r _ r o o m =
,→ False )
22 hou s i n g _ e x t r a _ a t t r i b s = attr_adder . transform ( housing . values )
Listing 16: Create custom transformers for additional features

9. Transformation Pipelines:
1 from sklearn . pipeline import Pipeline
2 from sklearn . compose import Co lumnTr ansfor mer
3

4 num_pipeline = Pipeline ([
5 ( ’ imputer ’ , SimpleImputer ( strategy = " median " ) ) ,
6 ( ’ attribs_adder ’ , C o m b i n e d A t t r i b u t e s A d d e r () ) ,
7 ( ’ std_scaler ’ , StandardScaler () ) ,
8 ])
9

10 num_attribs = list ( housing_num )

11 cat_attribs = [ " ocean_proximity " ]
12

13 full_pipeline = Column Transf ormer ([

14 ( " num " , num_pipeline , num_attribs ) ,
15 ( " cat " , OneHotEncoder () , cat_attribs ) ,
16 ])
17

18 housing_prepared = full_pipeline . fit_transform (

,→ strat_train_set )
Listing 17: Create a data transformation pipeline

6 Select and Train a Model

Objective: Choose an appropriate machine learning model and train it.

6.1 Instructions
1. Train a Linear Regression Model:
1 from sklearn . linear_model import LinearRegression
2

3 lin_reg = LinearRegression ()

6
4 lin_reg . fit ( housing_prepared , strat_train_set [ "
,→ m edi an _h ou se _v al ue " ])
Listing 18: Train a Linear Regression model

2. Evaluate the Model:

1 from sklearn . metrics import mea n_ sq ua re d_ er ro r
2

3 hous i n g_ p r ed i c ti o n s = lin_reg . predict ( housing_prepared )

4 lin_mse = me an _sq ua re d_ er ro r ( strat_train_set [ "
,→ m edi an _h ou se _v al ue " ] , h o u si n g _p r e di c t io n s )
5 lin_rmse = np . sqrt ( lin_mse )
6 print ( " Linear Regression RMSE : " , lin_rmse )
Listing 19: Evaluate the Linear Regression model

3. Train a Decision Tree Model:

1 from sklearn . tree import D e c i s i o n T r e e R e g r e s s o r
2

3 tree_reg = D e c i s i o n T r e e R e g r e s s o r ( random_state =42)

4 tree_reg . fit ( housing_prepared , strat_train_set [ "
,→ m edi an _h ou se _v al ue " ])
Listing 20: Train a Decision Tree model

4. Evaluate the Decision Tree Model:

1 hous i n g_ p r ed i c ti o n s = tree_reg . predict ( housing_prepared )
2 tree_mse = m ea n_ squ ar ed _e rr or ( strat_train_set [ "
,→ m edi an _h ou se _v al ue " ] , h o u si n g _p r e di c t io n s )
3 tree_rmse = np . sqrt ( tree_mse )
4 print ( " Decision Tree RMSE : " , tree_rmse )
Listing 21: Evaluate the Decision Tree model

5. Cross-Validation for Better Evaluation:

1 from sklearn . model_selection import cross_val_score
2

3 scores = cross_val_score ( tree_reg , housing_prepared ,

,→ strat_train_set [ " m edi an _h ou se _v al ue " ] ,
4 scoring = " n e g _ m e a n _ s q u a r e d _ e r r o r " , cv
,→ =10)
5 tree_rmse_scores = np . sqrt ( - scores )
6

7 def display_scores ( scores ) :

8 print ( " Scores : " , scores )
9 print ( " Mean : " , scores . mean () )
10 print ( " Standard deviation : " , scores . std () )
11

12 display_scores ( tree_rmse_scores )
Listing 22: Use cross-validation for model evaluation

7
7 Fine-tune the Model
Objective: Optimize model performance through hyperparameter tuning.

7.1 Instructions
1. Grid Search:
1 from sklearn . model_selection import GridSearchCV
2

3 param_grid = [
4 { ’ n_estimators ’: [3 , 10 , 30] , ’ max_features ’: [2 , 4 , 6 ,
,→ 8]} ,
5 { ’ bootstrap ’: [ False ] , ’ n_estimators ’: [3 , 10] , ’
,→ max_features ’: [2 , 3 , 4]} ,
6 ]
7

8 forest_reg = R a n d o m F o r e s t R e g r e s s o r ( random_state =42)

9 grid_search = GridSearchCV ( forest_reg , param_grid , cv =5 ,
10 scoring = ’ n e g _ m e a n _ s q u a r e d _ e r r o r ’ ,
11 re tu rn _t ra in _s co re = True )
12 grid_search . fit ( housing_prepared , strat_train_set [ "
,→ m edi an _h ou se _v al ue " ])
Listing 23: Use Grid Search for hyperparameter tuning

2. Analyze the Best Parameters and Scores:

1 grid_search . best_params_
2 grid_search . best_estimator_
Listing 24: Analyze the best parameters and scores from Grid Search

3. Evaluate Feature Importance:

1 feat u r e_ i m po r t an c e s = grid_search . best_estimator_ .
,→ f e a t u r e _ i m p o r t a n c e s _
2 extra_attribs = [ " rooms_per_hhold " , " pop_per_hhold " , "
,→ bed rooms_ per_ro om " ]
3 cat_encoder = full_pipeline . na m e d_ t r an s f or m e rs _ [ " cat " ]
4 cat_ o n e_ h o t_ a t tr i b s = list ( cat_encoder . categories_ [0])
5 attributes = num_attribs + extra_attribs +
,→ c a t_ o n e_ h o t_ a t tr i b s
6 sorted ( zip ( feature_importances , attributes ) , reverse = True )
Listing 25: Evaluate the importance of each feature

8 Present the Solution

Objective: Prepare the model for presentation and deployment.

8
8.1 Instructions
1. Evaluate the Model on Test Set:
1 final_model = grid_search . best_estimator_
2

3 X_test = strat_test_set . drop ( " m ed ia n_ ho us e_ va lu e " , axis =1)

4 y_test = strat_test_set [ " m ed ia n_ ho us e_ va lu e " ]. copy ()
5

6 X_test_prepared = full_pipeline . transform ( X_test )

7 fina l_pred iction s = final_model . predict ( X_test_prepared )
8

9 final_mse = me an _s qu ar ed _e rr or ( y_test , final _predi ctions )

10 final_rmse = np . sqrt ( final_mse )
11 print ( " Final RMSE on Test Set : " , final_rmse )
Listing 26: Evaluate the final model on the test set

2. Document the Results:

• Prepare a report with key findings, model performance, and next steps.

3. Create Visualizations (if applicable):

1 import matplotlib . pyplot as plt
2 plt . scatter ( y_test , f inal_p redict ions )
3 plt . xlabel ( " Actual Values " )
4 plt . ylabel ( " Predicted Values " )
5 plt . title ( " Predicted vs Actual Values " )
6 plt . plot ([0 , 500000] , [0 , 500000] , color = " red " , linewidth =2)
7 plt . show ()
Listing 27: Create visualizations to present model results

9 Launch, Monitor, and Maintain the System

Objective: Deploy the model and ensure its performance in a production environment.

9.1 Instructions
1. Deployment:

• Explain how to deploy the model using Flask, FastAPI, or a cloud platform.
• Discuss the importance of monitoring performance and retraining as needed.

2. Monitoring:

• Implement logging and monitoring to track model performance.

3. Maintenance:

• Schedule regular maintenance checks to ensure data integrity and model ac-
curacy.

9
10 Conclusion
This guide has walked you through the entire process of building a machine learning
model from scratch, including data preparation, model selection, training, evaluation,
and deployment. By following these steps, you can develop a robust machine learning
solution for predicting California housing prices.

References
[1] Aurélien Géron, Hands-On Machine Learning with Scikit-Learn, Keras, and Tensor-
Flow, 2nd Edition, O’Reilly Media, 2019.

Volkswagen Polo-Mk5 Workshop Manual (Polo Mk5)
67% (3)
Volkswagen Polo-Mk5 Workshop Manual (Polo Mk5)
1,702 pages
House Price Prediction: Project Description
No ratings yet
House Price Prediction: Project Description
11 pages
Liebert Ds 28 105kw 8 30 Tons System Design Manual
100% (3)
Liebert Ds 28 105kw 8 30 Tons System Design Manual
234 pages
ISMLA Module5
No ratings yet
ISMLA Module5
25 pages
Faseeh Chap 2 Report
No ratings yet
Faseeh Chap 2 Report
30 pages
Dawit House
No ratings yet
Dawit House
49 pages
Lecture02. ML Pipeline (Chapter 2)
No ratings yet
Lecture02. ML Pipeline (Chapter 2)
50 pages
Unit 1: Shobana T S Assistant Professor Dept. of ISE, BMSCE
No ratings yet
Unit 1: Shobana T S Assistant Professor Dept. of ISE, BMSCE
127 pages
L03 The Regression Pipeline
No ratings yet
L03 The Regression Pipeline
94 pages
Unit 2
No ratings yet
Unit 2
78 pages
Machine Learning Life Cycle Report
No ratings yet
Machine Learning Life Cycle Report
2 pages
QB 1
No ratings yet
QB 1
11 pages
California Housing Project
No ratings yet
California Housing Project
5 pages
Week 1 Get Familier With Jupyter Notebook
No ratings yet
Week 1 Get Familier With Jupyter Notebook
4 pages
ML Book Notes
No ratings yet
ML Book Notes
9 pages
ML Practical 04
No ratings yet
ML Practical 04
19 pages
House Pricing
No ratings yet
House Pricing
15 pages
Module 2
No ratings yet
Module 2
35 pages
AIMLlatestmodule 2notes Removed
No ratings yet
AIMLlatestmodule 2notes Removed
33 pages
Report
No ratings yet
Report
40 pages
ML Manual
No ratings yet
ML Manual
24 pages
Module 5
No ratings yet
Module 5
46 pages
Module 2notes
No ratings yet
Module 2notes
44 pages
0.1 Guilherme Marthe - Boston House Pricing Challenge
100% (1)
0.1 Guilherme Marthe - Boston House Pricing Challenge
15 pages
Machinelearning
No ratings yet
Machinelearning
26 pages
Explain Me Every Code Written in It With Deep Know
No ratings yet
Explain Me Every Code Written in It With Deep Know
7 pages
FALLSEM2021-22 MDI4001 ETH VL2021220104135 Reference Material I 09-Aug-2021 Data2 1
No ratings yet
FALLSEM2021-22 MDI4001 ETH VL2021220104135 Reference Material I 09-Aug-2021 Data2 1
9 pages
Presentation 21
No ratings yet
Presentation 21
9 pages
Machine Learning Lab Manual
No ratings yet
Machine Learning Lab Manual
22 pages
Project
No ratings yet
Project
10 pages
Lecture 4
No ratings yet
Lecture 4
56 pages
Housepriceprediction ML 221104055342 Fb5109ae
No ratings yet
Housepriceprediction ML 221104055342 Fb5109ae
17 pages
Ds ML House Price Book
No ratings yet
Ds ML House Price Book
46 pages
Cp4252-Machine Learning Lab Manual 23-24
No ratings yet
Cp4252-Machine Learning Lab Manual 23-24
28 pages
P05 The Regression Pipeline - Training and Testing Ans
No ratings yet
P05 The Regression Pipeline - Training and Testing Ans
13 pages
Machine Learning Lab
No ratings yet
Machine Learning Lab
20 pages
End To End Machine Learning Problem Problem Under Discussion
No ratings yet
End To End Machine Learning Problem Problem Under Discussion
12 pages
Setup: Chapter 2 - End-To-End Machine Learning Project
No ratings yet
Setup: Chapter 2 - End-To-End Machine Learning Project
31 pages
USA Real Estate Price Prediction Using Decision Tree Regressor, and AdaBoost Regressor
No ratings yet
USA Real Estate Price Prediction Using Decision Tree Regressor, and AdaBoost Regressor
14 pages
1 - Lab Manual (ML)
No ratings yet
1 - Lab Manual (ML)
42 pages
Dav 3rd
No ratings yet
Dav 3rd
2 pages
Cp4252 Machine Learning Lab Manual
No ratings yet
Cp4252 Machine Learning Lab Manual
27 pages
Lab 1. Boston House
No ratings yet
Lab 1. Boston House
7 pages
ML Lap
No ratings yet
ML Lap
23 pages
Phase 5
No ratings yet
Phase 5
5 pages
Module 2
No ratings yet
Module 2
20 pages
Module 2 Own Notes
No ratings yet
Module 2 Own Notes
10 pages
Aastha Mahajan Python File
No ratings yet
Aastha Mahajan Python File
17 pages
Machine Learning Labnem
No ratings yet
Machine Learning Labnem
5 pages
Module 2
No ratings yet
Module 2
24 pages
Integrated System Lab
No ratings yet
Integrated System Lab
25 pages
Ads Lab8
No ratings yet
Ads Lab8
5 pages
CP4252 Machine Learning Lab Manual
No ratings yet
CP4252 Machine Learning Lab Manual
26 pages
ML Project Part A 1
No ratings yet
ML Project Part A 1
6 pages
House Price Prediction Using Machine Learning in Python
No ratings yet
House Price Prediction Using Machine Learning in Python
13 pages
House Price Predictor PPT Project
No ratings yet
House Price Predictor PPT Project
13 pages
Predicting House Prices
No ratings yet
Predicting House Prices
9 pages
LAB MANUAL For Machine Learning
No ratings yet
LAB MANUAL For Machine Learning
15 pages
Python
No ratings yet
Python
4 pages
Practical Activity 01: Linear Regression: Case of Study: Predicting House Prices
No ratings yet
Practical Activity 01: Linear Regression: Case of Study: Predicting House Prices
2 pages
Machine Learning (BCSL606) Lab Manual
No ratings yet
Machine Learning (BCSL606) Lab Manual
117 pages
Massive MIMO
No ratings yet
Massive MIMO
4 pages
Hand Gesture Controlled Robot Using Accelerometer
No ratings yet
Hand Gesture Controlled Robot Using Accelerometer
51 pages
Harish Nagaraju
No ratings yet
Harish Nagaraju
2 pages
0821 Part B DCHB Ajmer
No ratings yet
0821 Part B DCHB Ajmer
308 pages
Terex Cranes Rt300 1 Rt345xl Parts Manual
100% (51)
Terex Cranes Rt300 1 Rt345xl Parts Manual
20 pages
Coin Toss Circuit
No ratings yet
Coin Toss Circuit
4 pages
AFPX 610 - : Creating Value From Waste
No ratings yet
AFPX 610 - : Creating Value From Waste
2 pages
Space Frame Structures
No ratings yet
Space Frame Structures
10 pages
Nam Dinh
No ratings yet
Nam Dinh
7 pages
D3FC-UK Parker
No ratings yet
D3FC-UK Parker
8 pages
Tecnical Data TYPE 131 T20/24AP - T20/24SP
No ratings yet
Tecnical Data TYPE 131 T20/24AP - T20/24SP
1 page
Record X Brochure
No ratings yet
Record X Brochure
2 pages
hv05 12
No ratings yet
hv05 12
2 pages
Byd Energy Storage Products - Battery Box (B-Box)
No ratings yet
Byd Energy Storage Products - Battery Box (B-Box)
2 pages
JNTU Kakinada M.tech CAD-CAM Syllabus
No ratings yet
JNTU Kakinada M.tech CAD-CAM Syllabus
18 pages
SPEEDAIRE 5Z405E - 251204 - 0508-Web
No ratings yet
SPEEDAIRE 5Z405E - 251204 - 0508-Web
20 pages
(Fast Download) Steam Tables by Rs Khurmi - PDF (Verified Download) STEAM TABLES BY RS KHURMI - ZIP (Direct Download) Steam Tables by Rs Khurmi - Rar
100% (1)
(Fast Download) Steam Tables by Rs Khurmi - PDF (Verified Download) STEAM TABLES BY RS KHURMI - ZIP (Direct Download) Steam Tables by Rs Khurmi - Rar
3 pages
TUP FORM 1-A (Revised 9-07-06) Technological University of The Philippines
No ratings yet
TUP FORM 1-A (Revised 9-07-06) Technological University of The Philippines
18 pages
Bos - Ict
No ratings yet
Bos - Ict
2 pages
Iptb-2019-01c-En-Dcr-Transformer-Selection-Update - 2023
No ratings yet
Iptb-2019-01c-En-Dcr-Transformer-Selection-Update - 2023
3 pages
Institute of Management Technology: PGDM, PDGM (Finance) & PDGM (Marketing) Term - I, AY 2019-2020 Course Handout
No ratings yet
Institute of Management Technology: PGDM, PDGM (Finance) & PDGM (Marketing) Term - I, AY 2019-2020 Course Handout
8 pages
As Schneider Flush Rings en
No ratings yet
As Schneider Flush Rings en
2 pages
EL4511 Fn7009 Data Sheet
No ratings yet
EL4511 Fn7009 Data Sheet
24 pages
Computer-Aided Pronunciation Pedagogy
No ratings yet
Computer-Aided Pronunciation Pedagogy
14 pages
Research: Motives of Vinyl Use (Author: Robert Arndt)
100% (1)
Research: Motives of Vinyl Use (Author: Robert Arndt)
30 pages
Magnum Manual Mmg320can6 Ops
No ratings yet
Magnum Manual Mmg320can6 Ops
52 pages
Nitiaayog India Report Web
No ratings yet
Nitiaayog India Report Web
134 pages

End To End Machine Learning Project-2

Uploaded by

End To End Machine Learning Project-2

Uploaded by

End-to-End Machine Learning Project:

1. Frame the Problem

2. Get the Data

3. Explore the Data

4. Prepare the Data for Machine Learning Algorithms

5. Select and Train a Model

6. Fine-tune the Model

7. Present the Solution

8. Launch, Monitor, and Maintain the System

Let’s dive into each step with detailed instructions.

2 Frame the Problem

3 Get the Data

2. Fetch the Data:

5 def fe tc h_ ho usi ng _d at a ( housing_url = HOUSING_URL , housing_path =

3. Load the Data:

5 housing = lo ad_ho using_ data ()

4 Explore the Data

• Discuss the meaning of each attribute.

2. Check for Missing Values:

3. View Numerical Attribute Summary:

4. Visualize the Data:

(b) Geographical Scatter Plot:

5 Prepare the Data for Machine Learning Algorithms

3 def split_train_test ( data , test_ratio ) :

11 train_set , test_set = split_train_test ( housing , 0.2)

2. Stratified Sampling Based on Income Category:

5 from sklearn . model_selection import S t r a t i f i e d S h u f f l e S p l i t

7 split = S t r a t i f i e d S h u f f l e S p l i t ( n_splits =1 , test_size =0.2 ,

3. Visualize Stratified Data:

4. Drop the Income Category:

• Handle missing values:

3 imputer = SimpleImputer ( strategy = " median " )

6. Handle Text and Categorical Attributes:

8. Custom Transformers (Optional):

3 # Define column indices

6 class C o m b i n e d A t t r i b u t e s A d d e r ( BaseEstimator , TransformerMixin

10 num_attribs = list ( housing_num )

13 full_pipeline = Column Transf ormer ([

18 housing_prepared = full_pipeline . fit_transform (

6 Select and Train a Model

2. Evaluate the Model:

3 hous i n g_ p r ed i c ti o n s = lin_reg . predict ( housing_prepared )

3. Train a Decision Tree Model:

3 tree_reg = D e c i s i o n T r e e R e g r e s s o r ( random_state =42)

4. Evaluate the Decision Tree Model:

5. Cross-Validation for Better Evaluation:

3 scores = cross_val_score ( tree_reg , housing_prepared ,

7 def display_scores ( scores ) :

8 forest_reg = R a n d o m F o r e s t R e g r e s s o r ( random_state =42)

2. Analyze the Best Parameters and Scores:

3. Evaluate Feature Importance:

8 Present the Solution

3 X_test = strat_test_set . drop ( " m ed ia n_ ho us e_ va lu e " , axis =1)

6 X_test_prepared = full_pipeline . transform ( X_test )

9 final_mse = me an _s qu ar ed _e rr or ( y_test , final _predi ctions )

2. Document the Results:

3. Create Visualizations (if applicable):

9 Launch, Monitor, and Maintain the System

• Implement logging and monitoring to track model performance.

You might also like