Machine Learning Life Cycle Report

Uploaded by

Lamia Altayeb

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

10 views2 pages

Machine Learning Life Cycle Report

Uploaded by

Lamia Altayeb

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 2

Machine Learning Life Cycle Report

1. Data Acquisition:
The housing price in California dataset was obtained for analysis and model
development. The dataset contains various features such as the number of rooms,
median income, housing prices, and other relevant variables.
2. Data Exploration and Visualization:
a) Top Five Rows: The head() method was used to examine the first five rows of the
dataset, providing an initial understanding of the data structure and variables.
b) Data Description: The info() method was employed to obtain a quick description of
the data, including the number of instances, attribute types, and any missing values.
c) Analysis of "ocean_proximity": The value_counts() method was used to determine the
number of districts belonging to each category in the "ocean_proximity" variable.
d) Summary of Numerical Attributes: The describe() method was utilized to generate a
statistical summary of the numerical attributes, including count, mean, standard
deviation, minimum, quartiles, and maximum values.
e) Data Visualization: Various visualizations were created to gain insights into the
dataset, including:

 Histogram Plot: A histogram plot was generated to visualize the distribution of

the housing dataframe.
 Scatter Plot: A scatter plot was created between the "longitude" and "latitude"
variables, with the alpha parameter set to 0.1. The size of each circle represented
the district's population, and the color represented the price.
 Correlation Analysis: The correlation matrix was computed using the .corr()
method to explore the relationships between all continuous numeric variables. A
heatmap plot was generated using the seaborn library to visualize the
correlations.
 Scatter Matrix: The pandas scatter_matrix() function was used to examine the
correlations between attributes. The plot was color-coded based on the
"ocean_proximity" category using the seaborn pairplot.
 Scatter Plot: A scatter plot was created between the "median_income" and
"median_house_value" variables to explore their relationship.
 Box Plot: A box plot was generated to show the relationship between the
"median_house_value" and the categorical feature "ocean_proximity".
3. Data Preprocessing:
a) Data Cleaning: The dataset was examined for missing values, and it was found that
the "total_bedrooms" attribute had some missing values. The missing values were filled
with the median value using the fillna() method.
b) Handling Zeros: It was verified that there were no zeros in the dataset, which could
indicate missing values.
c) Attribute Combinations: New attributes were created by combining existing ones,
namely "rooms_per_household" (derived from "total_rooms" and "households"),
"bedrooms_per_room" (derived from "total_bedrooms" and "total_rooms"), and
"population_per_household" (derived from "population" and "households").
d) Handling Text and Categorical Attributes: The categorical feature "ocean_proximity"
was handled by creating a separate variable called "housing_cat" and using the
OneHotEncoder from sklearn to encode the categorical values.
e) Feature Scaling: Numerical values were scaled using the StandardScaler from sklearn.
The numerical attributes were stored in a variable called "housing_num".
f) Custom Transformers: A custom transformer class called "CombinedAttributesAdder"
was created to add the combined attributes discussed earlier. The transformer was
instantiated as "attr_reader", and the housing values were transformed and saved in a
variable called "housing_extra_attribs".
g) Pipeline Creation: Two pipelines were created - "num_pipeline" for numerical
attributes and "full_pipeline" for both numerical and categorical attributes. The
"num_pipeline" included the SimpleImputer, CombinedAttributesAdder, and
StandardScaler transformers.
4. Train-Test Split:
The data was split into training and testing sets using the train_test_split function from
sklearn.model_selection. The random_state parameter was set to 42 to ensure
reproducibility. The predictors and labels were separated into "housing" and
"housing_labels" variables, respectively.

The machine learning life cycle involves several additional steps beyond the scope of
this report, such as model selection, training, evaluation, optimization, deployment, and
maintenance. These steps would typically be followed to develop and deploy a machine
learning model based on the given dataset.

M1 Formula Book
No ratings yet
M1 Formula Book
27 pages
Ds ML House Price Book
No ratings yet
Ds ML House Price Book
46 pages
Unit 1: Shobana T S Assistant Professor Dept. of ISE, BMSCE
No ratings yet
Unit 1: Shobana T S Assistant Professor Dept. of ISE, BMSCE
127 pages
Unit 2
No ratings yet
Unit 2
78 pages
ML Project Part A 1
No ratings yet
ML Project Part A 1
6 pages
Machine Learning (BCSL606) Lab Manual
No ratings yet
Machine Learning (BCSL606) Lab Manual
117 pages
ML Lab - BCSL606
No ratings yet
ML Lab - BCSL606
67 pages
ML Book Notes
No ratings yet
ML Book Notes
9 pages
Cp4252 Machine Learning Lab Manual
No ratings yet
Cp4252 Machine Learning Lab Manual
27 pages
ML 3
No ratings yet
ML 3
24 pages
End To End Machine Learning Project-2
No ratings yet
End To End Machine Learning Project-2
10 pages
AIMLlatestmodule 2notes Removed
No ratings yet
AIMLlatestmodule 2notes Removed
33 pages
Module 2
No ratings yet
Module 2
35 pages
Module 2notes
No ratings yet
Module 2notes
44 pages
ISMLA Module5
No ratings yet
ISMLA Module5
25 pages
Dawit House
No ratings yet
Dawit House
49 pages
Machine Learning Lab Manual
No ratings yet
Machine Learning Lab Manual
33 pages
Lecture02. ML Pipeline (Chapter 2)
No ratings yet
Lecture02. ML Pipeline (Chapter 2)
50 pages
Machine Learning Lab
No ratings yet
Machine Learning Lab
20 pages
Kirubavathi
No ratings yet
Kirubavathi
10 pages
L03 The Regression Pipeline
No ratings yet
L03 The Regression Pipeline
94 pages
MiniProject BI
No ratings yet
MiniProject BI
16 pages
Machine Learning Laboratory
No ratings yet
Machine Learning Laboratory
23 pages
Fault Codes For Series 26 DF 2022-01-11
No ratings yet
Fault Codes For Series 26 DF 2022-01-11
2 pages
Rajasri
No ratings yet
Rajasri
10 pages
Disaster Management and Dam Monitoring System
No ratings yet
Disaster Management and Dam Monitoring System
55 pages
Machine Learning Labnem
No ratings yet
Machine Learning Labnem
5 pages
Report
No ratings yet
Report
40 pages
HCPP-03 - Small - and Medium-Sized Campus Network Design Guide-2022.01
No ratings yet
HCPP-03 - Small - and Medium-Sized Campus Network Design Guide-2022.01
77 pages
Machinelearning
No ratings yet
Machinelearning
26 pages
Industrial Training Report
No ratings yet
Industrial Training Report
12 pages
ML Practical 04
No ratings yet
ML Practical 04
19 pages
High-Speed PWM Controller: Features Description
No ratings yet
High-Speed PWM Controller: Features Description
20 pages
House Pricing
No ratings yet
House Pricing
15 pages
Faseeh Chap 2 Report
No ratings yet
Faseeh Chap 2 Report
30 pages
Regression Analysis - Lasso and Ridge Regularization
No ratings yet
Regression Analysis - Lasso and Ridge Regularization
17 pages
Data Analytics I
No ratings yet
Data Analytics I
4 pages
Project
No ratings yet
Project
10 pages
Server-Side Programming: Java Servlets: Web Technologies A Computer Science Perspective
No ratings yet
Server-Side Programming: Java Servlets: Web Technologies A Computer Science Perspective
115 pages
Networking and Internetworking Devices
No ratings yet
Networking and Internetworking Devices
21 pages
Week 1 Get Familier With Jupyter Notebook
No ratings yet
Week 1 Get Familier With Jupyter Notebook
4 pages
Linear Regression Analysis - Polynomial Regression
No ratings yet
Linear Regression Analysis - Polynomial Regression
25 pages
Whiteleaf Corporate Profile
No ratings yet
Whiteleaf Corporate Profile
12 pages
Pine Labs POS - Troubleshooting Guide-HRPL-1
No ratings yet
Pine Labs POS - Troubleshooting Guide-HRPL-1
14 pages
Ads Lab8
No ratings yet
Ads Lab8
5 pages
Phase 2 Irfan
No ratings yet
Phase 2 Irfan
5 pages
Housepriceprediction ML 221104055342 Fb5109ae
No ratings yet
Housepriceprediction ML 221104055342 Fb5109ae
17 pages
Greek Property Prices
No ratings yet
Greek Property Prices
16 pages
Getting Started
No ratings yet
Getting Started
14 pages
Data Analysis Project MAIN
No ratings yet
Data Analysis Project MAIN
6 pages
Feature Engineering
No ratings yet
Feature Engineering
10 pages
FML PROJECT Diya
No ratings yet
FML PROJECT Diya
9 pages
Got A Better Name? Please Let Me Know!
No ratings yet
Got A Better Name? Please Let Me Know!
30 pages
Turbo HD DVR V3.4.83 - Build170526 Release Notes - External
No ratings yet
Turbo HD DVR V3.4.83 - Build170526 Release Notes - External
2 pages
Aastha Mahajan Python File
No ratings yet
Aastha Mahajan Python File
17 pages
CNET101 Computer Networks
No ratings yet
CNET101 Computer Networks
3 pages
P04 The Regression Pipeline - Preprocessing Ans
No ratings yet
P04 The Regression Pipeline - Preprocessing Ans
19 pages
Aiml Demo
No ratings yet
Aiml Demo
12 pages
Electronic Medical Record
100% (1)
Electronic Medical Record
10 pages
House Price Prediction
No ratings yet
House Price Prediction
5 pages
California Housing Project
No ratings yet
California Housing Project
5 pages
Ijmet 08 10 013
No ratings yet
Ijmet 08 10 013
7 pages
Exercise Explore Your Data
No ratings yet
Exercise Explore Your Data
2 pages
ML Powered NGFW Customer Presentation PDF
No ratings yet
ML Powered NGFW Customer Presentation PDF
68 pages
FALLSEM2021-22 MDI4001 ETH VL2021220104135 Reference Material I 09-Aug-2021 Data2 1
No ratings yet
FALLSEM2021-22 MDI4001 ETH VL2021220104135 Reference Material I 09-Aug-2021 Data2 1
9 pages
Mohammad Kausar Uddin
No ratings yet
Mohammad Kausar Uddin
3 pages
A
No ratings yet
A
2 pages
0.1 Guilherme Marthe - Boston House Pricing Challenge
100% (1)
0.1 Guilherme Marthe - Boston House Pricing Challenge
15 pages
Setup: Chapter 2 - End-To-End Machine Learning Project
No ratings yet
Setup: Chapter 2 - End-To-End Machine Learning Project
31 pages
User Manual 2 2946010
No ratings yet
User Manual 2 2946010
25 pages
Contador de Particulas
No ratings yet
Contador de Particulas
8 pages
Regression Dataset
No ratings yet
Regression Dataset
3 pages
Tarea - Prediccion de Casas en California
No ratings yet
Tarea - Prediccion de Casas en California
5 pages
Progenifix Product Review
No ratings yet
Progenifix Product Review
4 pages
Resource AI Class X
No ratings yet
Resource AI Class X
1 page
CSIC 6132 排版870 878
No ratings yet
CSIC 6132 排版870 878
9 pages
Group Assignment Logistic and Supply Chain Management ALS2023
No ratings yet
Group Assignment Logistic and Supply Chain Management ALS2023
29 pages
Quantitative Test Bank Chapter 9
No ratings yet
Quantitative Test Bank Chapter 9
67 pages
P R Saradalekshmi
No ratings yet
P R Saradalekshmi
2 pages
Ebooks Implementation Guide Sme
No ratings yet
Ebooks Implementation Guide Sme
35 pages
House Price Prediction: Project Description
No ratings yet
House Price Prediction: Project Description
11 pages
House Price Prediction Using Machine Learning in Python
No ratings yet
House Price Prediction Using Machine Learning in Python
13 pages
Esigno E S: Nergy Aver
No ratings yet
Esigno E S: Nergy Aver
2 pages
Analysis of An Interview Based On Emotion Detection Using Convolutional Neural Networks
No ratings yet
Analysis of An Interview Based On Emotion Detection Using Convolutional Neural Networks
25 pages
Module 2
No ratings yet
Module 2
20 pages
Final
No ratings yet
Final
14 pages
Introduction To Machine Learning (ML) With Sklearn
No ratings yet
Introduction To Machine Learning (ML) With Sklearn
10 pages
Curriculum Vitae
No ratings yet
Curriculum Vitae
3 pages
CD FO 32 Experience Verification Form V1.0
No ratings yet
CD FO 32 Experience Verification Form V1.0
1 page
AP Computer Science Principles: Student-Crafted Practice Tests For Excellence
From Everand
AP Computer Science Principles: Student-Crafted Practice Tests For Excellence
Sama Alshatali
No ratings yet