California Housing Project

Uploaded by

anushaj

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

44 views5 pages

California Housing Project

Uploaded by

anushaj

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

You are on page 1/ 5

California housing project

Get the Data










 Pandas [1]: a library for data analysis and manipulation. It provides data
structures and operations for working with labeled data, like DataFrames.
 NumPy [2]: a library for numerical computing. It provides powerful array and
matrix operations.
 Matplotlib [3]: a library for creating static, animated, and interactive
visualizations.
 Seaborn [4]: a library built on top of Matplotlib that provides a high-level
interface for making statistical graphics.

Download the Data

Here is the function to fetch the data:
import os
import tarfile // Used to work with tar archives (compressed files).
from six.moves import urllib // Enables downloading files from the
internet
DOWNLOAD_ROOT =
"https://raw.githubusercontent.com/ageron/handson-ml2/master/" // Base
URL for downloading the data archive.
HOUSING_PATH = os.path.join("datasets", "housing") // Specifies the
local directory where the downloaded archive will be stored.
HOUSING_URL = DOWNLOAD_ROOT

+ "datasets/housing/housing.tgz"
def fetch_housing_data(housing_url=HOUSING_URL,
housing_path=HOUSING_PATH):
if not os.path.isdir(housing_path):
os.makedirs(housing_path) // creates the directory and any necessary
subdirectories.

tgz_path = os.path.join(housing_path, "housing.tgz") //Constructs the full

path for the downloaded archive file (housing.tgz) within the housing_path directory.
os.path.join combines these paths.

urllib.request.urlretrieve(housing_url, tgz_path) // handles the download.

housing_tgz = tarfile.open(tgz_path) // Opens the downloaded archive
using tarfile.open

housing_tgz.extractall(path=housing_path) // Extracts all the contents of

the archive into the housing_path directory. extractall performs the
decompression.

housing_tgz.close()
All attributes are numerical, except the ocean_proximity field. Its type is
object, so it could hold any kind of Python object, but since you loaded
this data from a CSV file you know that it must be a text attribute. When
you looked at the top five rows, you probably noticed that the values in
the ocean_proximity column were repetitive
Another quick way to get a feel of the type of data you are dealing with is
to plot a histogram for each numerical attribute.
plt.show()

%matplotlib inline # only in a Jupyter notebook

import matplotlib.pyplot as plt
housing.hist(bins=50, figsize=(20,15))
few things in these histograms:
1. First, the median income attribute does not look like it is expressed in
US dollars (USD). After checking with the team that collected the data,
you are told that the data has been scaled and capped at 15 (actually
15.0001) for higher median incomes, and at 0.5 (actually 0.4999) for
lower median incomes. The numbers represent roughly tens of
thousands of dollars (e.g., 3 actually means about $30,000). Working
with reprocessed attributes is common in Machine Learning,

2. The housing median age and the median house value were also
capped. The latter may be a serious problem since it is your target
attribute (your labels). Your Machine Learning algorithms may learn that
prices never go beyond that limit. You need to check with your client
team (the team that will use your system’s out‐ put) to see if this is a
problem or not. If they tell you that they need precise predictions even
beyond $500,000, then you have mainly two options:
a. Collect proper labels for the districts whose labels were capped.
b. Remove those districts from the training set (and also from the
test set, since your system should not be evaluated poorly if it predicts
values beyond $500,000).

Create a Test Set

OLYMPIAD PROBLEMS ALGEBRA VOLUME II - Compressed
100% (1)
OLYMPIAD PROBLEMS ALGEBRA VOLUME II - Compressed
225 pages
The Anatomy of Peace Institute en 45662
No ratings yet
The Anatomy of Peace Institute en 45662
6 pages
House Price Prediction: Project Description
No ratings yet
House Price Prediction: Project Description
11 pages
2BSvs-PRO-03-V1.8 - EN - GHG - Calculation - Methodology - v1.8 - Valide - V130220 - EN
100% (1)
2BSvs-PRO-03-V1.8 - EN - GHG - Calculation - Methodology - v1.8 - Valide - V130220 - EN
39 pages
ISMLA Module5
No ratings yet
ISMLA Module5
25 pages
Dawit House
No ratings yet
Dawit House
49 pages
Faseeh Chap 2 Report
No ratings yet
Faseeh Chap 2 Report
30 pages
Unit 2
No ratings yet
Unit 2
78 pages
AIMLlatestmodule 2notes Removed
No ratings yet
AIMLlatestmodule 2notes Removed
33 pages
Module 2
No ratings yet
Module 2
35 pages
Ds ML House Price Book
No ratings yet
Ds ML House Price Book
46 pages
Module 2notes
No ratings yet
Module 2notes
44 pages
L03 The Regression Pipeline
No ratings yet
L03 The Regression Pipeline
94 pages
End To End Machine Learning Project-2
No ratings yet
End To End Machine Learning Project-2
10 pages
MiniProject BI
No ratings yet
MiniProject BI
16 pages
Setup: Chapter 2 - End-To-End Machine Learning Project
No ratings yet
Setup: Chapter 2 - End-To-End Machine Learning Project
31 pages
Module 2
No ratings yet
Module 2
20 pages
Unit 1: Shobana T S Assistant Professor Dept. of ISE, BMSCE
No ratings yet
Unit 1: Shobana T S Assistant Professor Dept. of ISE, BMSCE
127 pages
Report
No ratings yet
Report
40 pages
Machine Learning Lab
No ratings yet
Machine Learning Lab
20 pages
Emllab
No ratings yet
Emllab
6 pages
P04 The Regression Pipeline - Preprocessing Ans
No ratings yet
P04 The Regression Pipeline - Preprocessing Ans
19 pages
ML File
No ratings yet
ML File
6 pages
Lecture02. ML Pipeline (Chapter 2)
No ratings yet
Lecture02. ML Pipeline (Chapter 2)
50 pages
The Data Science Process
100% (1)
The Data Science Process
53 pages
Week 1 Get Familier With Jupyter Notebook
No ratings yet
Week 1 Get Familier With Jupyter Notebook
4 pages
USA Real Estate Price Prediction Using Decision Tree Regressor, and AdaBoost Regressor
No ratings yet
USA Real Estate Price Prediction Using Decision Tree Regressor, and AdaBoost Regressor
14 pages
Machine Learning Laboratory
No ratings yet
Machine Learning Laboratory
23 pages
House Price Prediction
No ratings yet
House Price Prediction
14 pages
Untitled6.Ipynb - Colab
No ratings yet
Untitled6.Ipynb - Colab
6 pages
FALLSEM2021-22 MDI4001 ETH VL2021220104135 Reference Material I 09-Aug-2021 Data2 1
No ratings yet
FALLSEM2021-22 MDI4001 ETH VL2021220104135 Reference Material I 09-Aug-2021 Data2 1
9 pages
Introduction To Machine Learning (ML) With Sklearn
No ratings yet
Introduction To Machine Learning (ML) With Sklearn
10 pages
ML Book Notes
No ratings yet
ML Book Notes
9 pages
Module 5
No ratings yet
Module 5
46 pages
House Price Prediction: # Importing Necessary Libraries
No ratings yet
House Price Prediction: # Importing Necessary Libraries
18 pages
Machine Learning (BCSL606) Lab Manual
No ratings yet
Machine Learning (BCSL606) Lab Manual
117 pages
Aastha Mahajan Python File
No ratings yet
Aastha Mahajan Python File
17 pages
House Price Prediction Models
No ratings yet
House Price Prediction Models
16 pages
Import As Import As From Import: "Mean Squared Errors: "
No ratings yet
Import As Import As From Import: "Mean Squared Errors: "
1 page
ML 1-11
No ratings yet
ML 1-11
27 pages
Rajasri
No ratings yet
Rajasri
10 pages
Normialization Dataset
No ratings yet
Normialization Dataset
7 pages
Real Estate
No ratings yet
Real Estate
10 pages
Presentation 21
No ratings yet
Presentation 21
9 pages
Machine Learning Life Cycle Report
No ratings yet
Machine Learning Life Cycle Report
2 pages
Kirubavathi
No ratings yet
Kirubavathi
10 pages
Exercise Explore Your Data
No ratings yet
Exercise Explore Your Data
2 pages
Capstone Project Report
No ratings yet
Capstone Project Report
187 pages
Project PDF
No ratings yet
Project PDF
13 pages
Data Clearning
No ratings yet
Data Clearning
7 pages
Machinelearning
No ratings yet
Machinelearning
26 pages
Python Assignment 1.ipynb - Colaboratory
No ratings yet
Python Assignment 1.ipynb - Colaboratory
3 pages
ML Lab Manual
No ratings yet
ML Lab Manual
60 pages
Lecture 4
No ratings yet
Lecture 4
56 pages
End-to-End Machine Learning Project (Bootcamp)
No ratings yet
End-to-End Machine Learning Project (Bootcamp)
415 pages
Linear Regression Analysis - Polynomial Regression
No ratings yet
Linear Regression Analysis - Polynomial Regression
25 pages
Tarea - Prediccion de Casas en California
No ratings yet
Tarea - Prediccion de Casas en California
5 pages
DL - LR - 1.ipynb - Colab
No ratings yet
DL - LR - 1.ipynb - Colab
5 pages
Machine Learning (BCSL606) Lab Manual
No ratings yet
Machine Learning (BCSL606) Lab Manual
117 pages
Final DA LAB1 Merged
No ratings yet
Final DA LAB1 Merged
48 pages
House Pricing
No ratings yet
House Pricing
15 pages
02 End To End Machine Learning Project
No ratings yet
02 End To End Machine Learning Project
26 pages
Concentrix Sustainability Report 2022
No ratings yet
Concentrix Sustainability Report 2022
52 pages
SRAC 463 - Amonia Di Kolam Ikan
No ratings yet
SRAC 463 - Amonia Di Kolam Ikan
2 pages
Formula Booklet Physics XI
84% (147)
Formula Booklet Physics XI
35 pages
1 s2.0 S0890695521000390 Main
No ratings yet
1 s2.0 S0890695521000390 Main
77 pages
PSP Computer System 17-21 5th Semester
No ratings yet
PSP Computer System 17-21 5th Semester
4 pages
Esports Research: Critical, Empirical, and Historical Studies of Competitive Videogame Play
No ratings yet
Esports Research: Critical, Empirical, and Historical Studies of Competitive Videogame Play
6 pages
EAPP Exam 1 .HTML
No ratings yet
EAPP Exam 1 .HTML
11 pages
Perno A325 HDG 5-8 X 1 1-4 (TKR+D7042279)
No ratings yet
Perno A325 HDG 5-8 X 1 1-4 (TKR+D7042279)
1 page
2nd-Call Engg-Program MigrationAdmission GL Fall-2023
No ratings yet
2nd-Call Engg-Program MigrationAdmission GL Fall-2023
8 pages
AK Practice 1 Unit 6 Comparative and Superlative
No ratings yet
AK Practice 1 Unit 6 Comparative and Superlative
2 pages
Report Writing and Persuasive Email Grade 5
No ratings yet
Report Writing and Persuasive Email Grade 5
7 pages
2007 - Erzen - Islamic Aesthetics
No ratings yet
2007 - Erzen - Islamic Aesthetics
7 pages
Picture Put Events (Letters / Numbers) in Order From Oldest To Youngest
No ratings yet
Picture Put Events (Letters / Numbers) in Order From Oldest To Youngest
2 pages
Revised Tos
No ratings yet
Revised Tos
32 pages
Environmental Studies W.E.F. 2016-17 AB
No ratings yet
Environmental Studies W.E.F. 2016-17 AB
2 pages
SS Lab Manual
No ratings yet
SS Lab Manual
48 pages
Lab Report Biochemistry (BIO 462)
No ratings yet
Lab Report Biochemistry (BIO 462)
4 pages
Datasheet Pipa 63 MM SDR 13,6 Rev0
No ratings yet
Datasheet Pipa 63 MM SDR 13,6 Rev0
2 pages
Final Answer Key of PGTTCE-2023
No ratings yet
Final Answer Key of PGTTCE-2023
5 pages
S1 09-10 Half-Yearly Paper 1
No ratings yet
S1 09-10 Half-Yearly Paper 1
13 pages
Class8-Phy-T2-Worksheet Week 5
No ratings yet
Class8-Phy-T2-Worksheet Week 5
3 pages
All Element Summary
No ratings yet
All Element Summary
30 pages
Adjectives
No ratings yet
Adjectives
17 pages
Ambassadors' Booklet (2024-25)
No ratings yet
Ambassadors' Booklet (2024-25)
28 pages
BTY587 - Unit II
No ratings yet
BTY587 - Unit II
16 pages
Mcgee 1999
No ratings yet
Mcgee 1999
14 pages
Schedule - JEE Main 2024 April Rank Booster Course
No ratings yet
Schedule - JEE Main 2024 April Rank Booster Course
1 page

California Housing Project

Uploaded by

California Housing Project

Uploaded by

California housing project

Get the Data

Download the Data

tgz_path = os.path.join(housing_path, "housing.tgz") //Constructs the full

urllib.request.urlretrieve(housing_url, tgz_path) // handles the download.

housing_tgz.extractall(path=housing_path) // Extracts all the contents of

%matplotlib inline # only in a Jupyter notebook

Create a Test Set

You might also like