0% found this document useful (0 votes)

54 views1 page

Import As Import As From Import: "Mean Squared Errors: "

This document shows code for loading and preprocessing the California housing dataset using scikit-learn. It splits the data into training and test sets, adds an intercept term, fits a linear regression model using the closed-form solution, and calculates the mean squared error on the test set. Key steps include: 1. Loading the California housing dataset and separating features (X) and target (y) 2. Standardizing the features 3. Splitting into training and test sets 4. Adding an intercept term to the training and test features 5. Computing the weights using the closed-form linear regression solution 6. Calculating the mean squared error on the test set

Uploaded by

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

54 views1 page

Import As Import As From Import: "Mean Squared Errors: "

Uploaded by

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 1

In

[6]: import pandas as pd

import numpy as np
from sklearn import datasets

housing = datasets.fetch_california_housing()
housing

Out[6]: {'data': array([[ 8.3252 , 41. , 6.98412698, ..., 2.55555556,

37.88 , -122.23 ],
[ 8.3014 , 21. , 6.23813708, ..., 2.10984183,
37.86 , -122.22 ],
[ 7.2574 , 52. , 8.28813559, ..., 2.80225989,
37.85 , -122.24 ],
...,
[ 1.7 , 17. , 5.20554273, ..., 2.3256351 ,
39.43 , -121.22 ],
[ 1.8672 , 18. , 5.32951289, ..., 2.12320917,
39.43 , -121.32 ],
[ 2.3886 , 16. , 5.25471698, ..., 2.61698113,
39.37 , -121.24 ]]),
'target': array([4.526, 3.585, 3.521, ..., 0.923, 0.847, 0.894]),
'frame': None,
'target_names': ['MedHouseVal'],
'feature_names': ['MedInc',
'HouseAge',
'AveRooms',
'AveBedrms',
'Population',
'AveOccup',
'Latitude',
'Longitude'],
'DESCR': '.. _california_housing_dataset:\n\nCalifornia Housing dataset\n--------------------------\n\n**Data Set Characteristics:**\n\n :Number of Instances: 20
640\n\n :Number of Attributes: 8 numeric, predictive attributes and the target\n\n :Attribute Information:\n - MedInc median income in block grou
p\n - HouseAge median house age in block group\n - AveRooms average number of rooms per household\n - AveBedrms average number of
bedrooms per household\n - Population block group population\n - AveOccup average number of household members\n - Latitude block gr
oup latitude\n - Longitude block group longitude\n\n :Missing Attribute Values: None\n\nThis dataset was obtained from the StatLib repository.\nhttps:/
/www.dcc.fc.up.pt/~ltorgo/Regression/cal_housing.html\n\nThe target variable is the median house value for California districts,\nexpressed in hundreds of thousands
of dollars ($100,000).\n\nThis dataset was derived from the 1990 U.S. census, using one row per census\nblock group. A block group is the smallest geographical unit
for which the U.S.\nCensus Bureau publishes sample data (a block group typically has a population\nof 600 to 3,000 people).\n\nAn household is a group of people resi
ding within a home. Since the average\nnumber of rooms and bedrooms in this dataset are provided per household, these\ncolumns may take surpinsingly large values for
block groups with few households\nand many empty houses, such as vacation resorts.\n\nIt can be downloaded/loaded using the\n:func:`sklearn.datasets.fetch_california
_housing` function.\n\n.. topic:: References\n\n - Pace, R. Kelley and Ronald Barry, Sparse Spatial Autoregressions,\n Statistics and Probability Letters, 33
(1997) 291-297\n'}

In [11]: X = housing.data
X.shape

y = housing.target
X.shape, y.shape

Out[11]: ((20640, 8), (20640,))

In [12]: from sklearn.preprocessing import StandardScaler

scaler = StandardScaler()

X = scaler.fit_transform(X)
X

Out[12]: array([[ 2.34476576, 0.98214266, 0.62855945, ..., -0.04959654,

1.05254828, -1.32783522],
[ 2.33223796, -0.60701891, 0.32704136, ..., -0.09251223,
1.04318455, -1.32284391],
[ 1.7826994 , 1.85618152, 1.15562047, ..., -0.02584253,
1.03850269, -1.33282653],
...,
[-1.14259331, -0.92485123, -0.09031802, ..., -0.0717345 ,
1.77823747, -0.8237132 ],
[-1.05458292, -0.84539315, -0.04021111, ..., -0.09122515,
1.77823747, -0.87362627],
[-0.78012947, -1.00430931, -0.07044252, ..., -0.04368215,
1.75014627, -0.83369581]])

In [13]: from sklearn.model_selection import train_test_split

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size = 0.25)

print(len(X_train))
print(len(X_test))

15480
5160

In [14]: # Add Intercept

#What is the shape of X they want

#(number of samples, number of features) --> correct shape
# for closed form formula
#How about the intercept
#w0 is OUR intercept
#what is the shape of w -->(n+1, )
#What is the shape of intercept --->(m, 1)
#X = [1 2 3 @ [w0
# 1 4 6 w1
# 1 9 1 w2
# 1 10 2 ]

#np.ones((shape))
intercept = np.ones((X_train.shape[0], 1))

#concatenate the intercept based on axis=1

X_train = np.concatenate((intercept, X_train), axis=1)

#np.ones((shape))
intercept = np.ones((X_test.shape[0], 1))

#concatenate the intercept based on axis=1

X_test = np.concatenate((intercept, X_test), axis=1)

In [17]: X_train

Out[17]: array([[ 1. , -0.51588775, -1.00430931, ..., -0.1006962 ,

-1.30243016, 1.33752281],
[ 1. , 0.53960528, 1.61780729, ..., -0.05983722,
-0.74060628, 0.59381804],
[ 1. , -0.14247524, 1.14105882, ..., -0.02441623,
0.95891097, -1.27792215],
...,
[ 1. , 1.44860733, -0.68647699, ..., 0.02829722,
0.83718246, -1.14315686],
[ 1. , -1.12969705, -0.60701891, ..., -0.03598869,
1.55350791, -0.18981719],
[ 1. , 0.40464198, -0.52756083, ..., -0.00465543,
1.41773381, -0.74884359]])

In [18]: from numpy.linalg import inv

#order of operation DOES NOT MATTER

#But don't flip y before X^T for example
def closed_form(X, y):
return inv(X.T @ X) @ X.T @ y

In [19]: #let's use the closed_form to find the theta

theta = closed_form(X_train, y_train)
theta #<------this is our model

Out[19]: array([ 2.06922803, 0.83307095, 0.11525569, -0.28134176, 0.30252723,

-0.00705287, -0.04216411, -0.88801746, -0.85760284])

In [20]: #Compute the accuracy/loss

yhat = X_test @ theta #==> X (m, n+1) @ (n+1, ) w ==> (m, ) y

#if I want to compare yhat and y, I need to make sure they are the same shape
assert y_test.shape == yhat.shape

In [21]: #get the mse

mse = ((y_test - yhat)**2).sum() / X_test.shape[0]
print("Mean squared errors: ", mse)

Mean squared errors: 0.5289323658169676

Untitled6.Ipynb - Colab
No ratings yet
Untitled6.Ipynb - Colab
6 pages
Reanalysis Data Example - Ipynb
No ratings yet
Reanalysis Data Example - Ipynb
363 pages
Unit 1: Shobana T S Assistant Professor Dept. of ISE, BMSCE
No ratings yet
Unit 1: Shobana T S Assistant Professor Dept. of ISE, BMSCE
127 pages
One Hot Encoding
No ratings yet
One Hot Encoding
12 pages
Machine Learning Lab Manual
No ratings yet
Machine Learning Lab Manual
9 pages
ML Lab Manual
No ratings yet
ML Lab Manual
60 pages
Linear Regression Analysis - Polynomial Regression
No ratings yet
Linear Regression Analysis - Polynomial Regression
25 pages
Cp4252 Machine Learning Lab Manual
No ratings yet
Cp4252 Machine Learning Lab Manual
27 pages
Dadm Assesment #2: Akshat Bansal
No ratings yet
Dadm Assesment #2: Akshat Bansal
24 pages
ML - Datascience Manual
No ratings yet
ML - Datascience Manual
64 pages
Python - Vectorized - Tute - Jupyter Notebook
No ratings yet
Python - Vectorized - Tute - Jupyter Notebook
16 pages
Internal Audit Sampling by IIA
89% (9)
Internal Audit Sampling by IIA
11 pages
An Analysis of Christopher R Browning S Ordinary Men Reserve Police Battalion 101 and The Final Solution in Poland 1st Edition James Chappel and Tom Stammers Instant Download
50% (2)
An Analysis of Christopher R Browning S Ordinary Men Reserve Police Battalion 101 and The Final Solution in Poland 1st Edition James Chappel and Tom Stammers Instant Download
147 pages
Machine Learning Lab Manual
No ratings yet
Machine Learning Lab Manual
33 pages
Example Project California Data Anaylsis Jupyter Notebook
No ratings yet
Example Project California Data Anaylsis Jupyter Notebook
28 pages
Injecttive Blockchain
No ratings yet
Injecttive Blockchain
14 pages
Linear Reg
No ratings yet
Linear Reg
25 pages
Week 6 LAB
No ratings yet
Week 6 LAB
13 pages
Decision Tree Algorithm in Machine Learning
No ratings yet
Decision Tree Algorithm in Machine Learning
13 pages
Machine Learning Laboratory
No ratings yet
Machine Learning Laboratory
23 pages
Document From Jahnavi
No ratings yet
Document From Jahnavi
20 pages
Faseeh Chap 2 Report
No ratings yet
Faseeh Chap 2 Report
30 pages
Exp - 2-EDA - CaliforniaData Set - HeatMap - PairPlot-checkpoint - Jupyter Notebook
No ratings yet
Exp - 2-EDA - CaliforniaData Set - HeatMap - PairPlot-checkpoint - Jupyter Notebook
12 pages
T2 Summary VHA
No ratings yet
T2 Summary VHA
14 pages
Emllab
No ratings yet
Emllab
6 pages
Shiva Teja
No ratings yet
Shiva Teja
19 pages
02 End To End Machine Learning Project
No ratings yet
02 End To End Machine Learning Project
26 pages
Machine Learning Lab Manual
No ratings yet
Machine Learning Lab Manual
26 pages
Mlalllabprgs
No ratings yet
Mlalllabprgs
17 pages
Machinelearning
No ratings yet
Machinelearning
26 pages
ML Manual
No ratings yet
ML Manual
9 pages
Machine Learning Project: TITLE: Predicting The Sale Price of A House Using Linear Regression
No ratings yet
Machine Learning Project: TITLE: Predicting The Sale Price of A House Using Linear Regression
20 pages
FML - Lab - Ipynb - Colab
No ratings yet
FML - Lab - Ipynb - Colab
3 pages
MLLab Manual
No ratings yet
MLLab Manual
24 pages
DL Assignment 1ms24rai03
No ratings yet
DL Assignment 1ms24rai03
10 pages
Ilovepdf Merged
No ratings yet
Ilovepdf Merged
47 pages
Regression Algorithm
No ratings yet
Regression Algorithm
9 pages
ML Manual
No ratings yet
ML Manual
30 pages
ML Observation
No ratings yet
ML Observation
29 pages
Lab Extern L
No ratings yet
Lab Extern L
8 pages
Ds Pract 5 Data Analytics1 Vedanti
No ratings yet
Ds Pract 5 Data Analytics1 Vedanti
7 pages
House Price Prediction Models
No ratings yet
House Price Prediction Models
16 pages
Assignment 4
No ratings yet
Assignment 4
7 pages
California Housing Project
No ratings yet
California Housing Project
5 pages
Tarea - Prediccion de Casas en California
No ratings yet
Tarea - Prediccion de Casas en California
5 pages
ML Short Code - Under Updating
No ratings yet
ML Short Code - Under Updating
4 pages
Data Science Record - 05
No ratings yet
Data Science Record - 05
20 pages
Normialization Dataset
No ratings yet
Normialization Dataset
7 pages
Setup: Chapter 2 - End-To-End Machine Learning Project
No ratings yet
Setup: Chapter 2 - End-To-End Machine Learning Project
31 pages
Report
No ratings yet
Report
40 pages
Sklearn Tutorial: DNN On Boston Data
No ratings yet
Sklearn Tutorial: DNN On Boston Data
9 pages
Numpy Cheatsheet
No ratings yet
Numpy Cheatsheet
11 pages
Machine Learning Lab
No ratings yet
Machine Learning Lab
20 pages
A
No ratings yet
A
2 pages
Kaggle Machine Learning
No ratings yet
Kaggle Machine Learning
6 pages
Regression Analysis - Lasso and Ridge Regularization
No ratings yet
Regression Analysis - Lasso and Ridge Regularization
17 pages
Project 4 - House Price Prediction - Ipynb - Colab
No ratings yet
Project 4 - House Price Prediction - Ipynb - Colab
5 pages
1 Abril PDF
No ratings yet
1 Abril PDF
10 pages
Mlext
No ratings yet
Mlext
1 page
Pattern - Recognition - 3 - Code With Output
No ratings yet
Pattern - Recognition - 3 - Code With Output
7 pages
Introduction To Machine Learning (ML) With Sklearn
No ratings yet
Introduction To Machine Learning (ML) With Sklearn
10 pages
Recent Advances in Biostatistics False Discovery Rates, Survival Analysis, and Related Topics Unlimited Ebook Download
100% (16)
Recent Advances in Biostatistics False Discovery Rates, Survival Analysis, and Related Topics Unlimited Ebook Download
16 pages
As Applied Unit 3 Probability MS
100% (1)
As Applied Unit 3 Probability MS
7 pages
California Housing Price Prediction .
No ratings yet
California Housing Price Prediction .
1 page
Dougherty C12G02 2016 05 22
No ratings yet
Dougherty C12G02 2016 05 22
18 pages
Pr2 Midterm Exam
No ratings yet
Pr2 Midterm Exam
3 pages
Qabd Case Study Mba PDF
100% (1)
Qabd Case Study Mba PDF
2 pages
Team 7 MINI Project FINAL REPORT
No ratings yet
Team 7 MINI Project FINAL REPORT
41 pages
Chapter 1 - ABM12 Group 2 - 1
No ratings yet
Chapter 1 - ABM12 Group 2 - 1
32 pages
2004
No ratings yet
2004
20 pages
QMT
No ratings yet
QMT
6 pages
Problems With Econometric Models Heteros
No ratings yet
Problems With Econometric Models Heteros
10 pages
Literature Review Evaluation Example
100% (2)
Literature Review Evaluation Example
7 pages
AP Stats Homework Answers Chapter 10
100% (1)
AP Stats Homework Answers Chapter 10
8 pages
Project On Khadi Products in New Delhi
100% (1)
Project On Khadi Products in New Delhi
42 pages
How To Analyze Data Using ANOVA in SPSS
No ratings yet
How To Analyze Data Using ANOVA in SPSS
8 pages
How To Use SPSS A Step by Step Guide To Analysis and Interpretation 4th Edition by Brian Cronk 188458568X 978-1884585685
No ratings yet
How To Use SPSS A Step by Step Guide To Analysis and Interpretation 4th Edition by Brian Cronk 188458568X 978-1884585685
49 pages
Unitplan2 Chi-Square
No ratings yet
Unitplan2 Chi-Square
11 pages
Materi 1
No ratings yet
Materi 1
18 pages
Ain Shams University
No ratings yet
Ain Shams University
15 pages
Intro Part2
No ratings yet
Intro Part2
50 pages
An Ova
No ratings yet
An Ova
16 pages
Icma Centre University of Reading: Quantitative Methods For Finance
No ratings yet
Icma Centre University of Reading: Quantitative Methods For Finance
3 pages
Methods For Isolation of Entomopathogenic Fungi From The Soil Environment
No ratings yet
Methods For Isolation of Entomopathogenic Fungi From The Soil Environment
18 pages
Ch.6 Population and Samples
No ratings yet
Ch.6 Population and Samples
15 pages
Chapter-3 Data Science
No ratings yet
Chapter-3 Data Science
7 pages
Types of Inferential Statistics
No ratings yet
Types of Inferential Statistics
2 pages
Paper 4 PDF
No ratings yet
Paper 4 PDF
6 pages
1.1 What Is Data Mining?
No ratings yet
1.1 What Is Data Mining?
6 pages
Profound Python Data Science
From Everand
Profound Python Data Science
Onder Teker
No ratings yet