0% found this document useful (0 votes)
2 views2 pages

least square method

The document outlines a Python script for analyzing the California Housing dataset using libraries such as pandas, numpy, and scikit-learn. It includes steps for exploratory data analysis, data visualization, and the implementation of a linear regression model to predict median house values. The script also evaluates the model's performance using mean squared error and R-squared metrics.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
2 views2 pages

least square method

The document outlines a Python script for analyzing the California Housing dataset using libraries such as pandas, numpy, and scikit-learn. It includes steps for exploratory data analysis, data visualization, and the implementation of a linear regression model to predict median house values. The script also evaluates the model's performance using mean squared error and R-squared metrics.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 2

# Import necessary libraries

import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
from sklearn.datasets import fetch_california_housing
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LinearRegression
from sklearn.metrics import mean_squared_error, r2_score

# Load the California Housing dataset


housing_data = fetch_california_housing()
X = housing_data.data # Features
y = housing_data.target # Target variable (median house value)
feature_names = housing_data.feature_names

# Create a DataFrame from the data and feature names


df = pd.DataFrame(X, columns=feature_names)
df['Target'] = y

# Perform Basic EDA(Exploratory Data Analysis)


# Display the first few rows
print("First few rows of the dataset:")
print(df.head())

# Display summary statistics


print("\nSummary statistics:")
print(df.describe())

# Check for missing values


print("\nMissing values:")
print(df.isnull().sum())

# Data types of each column


print("\nData types:")
print(df.dtypes)

# Histograms of features
df.hist(figsize=(12, 10), bins=20)
plt.suptitle('Histogram of Features')
plt.show()

# Scatter plot of a feature vs. target


feature = 'MedInc' # Choose 'MedInc' (Median Income) as an example feature
32
B.Tech / M.Tech (Integrated) Programmes-Regulations 2021-Volume-11-CSE-Higher Semester Syllabi-Control Copy
plt.figure(figsize=(8, 6))
plt.scatter(df[feature], df['Target'], alpha=0.5)
plt.title(f'Scatter Plot: {feature} vs. Target')
plt.xlabel(feature)
plt.ylabel('Target (Median House Value)')
plt.grid(True)
plt.show()

# Split the data into training and testing sets


X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

# Create a Linear Regression model


model = LinearRegression()

# Train the model on the training data


model.fit(X_train, y_train)

# Make predictions on the test data


y_pred = model.predict(X_test)

# Evaluate the model


mse = mean_squared_error(y_test, y_pred)
r2 = r2_score(y_test, y_pred)

# Print the results


print(f"\nMean Squared Error: {mse}")
print(f"R-squared: {r2}")

# Plot the regression line


plt.figure(figsize=(8, 6))
plt.scatter(X_test[:, 0], y_test, color='blue', label='Actual')
plt.plot(X_test[:, 0], y_pred, color='red', linewidth=2, label='Predicted')
plt.title('Regression Line (Feature: MedInc)')
plt.xlabel('MedInc')
plt.ylabel('Target (Median House Value)')
plt.legend()
plt.grid(True)
plt.show()

32
B.Tech / M.Tech (Integrated) Programmes-Regulations 2021-Volume-11-CSE-Higher Semester Syllabi-Control Copy

You might also like