0% found this document useful (0 votes)

35 views23 pages

Housing Main

The document discusses analyzing a housing dataset using Python. It shows importing the dataset, checking for missing values and data types, then creating pie charts and histograms to visualize the distribution of rooms, bathrooms, locations, miles from school, and rent prices. Scatter plots are used to examine correlations between rent price and distance from school and other variables while differentiating locations.

Uploaded by

hamburgerhenry13

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

35 views23 pages

Housing Main

Uploaded by

hamburgerhenry13

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 23

Homework 1

B10702053 會計三黃少凱

1. Housing Dataset
**Q1. What steps will you take upon receiving this dataset before

commencing data analysis?**

First, import the dataset as pandas dataframe.

In [ ]: import pandas as pd
import numpy as np

In [ ]: housing_df = pd.read_csv('housing_data.csv')
print(housing_df.head())

Area No. of Rooms No. of Bathrooms Location \

0 1360 1 1 Rural
1 1794 3 1 Suburb
2 1630 2 1 Suburb
3 1595 1 1 Suburb
4 2138 1 1 Suburb

Miles (dist. between school and house) Rent Price per Month Sell Price
0 463 7401 74446632
1 210 9259 76199794
2 157 16469 16249579
3 133 18096 24291317
4 10 9923 50273384

Next, check for types of data in the dataset, and see if there are any missing values.

In [ ]: print(housing_df.info())
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 1000 entries, 0 to 999
Data columns (total 7 columns):
# Column Non-Null Count Dtype
--- ------ -------------- -----
0 Area 1000 non-null int64
1 No. of Rooms 1000 non-null int64
2 No. of Bathrooms 1000 non-null int64
3 Location 1000 non-null object
4 Miles (dist. between school and house) 1000 non-null int64
5 Rent Price per Month 1000 non-null int64
6 Sell Price 1000 non-null int64
dtypes: int64(6), object(1)
memory usage: 54.8+ KB
None

In [ ]: print(housing_df.describe())
Area No. of Rooms No. of Bathrooms \
count 1000.000000 1000.000000 1000.0
mean 1763.241000 1.974000 1.0
std 704.717323 0.814855 0.0
min 501.000000 1.000000 1.0
25% 1170.000000 1.000000 1.0
50% 1753.000000 2.000000 1.0
75% 2366.250000 3.000000 1.0
max 2997.000000 3.000000 1.0

Miles (dist. between school and house) Rent Price per Month \
count 1000.000000 1000.000000
mean 255.405000 13133.528000
std 142.346449 4106.514878
min 10.000000 6018.000000
25% 133.000000 9600.250000
50% 259.500000 13210.000000
75% 378.250000 16844.750000
max 498.000000 19993.000000

Sell Price
count 1.000000e+03
mean 4.207750e+07
std 2.164932e+07
min 6.113936e+06
25% 2.343184e+07
50% 4.284373e+07
75% 6.118787e+07
max 7.998578e+07

In [ ]: # look at any missing values

print(housing_df.isnull().sum())

Area 0
No. of Rooms 0
No. of Bathrooms 0
Location 0
Miles (dist. between school and house) 0
Rent Price per Month 0
Sell Price 0
dtype: int64
After asserting there are no missing values, we could then proceed to data analysis by applying Matplotlib.pyplot and
Seaborn for data visualization.

To begin with, we place our emphasis on the columns No. of Rooms , No. of Bathrooms , and Location . With pie charts, we
observe that all residential properties range from 1 to 3 rooms and 1 bathroom, each of which is evenly distributed among the
three locations, city center, suburb, and rural area.

In [ ]: # draw a pie chart for the number of rooms from 0 to 3

import matplotlib.pyplot as plt

fig, ax = plt.subplots(1, 3, figsize=(15, 5))

rooms = housing_df['No. of Rooms']

keys = rooms.value_counts().keys().tolist()
keys = [f"{i} rooms" for i in keys]
values = rooms.value_counts().tolist()

custom_colors = ['lightgreen','lightskyblue','lightcoral', 'gold']

ax[0].pie(values, labels=['']*len(keys), autopct='%1.1f%%',

pctdistance=0.8, textprops={'color':'w', 'weight':'bold', 'size':10},
wedgeprops=dict(width=0.4, edgecolor='w'), colors=custom_colors)

# show the legends

ax[0].legend(keys, loc='lower left')

# title
ax[0].set_title('No. of Rooms', fontsize=12
, fontweight='bold', color='navy')

bathrooms = housing_df['No. of Bathrooms']

keys = bathrooms.value_counts().keys().tolist()
keys = [f"{i} bathrooms" for i in keys]
values = bathrooms.value_counts().tolist()

ax[1].pie(values, labels=['']*len(keys), autopct='%1.1f%%',

pctdistance=0.8, textprops={'color':'w', 'weight':'bold', 'size':10},
wedgeprops=dict(width=0.4, edgecolor='w'), colors=custom_colors)

# show the legends

ax[1].legend(keys, loc='lower left')

# title
ax[1].set_title('No. of Bathrooms', fontsize=12
, fontweight='bold', color='navy')

# show the categories of location

location = housing_df['Location']
keys = location.value_counts().keys().tolist()
values = location.value_counts().tolist()

ax[2].pie(values, labels=['']*len(keys), autopct='%1.1f%%',

pctdistance=0.8, textprops={'color':'w', 'weight':'bold', 'size':10},
wedgeprops=dict(width=0.4, edgecolor='w'), colors=custom_colors)

# show the legends

ax[2].legend(keys, loc='lower left')

# title
ax[2].set_title('Location', fontsize=12
, fontweight='bold', color='navy')

plt.show()
For further analysis, we observe the distribution of the column Miles from School with a histogram, perceiving that the
distances between the residential properties and the school are evenly distributed between 0 to 500 miles. Similar phenomenon is
observed in the column Rent Price per Month , showing that the rent prices are evenly distributed between 6000 to 20000
dollars per month.

In [ ]: # draw a bar chart for the miles from the city center
# from 0-50, 50-100, 100-150, 150-200, 200-250, 250-300, 300-350, 350-400, 400-450, 450-500
fig, ax = plt.subplots(figsize=(15, 5))

miles = housing_df['Miles (dist. between school and house)']

miles_categories = pd.cut(miles, bins=[0, 50, 100, 150, 200, 250, 300, 350, 400, 450, 500]).value_counts().sort_index
keys = miles_categories.keys().tolist()
values = miles_categories.tolist()

keys = [f"{i.left}-{i.right} miles" for i in keys]

ax.bar(keys, values, width=0.4, color=plt.cm.Set3(np.arange(len(keys))))

# get rid of the top and right spines

ax.spines['top'].set_visible(False)
ax.spines['right'].set_visible(False)

# title
ax.set_title('Miles from Schools', fontsize=18
, fontweight='bold', color='navy')

plt.show()
In [ ]: # draw a bar chart for the rent price
# from 6000-7000, 7000-8000, 8000-9000, 9000-10000, 10000-11000, 11000-12000, 12000-13000, 13000-14000, 14000-15000,

fig, ax = plt.subplots(figsize=(20, 5))

rent = housing_df['Rent Price per Month']

rent_categories = pd.cut(rent, bins=[6000, 7000, 8000, 9000, 10000, 11000, 12000, 13000, 14000, 15000, 16000, 17000,

keys = rent_categories.keys().tolist()
values = rent_categories.tolist()

keys = [f"${int(i.left / 1000)}K-${int(i.right / 1000)}K" for i in keys]

ax.bar(keys, values, width=0.4, color=plt.cm.tab20c(np.arange(len(keys))))

# get rid of the top and right spines

ax.spines['top'].set_visible(False)
ax.spines['right'].set_visible(False)

# title
ax.set_title('Rent Price per Month', fontsize=12
, fontweight='bold', color='navy')
plt.show()

To furtherly analyze the relationship between the rent price and the distance from the school, we could apply a scatter plot to
observe the correlation between the two variables. We could also apply some colors to the scatter plot to differentiate the three
locations, city center, suburb, and rural area. Note that data with the column No. of Rooms not equal to 1 are filtered to avoid
the influence of the number of rooms on the rent price.

As below, there doesn't exist a strong correlation between the rent price and the distance from the school, and the rent prices are
evenly distributed among the three locations, city center, suburb, and rural area. Similar conclusion could be drawn from the scatter
plot of the relationship between the selling price and the distance from the school.

In [ ]: # draw a scatter plot between the rent price and the miles from the city center
color_map = {
"City Center": "lightcoral",
"Suburb": "lightgreen",
"Rural": "lightblue"
}

fig, ax = plt.subplots(figsize=(10, 5))

one_room = housing_df[(housing_df['No. of Rooms'] == 1) &
(housing_df['No. of Bathrooms'] == 1)]

# ax.scatter(one_room['Miles (dist. between school and house)'], one_room['Rent Price per Month'], color='lightcoral
scatter = ax.scatter(one_room['Miles (dist. between school and house)'],
one_room['Rent Price per Month'],
c=one_room["Location"].map(color_map))

# show the legends

# Create legend handles and labels
legend_handles = [plt.Line2D([0], [0], marker='o', color='w',
markerfacecolor=color, label=label)
for label, color in color_map.items()]

# Add legend
ax.legend(handles=legend_handles, title="Location",bbox_to_anchor=(1.01, 1), loc='upper left')

# Title and labels

ax.set_title('Rent Price per Month vs. Miles from Schools', fontsize=12, fontweight='bold', color='navy')
ax.set_xlabel('Miles from Schools')
ax.set_ylabel('Rent Price per Month')

plt.show()
In [ ]: # draw a scatter plot between the rent price and the miles from the city center
color_map = {
"City Center": "lightcoral",
"Suburb": "lightgreen",
"Rural": "lightblue"
}

fig, ax = plt.subplots(figsize=(10, 5))

one_room = housing_df[(housing_df['No. of Rooms'] == 1) &
(housing_df['No. of Bathrooms'] == 1)]

# ax.scatter(one_room['Miles (dist. between school and house)'], one_room['Rent Price per Month'], color='lightcoral
scatter = ax.scatter(one_room['Miles (dist. between school and house)'],
one_room['Sell Price'],
c=one_room["Location"].map(color_map))

# show the legends

# Create legend handles and labels
legend_handles = [plt.Line2D([0], [0], marker='o', color='w',
markerfacecolor=color, label=label)
for label, color in color_map.items()]

# Add legend
ax.legend(handles=legend_handles, title="Location", bbox_to_anchor=(1.01, 1), loc='upper left')

# Title and labels

ax.set_title('Sell Price vs. Miles from Schools', fontsize=12, fontweight='bold', color='navy')
ax.set_xlabel('Miles from Schools')
ax.set_ylabel('Sell Price')

plt.show()
In [ ]: # show the relationship between the sell price and the location
# as the type of box plots

fig, ax = plt.subplots(1, 3, figsize=(15, 5))

locations = housing_df['Location'].unique()

for i, loc in enumerate(locations):

now_df = housing_df[housing_df['Location'] == loc]
ax[i].boxplot(now_df['Sell Price'], patch_artist=True,
boxprops=dict(facecolor='lightblue'))

ax[i].set_title(f'Sell Price in {loc}', fontsize=12, fontweight='bold', color='navy')

ax[i].set_xticklabels([''])

plt.show()

To quantify the relationship between each two columns in the dataset, we create a heatmap to visualize the correlation between the
columns. Surprisingly, all pairs of features seems to be independent of each other, as the correlation coefficients are all close to 0.

With the above conclusion obtained, I decide not to take any further steps upon receiving this dataset before commencing data
analysis.
In [ ]: # draw a heatmap for the correlation matrix
import seaborn as sns

# convert the location to numerical values

housing_new = housing_df.copy()
housing_new['Location'] = housing_new['Location'].map({'City Center': 0, 'Suburb': 1, 'Rural': 2})

fig, ax = plt.subplots(figsize=(10, 5))

corr = housing_new.corr()
sns.heatmap(corr, annot=True, cmap='coolwarm', ax=ax)

plt.show()
**Q2. If you are to inquire about Q1 from ChatGPT or Bing, what

responses will you receive? Do you find them reasonable? If not, how will you rectify it?**
Below is the response of ChatGPT regarding steps to perform when conducting a data analysis:

Before commencing data analysis on the housing dataset, it's essential to perform several steps to ensure data quality and
understand the characteristics of the dataset. Here are the steps I would take:

1. Data Cleaning:

Check for missing values: Look for any missing or null values in the dataset and decide on how to handle them. Depending
on the amount of missing data, options include imputation, deletion, or leaving them as is if they are negligible.
Check for duplicates: Identify and remove any duplicate entries in the dataset, if present.
Check for outliers: Examine the distribution of each numerical variable to identify any outliers that may skew the analysis.
Decide on whether to remove or transform outliers based on domain knowledge and statistical analysis.
2. Data Exploration:

Summary statistics: Calculate descriptive statistics such as mean, median, standard deviation, minimum, and maximum
values for numerical variables. This provides an initial understanding of the data distribution.
Visualizations: Create visualizations such as histograms, box plots, or scatter plots to explore the distribution and
relationships between variables. For example, scatter plots can help visualize the relationship between sell price and other
variables like area, number of rooms, and number of bathrooms.
3. Feature Engineering:

Create new features if necessary: Derive new features that might be useful for analysis or modeling. For example, calculate
the price per square foot/meter by dividing sell price by the area.
Encode categorical variables: If there are categorical variables like 'Location', encode them into numerical values using
techniques such as one-hot encoding or label encoding.
4. Data Transformation:

Normalize or standardize numerical variables if needed: Depending on the algorithm chosen for analysis, scaling numerical
variables might be necessary to ensure all variables contribute equally to the analysis.
Transform skewed variables: If any numerical variables are heavily skewed, consider applying transformations such as
logarithmic or square root transformations to make the distribution more symmetrical.
5. Data Splitting:
If the dataset is intended for predictive modeling, split the data into training and testing sets to evaluate the performance
of the model accurately.
6. Documentation:

Document any changes made to the dataset during the cleaning, exploration, and feature engineering stages. This
documentation helps maintain transparency and reproducibility of the analysis.

By following these steps, we can ensure that the dataset is well-prepared for analysis, leading to more accurate insights and reliable
modeling results.

I consider it reasonable for ChatGPT to provide the above response, as it covers the essential steps required to prepare the
dataset for analysis. However, the response is more general and covers a broader range of data analysis tasks. For the specific
housing dataset provided, the steps I outlined in my initial response are sufficient for the analysis. If I were to rectify it, I would
provide a more concise and focused response tailored to the housing dataset.

Q3. If you are restricted to renting a house, which one or ones will you select, and why?

When considering renting a house, I would consider several factors such as location, rent price, distance from the school, and the
number of rooms and bathrooms. Based on the dataset, I would consider the following criteria:

1. Location: I would prefer a house located in the city center due to the convenience of access to amenities, public
transportation, and proximity to schools and workplaces.

2. Miles from School: I would prefer a house that is 10 miles or less from the school to minimize commuting time and
transportation costs.

3. Rent Price per Month: I would select the house with minimal rent price per month, as it would be more cost-effective and
allow for better budget management.

4. Number of Rooms and Bathrooms: I would prefer the house with as many rooms and bathrooms as needed for my family
size and lifestyle.

In [ ]: # draw a scatter plot between the rent price and the miles from the city center
color_map = {
3: "lightcoral",
2: "lightgreen",
1: "lightblue"
}

fig, ax = plt.subplots(figsize=(10, 5))

housing_new = housing_df[(housing_df['Location'] == "City Center") & (housing_df['Miles (dist. between school and hou

# ax.scatter(one_room['Miles (dist. between school and house)'], one_room['Rent Price per Month'], color='lightcoral
scatter = ax.scatter(housing_new['Miles (dist. between school and house)'],
housing_new['Rent Price per Month'],
c=housing_new["No. of Rooms"].map(color_map))

# show the legends

# Create legend handles and labels
legend_handles = [plt.Line2D([0], [0], marker='o', color='w',
markerfacecolor=color, label=label)
for label, color in color_map.items()]

# Add legend
ax.legend(handles=legend_handles, title="No. of Rooms",bbox_to_anchor=(1.01, 1), loc='upper left')

# change the point with index 785 to gold

plt.scatter(housing_new['Miles (dist. between school and house)'][785],
housing_new['Rent Price per Month'][785],
color='gold', s=100, edgecolor='black')

# Title and labels

ax.set_title('Rent Price per Month vs. Miles from Schools', fontsize=12, fontweight='bold', color='navy')
ax.set_xlabel('Miles from Schools')
ax.set_ylabel('Rent Price per Month')

plt.show()
After making tradoffs between the renting price and the distance from the school, I decided to choose the point with gold color in
the scatter plot, with renting price around 9000 dodllars and distance from the school around 19 miles. This point is located in the
city center, and it has 2 rooms and 1 bathroom, which is the most suitable for me.

In [ ]: housing_new[(housing_new["Rent Price per Month"] <= 10000) & (housing_new["Miles (dist. between school and house)"] <

Out[ ]: No. of No. of Miles (dist. between school and Rent Price per
Area Location Sell Price
Rooms Bathrooms house) Month

City
785 2041 2 1 19 8912 27709264
Center

**Q4. Assuming you have enough funds to purchase a house,

will you opt to continue renting or proceed with a purchase? If renting, which one will you choose? If buying, which one will you
select? Why?**

To evaluate the decision between renting and purchasing a house, I create a new column Sell Rent Ratio by dividing the Sell
Price by the Rent Price per Month then multiplying 12. The Sell Rent Ratio represents the number of years it would
take to pay off the house if the rent price is used to pay off the house. A lower Sell Rent Ratio indicates a better investment
opportunity.

After calculating the Sell Rent Ratio , I filtered the data with criterias below:

1. Sell Rent Ratio less than 50

2. Location is Suburb
3. Miles from School as close to 0 as possible

These criterias are chosen because I would prefer to purchase a house with a good investment opportunity with location being the
suburb and close to the school.

In [ ]: # create a new column indicating selling price / rent price / 12

housing_new = housing_df.copy()
housing_new["Sell Rent Ratio"] = housing_new["Sell Price"] / (housing_new["Rent Price per Month"] * 12)

# draw a scatter plot between the sell rent ratio and the miles from the city center
color_map = {
3: "lightcoral",
2: "lightgreen",
1: "lightblue"
}

fig, ax = plt.subplots(figsize=(10, 5))

housing_new = housing_new[(housing_new['Location'] == "Suburb") & (housing_new['Sell Rent Ratio'] <= 50)]

# ax.scatter(one_room['Miles (dist. between school and house)'], one_room['Rent Price per Month'], color='lightcoral
scatter = ax.scatter(housing_new['Miles (dist. between school and house)'],
housing_new['Sell Rent Ratio'],
c=housing_new["No. of Rooms"].map(color_map))

# show the legends

# Create legend handles and labels
legend_handles = [plt.Line2D([0], [0], marker='o', color='w',
markerfacecolor=color, label=label)
for label, color in color_map.items()]

# change the point with index 928 to gold

plt.scatter(housing_new['Miles (dist. between school and house)'][928],
housing_new['Sell Rent Ratio'][928],
color='gold', s=100, edgecolor='black')

# Add legend
ax.legend(handles=legend_handles, title="No. of Rooms",bbox_to_anchor=(1.01, 1), loc='upper left')

# title and labels

ax.set_title('Sell Rent Ratio vs. Miles from Schools', fontsize=12, fontweight='bold', color='navy')

ax.set_xlabel('Miles from Schools')

ax.set_ylabel('Sell Rent Ratio')

plt.show()
In [ ]: # print the house with index 928
housing_df.iloc[928]

Out[ ]: Area 2158

No. of Rooms 3
No. of Bathrooms 1
Location Suburb
Miles (dist. between school and house) 25
Rent Price per Month 16445
Sell Price 7196311
Name: 928, dtype: object

**Q5. Are there any properties with rent or selling prices that

seem unusually high or low? Why?**

To identify properties with unusually high or low rent or selling prices, I would examine the distribution of rent and selling prices
using box plots. As defined by the Interquartile Range (IQR), any data points that fall below Q1 - 1.5 * IQR or above Q3 + 1.5 * IQR
can be considered as outliers. With showfliers=True , the box plot would show the outliers in the dataset.

As shown in the below box plots, neither the rent price nor the selling price has any outliers, as there are no data points that fall
below Q1 - 1.5 * IQR or above Q3 + 1.5 * IQR.

In [ ]: # show the relationship between the sell price and the location
# as the type of box plots

fig, ax = plt.subplots(1, 2, figsize=(15, 5))

ax[0].boxplot(housing_df['Rent Price per Month'], patch_artist=True,

boxprops=dict(facecolor='lightblue'), showfliers=True)

ax[0].set_title(f'Rent Price per Month', fontsize=12, fontweight='bold', color='navy')

ax[0].set_xticklabels([''])

ax[1].boxplot(housing_df['Sell Price'], patch_artist=True,

boxprops=dict(facecolor='lightblue'), showfliers=True)
ax[1].set_title(f'Sell Price', fontsize=12, fontweight='bold', color='navy')
ax[1].set_xticklabels([''])

plt.show()

House Rent Prediction EDA
No ratings yet
House Rent Prediction EDA
35 pages
Integrating+LLMs+into+AI-Driven+Supply+Chains
No ratings yet
Integrating+LLMs+into+AI-Driven+Supply+Chains
35 pages
cz4041 Project Final Report Nyc Taxi Fare Prediction
0% (1)
cz4041 Project Final Report Nyc Taxi Fare Prediction
18 pages
Anushi Project-House Price Prediction
100% (2)
Anushi Project-House Price Prediction
26 pages
Linear Regression With Python - Part 1
No ratings yet
Linear Regression With Python - Part 1
167 pages
P04 The Regression Pipeline - Preprocessing Ans
No ratings yet
P04 The Regression Pipeline - Preprocessing Ans
19 pages
Setup: Chapter 2 - End-To-End Machine Learning Project
No ratings yet
Setup: Chapter 2 - End-To-End Machine Learning Project
31 pages
House - Price - Prediction
No ratings yet
House - Price - Prediction
16 pages
ML LAB Prob 1 5
No ratings yet
ML LAB Prob 1 5
22 pages
Regression Week 1: Simple Linear Regression Assignment: All Course Content
No ratings yet
Regression Week 1: Simple Linear Regression Assignment: All Course Content
1 page
Data Analysis With Python - Jupyter Notebook
No ratings yet
Data Analysis With Python - Jupyter Notebook
10 pages
Towards Pentesting Automation Using The Metasploit Framework
No ratings yet
Towards Pentesting Automation Using The Metasploit Framework
8 pages
House Pricing Regression
No ratings yet
House Pricing Regression
11 pages
Assignment2 VidulGarg
No ratings yet
Assignment2 VidulGarg
11 pages
Assignement 4
No ratings yet
Assignement 4
6 pages
Tarea - Prediccion de Casas en California
No ratings yet
Tarea - Prediccion de Casas en California
5 pages
Introduction To Machine Learning (ML) With Sklearn
No ratings yet
Introduction To Machine Learning (ML) With Sklearn
10 pages
Housing Prices Notebook
No ratings yet
Housing Prices Notebook
14 pages
00 Data Wrangling
No ratings yet
00 Data Wrangling
10 pages
Machine Learning
No ratings yet
Machine Learning
57 pages
Eda Project
No ratings yet
Eda Project
28 pages
Project PDF
No ratings yet
Project PDF
13 pages
House Price Prediction
No ratings yet
House Price Prediction
1 page
Capstone Project Report
No ratings yet
Capstone Project Report
187 pages
IndianHouses 1695069727
No ratings yet
IndianHouses 1695069727
7 pages
02 End To End Machine Learning Project
No ratings yet
02 End To End Machine Learning Project
26 pages
Data Science Project
No ratings yet
Data Science Project
7 pages
Python Assignment 1.ipynb - Colaboratory
No ratings yet
Python Assignment 1.ipynb - Colaboratory
3 pages
House Price Prediction
No ratings yet
House Price Prediction
14 pages
Predicting Home Prices in Bangalore
No ratings yet
Predicting Home Prices in Bangalore
18 pages
Laboratory Eercise 4.1 - Del Pilar
No ratings yet
Laboratory Eercise 4.1 - Del Pilar
9 pages
Delhi House Price Prediction 1692019997
No ratings yet
Delhi House Price Prediction 1692019997
34 pages
Eda On Housing Data
No ratings yet
Eda On Housing Data
7 pages
Final PDF
No ratings yet
Final PDF
13 pages
House Price Prediction Models
No ratings yet
House Price Prediction Models
16 pages
Unit 5
No ratings yet
Unit 5
20 pages
276 ArticleText 528 1 10 20220523
No ratings yet
276 ArticleText 528 1 10 20220523
20 pages
Machine Learning-2
No ratings yet
Machine Learning-2
87 pages
Kaggle Machine Learning
No ratings yet
Kaggle Machine Learning
6 pages
IE0005 Exercise Solutions 2-6
No ratings yet
IE0005 Exercise Solutions 2-6
84 pages
Speaker Response
No ratings yet
Speaker Response
14 pages
3.detection of Autism Spectrum Disorder in Children Using
No ratings yet
3.detection of Autism Spectrum Disorder in Children Using
16 pages
Exercise3 Solution
No ratings yet
Exercise3 Solution
19 pages
Quantam - Learning - Colaboratory
No ratings yet
Quantam - Learning - Colaboratory
13 pages
Example Project California Data Anaylsis Jupyter Notebook
No ratings yet
Example Project California Data Anaylsis Jupyter Notebook
28 pages
Normialization Dataset
No ratings yet
Normialization Dataset
7 pages
Exercise2 Solution
No ratings yet
Exercise2 Solution
15 pages
Data Science Using R
No ratings yet
Data Science Using R
74 pages
DeepAffinity: Interpretable Deep Learning of Compound-Protein Affinity Through Unified Recurrent and Convolutional Neural Networks
No ratings yet
DeepAffinity: Interpretable Deep Learning of Compound-Protein Affinity Through Unified Recurrent and Convolutional Neural Networks
10 pages
CS3491 - Aiml - Unit Iii Supervised Learning
No ratings yet
CS3491 - Aiml - Unit Iii Supervised Learning
162 pages
Final DA LAB1 Merged
No ratings yet
Final DA LAB1 Merged
48 pages
House Price Prediction: # Importing Necessary Libraries
No ratings yet
House Price Prediction: # Importing Necessary Libraries
18 pages
DMV - 3 - Jupyter Notebook
No ratings yet
DMV - 3 - Jupyter Notebook
2 pages
Week 12
No ratings yet
Week 12
2 pages
UNIT-1 Regression vs. Classification
No ratings yet
UNIT-1 Regression vs. Classification
25 pages
Diabetes Prediction
No ratings yet
Diabetes Prediction
8 pages
Assignment-2: Pandas PD Numpy NP Seaborn Sns Matplotlib - Pyplot PLT
No ratings yet
Assignment-2: Pandas PD Numpy NP Seaborn Sns Matplotlib - Pyplot PLT
14 pages
Hello
No ratings yet
Hello
3 pages
Real Estate Price Prediction Model
No ratings yet
Real Estate Price Prediction Model
33 pages
Data Mining Lab Manual
No ratings yet
Data Mining Lab Manual
71 pages
Q 1
No ratings yet
Q 1
2 pages
R Prerequisite1
No ratings yet
R Prerequisite1
4 pages
Riya - 2412res102@iitp - Ac.in - Ipynb - Colab
No ratings yet
Riya - 2412res102@iitp - Ac.in - Ipynb - Colab
3 pages
大语言模型
No ratings yet
大语言模型
23 pages
ADS Exp3
No ratings yet
ADS Exp3
8 pages
Time Series Stock Price Forecasting Based On Genetic
No ratings yet
Time Series Stock Price Forecasting Based On Genetic
7 pages
Boston Housing Analysis
No ratings yet
Boston Housing Analysis
3 pages
Boston Housing Solutions
No ratings yet
Boston Housing Solutions
3 pages
DL 1
No ratings yet
DL 1
11 pages
Full Download Machine Learning With R, The Tidyverse, and MLR 1st Edition Hefin Ioan Rhys PDF
100% (2)
Full Download Machine Learning With R, The Tidyverse, and MLR 1st Edition Hefin Ioan Rhys PDF
65 pages
Underfitting and Overfitting Slides and Transcript
No ratings yet
Underfitting and Overfitting Slides and Transcript
13 pages
Exp 10
No ratings yet
Exp 10
1 page
Diabetes
No ratings yet
Diabetes
73 pages
Data Cleaning On Melbourne Housing
No ratings yet
Data Cleaning On Melbourne Housing
16 pages
Compte Rendu Data Visualisation
No ratings yet
Compte Rendu Data Visualisation
5 pages
Project
No ratings yet
Project
30 pages
What Are Data Distributions, and Why Are They Important
No ratings yet
What Are Data Distributions, and Why Are They Important
4 pages
Unit 6 Machine Learning Algorithms - AI CBSE
No ratings yet
Unit 6 Machine Learning Algorithms - AI CBSE
1 page
Synthetic Data Generation Pipeline For Private ID Cards Detection
No ratings yet
Synthetic Data Generation Pipeline For Private ID Cards Detection
6 pages
W7-8 - Decision Trees
No ratings yet
W7-8 - Decision Trees
81 pages
Exp - 2-EDA - CaliforniaData Set - HeatMap - PairPlot-checkpoint - Jupyter Notebook
No ratings yet
Exp - 2-EDA - CaliforniaData Set - HeatMap - PairPlot-checkpoint - Jupyter Notebook
12 pages
Shivamani
No ratings yet
Shivamani
63 pages
1 s2.0 S0957417423005274 Main
No ratings yet
1 s2.0 S0957417423005274 Main
17 pages
Love Sikarvar Bca Final Year Project
No ratings yet
Love Sikarvar Bca Final Year Project
45 pages
BCA 5th Sem Lab (ML)
No ratings yet
BCA 5th Sem Lab (ML)
20 pages
Pandas - Jupyter Notebook - 19!7!2025
No ratings yet
Pandas - Jupyter Notebook - 19!7!2025
36 pages
Al Haija2021
No ratings yet
Al Haija2021
5 pages
West Rox
No ratings yet
West Rox
29 pages

Housing Main

Uploaded by

Housing Main

Uploaded by

Homework 1

B10702053 會計三 黃少凱

commencing data analysis?**

First, import the dataset as pandas dataframe.

Area No. of Rooms No. of Bathrooms Location \

In [ ]: # look at any missing values

In [ ]: # draw a pie chart for the number of rooms from 0 to 3

fig, ax = plt.subplots(1, 3, figsize=(15, 5))

rooms = housing_df['No. of Rooms']

custom_colors = ['lightgreen','lightskyblue','lightcoral', 'gold']

ax[0].pie(values, labels=['']*len(keys), autopct='%1.1f%%',

# show the legends

bathrooms = housing_df['No. of Bathrooms']

ax[1].pie(values, labels=['']*len(keys), autopct='%1.1f%%',

# show the legends

# show the categories of location

ax[2].pie(values, labels=['']*len(keys), autopct='%1.1f%%',

# show the legends

miles = housing_df['Miles (dist. between school and house)']

keys = [f"{i.left}-{i.right} miles" for i in keys]

ax.bar(keys, values, width=0.4, color=plt.cm.Set3(np.arange(len(keys))))

# get rid of the top and right spines

fig, ax = plt.subplots(figsize=(20, 5))

rent = housing_df['Rent Price per Month']

keys = [f"${int(i.left / 1000)}K-${int(i.right / 1000)}K" for i in keys]

ax.bar(keys, values, width=0.4, color=plt.cm.tab20c(np.arange(len(keys))))

# get rid of the top and right spines

fig, ax = plt.subplots(figsize=(10, 5))

# show the legends

# Title and labels

fig, ax = plt.subplots(figsize=(10, 5))

# show the legends

# Title and labels

fig, ax = plt.subplots(1, 3, figsize=(15, 5))

for i, loc in enumerate(locations):

ax[i].set_title(f'Sell Price in {loc}', fontsize=12, fontweight='bold', color='navy')

# convert the location to numerical values

fig, ax = plt.subplots(figsize=(10, 5))

fig, ax = plt.subplots(figsize=(10, 5))

# show the legends

# change the point with index 785 to gold

# Title and labels

**Q4. Assuming you have enough funds to purchase a house,

1. Sell Rent Ratio less than 50

In [ ]: # create a new column indicating selling price / rent price / 12

fig, ax = plt.subplots(figsize=(10, 5))

# show the legends

# change the point with index 928 to gold

# title and labels

ax.set_xlabel('Miles from Schools')

Out[ ]: Area 2158

seem unusually high or low? Why?**

fig, ax = plt.subplots(1, 2, figsize=(15, 5))

ax[0].boxplot(housing_df['Rent Price per Month'], patch_artist=True,

ax[0].set_title(f'Rent Price per Month', fontsize=12, fontweight='bold', color='navy')

ax[1].boxplot(housing_df['Sell Price'], patch_artist=True,

You might also like

B10702053 會計三黃少凱