0% found this document useful (0 votes)
12 views16 pages

MiniProject BI

Uploaded by

Anushka Shingade
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
12 views16 pages

MiniProject BI

Uploaded by

Anushka Shingade
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 16

Savitribai Phule Pune University

PUNE INSTITUTE OF COMPUTER TECHNOLOGY


2024-2025
DEPARTMENT OF COMPUTER ENGINEERING

A REPORT ON

EVALUATION OF HOUSING PRICES IN


CALIFORNIA

B.E. (COMPUTER ENGINEERING)


BUSINESS INTELLIGENCE

SUBMITTED BY

Manasi Mahadev Raut


Roll No: 41468

UNDER THE GUIDANCE OF


Prof. R. R. JADHAV
Text(0, 0.5, 'Count')

In [ ]:

missing

Out[ ]:

total_bedrooms 207
ocean_proximity 0
median_house_value 0
median_income 0
households 0
population 0
total_rooms 0
housing_median_age 0
latitude 0
longitude 0
dtype: int64

In above output, We can clearly see that, There is only one value (total_bedrooms) that has null values. So
we have to fill some statastical values.

Filling Missing Values

In [ ]:

data[[ 'total_bedrooms']] .describe(include='all')


Out[ ]:

total_bedrooms
total_bedroo s

count 20433.000000

mean
ean 537.870553

std 421.385070

min
in 1.000000

25% 296.000000

50% 435.000000

75% 647.000000

max
ax 6445.000000

As we can see there is only one feature that has categorical values and rest all have numerical features.
Data Visualisation
In [ ]:

import plotly.express as ex
ex.pie(data,names='ocean_proximity',title='Proportion of Locations of the house w.r.t oce
an/sea')

We can see here that majority of the houses are close to the sea.

In [ ]:

from pandas.plotting import scatter_matrix


sct_features = ["median_house_value", "median_income","total_rooms","housing_median_age"]
scatter_matrix(data[sct_features], figsize=(12 ,8))
Out[ ]:

array([[<matplotlib.axes._subplots.AxesSubplot object at 0x7fa4179d4990>,


<matplotlib.axes._subplots.AxesSubplot object at 0x7fa41795f9d0>,
<matplotlib.axes._subplots.AxesSubplot object at 0x7fa41790a690>,
<matplotlib.axes._subplots.AxesSubplot object at 0x7fa4178d4710>],
[<matplotlib.axes._subplots.AxesSubplot object at 0x7fa41784ad90>,
<matplotlib.axes._subplots.AxesSubplot object at 0x7fa41780c450>,
<matplotlib.axes._subplots.AxesSubplot object at 0x7fa4177c3ad0>,
<matplotlib.axes._subplots.AxesSubplot object at 0x7fa4177f0790>],
[<matplotlib.axes._subplots.AxesSubplot object at 0x7fa4177bb850>,
<matplotlib.axes._subplots.AxesSubplot object at 0x7fa41776fed0>,
<matplotlib.axes._subplots.AxesSubplot object at 0x7fa417731550>,
<matplotlib.axes._subplots.AxesSubplot object at 0x7fa4176e9bd0>],
[<matplotlib.axes._subplots.AxesSubplot object at 0x7fa4176a9290>,
<matplotlib.axes._subplots.AxesSubplot object at 0x7fa417661910>,
<matplotlib.axes._subplots.AxesSubplot object at 0x7fa417617f90>,
<matplotlib.axes._subplots.AxesSubplot object at 0x7fa4175d9650>]],
dtype=object)
This visualization gives us an idea about the relation of median_house_value with other attributes.

In [ ]:

from plotly.subplots import make_subplots


import plotly.graph_objs as go
fig = make_subplots(1,2)
fig.add_trace(go.Histogram(x=data['median_house_value']),1,1)
fig.add_trace(go.Box(y=data['median_house_value'],boxpoints='all',line_color='orange'), 1,
2)
fig.update_layout(height=500, showlegend=False,title_text="Median income distribution and
Box Plot")
plt.legend()
plt.show()

It helps you to know the density by overlapping the circle if this area have a lot of house there, also other
area that have small number of house will be appear because of its opacity will be low.
Not just alpha, other parameters of the graph can help you discover more pattern

In [ ]:

data.plot(kind="scatter", x="longitude", y = "latitude", alpha=. 05)


Out[ ]:

<matplotlib.axes._subplots.AxesSubplot at 0x7fa417531b50>

In [ ]:
From the above plot we can see that both features follow a multimodal distribution, meaning we have
underlaying groups in our data.

Conclusion
The analysis gives us a detailed visualisation of the Median Housing Prices in California. We noticed that the
major cities in California have a huge difference in median house price with respect to the inland counties. Most
of the high price properties are located in promixity to the ocean. The age of the houses are also striking, which
the analysis concludes that the median age hovers around 30-40 years.

This detailed analysis will help advertisers to deliver targeted ads.

You might also like