MiniProject BI
MiniProject BI
A REPORT ON
SUBMITTED BY
In [ ]:
missing
Out[ ]:
total_bedrooms 207
ocean_proximity 0
median_house_value 0
median_income 0
households 0
population 0
total_rooms 0
housing_median_age 0
latitude 0
longitude 0
dtype: int64
In above output, We can clearly see that, There is only one value (total_bedrooms) that has null values. So
we have to fill some statastical values.
In [ ]:
total_bedrooms
total_bedroo s
count 20433.000000
mean
ean 537.870553
std 421.385070
min
in 1.000000
25% 296.000000
50% 435.000000
75% 647.000000
max
ax 6445.000000
As we can see there is only one feature that has categorical values and rest all have numerical features.
Data Visualisation
In [ ]:
import plotly.express as ex
ex.pie(data,names='ocean_proximity',title='Proportion of Locations of the house w.r.t oce
an/sea')
We can see here that majority of the houses are close to the sea.
In [ ]:
In [ ]:
It helps you to know the density by overlapping the circle if this area have a lot of house there, also other
area that have small number of house will be appear because of its opacity will be low.
Not just alpha, other parameters of the graph can help you discover more pattern
In [ ]:
<matplotlib.axes._subplots.AxesSubplot at 0x7fa417531b50>
In [ ]:
From the above plot we can see that both features follow a multimodal distribution, meaning we have
underlaying groups in our data.
Conclusion
The analysis gives us a detailed visualisation of the Median Housing Prices in California. We noticed that the
major cities in California have a huge difference in median house price with respect to the inland counties. Most
of the high price properties are located in promixity to the ocean. The age of the houses are also striking, which
the analysis concludes that the median age hovers around 30-40 years.