0% found this document useful (0 votes)
19 views40 pages

Listing Property

This document outlines steps for analyzing a property listing dataset to help a company maximize profits. It begins with understanding the business problem, then cleaning the data by removing outliers and irrelevant values. Descriptive statistics on the price and size columns show distributions across properties. Exploratory data analysis will examine patterns in columns for luxury properties and popular areas to identify which may sell in large quantities and yield high profits.

Uploaded by

ramadhan lazrd
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
19 views40 pages

Listing Property

This document outlines steps for analyzing a property listing dataset to help a company maximize profits. It begins with understanding the business problem, then cleaning the data by removing outliers and irrelevant values. Descriptive statistics on the price and size columns show distributions across properties. Exploratory data analysis will examine patterns in columns for luxury properties and popular areas to identify which may sell in large quantities and yield high profits.

Uploaded by

ramadhan lazrd
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 40

LISTING

PROPERTY
BY RAMADHAN DWI YANUAR
TABLE OF
CONTENTS
01 02 03
Business Understanding Data Cleaning Descriptive Statistics
We will look at the business Followed by cleaning Then we see the results of its
problem first. irrelevant data. descriptive statistics.

04 05 06
EDA Statistical Measurements Additional Point
Next, we do an Exploratory Finally, we look at the results Just a few extra points.
Data Analytics (EDA). of statistical measurements
using correlation and
regression.
Business

BUSINESS UNDERSTANDING
Understanding
Core Business Problem :
Mrs. Wang, Our Head of Data at ABC Company gives us the task to
perform end-to-end analysis of property listing in Malaysia to
achieve the goal to help customers find the best fit and
maximize company profit by 20% profit sharing mechanism.
Based on that, the higher the price of the property itself, the higher
the company's profit. However, we also need to understand that this is
not entirely absolute, because, in reality, the level of property sales at
high prices will certainly not be as high as properties with relatively
lower prices. So if we play with quantity, in reality, the company will
get more profit from selling property at not too high a price but in
large quantities. so here we will find out which properties are likely to
be purchased by many customers.

01
BUSINESS UNDERSTANDING
CHECK THE DATASET
Before we dive into the Analysis Process, it's best
to understand the original dataset first so you can
better understand the context.
You can click the link below.

Original Dataset

01
HOW DATA PROVIDED COULD HELP ANSWER BUSINESS PROBLEM

BUSINESS UNDERSTANDING
Column A has Location By analyzing this column we can find out which location has the
highest average price to determine how much our company's profit is.
Column B has Prices By analyzing this column we can find out the average price as a whole to
determine how much our company's profit is.
Column C and D has Rooms and Bathrooms by analyzing this column we can find out
which properties are suitable for customers depending on the number of their family members.
Column E has car parks by analyzing this column we can find out which property is suitabl e
for customers depending on the number of vehicles they have.
Columns F and G have property type and character properties. By analyzing these
columns we can find out which property type and character property have the highest value.
Column H has the size property. By analyzing this column, we can find out which property
has the most suitable area and price for the customer.
Column I has furnishing. By analyzing this column, we can find out which properties already
have furniture and which do not, so we can adjust to customer needs.

01
BUSINESS UNDERSTANDING
STATISTICS THAT CAN ANSWER BUSINESS PROBLEMS

We use Central Tendency to see which properties are


suitable for each customer based on the needs of each
requested feature.
We use the Minimum and Maximum Values to see
which properties have the lowest and highest profits
for the company, of course, also adjusted for the level
of sales so that the data is more reliable.

01
DATA CLEANING

DATA CLEANING
After we know the existing business problems,
then we clean up data that will interfere with our
next data analysis process. The process of
cleaning the data includes formatting, deleting
unnecessary values, filling in blank values,
replacing incorrect values, removing outliers,
etc.

02
DATA CLEANING STEPS

DATA CLEANING
Delete data that has a Removing Outliers in Remove unnecessary
null value the words in
in the Price column. Price column. 01 the Location and Price
columns.

03 02

04 05

Replace the data that has a null Replace data that has a
value in the Rooms column with the null value in the
number 0, and the one with the Bathrooms and Car Parks 02
value Studio becomes 1. column to 0.
DATA CLEANING STEPS

DATA CLEANING
Delete data that has null Delete unnecessary Delete data that has a
value and value sq. m in words in null
the Property Character the Property Type 06 value in the Property
column. column. Type
column.
08 07

09 10
Convert data that has a non sqft value
Delete data that has to sqft so that it is same like the others.
irrelevant values in the Size column. followed by formatting the number to
provide a thousands separator for all 02
numeric data.
DESCRIPTIVE
STATISTICS
In this analysis, the key variable that must
beexamined is the Price Column and SizeColumn.
because when we return to referring tothe
business problem, what the company wantsto
know is which property will provide highprofits for
the company also to know which property is best
fit for customers, so the Price Column and Size
03 Column will be the answer.
DESCRIPTIVE STATISTICS
STATISTICAL MEASUREMENT TO KNOW DATA DISTRIBUTION - (PRICE COLUMN)

The smallest or cheapest value that exists is RM. 1150


andthe highest or most expensive is RM. 130 Million,
this makes the price range quite varied so that it will
make it easier for customers to choose a property
according to their budget and needs. Then for the
average price of all data is RM. 2.1 million and for the
majority of the existing prices, namely RM. 1.2 Million.
The Positive Skewness value is proven by the median
andmode numbers which are smaller than the mean,
and with this we can see that there are more
properties with relatively low prices than properties
with high prices.

03
DESCRIPTIVE STATISTICS
STATISTICAL MEASUREMENT TO KNOW DATA DISTRIBUTION - (SIZE COLUMN)

The smallest value that exists is 17 sqftand the highest


is 790.000 sqft, This makes the wide variety of
properties moreand more so that it will make it easier
for customers to choose properties according to their
needs. Then for the average size of all data is 2.849 sqft
and for the majority of the existing sizes, namely 1.650
sqft.
The Positive Skewness value is proven by the median
andmode numbers which are smaller than the mean,
and with this wecan see that there are more properties
with relatively smalsizes than properties with big sizes.

03
EXPLORATORY DATA
ANALYSIS
Furthermore, after we do a little descriptive
statistics on the key column, we continue to see a
pattern, anomaly or interesting things in the
other columns by conducting Exploratory Data
Analysis (EDA). This time, our EDA will be
divided into two parts, namely to look at
properties in the Luxury category and
properties in the Affordable category.

04
Quartile

In dividing these two categories, we use the price


column. Property that has a price between Q3-Q4
will be considered a Luxury Property and that
has a price between Q1-Q2 will be considered an
Affordable Property.

Next, we will immediately see some interesting insights on each column of the two
categories above so we can see the characteristics of each of these categories.

EXPLORATORY DATA ANALYSIS


04
04

EXPLORATORY DATA ANALYSIS


PROPERTY
LUXURY
LUXURY PROPERTY

From the cumulative results of property prices per location, it can be seen that the TOP 5 of the
properties that dominate more than 80% of all properties are those located in KLCC, Mont Kiara, Desa
ParkCity, Bangsar ,and Damansara Heights.

From here we can also see that the property with the highest number is located in Mount
Kirana, there are 210 units with an average price of RM. 4 million.

EXPLORATORY DATA ANALYSIS


04
LUXURY PROPERTY
From the cumulative sum here,
we can see that 80% of the
properties in the luxury category
are Condominium, Bungalow,
Serviced Residence and Semi-
Detached House types.

From this we can also see that Condominiums are the most common property type in the luxury category with
a total of 363 units and an average price of RM. 4.2 Million. Then here is what is interesting about the Residential
Land type property because it has avery high average price compared to other types, which is worth RM. 17
Million and for the least type is a Townhouse with only 2 units with an average price of RM. 3.2 Million.

EXPLORATORY DATA ANALYSIS


04
LUXURY PROPERTY MIN PRICE
These are some of the
locations that have
luxury properties but
still have relatively
low prices (it was in
Q3).

We can see from the table above that there are 20 locations and 6 types where property prices are
cheap but are already in the luxury property category with prices at RM. 2.5 Million.

EXPLORATORY DATA ANALYSIS


04
LUXURY PROPERTY MAX PRICE

Here we can see the location and


also the type of property in the
luxury category which has the
highest price compared to the
others.

From the table above we can get information that the units that have the most expensive
prices are the units located on the Pantai and Brickfields and also the Residential Land type
and also the Bungalows. The most interesting thing is that the highest price is at RM. 130
million, which is very, very far when compared to the lowest price in this luxury category,
which is RM. 2.5 Million.

EXPLORATORY DATA ANALYSIS


04
04

EXPLORATORY DATA ANALYSIS


AFFORDABLE
PROPERTY
AFFORDABLE PROPERTY
From the cumulative results of
property prices per location, it
can be seen that the TOP 5 of the
properties that dominate more
than 80% of all properties are
those located in KLCC, Mont
Kiara, DesaParkCity, Bukit
jalil,and,cheras .

From here we can also see that the property with the highest number is located in KLLC, there are 272
units with an average price of RM. 1 million.

EXPLORATORY DATA ANALYSIS


04
AFFORDABLE PROPERTY
From the cumulative sum
here, we can see that 80% of
the properties in the
Affordable category are
Condominium and Serviced
Residence Detached House
types.

From this we can also see that Condominiums are the most common property type in the
Affordable category with a total of 1652 units and an average price of RM. 1 Million. About the
Condominium is that this type of property actually reaches 50% of the total unit
and that number is very large compared to other types and some even have only 4 unit or
0.16% of the total of this category.

EXPLORATORY DATA ANALYSIS


04
AFFORDABLE PROPERTY MIN PRICE
There are 2 types of
These are some of the units that have the
locations with the lowest prices (right at
cheapest unit prices Q1), namely
in this category. (right Condominium,
on Q1). Serviced Residence,
and Terrace types.

We can see from the two tables above that the units in the affordable category are in 12
locations with 3 types of units, this is the unit with the lowest price in this category, which
is RM. 720 thousand.

EXPLORATORY DATA ANALYSIS


04
AFFORDABLE PROPERTY MAX PRICE

Here we can see the location and


also the type of property in the
affordable category which has the
highest price compared to the
others.

From the table above we can get information that the units that have the most expensive
prices are the units located in 16 location and 5 Type above. The highest price in this
category is RM. 1.3 Million.

EXPLORATORY DATA ANALYSIS


04
EXPLORATORY DATA ANALYSIS
VS

LUXURY AFFORDABLE
PROPERTY PROPERTY
04
EXPLORATORY DATA ANALYSIS
CHARACTERISTICS

If we look at the two tables above, we can see the different characteristics of Luxury Property and
Affordable Property namely:
The difference in the average price for luxury is RM. 5.3 Million while for affordable the average price
is RM. 1 million.
Then the difference in the average number of rooms if in luxury there are 5 rooms then in affordable
there are 3 rooms.
Next is the difference in the average number of bathrooms where in luxury there are 4 bathrooms
while in affordable there are only 2 bathrooms.
Especially for car parks in the two categories, there is no significant difference in the average
number of car parks because they are the same in number 1.
Furthermore, the difference in average size for luxury is 5,883 sq.ft while for affordable it is only
1,533 sq.ft. 04
EXPLORATORY DATA ANALYSIS
LUXURY PROPERTY AFFORDABLE PROPERTY

Has a price between RM. 2.5Million Has a price between RM. 720,000
up to RM. 130 Million. up to RM. 1.3 Million.
Has a size between 815 sq.ft Has a size between 473 sq.ft to
to246.898 sq.ft. 100,000 sq.ft.
Has an average of 5 rooms, Has an average of 3 rooms, 2
4bathrooms, and 1 car park. bathrooms, and 1 car park.
04
STATISTICAL
MEASUREMENTS
AFTER WE FOUND SOME INTERESTING THINGS IN THE
PREVIOUS EDA PROCESS AND PRODUCED THE
CHARACTERISTICS OF EACH PROPERTY CATEGORY WHICH
WILL BE USEFUL WHEN RECOMMENDING WHICH UNITS ARE
SUITABLE FOR OUR CUSTOMERS, THEN WE WILL CARRY OUT
SOME ADVANCED STATISTICAL CALCULATIONS THAT WILL BE
USEFUL FOR THE COMPANY AND CONTINUE WITH UNIT
SELECTION ACCORDING TO THE REQUEST OF ONE OF OUR
CUSTOMERS. IN THIS PROCESS WE DO NOT USE ALL DATA,
BUT ONLY PROPERTY DATA FOR DESAPARK CITY.
CORRELATION
ANALYSIS
STATISTICAL MEASUREMENTS
CORRELATION ANALYSIS

We can see in the correlation matrix table, size has the strongest relation (strong
positive correlation) on pricing property (0.79)
Car park has the lowest relation with property prices compared to the other variables,
although it still has a strong positive correlation.
Multicollinearity occurs, because between independent variables that have a strong
correlation of more than 0.7, namely the correlation between Bathrooms and Rooms
(0.82), as well as Bathrooms and Size (0.71). Therefore, it is necessary to choose one of
them not to be included in the model by performing regression analysis
But in this case, we chose to perform a stepwise regression analysis by including all
independent variables first
05
REGRESSION
ANALYSIS
STATISTICAL MEASUREMENTS
REGRESSION ANALYSIS

Insight:
Criteria of Testing
Hypothesis Testing
Reject HO if p-value <= 0.05, Accept HO if p-value >= 0.05
• HO: No linear relationship between dependent and independent variable
Partial Test p-value < 0.05 = Bathooms, Car Parks, and Size (Reject H0)
• H1: linear relationship between dependent and independent variable
p-value>0.05 = Rooms (Accept H0)

Summary output
• Adjusted R-square of the linear regression model is 95.42%, which means we have a good regression model
• Regression model has a Significance F (0) < 5%, but the rooms variable has a p-value> 0.05 which means it does not have a linear
relationship (not significant). Therefore, we need to remove the independent variable rooms and perform stepwise regression
• Constant (intercept) is made zero because there are independent variables that have a p-value> 5%
• Equation Price = 0+(36314.99144*rooms) + (169673.9105*bathrooms)+(121598.7239*car parks) + (456.7218771*size)

05
STATISTICAL MEASUREMENTS
REGRESSION ANALYSIS

Insight:
Criteria of Testing
Hypothesis Testing
Reject HO if p-value <= 0.05, Accept HO if p-value >= 0.05
• HO: No linear relationship between dependent and independent variable
Partial Test p-value < 0.05 = Bathooms, Car Parks, and Size (Reject H0)
• H1: linear relationship between dependent and independent variable
p-value>0.05 = Rooms (Accept H0)

Summary output
• Adjusted R-square value increased slightly to 95.43%, which means we still have a good regression model
• Regression model has a Significance F (0) < 5% Constant (intercept) is made zero because the independent variables in the model
make more sense
• Bathrooms has the biggest impact on the price of the property

05
STATISTICAL MEASUREMENTS
ASSUMPTION CHECK
Here we check some of the regression data to see whether the regression model we have created can
be recognized for its effectiveness in our analysis process.

P - Value
R-S quared 0.96 Bathrooms 0.00
Significance F 0 Car Parks 0.00
Size
If we look at the R-Square and 0.00
Significance F values above, we can
assume that the model we made is Based on the p-value table above
good because it means that by using we can conclude that the values
the 3 independent variables that of the three independent variables
exist, we can explain 96% of the price are really significant to the
variable. also the significance of F = 0 dependent variable. which means
and < alpha = 0.5 (5%) will further bathrooms, car parks and sizes
support that our model is good. have a significant effect on the
increase or decrease in the price
of the property itself. 05
STATISTICAL MEASUREMENTS
Price Offer for Customer and Recommendation

After we have succeeded in finding a regression


model that is strong enough, then we will then
perform a price calculation for a unit based on the
wishes of one of our customers and after that we
will provide recommendations for which unit is
suitable for the customer to buy.

05
STATISTICAL MEASUREMENTS
Price Offer for Customer and Recommendation
Price offer for XX
A Customer need property with requirements 3 Rooms, 4 Bathrooms, 3 Carparks and 2200 Sq ft size so we can
input it into formula below.

Price = 0 + 190019.0*Bathrooms + 139184.8*Car Parks + 469.8*Size


Price = 0 + 190019.0*4 + 139184.8*3 + 469.8*2200
Price = 2211190.4
Note : I do not enter the room variable here because in this process the modeling used is the last
model after dropping the room variable.

The prices listed above refer to the


coefficient table on the left in
accordance with the final results of
our regression model. 05
STATISTICAL MEASUREMENTS
Price Offer for Customer and Recommendation
Price offer for XX
A Customer need property with requirements 3 Rooms, 4 Bathrooms, 3 Carparks and 2200 Sq ft size so we can
input it into formula below.

Price = 0 + 190019.0*Bathrooms + 139184.8*Car Parks + 469.8*Size


Price = 0 + 190019.0*4 + 139184.8*3 + 469.8*2200
Price = 2211190.4
Note : I do not enter the room variable here because in this process the modeling used is the last
model after dropping the room variable.

The prices listed above refer to the


coefficient table on the left in
accordance with the final results of
our regression model. 05
STATISTICAL MEASUREMENTS
Price Offer for Customer and Recommendation

Price Offer for Customer and Recommendation


Recommendation
Property recommendations based on customer needs and also from the results of our data
modeling formula are properties in the dataset (DesaPark City) Number 30 where the values
are as below.

Price Rooms Bathrooms Car Parks Size

2200000 5 4 3 2200

So that is the result of calculating which unit is suitable according to


customer requests using the regression model that we have created.

05
Price Offer for Customer and Recommendation

STATISTICAL MEASUREMENTS
ADDITIONAL POINT
If we refer back to our company's business
problems, namely regarding the profit sharing
mechanism, the recommendation that we can give
to companies is to be able to use the regression
model that we have created to determine the units
according to customer requests because it has
been adjusted so that it will be able to maximize
company profits. It would be much better to sell
units that are affordable but in large quantities and
continuously than to sell units that are luxury but
only a few times.

Final Data sheet 05


REAL ESTATE LISTINGS PRESENTATION
Thanks
If you have any questions, feel free
to contact me!

EMAIL ADDRESS
[email protected]

15–15

You might also like