Listing Property
Listing Property
PROPERTY
BY RAMADHAN DWI YANUAR
TABLE OF
CONTENTS
01 02 03
Business Understanding Data Cleaning Descriptive Statistics
We will look at the business Followed by cleaning Then we see the results of its
problem first. irrelevant data. descriptive statistics.
04 05 06
EDA Statistical Measurements Additional Point
Next, we do an Exploratory Finally, we look at the results Just a few extra points.
Data Analytics (EDA). of statistical measurements
using correlation and
regression.
Business
BUSINESS UNDERSTANDING
Understanding
Core Business Problem :
Mrs. Wang, Our Head of Data at ABC Company gives us the task to
perform end-to-end analysis of property listing in Malaysia to
achieve the goal to help customers find the best fit and
maximize company profit by 20% profit sharing mechanism.
Based on that, the higher the price of the property itself, the higher
the company's profit. However, we also need to understand that this is
not entirely absolute, because, in reality, the level of property sales at
high prices will certainly not be as high as properties with relatively
lower prices. So if we play with quantity, in reality, the company will
get more profit from selling property at not too high a price but in
large quantities. so here we will find out which properties are likely to
be purchased by many customers.
01
BUSINESS UNDERSTANDING
CHECK THE DATASET
Before we dive into the Analysis Process, it's best
to understand the original dataset first so you can
better understand the context.
You can click the link below.
Original Dataset
01
HOW DATA PROVIDED COULD HELP ANSWER BUSINESS PROBLEM
BUSINESS UNDERSTANDING
Column A has Location By analyzing this column we can find out which location has the
highest average price to determine how much our company's profit is.
Column B has Prices By analyzing this column we can find out the average price as a whole to
determine how much our company's profit is.
Column C and D has Rooms and Bathrooms by analyzing this column we can find out
which properties are suitable for customers depending on the number of their family members.
Column E has car parks by analyzing this column we can find out which property is suitabl e
for customers depending on the number of vehicles they have.
Columns F and G have property type and character properties. By analyzing these
columns we can find out which property type and character property have the highest value.
Column H has the size property. By analyzing this column, we can find out which property
has the most suitable area and price for the customer.
Column I has furnishing. By analyzing this column, we can find out which properties already
have furniture and which do not, so we can adjust to customer needs.
01
BUSINESS UNDERSTANDING
STATISTICS THAT CAN ANSWER BUSINESS PROBLEMS
01
DATA CLEANING
DATA CLEANING
After we know the existing business problems,
then we clean up data that will interfere with our
next data analysis process. The process of
cleaning the data includes formatting, deleting
unnecessary values, filling in blank values,
replacing incorrect values, removing outliers,
etc.
02
DATA CLEANING STEPS
DATA CLEANING
Delete data that has a Removing Outliers in Remove unnecessary
null value the words in
in the Price column. Price column. 01 the Location and Price
columns.
03 02
04 05
Replace the data that has a null Replace data that has a
value in the Rooms column with the null value in the
number 0, and the one with the Bathrooms and Car Parks 02
value Studio becomes 1. column to 0.
DATA CLEANING STEPS
DATA CLEANING
Delete data that has null Delete unnecessary Delete data that has a
value and value sq. m in words in null
the Property Character the Property Type 06 value in the Property
column. column. Type
column.
08 07
09 10
Convert data that has a non sqft value
Delete data that has to sqft so that it is same like the others.
irrelevant values in the Size column. followed by formatting the number to
provide a thousands separator for all 02
numeric data.
DESCRIPTIVE
STATISTICS
In this analysis, the key variable that must
beexamined is the Price Column and SizeColumn.
because when we return to referring tothe
business problem, what the company wantsto
know is which property will provide highprofits for
the company also to know which property is best
fit for customers, so the Price Column and Size
03 Column will be the answer.
DESCRIPTIVE STATISTICS
STATISTICAL MEASUREMENT TO KNOW DATA DISTRIBUTION - (PRICE COLUMN)
03
DESCRIPTIVE STATISTICS
STATISTICAL MEASUREMENT TO KNOW DATA DISTRIBUTION - (SIZE COLUMN)
03
EXPLORATORY DATA
ANALYSIS
Furthermore, after we do a little descriptive
statistics on the key column, we continue to see a
pattern, anomaly or interesting things in the
other columns by conducting Exploratory Data
Analysis (EDA). This time, our EDA will be
divided into two parts, namely to look at
properties in the Luxury category and
properties in the Affordable category.
04
Quartile
Next, we will immediately see some interesting insights on each column of the two
categories above so we can see the characteristics of each of these categories.
From the cumulative results of property prices per location, it can be seen that the TOP 5 of the
properties that dominate more than 80% of all properties are those located in KLCC, Mont Kiara, Desa
ParkCity, Bangsar ,and Damansara Heights.
From here we can also see that the property with the highest number is located in Mount
Kirana, there are 210 units with an average price of RM. 4 million.
From this we can also see that Condominiums are the most common property type in the luxury category with
a total of 363 units and an average price of RM. 4.2 Million. Then here is what is interesting about the Residential
Land type property because it has avery high average price compared to other types, which is worth RM. 17
Million and for the least type is a Townhouse with only 2 units with an average price of RM. 3.2 Million.
We can see from the table above that there are 20 locations and 6 types where property prices are
cheap but are already in the luxury property category with prices at RM. 2.5 Million.
From the table above we can get information that the units that have the most expensive
prices are the units located on the Pantai and Brickfields and also the Residential Land type
and also the Bungalows. The most interesting thing is that the highest price is at RM. 130
million, which is very, very far when compared to the lowest price in this luxury category,
which is RM. 2.5 Million.
From here we can also see that the property with the highest number is located in KLLC, there are 272
units with an average price of RM. 1 million.
From this we can also see that Condominiums are the most common property type in the
Affordable category with a total of 1652 units and an average price of RM. 1 Million. About the
Condominium is that this type of property actually reaches 50% of the total unit
and that number is very large compared to other types and some even have only 4 unit or
0.16% of the total of this category.
We can see from the two tables above that the units in the affordable category are in 12
locations with 3 types of units, this is the unit with the lowest price in this category, which
is RM. 720 thousand.
From the table above we can get information that the units that have the most expensive
prices are the units located in 16 location and 5 Type above. The highest price in this
category is RM. 1.3 Million.
LUXURY AFFORDABLE
PROPERTY PROPERTY
04
EXPLORATORY DATA ANALYSIS
CHARACTERISTICS
If we look at the two tables above, we can see the different characteristics of Luxury Property and
Affordable Property namely:
The difference in the average price for luxury is RM. 5.3 Million while for affordable the average price
is RM. 1 million.
Then the difference in the average number of rooms if in luxury there are 5 rooms then in affordable
there are 3 rooms.
Next is the difference in the average number of bathrooms where in luxury there are 4 bathrooms
while in affordable there are only 2 bathrooms.
Especially for car parks in the two categories, there is no significant difference in the average
number of car parks because they are the same in number 1.
Furthermore, the difference in average size for luxury is 5,883 sq.ft while for affordable it is only
1,533 sq.ft. 04
EXPLORATORY DATA ANALYSIS
LUXURY PROPERTY AFFORDABLE PROPERTY
Has a price between RM. 2.5Million Has a price between RM. 720,000
up to RM. 130 Million. up to RM. 1.3 Million.
Has a size between 815 sq.ft Has a size between 473 sq.ft to
to246.898 sq.ft. 100,000 sq.ft.
Has an average of 5 rooms, Has an average of 3 rooms, 2
4bathrooms, and 1 car park. bathrooms, and 1 car park.
04
STATISTICAL
MEASUREMENTS
AFTER WE FOUND SOME INTERESTING THINGS IN THE
PREVIOUS EDA PROCESS AND PRODUCED THE
CHARACTERISTICS OF EACH PROPERTY CATEGORY WHICH
WILL BE USEFUL WHEN RECOMMENDING WHICH UNITS ARE
SUITABLE FOR OUR CUSTOMERS, THEN WE WILL CARRY OUT
SOME ADVANCED STATISTICAL CALCULATIONS THAT WILL BE
USEFUL FOR THE COMPANY AND CONTINUE WITH UNIT
SELECTION ACCORDING TO THE REQUEST OF ONE OF OUR
CUSTOMERS. IN THIS PROCESS WE DO NOT USE ALL DATA,
BUT ONLY PROPERTY DATA FOR DESAPARK CITY.
CORRELATION
ANALYSIS
STATISTICAL MEASUREMENTS
CORRELATION ANALYSIS
We can see in the correlation matrix table, size has the strongest relation (strong
positive correlation) on pricing property (0.79)
Car park has the lowest relation with property prices compared to the other variables,
although it still has a strong positive correlation.
Multicollinearity occurs, because between independent variables that have a strong
correlation of more than 0.7, namely the correlation between Bathrooms and Rooms
(0.82), as well as Bathrooms and Size (0.71). Therefore, it is necessary to choose one of
them not to be included in the model by performing regression analysis
But in this case, we chose to perform a stepwise regression analysis by including all
independent variables first
05
REGRESSION
ANALYSIS
STATISTICAL MEASUREMENTS
REGRESSION ANALYSIS
Insight:
Criteria of Testing
Hypothesis Testing
Reject HO if p-value <= 0.05, Accept HO if p-value >= 0.05
• HO: No linear relationship between dependent and independent variable
Partial Test p-value < 0.05 = Bathooms, Car Parks, and Size (Reject H0)
• H1: linear relationship between dependent and independent variable
p-value>0.05 = Rooms (Accept H0)
Summary output
• Adjusted R-square of the linear regression model is 95.42%, which means we have a good regression model
• Regression model has a Significance F (0) < 5%, but the rooms variable has a p-value> 0.05 which means it does not have a linear
relationship (not significant). Therefore, we need to remove the independent variable rooms and perform stepwise regression
• Constant (intercept) is made zero because there are independent variables that have a p-value> 5%
• Equation Price = 0+(36314.99144*rooms) + (169673.9105*bathrooms)+(121598.7239*car parks) + (456.7218771*size)
05
STATISTICAL MEASUREMENTS
REGRESSION ANALYSIS
Insight:
Criteria of Testing
Hypothesis Testing
Reject HO if p-value <= 0.05, Accept HO if p-value >= 0.05
• HO: No linear relationship between dependent and independent variable
Partial Test p-value < 0.05 = Bathooms, Car Parks, and Size (Reject H0)
• H1: linear relationship between dependent and independent variable
p-value>0.05 = Rooms (Accept H0)
Summary output
• Adjusted R-square value increased slightly to 95.43%, which means we still have a good regression model
• Regression model has a Significance F (0) < 5% Constant (intercept) is made zero because the independent variables in the model
make more sense
• Bathrooms has the biggest impact on the price of the property
05
STATISTICAL MEASUREMENTS
ASSUMPTION CHECK
Here we check some of the regression data to see whether the regression model we have created can
be recognized for its effectiveness in our analysis process.
P - Value
R-S quared 0.96 Bathrooms 0.00
Significance F 0 Car Parks 0.00
Size
If we look at the R-Square and 0.00
Significance F values above, we can
assume that the model we made is Based on the p-value table above
good because it means that by using we can conclude that the values
the 3 independent variables that of the three independent variables
exist, we can explain 96% of the price are really significant to the
variable. also the significance of F = 0 dependent variable. which means
and < alpha = 0.5 (5%) will further bathrooms, car parks and sizes
support that our model is good. have a significant effect on the
increase or decrease in the price
of the property itself. 05
STATISTICAL MEASUREMENTS
Price Offer for Customer and Recommendation
05
STATISTICAL MEASUREMENTS
Price Offer for Customer and Recommendation
Price offer for XX
A Customer need property with requirements 3 Rooms, 4 Bathrooms, 3 Carparks and 2200 Sq ft size so we can
input it into formula below.
2200000 5 4 3 2200
05
Price Offer for Customer and Recommendation
STATISTICAL MEASUREMENTS
ADDITIONAL POINT
If we refer back to our company's business
problems, namely regarding the profit sharing
mechanism, the recommendation that we can give
to companies is to be able to use the regression
model that we have created to determine the units
according to customer requests because it has
been adjusted so that it will be able to maximize
company profits. It would be much better to sell
units that are affordable but in large quantities and
continuously than to sell units that are luxury but
only a few times.
EMAIL ADDRESS
[email protected]
15–15