Image-Based Appraisal of Real Estate Properties

Download as pdf or txt
Download as pdf or txt
You are on page 1of 9

IEEE TRANSACTIONS ON MULTIMEDIA, VOL. 19, NO.

12, DECEMBER 2017 2751

Image-Based Appraisal of Real Estate Properties


Quanzeng You , Ran Pang, Liangliang Cao, and Jiebo Luo, Fellow, IEEE

Abstract—Real estate appraisal, which is the process of


estimating the price for real estate properties, is crucial for both
buyers and sellers as the basis for negotiation and transaction.
Traditionally, the repeat sales model has been widely adopted
to estimate real estate prices. However, it depends on the design
and calculation of a complex economic-related index, which is
challenging to estimate accurately. Today, real estate brokers
provide easy access to detailed online information on real estate
properties to their clients. We are interested in estimating the
real estate price from these large amounts of easily accessed data.
In particular, we analyze the prediction power of online house
pictures, which is one of the key factors for online users to make
a potential visiting decision. The development of robust computer
vision algorithms makes the analysis of visual content possible.
In this paper, we employ a recurrent neural network to predict
real estate prices using the state-of-the-art visual features. The
experimental results indicate that our model outperforms several
other state-of-the-art baseline algorithms in terms of both mean
absolute error and mean absolute percentage error.
Index Terms—Deep neural networks, real estate, visual content
analysis.

I. INTRODUCTION
Fig. 1. Example of homes for sale from Realtor.
EAL estate appraisal, which is the process of estimating
R the price for real estate properties, is crucial for both buys
and sellers as the basis for negotiation and transaction. Real
their homes. From this perspective, real estate appraisal is also
closely related to people’s lives.
estate plays a vital role in all aspects of our contemporary
Current research from both estate industry and academia has
society. In a report published by the European Public Real
reached the conclusion that real estate value is closely related
Estate Association (EPRA http://alturl.com/7snxx), it was
to property infrastructure [1], traffic [2], online user reviews [3]
shown that real estate in all its forms accounts for nearly 20%
and so on. Generally speaking, there are several different types
of the economic activity. Therefore, accurate prediction of real
of appraisal values. In particular, we are interested in the market
estate prices or the trends of real estate prices help governments
value, which refers to the trade price in a competitive Walrasian
and companies make informed decisions. On the other hand, for
auction setting [4]. Today, people are likely to trade through
most of the working class, housing has been one of the largest
real estate brokers, who provide easy access online websites for
expenses. A right decision on a house, which heavily depends on
browsing real estate property in an interactive and convenient
their judgement on the value of the property, can possibly help
way. Fig. 1 shows an example of house listing from Realtor
them save money or even make profits from their investment in
(http://www.realtor.com/), which is the largest real estate broker
in North America. From the figure, we see that a typical piece of
Manuscript received March 28, 2016; revised February 26, 2017 and April listing on a real estate property will introduce the infrastructure
18, 2017; accepted May 15, 2017. Date of publication June 1, 2017; date
of current version November 15, 2017. The associate editor coordinating the data in text for the house along with some pictures of the house.
review of this manuscript and approving it for publication was Prof. Benoit Typically, a buyer will look at those pictures to obtain a general
Huet. (Corresponding author: Quanzeng You.) idea of the overall property in a selected area before making his
Q. You and J. Luo are with the Department of Computer Science, Univer-
sity of Rochester, Rochester, NY 14623 USA (e-mail: [email protected]; next move.
[email protected]). Traditionally, both real estate industry professionals and
R. Pang is with PayPaL, San Jose, CA 95131 USA (e-mail: pangrr89@ researchers have relied on a number of factors, such as eco-
gmail.com).
L. Cao is with the Electrical Engineering and Computer Sciences Department, nomic index, house age, history trade and neighborhood en-
Columbia University, New York, NY 10013 USA, and also with customerser- vironment [5] and so on to estimate the price. Indeed, these
viceAI, New York, NY 10013 USA (e-mail: [email protected]). factors have been proved to be related to the house price, which is
Color versions of one or more of the figures in this paper are available online
at http://ieeexplore.ieee.org. quite difficult to estimate and sensitive to many different human
Digital Object Identifier 10.1109/TMM.2017.2710804 activities. Therefore, researchers have devoted much effort in
1520-9210 © 2017 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission.
See http://www.ieee.org/publications standards/publications/rights/index.html for more information.

Authorized licensed use limited to: UNIVERSITY OF ROCHESTER. Downloaded on March 17,2020 at 20:53:35 UTC from IEEE Xplore. Restrictions apply.
2752 IEEE TRANSACTIONS ON MULTIMEDIA, VOL. 19, NO. 12, DECEMBER 2017

building a robust house price index [6]–[9]. In addition, quan- To preserve the local relation among properties we employ
titative features including Area, Year, Storeys, Rooms and Cen- a novel approach, which employs random walks to generate
tre [10], [11] are also employed to build neural network models house sequences. In building the random walk graph, only the
for estimating house prices. However, pictures, which is proba- locations of houses are utilized. In this way, the problem of real
bly the most important factor on a buyer’s initial decision making estate appraisal has been transformed into a sequence learn-
process [12], have been ignored in this process. This is partially ing problem. Recurrent Neural Network (RNN) is particularly
due to the fact that visual content is very difficult to interpret or designed to solve sequence related problems. Recently, RNNs
quantify by computers compared with human beings. have been successfully applied to challenging tasks including
A picture is worth a thousand words. One advantage with im- machine translation [25], image captioning [26], and speech
ages and videos is that they act like universal languages. People recognition [27]. Inspired by the success of RNN, we deploy
with different backgrounds can easily understand the main con- RNN to learn regression models on the transformed problem.
tent of an image or video. In the real estate industry, pictures can The main contributions of our work are as follows.
easily tell people exactly how the house looks like, which is im- 1) To the best of our knowledge, we are the first to quan-
possible to be described in many ways using language. For the tify the impact of visual content on real estate price es-
given house pictures, people can easily have an overall feeling timation. We attribute the possibility of our work to the
of the house, e.g. what is the overall construction style, how the newly designed computer vision algorithms, in particular
neighboring environment looks like. These high-level attributes Convolutional Neural Networks (CNNs).
are difficult to be quantitatively described. On the other hand, 2) We employ random walks to generate house sequences
today’s computational infrastructure is also much cheaper and according to the locations of each house. In this way, we
more powerful to make the analysis of computationally inten- are able to transform the problem into a novel sequence
sive visual content analysis feasible. Indeed, there are existing prediction problem, which is able to preserve the relation
works on focusing the analysis of visual content for tasks such among houses.
as prediction [13], [14], and online user profiling [15]. Due to 3) We employ the novel Recurrent Neural Networks (RNNs)
the recently developed deep learning, computers have become to predict real estate properties and achieve accurate
smart enough to interpret visual content in a way similar to results.
human beings.
Recently, deep learning has enabled robust and accurate
feature learning, which in turn produces the state-of-the-art per- II. RELATED WORK
formance on many computer vision related tasks, e.g., digit Real estate appraisal has been studied by both real estate in-
recognition [16], [17], image classification [18], [19], aesthet- dustrial professionals and academia researchers. Earlier work
ics estimation [20] and scene recognition [21]. These systems focused on building price indexes for real properties. The semi-
suggest that deep learning is very effective in learning robust nal work in [6] built price index according to the repeat prices of
features in a supervised or unsupervised fashion. Even though the same property at different times. They employed regression
deep neural networks may be trapped in local optima [22], [23], analysis to build the price index, which shows good perfor-
using different optimization techniques, one can achieve the mances. Another widely used regression model, Hedonic re-
state-of-the-art performance on many challenging tasks men- gression, is developed on the assumption that the characteristics
tioned above. of a house can predict its price [7], [8]. However, it is argued
Inspired by the recent successes of deep learning, in this that the Hedonic regression model requires more assumptions
work we are interested in solving the challenging real estate ap- in terms of explaining its target [28]. They also mentioned that
praisal problem using deep visual features. In particular, for for repeat sales model, the main problem is lack of data, which
images related tasks, Convolutional Neural Network (CNN) may lead to failure of the model. Recent work in [9] employed
are widely used due to the usage of convolutional layers. It locations and sale price series to build an autoregressive com-
takes into consideration the locations and neighbors of image ponent. Their model is able to use both single sale homes and
pixels, which are important to capture useful features for vi- repeat sales homes, which can offer a more robust sale price
sual tasks. Convolutional Neural Networks [18], [19], [24] have index.
been proved very powerful in solving computer vision related More studies are conducted on employing feed forward neu-
tasks. ral networks for real estate appraisal [29]–[32]. However, their
We intend to employ the pictures for the task of real es- results suggest that neural network models are unstable even us-
tate price estimation. We want to know whether visual features, ing the same package with different run times [29]. The perfor-
which is a reflection of a real estate property, can help estimate mance of neural networks are closely related to the features and
the real estate price. Intuitively, if visual features can charac- data size [32]. Recently, Kontrimas and Verikas [33] empirically
terize a property in a way similar to human beings, we should studied several different models on selected 12 dimensional fea-
be able to quantify the house features using those visual re- tures, e.g., type of the house, size, and construction year. Their
sponses. Meanwhile, real estate properties are closely related to results show that linear regression outperforms neural network
the neighborhood. In this work, we develop algorithms which on their selected 100 houses.
only rely on: 1) the neighbor information and 2) the attributes More recent studies in [1] propose a ranking objective, which
from pictures to estimate real estate property price. takes geographical individual, peer and zone dependencies into

Authorized licensed use limited to: UNIVERSITY OF ROCHESTER. Downloaded on March 17,2020 at 20:53:35 UTC from IEEE Xplore. Restrictions apply.
YOU et al.: IMAGE-BASED APPRAISAL OF REAL ESTATE PROPERTIES 2753

consideration. Their method is able to use various estate related


Algorithm 1: Random Walks
data, which helps improve their ranking results based on prop-
erties’ investment values. Furthermore, the work in [3] studied Require: H = {h1 , h2 , . . . , hn } geo-coordinates of n
online user’s reviews and mobile users’ moving behaviors on houses σ hyper-parameter for Gaussian Kernel t
the problem of real estate ranking. Their proposed sparsity regu- threshold for distance M total number of desired
larized learning model demonstrated competitive performance. sequences
In contrast, we are trying to solve this problem using the 1: Calculate the Vincenty distance between any pair of
attributes reflected in the visual appearances of houses. In houses
particular, our model does not use the meta data of a house 2: Calculate the similarity between houses according to
(e.g., size, number of rooms, and construction year). We intend the Gaussian kernel function (see (1)).
to utilize the location information in a novel way such that our 3: repeat
model is able to use the state-of-the-art deep learning for feature 4: Initialize sc = {}
extraction (Convolutional Neural Network) and model learning 5: Randomly pick one node hi and add hi to sc
(Recurrent Neural Network). 6: set hc = hi
7: while size(sc ) < L do
III. RECURRENT NEURAL NETWORK FOR REAL ESTATE 8: Pick hc ’s neighbor node hj with probability pj
PRICE ESTIMATION defined in (2)
9: add hj to sc
In this section, we present the main components of our frame- 10: set hc = hj
work. We describe how to transform the problem into a prob- 11: end whileadd sc to S
lem that can be solved by the Recurrent Neural Network. The 12: until size (S) = M
architecture of our model is also presented. 13: return The set of sequence S

A. Random Walks
One main feature of real estate properties is its location. In by the recent proposed DeepWalk [35] to learn feature represen-
particular, for houses in the same neighborhood, they tend to tations for graph nodes. It has been shown that random walks
have similar extrinsic features including traffic, schools and so can capture the local structure of the graphs. In this way, we can
on. We build an undirected graph G for all the houses collected, keep the local location structure of houses and build sequences
where each node vi represent the i-th house in our data set. The for houses in the graph. Algorithm 1 summarizes the detailed
similarity sij between house hi and house hj is defined using steps for generating sequences from a similarity graph.
the Gaussian kernel function, which is a widely used similarity We have generated sequences by employing random walks.
measure1 In each sequence, we have a number of houses, which is related
  in terms of their locations. Since we build the graph on top of
dist(hi , hj )
sij = exp (1) house locations, the houses within the same sequence are highly
2σ 2 possible to be close to each other. In other words, the prices of
where dist(hi , hj ) is the geodesic distance between house hi houses in the same sequence are related to each other. We can
and hj . σ is the hyper-parameter, which controls the similarity employ this context for estimating real estate property price,
decaying velocity with the increase of distance. In all of our which can be solved by recurrent neural network discussed in
experiments, we set σ to 0.5 miles so that houses within the following sections.
1.5 (within 3σ) miles will have a relatively larger similarity.
The -neighborhood graph [34] is employed to build G in our B. Recurrent Neural Network
implementation. We assign the weight of each edge eij as the With a Recurrent Neural Network (RNN), we are trying to
similarity sij between house hi and the house hj . predict the output sequence {y1 , y2 , . . . , yT } given the input
Given this graph G, we can then employ random walks to gen- sequence {x1 , x2 , . . . , xT }. Between the input layer and the
erate sequences. In particular, every time, we randomly choose output layer, there is a hidden layer, which is usually estimated
one node vi as the root node, then we proportionally jump to as in
its neighboring nodes vj according to the weights between vi
and its neighbors. The probability of jumping to node vj is ht = Δ(Whi ht−1 + Wx xt + bh ). (3)
defined as Δ represents some selected activation function or other com-
ej i
pj =  (2) plex architecture employed to process the input xt and ht . One
k ∈N (i) ek i of the most widely deployed architectures is Long Short-Term
where N (i) is the set of neighbor nodes of vi . We continue to Memory (LSTM) cell [36], which can overcome the vanishing
employ this process until we generate the desired length of se- and exploding gradient problem [37] when training RNN with
quence. The employment of random walks is mainly motivated gradient descent. Fig. 2 shows the details of a single Long Short-
Term Memory (LSTM) block [38]. Each LSTM cell contains
1 [Online]. Available: http://en.wikipedia.org/wiki/Radial_basis_function_ an input gate, an output gate and an forget gate, which is also
kernel called a memory cell in that it is able to remember the error in

Authorized licensed use limited to: UNIVERSITY OF ROCHESTER. Downloaded on March 17,2020 at 20:53:35 UTC from IEEE Xplore. Restrictions apply.
2754 IEEE TRANSACTIONS ON MULTIMEDIA, VOL. 19, NO. 12, DECEMBER 2017

Algorithm 2: Training Multi-Layer B-LSTM


Require: H = {h1 , h2 , . . . , hn } geo-coordinates of n
houses X = {x1 , x2 , . . . , xn } features of the n
house Y = {y1 , y2 , . . . , yn } prices of the n houses
1: S = RandomWalks (see Algorithm 1)
2: Split S into mini-batches
3: repeat
4: Calculate the gradient of L in (9) and update the
parameters using RMSProp.
5: until Convergence
6: return The learned model M

layer respectively. Next, the output (in our problem, the output
Fig. 2. Illustration of a single long short-term memory (LSTM) cell. is the price of each house) of each house is calculated using the
output of the 2nd-LSTM layer as input to the output layer.
The objective function for training the Multi-Layer Bidirec-
the error propagation stage [39]. In this way, LSTM is capable tional LSTM is defined as follows:
of modeling long-range dependencies than conventional RNNs.
1 
N
For completeness, we give the detailed calculation of ht
L=  ŷij − yij 2 (9)
given input xt and ht−1 in the following equations. Let W.i , N n =1 j
W.f , W.o represent the parameters related to input, forget and
output gate respectively.  denotes the element-wise multiplica- where W is the the set of all the weights between different
tion between two vectors. φ and ψ are some selected activation layers. yij is the actual trade price for the j-th house in the
functions and σ is the fixed logistic sigmoid function. Follow- generated i-th sequence and ŷij is the corresponding estimated
ing [27], [38], [40], we employ tanh for both φ in (6) and ψ price for this house.
in (8): When training our Multi-Layer B-LSTM model, we employ
the RMSProp [42] optimizer, which is an adaptive method for
it = σ(Wxi xt + Whi ht−1 + Wci ct−1 + bi ) (4) automatically adjust the learning rates. In particular, it normal-
izes the gradients by the average of its recent magnitude.
ft = σ(Wxf xt + Whf ht−1 + Wcf ct−1 + bf ) (5) We conduct the back propagation in a mini-batch approach.
ct = ft  ct−1 + it  φ(Wxc xt + Whc ht−1 + bc ) (6) Algorithm 2 summarizes the main steps for our proposed
algorithm.
ot = σ(Wxo xt + Who ht−1 + Wco ct + bo ) (7)
ht = ot  ψ(ct ). (8) D. Prediction
In the prediction stage, the first step is also generating se-
C. Multilayer Bidirectional LSTM quence. For each testing house, we add it as a new node into our
previously build similarity graph on the training data. Each test-
In previous sections, we have discussed the generation of
ing house is a new node in the graph. Next, we add edges to the
sequences as well as Recurrent Neural Network. Recall that
testing nodes and the training nodes. We use the same settings
we have built an undirected graph in generating the sequences,
when adding edges to the new -neighborhood graph. Given
which indicates that the price of one house is related to all the
the new graph G , we randomly generate sequences and keep
houses in the same sequence including those in the later part.
those sequences that contain one and only one testing node. In
Bidirectional Recurrent Neural Network (BRNN) [41] has been
this way, for each house, we are able to generate many different
proposed to enable the usage of both earlier and future contexts.
sequences that contain this house. Fig. 4 shows the idea. Each
In bidirectional recurrent neural network, there is an additional
testing sequence only has one testing house. The remaining
backward hidden layer iterating from the last of the sequence
nodes in the sequence are the known training houses.
to the first. The output layer is calculated by employing both
a) Average: The above strategy implies that we are able to
forward and backward hidden layer.
build many different sequences for each testing house. To obtain
Bidirectional-LSTM (B-LSTM) is a particular type of BRNN,
the final prediction price for each testing house, one simple strat-
where each hidden node is calculated by the long short-term
egy is to average the prediction results from different sequences
memory as shown in Fig. 2. Graves et al. [40] have employed
and report the average price as the final prediction price.
Bidirectional-LSTM for speech recognition. Fig. 3 shows the
architecture of the bidirectional recurrent neural network. We
have two Bidirectional-LSTM layers. During the forward pass IV. EXPERIMENTAL RESULTS
of the network, we calculate the response of both the forward In this section, we discuss how to collect data and evalu-
and the backward hidden layers in the 1st-LSTM and 2nd-LSTM ate the proposed framework as well as several state-of-the-art

Authorized licensed use limited to: UNIVERSITY OF ROCHESTER. Downloaded on March 17,2020 at 20:53:35 UTC from IEEE Xplore. Restrictions apply.
YOU et al.: IMAGE-BASED APPRAISAL OF REAL ESTATE PROPERTIES 2755

Fig. 3. Multilayer BRNN architecture for real estate price estimation. There are two bidirectional recurrent layers in this architecture. For real estate price
estimation, the price of each house is related to all houses in the same sequence, which is the main motivation to employ bidirectional recurrent layers.

TABLE I
AVERAGE PRICE PER SQUARE FOOT AND THE STANDARD DEVIATION
(STD) OF THE PRICE OF THE TWO STUDIED CITIES

City # of Houses Avg Price std of Price

San Jose 3064 454.2 132.1


Rochester 1500 76.4 21.2
Fig. 4. Testing sequence h 1 → h 2 → · · · → h T . In each testing sequence,
there is one and only one testing node in that sequence. The remaining nodes
are all come from training data.

country). The house prices in the two cities also have significant
approaches. In this work, all the data are collected from Real- differences. Fig. 5 shows some of the example house pictures
tor (http://www.realtor.com/), which is the largest realtor asso- from the two cities, respectively. From these pictures, we ob-
ciation in North America. We collect data from San Jose, CA, serve that houses whose prices are above average typically have
one of the most active cities in U.S., and Rochester, NY, one of larger yards and better curb appeal, and vice versa. The same
the least active cities in U.S., over a period of one year. In the can be observed among house interior pictures (examples not
next section, we will discuss the details on how to preprocess shown due to space).
the data for further experiments. Realtor does not provide the exact geo-location for each
house. However, geo-location is important for us to build the
-neighborhood graph for random walks. We employ Microsoft
A. Data Preparation Bing Map API (https://msdn.microsoft.com/en-us/library/
The data collected from Realtor contains description, school ff701715.aspx) to obtain the latitude and longitude for each
information and possible pictures about each real property as house given its collected address. Fig. 6 shows some of the
shown in Fig. 1 show. We are particularly interested in employ- houses in our collected data from San Jose and Rochester using
ing the pictures of each house to conduct the price estimation. the returned geo-locations from Bing Map API.
We filter out those houses without image in our data set. Since According to these coordinates, we are able to calcu-
houses located in the same neighborhood seem to have similar late the distance between any pair of houses. In particular,
price, the location is another important features in our data set. we employ Vincenty distance (https://en.wikipedia.org/wiki/
However, after an inspection of the data, we notice that some of Vincenty’s_formulae) to calculate the geodesic distances ac-
the house price are abnormal. Thus, we preprocess the data by cording to the coordinates. Fig. 7 shows distribution of the dis-
filtering out houses with extremely high or low price compared tance between any pair of houses in our data set. The distance
with their neighborhood. is less than 4 miles for most randomly picked pair of houses. In
Table I shows the overall statistics of our dataset after filter- building our -neighborhood graph, we assign an edge between
ing. Overall, the city of San Jose has more houses than Rochester any pair of houses, which has a distance smaller than 5 miles
on the market (as expected for one of the hottest market in the ( = 5 miles).

Authorized licensed use limited to: UNIVERSITY OF ROCHESTER. Downloaded on March 17,2020 at 20:53:35 UTC from IEEE Xplore. Restrictions apply.
2756 IEEE TRANSACTIONS ON MULTIMEDIA, VOL. 19, NO. 12, DECEMBER 2017

Fig. 5. Examples of house pictures of the two cities, respectively. Top row: houses whose prices (per square foot) are above the average of their neighborhood.
Bottom row: houses whose prices (per square foot) are below the average of their neighborhood. (a) Rochester. (b) San Jose.

B. Feature Extraction and Baseline Algorithms


In our implementation, we experimented with GoogleNet
model [43], which is one of the state-of-the-art deep neural
architectures. In particular, we use the response from the last
avg − pooling layer as the visual features for each image. In
this way, we obtain a 1,024 dimensional feature vector for each
image. Each house may have several different pictures on dif-
ferent angles of the same property. We average features of all
the images of the same house (also known as average-pooling)2
to obtain the feature representation of the house.
We compare the proposed framework with the following
algorithms.
1) Regression Model (LASSO): Regression model has been
employed to analyze real estate price index [6]. Recently, the
results in Fu et al. [3] show that sparse regularization can obtain
better performance in real estate ranking. Thus, we choose to use
LASSO (http://statweb.stanford.edu/˜tibs/lasso.html), which is
a l1-constrained regression model, as one of our baseline
algorithms.
2) DeepWalk: Deepwalk [35] is another way of employing
random walks for unsupervised feature learning of graphs. The
main approach is inspired by distributed word representation
learning. In using DeepWalk, we also use -neighborhood graph
with the same settings with the graph we built for generating
Fig. 6. Distribution of the houses in our collected data for both San Jose sequences for B-LSTM. The learned features are also fed into
and Rochester according to their geo-locations. (a) San Jose, CA, USA (b)
Rochester, NY, USA. a LASSO model for learning the regression weights. Indeed,
deepwalk can be thought as a simpler version of our algorithm,
where only the graph structure are employed to learn features.
Our framework can employ both the graph structure and other
features, i.e. visual attributes, for building regression model.

C. Training a Multilayer B-LSTM Model


With the above mentioned similarity graph, we are able to
generate sequences using random walks following the steps
described in Algorithm 1. For each city, we randomly split the
houses into training (80%) and testing set (20%). Next, we
generate sequences using random walks on the training houses
only to build our training sequences for Multi-layer B-LSTM.

2 We also tried max-pooling. However, the results are not as good as average-
pooling. In the following experiments, we report the results using average-
Fig. 7. Distribution of distances between different pairs of houses. pooling.

Authorized licensed use limited to: UNIVERSITY OF ROCHESTER. Downloaded on March 17,2020 at 20:53:35 UTC from IEEE Xplore. Restrictions apply.
YOU et al.: IMAGE-BASED APPRAISAL OF REAL ESTATE PROPERTIES 2757

TABLE II
PREDICTION DEVIATION OF DIFFERENT MODELS FROM THE ACTUAL SALE PRICES

City LASSO DeepWalk RNN-best RNN-avg

MAE MAPE MAE MAPE MAE MAPE MAE MAPE

San Jose 70.79 16.92% 68.05 16.12% 17.98 4.58% 66.3 16.11%
Rochester 14.19 24.83% 13.68 23.28% 5.21 9.94% 13.32 22.69%

Note that RNN-best is the upper-bound performance of the RNN based model proposed in this work.

For both cities, we build 200,000 sequences for training, with


a length of 10. Similarly, we also generate testing sequences,
where each sequence contain one and only one testing house
(see Fig. 4). On the average, we randomly generate 100
sequences for each testing house. The B-LSTM model is trained
with a batch size of 1024. In our experimental settings, we set
the size of the first hidden layer to be 400 and the size of the
second hidden layer to be 200.
The evaluation metrics employed are mean absolute error
(MAE) and mean absolute percentage error (MAPE). Both of
them are popular measures for evaluating the accuracy of pre-
diction models. (10) and (11) give the definitions for these two
metrics, where pi is the predicted value and ti is the true value
for the i-th instance.
1 
N
MAE = |ti − pi | (10)
N i=1

1  t i − pi
N
MAPE = | | (11)
N i=1 ti Fig. 8. Performance of B-LSTM-avg in different groups. All the testing houses
are grouped by the predicted standard deviation. (a) MAE. (b) MAPE.
We use the same training and testing split to evaluate all the
approaches. Table II shows the regression results for all the
different approaches in the two selected cities. For each testing model can distinguish the confidence level of its prediction. In
house, we generate about 100 sequences. In Table II, we report particular, we group the testing houses evenly into three groups
both the best and the average price of the predicted price. For for each city. The first group has the smallest standard deviation
Rochester, the average standard deviation of the predicted prices of the prediction prices. The second group is the middle one and
over all the houses is 5.6, which is 7.33% of the average price the last group is the one with the largest standard deviation.
in Rochester (see Table I). Comparably, the average standard Fig. 8 shows the MAE and MAPE for the different groups.
deviation for San Jose is 34.64, which is 7.63% of the average The results show that standard deviation can be viewed as a
price in San Jose. The best is the price closest to the true price rough measure of the confidence level of the proposed model
among all the available sequences for each house.3 Overall, our on the current testing house. Small standard deviation tends
B-LSTM model outperforms other two baseline algorithms in to indicate a high confidence of the model and overall it also
both cities. All of the evaluation approaches perform better in suggests a smaller prediction error.
San Jose than in Rochester in terms of MAPE. This is possible
due to the availability of more training data in the city of San V. CONCLUSION
Jose. DeepWalk shows slightly better performance than LASSO,
which suggests that location is relatively more important than In this work, we propose a novel framework for real estate
the visual features in the realtor business. This is expected appraisal. In particular, the proposed framework is able to take
both the location and the visual attributes into consideration.
D. Confidence Level The evaluation of the proposed model on two selected cities
suggests the effectiveness and flexibility of the model. Indeed,
For each testing house, the proposed model can give a group our work has also offered new approaches of applying deep
of predictions. We want to know whether or not the proposed neural networks on graph structured data. We hope our model
can not only give insights on real estate appraisal, but also can
3 This is the upper bound of the prediction results. We choose the closest price inspire others on employing deep neural networks on graph
using the ground truth price as reference. structured data.

Authorized licensed use limited to: UNIVERSITY OF ROCHESTER. Downloaded on March 17,2020 at 20:53:35 UTC from IEEE Xplore. Restrictions apply.
2758 IEEE TRANSACTIONS ON MULTIMEDIA, VOL. 19, NO. 12, DECEMBER 2017

REFERENCES [24] Y. LeCun, L. Bottou, Y. Bengio, and P. Haffner, “Gradient-based learn-


ing applied to document recognition,” Proc. IEEE, vol. 86, no. 11,
[1] Y. Fu et al., “Exploiting geographic dependencies for real estate appraisal: pp. 2278–2324, Nov. 1998.
A mutual perspective of ranking and clustering,” in Proc. 20th ACM [25] D. Bahdanau, K. Cho, and Y. Bengio, “Neural machine transla-
SIGKDD Int. Conf. Knowl. Discovery Data Mining, 2014, pp. 1047– tion by jointly learning to align and translate,” in Proc. Int. Conf.
1056. Learn. Represent., 2014. [Online]. Available: https://arxiv.org/abs/
[2] K. Wardrip, “Public transits impact on housing costs: A review 1409.0473
of the literature,” Center for Housing Policy, Washington, DC, [26] O. Vinyals, A. Toshev, S. Bengio, and D. Erhan, “Show and tell: A neu-
USA, Aug. 31, 2011, Center for Housing Policy. [Online]. Avail- ral image caption generator,” in Proc. IEEE Conf. Comput. Vis. Pattern
able: http://www.reconnectingamerica.org/resource-center/browse- Recog., 2015, pp. 3156–3164.
research/2011/public-transit-s-impact-on-housing-costs-a-review-of-the- [27] A. Graves, A.-R. Mohamed, and G. Hinton, “Speech recognition with
literature/SearchForm/?Search=Keith+Wardrip deep recurrent neural networks,” in Proc. IEEE Int. Conf. Acoust., Speech,
[3] Y. Fu et al., “Sparse real estate ranking with online user reviews and Signal Process., 2013, pp. 6645–6649.
offline moving behaviors,” in Proc. IEEE Int. Conf. Data Mining, 2014, [28] F. T. Wang and P. M. Zorn, “Estimating house price growth with repeat
pp. 120–129. sales data: What’s the aim of the game?” J. Housing Econ., vol. 6, no. 2,
[4] A. Beja and M. B. Goldman, “On the dynamic behavior of prices in pp. 93–118, 1997.
disequilibrium,” J. Finance, vol. 35, no. 2, pp. 235–248, 1980. [29] E. Worzala, M. Lenk, and A. Silva, “An exploration of neural networks
[5] E. L’Eplattenier, “How to run a comparative market analysis (CMA) and its application to real estate valuation,” J. Real Estate Res., vol. 10,
the right way.” 2016. [Online]. Available: http://fitsmallbusiness.com/ no. 2, pp. 185–201, 1995.
comparative-market-analysis/ [30] P. Rossini, “Improving the results of artificial neural network models for
[6] M. J. Bailey, R. F. Muth, and H. O. Nourse, “A regression method for real residential valuation,” in Proc. 4th Annu. Pacific-Rim Real Estate Soc.
estate price index construction,” J. Amer. Statist. Assoc., vol. 58, no. 304, Conf., Perth, Western Australia, 1998.
pp. 933–942, 1963. [31] P. Kershaw and P. Rossini, “Using neural networks to estimate constant
[7] R. Meese and N. Wallace, “Nonparametric estimation of dynamic hedonic quality house price indices,” in Proc. Int. Real Estate Soc. Conf., Jan.
price models and the construction of residential housing price indices,” 26–30, 1999.
Real Estate Econ., vol. 19, no. 3, pp. 308–332, 1991. [32] N. Nghiep and C. Al, “Predicting housing value: A comparison of multiple
[8] S. T. Anderson and S. E. West, “Open space, residential property values, regression analysis and artificial neural networks,” J. Real Estate Res.,
and spatial context,” Region. Sci. Urban Econ., vol. 36, no. 6, pp. 773–789, vol. 22, no. 3, pp. 313–336, 2001.
2006. [33] V. Kontrimas and A. Verikas, “The mass appraisal of the real es-
[9] C. H. Nagaraja et al., “An autoregressive approach to house price tate by computational intelligence,” Appl. Soft Comput., vol. 11, no. 1,
modeling,” Annal. Appl. Statist., vol. 5, no. 1, pp. 124–149, 2011. pp. 443–448, 2011.
[10] T. Lasota, Z. Telec, G. Trawiński, and B. Trawiński, “Empirical com- [34] U. Von Luxburg, “A tutorial on spectral clustering,” Statist. Comput.,
parison of resampling methods using genetic fuzzy systems for a regres- vol. 17, no. 4, pp. 395–416, 2007.
sion problem,” in Intelligent Data Engineering and Automated Learning- [35] B. Perozzi, R. Al-Rfou, and S. Skiena, “Deepwalk: Online learning of
IDEAL. New York, NY, USA: Springer, 2011, pp. 17–24. social representations,” in Proc. 20th ACM SIGKDD Int. Conf. Knowl.
[11] O. Kempa, T. Lasota, Z. Telec, and B. Trawiński, “Investigation of bagging Discovery Data Mining, 2014, pp. 701–710.
ensembles of genetic neural networks and fuzzy systems for real estate [36] F. Gers, “Long short-term memory in recurrent neural networks,” Ph.D.
appraisal,” in Intelligent Information and Database Systems. New York, dissertation, Dept. Comput. Sci., École Polytechnique Fédérale de Lau-
NY, USA: Springer, 2011, pp. 323–332. sanne, Lausanne, Switzerland, 2001.
[12] W. Di, N. Sundaresan, R. Piramuthu, and A. Bhardwaj, “Is a picture really [37] R. Pascanu, T. Mikolov, and Y. Bengio, “On the difficulty of training re-
worth a thousand words?:-On the role of images in e-commerce,” in Proc. current neural networks,” in Proc. 30th Int. Conf. Int. Conf. Mach. Learn.,
7th ACM Int. Conf. Web Search Data Mining, 2014, pp. 633–642. 2013, pp. 1310–1318.
[13] X. Jin, A. Gallagher, L. Cao, J. Luo, and J. Han, “The wisdom of social [38] F. A. Gers, N. N. Schraudolph, and J. Schmidhuber, “Learning pre-
multimedia: Using flickr for prediction and forecast,” in Proc. Int. Conf. cise timing with lstm recurrent networks,” J. Mach. Learn. Res., vol. 3,
Multimedia, 2010, pp. 1235–1244. pp. 115–143, 2003.
[14] Q. You, L. Cao, Y. Cong, X. Zhang, and J. Luo, “A multifaceted ap- [39] S. Hochreiter and J. Schmidhuber, “Long short-term memory,” Neural
proach to social multimedia-based prediction of elections,” IEEE Trans. Comput., vol. 9, no. 8, pp. 1735–1780, 1997.
Multimedia, vol. 17, no. 12, pp. 2271–2280, Dec. 2015. [40] A. Graves, N. Jaitly, and A.-R. Mohamed, “Hybrid speech
[15] Q. You, S. Bhatia, and J. Luo, “A picture tells a thousand words?About recognition with deep bidirectional lstm,” in Proc. Workshop Automat.
you! user interest profiling from user generated visual content,” Signal Speech Recog. Understand., 2013, pp. 273–278.
Process., vol. 124, pp. 45–53, 2016. [41] M. Schuster and K. K. Paliwal, “Bidirectional recurrent neural networks,”
[16] Y. LeCun et al., “Backpropagation applied to handwritten zip code recog- IEEE Trans. Signal Process., vol. 45, no. 11, pp. 2673–2681, Nov. 1997.
nition,” Neural Comput., vol. 1, no. 4, pp. 541–551, 1989. [42] T. Tieleman and G. Hinton, “Lecture 6.5—rmsprop, coursera: Neural
[17] G. E. Hinton, S. Osindero, and Y.-W. Teh, “A fast learning algorithm networks for machine learning,” University of Toronto, Toronto, ON,
for deep belief nets,” Neural Comput., vol. 18, no. 7, pp. 1527–1554, Canada, Tech. Rep., 2012.
2006. [43] C. Szegedy et al., “Going deeper with convolutions,” in Proc. IEEE Conf.
Comput. Vis. Pattern Recog., Jun. 2015, pp. 1–9.
[18] D. C. Cireşan, U. Meier, J. Masci, L. M. Gambardella, and J. Schmid-
huber, “Flexible, high performance convolutional neural networks for
image classification,” in Proc. 22nd Int. Joint Conf. Artif. Intell., 2011,
pp. 1237–1242.
[19] A. Krizhevsky, I. Sutskever, and G. E. Hinton, “ImageNet classification
with deep convolutional neural networks,” in Proc. 25th Int. Conf. Neural
Inf. Process. Syst., 2012, pp. 1097–1105.
[20] X. Lu, Z. Lin, H. Jin, J. Yang, and J. Z. Wang, “Rapid: Rating pictorial Quanzeng You received the B.E. and M.E. degrees
aesthetics using deep learning,” in Proc. 22nd ACM Int. Conf. Multimedia, from the Dalian University of Technology, Dalian,
2014, pp. 457–466. China, in 2009 and 2012, respectively, and is cur-
[21] B. Zhou, A. Lapedriza, J. Xiao, A. Torralba, and A. Oliva, “Learning deep rently working toward the Ph.D. degree in computer
features for scene recognition using places database,” in Proc. 27th Int. science at the University of Rochester, Rochester, NY,
Conf. Neural Inf. Process. Syst., 2014, pp. 487–495. USA. His advisor is Prof. J. Luo.
[22] G. E. Hinton, “A practical guide to training restricted Boltzmann ma- His research focuses on social multimedia, social
chines,” Dept. Comput. Sci., Univ. of Toronto, Toronto, ON, Canada, networks, and data mining. He is interested in de-
Tech. Rep. UTML TR 2010-003, 2010. veloping effective machine learning algorithms that
[23] Y. Bengio, “Practical recommendations for gradient-based training of deep can help us understand the data. His recent research
architectures,” in Neural Networks: Tricks of the Trade. New York, NY, is high-level visual understanding, including image
USA: Springer, 2012, pp. 437–478. captioning and visual sentiment analysis.

Authorized licensed use limited to: UNIVERSITY OF ROCHESTER. Downloaded on March 17,2020 at 20:53:35 UTC from IEEE Xplore. Restrictions apply.
YOU et al.: IMAGE-BASED APPRAISAL OF REAL ESTATE PROPERTIES 2759

Ran Pang is currently working toward the M.S. Jiebo Luo (S’93–M’96–SM’99–F’09) received the
degree in computer science at the University of B.S. and M.S. degrees in electrical engineering from
Rochester, Rochester, NY, USA. the University of Science and Technology of China,
He is interested in artificial intelligence and his re- Hefei, China, in 1989 and 1992, respectively, and
search focuses on social multimedia and data mining. the Ph.D. degree in electrical and computer engineer-
ing from the University of Rochester, Rochester, NY,
USA, in 1995.
He joined the University of Rochester, Rochester,
NY, USA, in fall 2011, after more than 15 years at
Kodak Research Laboratories, Rochester, NY, USA,
where he was a Senior Principal Scientist leading re-
search and advanced development.
Prof. Luo is a Fellow of the International Society for Optics and Pho-
Liangliang Cao received the B.E. degree from the tonics, and the International Association for Pattern Recognition. He has
University of Science and Technology of China, been involved in numerous technical conferences, and served as the Program
Hefei, China, in 2003, the M.E. degree from The Chi- Co-Chair of ACM Multimedia 2010 and the IEEE CVPR 2012. He is the Editor-
nese University of Hong Kong, Hong Kong, China, in-Chief of the Journal of Multimedia, and has served on the Editorial Boards of
in 2005, and the Ph.D. degree from the University of the IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE,
Illinois at Urbana-Champaign, Urbana, IL, USA, in the IEEE TRANSACTIONS ON MULTIMEDIA, the IEEE TRANSACTIONS ON CIR-
2011. CUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, Pattern Recognition, Machine
He is currently a Senior Research Scientist at Ya- Vision and Applications, and the Journal of Electronic Imaging.
hoo! Laboratories, Sunnyvale, CA, USA, and an Ad-
junct Faculty at Columbia University, New York, NY,
USA. He has authored or coauthored more than 40 pa-
pers in top conferences and journals, including the International Conference on
Computer Vision, the Computer Vision and Pattern Recognition Conference, the
European Conference on Computer Vision, the Conference on Neural Informa-
tion Processing Systems, the ACM Multimedia, the International World Wide
Web Conference, the IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MA-
CHINE INTELLIGENCE, and the PROCEEDINGS OF THE IEEE. His research interests
include the intersection of computer vision, multimedia, and big data analytics.
Mr. Cao was the General Chair of the Greater New York Area Multimedia
and Vision Meeting in 2012 and 2013. He was an Area Chair of WACV 2014
and ACM Multimedia 2012. He was a Guest Editor of the ACM Transactions on
Multimedia Computing, Communications, and Applications, and the Computer
Vision and Image Understanding journal.

Authorized licensed use limited to: UNIVERSITY OF ROCHESTER. Downloaded on March 17,2020 at 20:53:35 UTC from IEEE Xplore. Restrictions apply.

You might also like