Research Article: Urban Planning and Design Layout Generation Based On Artificial Intelligence

Mathematical Problems in Engineering

Volume 2022, Article ID 8976943, 10 pages

Research Article
Urban Planning and Design Layout Generation Based on
Artificial Intelligence

Ting Wan and Yuhang Ma

School of Garden, Northeast Forestry University, Harbin, Heilongjiang, China

Correspondence should be addressed to Ting Wan; [email protected]

Received 16 May 2022; Revised 7 June 2022; Accepted 9 June 2022; Published 28 June 2022

Academic Editor: Lianhui Li

Copyright © 2022 Ting Wan and Yuhang Ma. This is an open access article distributed under the Creative Commons Attribution
License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is
properly cited.
Today’s cities are becoming more and more complex, the spatial layout is gradually becoming more and more complex, and all
aspects of urban construction that need to be considered are increasing. The traditional urban planning and design methods have
encountered new challenges. Based on the unique perspective of urban “mesoscale,” this study attempts to apply artificial
intelligence technology in the early stage of urban planning and design, predict the positioning of design land based on the
surrounding environment, so as to break the limitations of manual decision-making, explore the spatial layout problem from the
perspective of machine, find the correlation between land data, and generate results with certain reference value to assist decision-
making. By delimiting the research area and collecting and processing data, we trained and generated the artificial neural network
model and selected three different areas for model test. The test results verify the feasibility and effectiveness of the
method process.

1. Introduction considered in urban construction will be more complex,

comprehensive, and diversified [8–12]. The traditional ex-
Human beings have created cities for survival and de- perience-based urban planning and design methods have
velopment. The most important and primary role of cities certain limitations, and the scheme ideas put forward by
is to serve as gathering places for people living in them. designers based on practical experience are sometimes not
Human activities connect people with urban space [1–4]. comprehensive and in-depth [13]. Today, in a digital era,
After the rapid development of urban construction, many computer technology is widely used in various fields [14–19],
urban plots have changed towards simplification and including architecture and urban areas. Concepts are
homogenization, the urban vitality has been weakened, parametric architectural design [20, 21], computational
and various problems have emerged in urban develop- urban design [22, 23], urban big data analysis [24, 25], and
ment. In fact, reasonable “mixing” is the form in which the smart city emerge [26] one after another.
city should exist and the ideal state for the sustainable Although the design field is more subjective than other
development of the city. Even if the initial planning is science and engineering disciplines, it is not without rules to
relatively simple, the city still evolves in the direction of follow. When designers refer to and study excellent cases and
mixing in people’s actual use until it reaches a more apply them to their own schemes, they actually summarize
balanced state [5, 6]. the common laws and patterns of “good design.” Designers
In recent years, with the increase of population and the can find the law from the scheme. Similarly, if the data is
enrichment of human activities, people also put forward new input into the computer, the computer can also mine the law
demands for urban construction [7]. The mixed develop- between the data. This kind of computer technology is called
ment of cities is not only the current situation, but also a machine learning [27, 28], which is a subset of artificial
sustainable trend. In this context, the factors to be intelligence technology. It aims to “learn” the correlation
2 Mathematical Problems in Engineering

between the characteristics of input data and label data The data preparation part aims to provide data input
through a large amount of data driving, so as to make a more for the algorithm development part. After data acquisi-
accurate prediction. In urban research, machine learning tion, data preprocessing, and dataset generation, three-
technology can mine the potential laws between urban dimensional data is generated for each grid. In addition,
spatial layout data and try to generate urban spatial layout when there is a severe imbalance in the number of classes
scheme from the perspective of computer to assist the in the dataset, additional processing methods are required
existing manual planning and design. to reconstruct the dataset to make the class relatively
Data input is an important part of machine learning balanced.
model. The first law of geography reveals the close rela- The essential algorithm development part aims to re-
tionship between adjacent things. The surrounding envi- alize the spatial layout generation of a single research grid,
ronment is obviously one of the key factors to be considered including the steps of artificial neural network construc-
in the design of urban spatial layout. The isolated scheme tion, model training optimization, and result generation
design separated from the environment will be very abrupt. and display. The expansion algorithm development part
Therefore, if the surrounding environment data of the design starts from two ideas to realize the generation of multigrid
land is used as the input, the machine learning model can results: They are the solution space optimization search
learn the relationship between the design land and the idea based on Monte-Carlo Tree Search (MCTS) [29, 30]
surrounding environment. The input data will cover mul- and the iterative elimination idea based on adjacent grid
tiple dimensions related to urban spatial layout, so as to give filling. The MCTS algorithm uses the complete single-grid
full play to the advantages of computer, put the city in a model of the neighborhood grid generated by the basic
more complete quantitative environment for analysis, and algorithm. The adjacent grid filling algorithm also needs to
comprehensively consider multiple attribute data of the plot train several single-grid models with incomplete neigh-
to generate results. To sum up, in urban planning and design, borhood grids. These incomplete models require sample
how to use artificial intelligence technology to realize the augmentation to the original dataset to ensure sufficient
generation of hybrid urban space layout scheme and create data volume.
richer urban space to better meet the needs of residents is of The above three parts of the work constitute the entire
great significance for the exploration of new urban research method content, and the framework flow is shown in
roads in the digital background. Figure 1.

2. Urban Planning and Design

Layout Generation 2.2. Data Preparation

With the help of artificial intelligence technology, we de- 2.2.1. Data Collection. This research aims to use machine
scribe the construction process of the urban spatial layout learning technology to provide a reference for the generation
generation method explicitly. The necessary steps of of urban spatial layout. After selecting a target city, to avoid
method construction are presented universally to form a the tendency of result generation to be too evident and
standardized process [3, 5]. The premise of method con- single, it will try to delineate the cities with more mature
struction is to clarify the basic research unit form of urban development and more mixed conditions. Areas are used as
spatial layout. This study selected the artificial neural the research scope, and data acquisition work is carried out
network as the main model algorithm. The relevant data of so that the data can cover more complex situations and
urban spatial structure is input into the model in a nu- enhance the generality of the method. For a target city, after
merical matrix. This study will deal with these data in a grid the training is completed, the model can learn the layout
and take the data grid obtained by dividing the research logic of the city; that is, when it is practically applied, the
area into the most basic research unit. generated reference layout scheme will present a “style” or
“feature” similar to the city.
After determining the research scope, to describe the
2.1. Overall Framework. This study explores how to con- concept of urban spatial layout more comprehensively, this
struct a method for generating urban spatial layout based research screened out three elements that are closely related
on artificial intelligence technology. The method in this to spatial layout and have a substantial impact: land use
study is as follows: the build will consist of specific data function, urban morphology, and traffic connections, and
preparation and algorithm development. In practical ap- collect the required data from these three aspects as the input
plication, the design land may occupy multiple divided grid to the machine learning algorithm.
units. The algorithm development will be subdivided into (1) Functional Elements of Land Use. The function is one
two parts: single-grid algorithm development and multi- of the most intuitive elements to describe the content of
grid algorithm development. In the case of an unknown urban land, guiding different human activities. The nature of
grid, the multigrid algorithm expands the scope of appli- land use generally refers to the functional use of a piece of
cation. The method construction is generally divided into land at the planning level, which can be directly obtained
three parts: data preparation, single-grid essential algo- from planning data or indirectly obtained through the
rithm development, and multigrid expansion algorithm classification of planned or completed building functions on
development. the plot.
Mathematical Problems in Engineering 3

Data acquisition
methods of the urban form include space syntax, Forntax,
Morpho, LCZ theory, etc. Among them, LCZ theory covers
more comprehensive quantitative indicators of urban con-
dimension: land use ditions than other methods. While establishing qualitative
Collection area:
urban built-up area
function, urban and quantitative description indicators, it also provides a
form, traffic
connection more applicable method for urban form’s quantitative de-
scription and research.
Data processing The LCZ theory divides urban forms into 17 classes; the
built environment includes 10 classes: high-density high-
rise, high-density mid-rise, high-density low-rise, low-
Data set
Category balance
reconstruction density high-rise, low-density mid-rise, low-density low-
rise, and light-weight low-rise, mass low-rise, scattered
buildings, industrial plants. The natural environment in-
cludes 7 classes: dense trees, sparse trees, bushes, low veg-
Final data set
etation, hard paving, bare land, and water. Since some
classifications are rare and inapplicable in China’s urban
areas, many studies have proposed revisions to the original
ANN algorithm classes. For example, high-density middle-rise buildings,
scattered buildings, dense trees, and bushes were deleted in
some studies, and 13 classes were retained. 11 classes were
Complete Nonholonomic included after research reduction and integration, including
neighborhood neighborhood
model model 7 built class types and 4 nature class types. Considering that
this study focuses on urban built-up areas, too detailed
morphological classification of the natural environment will
Multigrid expansion based Multigrid extension based
lead to too few samples per class, affecting the model’s
on Monte Carlo number on adjacent lattice filling training effect. Therefore, based on the 11-class classifica-
search algorithm algorithm
tion, this study further simplified the natural environment
class, and only the vegetation class and water surface are
Generation model of To obtain LCZ data, it is necessary to download and get
urban spatial layout satellite remote sensing image data such as Landsat-8 from
the official website of the United States Geological Survey
Figure 1: Overall framework. (USGS) and then use the WUDAPT method to process and
generate it. The WUDAPT method refers to using the band
value of the remote sensing image as the classification basis
The land use properties in the traditional sense are for water, vegetation, nonvegetation, and other classes and
obtained through the urban land use status map. Still, the then manually sampling each class in Google Earth, in-
drawing of these maps has a significant delay and does not putting the sample boundary position into the GIS software
fully conform to the status, and the classification is not and comparing it with the satellite image. Get each class of
precise enough to show the mixed state of the region. In samples based on satellite imagery. Based on these training
contrast, Point of Interest (POI) is another way of de- samples, the random forest algorithm is used to further
scribing land use functions. Each point contains infor- classify and discriminate the grids in the study area, sup-
mation such as name, latitude and longitude coordinates, plemented by appropriate manual error correction opera-
and functional classification; the classification is detailed tions, to obtain the final LCZ classification results.
and updated quickly. However, POI also has a big problem: (3) Functional Elements of Traffic Connection. In addi-
For functions with a single nature but a large area, such as tion to the “content” of the plots, the relationship between
schools and the industrial regions, one point obviously plots, that is, traffic, needs to be described in urban design.
cannot cover the actual range, and the error is significant. Many factors affect the traffic connection in urban land use.
The study integrates land use status and various real-time This study selects point data based on the bus, subway, and
POI data crawled to generate more accurate data on land other stations and line data related to road networks, such as
use properties. Land use status data and POI data are road grades and road integration. After comprehensive
obtained using open map APIs like Baidu Maps and calculation, it is used to characterize the traffic connection
Tencent Maps. strength value of the grid. The site data is obtained through
(2) Functional Elements of Urban Morphology. Urban POI crawling, and the road network data is downloaded
morphology is another key element in describing urban from the OpenStreetMap (OSM) open-source map website.
space. Applying machine learning requires using a quanti- In summary, the data used in this study include five types
tative, discretized representation of urban form so that this of functional POI, land use status, Landsat-8 satellite remote
element can be smoothly fed into the model in numerical sensing map, traffic station POI, and urban road network.
form. At the macroscale, common quantitative expression They correspond to the three dimensions of land use, LCZ,
4 Mathematical Problems in Engineering

and traffic intensity of the dataset. Among them, Landsat-8 corresponding LCZ type position to 1; for example,
satellite images and road network data are downloaded from LCZ 3 can be expressed as [0, 0, 1, 0, 0, 0, 0, 0, 0].
the corresponding website; the rest of the geographic data (c) Traffic intensity data. This study defines the concept
was obtained from the open map API using web crawling of “traffic intensity,” which describes the traffic sit-
technology (Table 1). uation of a grid by integrating the four factors of
subway station distance, bus station distance, road
level, and road integration degree. The distance data
2.2.2. Data Division and Processing. After obtaining the raw of two classes of stations are calculated and generated
data, the following processing steps are performed to gen- by the corresponding POI through the Euclidean
erate the dataset results. distance tool in GIS. The road level is classified by the
(1) Grid Division. The study area is divided into grids in road property identification in the OSM road net-
the GIS software, and the data of each dimension is work data, which is divided into 4 levels; the inte-
superimposed on the grid at the corresponding location gration degree is calculated based on the road
using the overlay analysis tool of GIS. network using the DepthMapX tool generated. The
The following two criteria determine the grid size: one latter two are converted from line to raster data by
needs to match the grid size of the LCZ data as much as kernel density weighted calculation.
possible, and the other is to conform to the common urban
block plot size. The grid size of LCZ data is generally between The four indicators are reclassified according to cer-
200 m and 500 m in the early period, and 100 m is more tain standards, and the traffic intensity level of each grid is
common in recent years. The common size of urban blocks is obtained after weighted summation. The specific calcu-
between 50 m and 200 m. The grid size used in this study is lation method and standard are shown in Table 3. Among
determined according to the specific conditions of different them, the subway and bus stations are reclassified
urban instances. according to a certain radiation distance interval, and the
(2) Data Processing. To ensure the effectiveness of the radiation distance interval of the subway station is slightly
machine learning model, the data of the three dimensions larger than that of the bus station to reflect the more
need to be normalized. significant influence of the subway. The road level and
road integration degree are reclassified based on the
(a) Land use data. This study divides land use properties natural breakpoint classification method. In the weighted
into five basic classes: business, industrial, public, summation process, the station index and the road index
residential, and landscape, covering most functional account for 50% of the weight, and the weight of defining
land use classes. See Table 2 for specific descriptions the subway station is slightly larger than that of the bus.
and code abbreviations used in the text. The weight of the road level is somewhat more significant
Integrate the crawled functional POIs into the above than the integration degree, which is 30% and 20%,
five classes and stack them on the grid. Considering respectively.
the hybridity of functions, in order not to lose too Then:
much information, this study chooses to calculate
the proportion data of five types of land in the grid to x � xSubway · 0.3 + xBus · 0.2 + xRoad level · 0.3
represent the land use properties of the grid area, (1)
+ xRoad integrity degree · 0.2.
rather than just taking the maximum value to obtain
a single dominant class. Based on this, relative In formula (1), x is the comprehensive traffic intensity
proportions of five classes of POIs are generated for value, and xSubway , xBus , xRoad level , xRoad integrity degree repre-
each grid calculation. The continuous proportion sent the index value of the subway station distance, bus
data are discretized into 6 levels of 0%, 20%, 40%, station distance, road grade, and road integration index after
60%, 80%, and 100%. For the blank areas not covered classification.
by POI, use the current land use data as a supplement The summation results are divided into 4 levels, nor-
to reduce the number of blank grids as much as malized to the 0-1 interval (Table 4). The final traffic intensity
possible. Finally, a one-dimensional array of length 5 level data is obtained, represented by a one-dimensional
is generated for each grid, and the numeric type is array with a length of 1. The numerical type is a floating-
float. point type.
(b) LCZ data. The 11 classes of LCZ data generated by
the WUDAPT method were integrated into the 9
classes defined for this study, superimposed into the 2.2.3. Dataset Generation. Based on the first law of geog-
grid, and the LCZ data of each grid was stored in the raphy, this study attempts to explore the correlation between
form of one-hot encoding (One-Hot), and the nu- unknown grids and their surrounding known grids. Hence, a
merical type was an integer. One-hot encoding is an neighborhood range needs to be determined before data
effective encoding method for dealing with discrete input. According to the definition of Moore’s Neighborhood
categorical features. For this study, it means ini- 99, 3 × 3 is the minor neighborhood and is suitable as the
tializing a one-dimensional all-zero array of length 9: basic unit of research. In contrast, the 5 × 5 neighborhood is
[0, 0, 0, 0, 0, 0, 0, 0, 0], and change the value of the one more grid distance than the 3 × 3 neighborhood, and it is
Mathematical Problems in Engineering 5

Table 1: Data explanation.

Influencing Corresponding Method of
Data content Class of data Data sources
factors data obtaining
Various functions Latitude and longitude
Open map API Web crawler
POI coordinates of the point
Land use
Land use Latitude and longitude
Status of land use coordinates of each endpoint Open map API Web crawler
Landsat-8 satellite
City The official website of the United
LCZ remote sensing Raster image Download
morphology States Geological Survey (USGS)
Latitude and longitude
Traffic stop POI Open map API Web crawler
coordinates of the point
Traffic intensity The latitude and longitude
connection Urban road OpenStreetMap (OSM) open-
coordinates of each endpoint Download
network source map
of the line

Table 2: Classification of land use.

ID Abbreviation Land use Describe
0 B Business Entertainment, dining, shopping, hotel, business, etc.
1 M Industrial Campus, factories, etc.
2 A Public Medical, education, culture, sport, etc.
3 R Residential Residential area, etc.
4 G Landscape Park, attractions, city squares, etc.

Table 3: Traffic intensity level.

Original data Subway POI Bus POI Urban road network
Corresponding Subway station
Bus stop distance Road level Road integrity degree
indicators distance
Main roads, highways,
3 DepthMapX calculation
2 Secondary road
Euclidean distance Euclidean distance
Quantitative method 1 Branch road
analysis analysis
0 Sidewalk etc.
Kernel density weighted Kernel density weighted
analysis analysis
3 Within 500 m Within 300 m
Reclassification 2 500 m–800 m 300 m–500 m Natural breakpoint Natural breakpoint
criteria 1 800 m–1200 m 500 m–800 m classification (Jenks) classification (Jenks)
0 Beyond 1200 Beyond 800
Weights 0.3 0.2 0.3 0.2

circle”; the 16 grids extended from the 5 × 5 neighborhood

Table 4: Standardization method of calculation results.
are called “outer circle.”
Calculated value x Reclassification After combining the data of the above three dimensions,
2.25 < x ≤ 3 1 express the spatial layout properties of each grid as a one-
1.5 < x ≤ 2.25 0.67 dimensional array of length 15; for example the array [0.2, 0,
0.75 < x ≤ 1.5 0.33 0, 0.8, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0.67] means that 80% of the
0 ≤ x ≤ 0.75 0 grid plot is residential land; 20% is commercial land, mainly
high-density mid-rise buildings, and high traffic intensity
also a more comfortable theoretical walking distance, worthy (Table 5).
of study. We slice the matrix grid data into 3 × 3 and 5 × 5 ranges
This study chooses to “slice” the original grid based on using Python and remove the slices that contain blank grids
the two neighborhood ranges of 3 × 3 and 5 × 5. It defines the without data to obtain a slice set that can be used for
8 grids in the 3 × 3 neighborhood of a grid as an “inner training. Extract the data of 8 (3 × 3 neighborhood) or 24
6 Mathematical Problems in Engineering

Table 5: Grid space layout data. change the imbalance in the dataset itself. This study chose to
approach this question from a data perspective.
Land use LCZ Traffic
Without considering the fundamental adjustment of the
B M A R G 1 2 3 4 5 6 7 8 9 — research scope and classification method, the proportion of
0.2 0 0 0.8 0 0 1 0 0 0 0 0 0 0 0.67
the original categories of the dataset cannot be changed. It is
necessary to adjust the number of various types of samples
(5 × 5 neighborhood) grids around the slice, and connect through resampling to realize the reconstruction of the
them into a one-dimensional array as learning data; extract dataset. Resampling methods include undersampling (re-
the data of 1 grid in the middle as a label data to compare ducing the number of samples) for large classes and over-
with learning results to evaluate training accuracy. For the sampling (increasing the number of samples) for small
5 × 5 neighborhood, it is necessary to extract the data of the 8 classes. Still, in general, undersampling will result in the loss
grids in the inner circle and bring it to the front to be of more information for the class of samples. Therefore, this
consistent with the input order of the 3 × 3 neighborhood. study only adopts the oversampling method to expand the
Each slice is transformed into a 1D array of length number of small class samples to narrow the gap between
15 × 9 � 135 (3 × 3 neighborhood) or 15 × 25 � 375 (5 × 5 classes and solve the imbalance problem. Of course, this
neighborhood), and the first 120 (3 × 3 neighborhood) or method will change the distribution of the original category,
360 (5 × 5 neighborhood) values are learning data. The last and further, it will forcefully reverse the preferences of the
15 are labels, which are processed in turn, and the final original model, which will undoubtedly cause new problems.
dataset results of 3 × 3 neighborhood and 5 × 5 neighbor- As a compromise choice, this study will not resample a total
hood are obtained based on the slice set. balance of the number of each class when reconstructing the
dataset. Still, there is a relative balance so that the proportion
gap between the classes is not too large.
2.2.4. Analysis and Handling of Dataset Imbalance. The Taking a noncomplete LCZ model that needs to be added
scope of data collection selected in this study is urban built- to the adjacent grid filling algorithm as an example, the re-
up areas, which will inevitably encounter the problem of construction process of the dataset is briefly explained, and
unbalanced dataset categories. For example, the proportion the sample data is shown in Table 6. As an incomplete model,
of business, public, and residential land use is obviously the corresponding dataset needs to be expanded in quantity to
higher than industry and landscape. The proportion of ensure that the total number of samples is not too small.
medium- and high-rise building types corresponding to Assuming that the original number of 9 classes of LCZ
these types of land will also be higher than that of low-rise samples is unbalanced, the initial total number of samples is
buildings, vegetation, and water surfaces. the sum of 9 types of 2475, and the original expansion ratio is
Dataset imbalance is a common problem in machine 6 times, the total number of samples that need to be expanded
learning. When the number of samples is not much different, is 14850. Suppose we want to achieve absolute equality of the
it can be ignored, but if the gap is relatively large, such as sample sizes of all types. In that case, 1650 of the total is the
class A: class B � 100 : 1, it will lead to a poor model training target value of each class of samples. Before the correction, the
effect. Neural networks largely tend to predict outcomes as resampling ratio can be obtained by dividing the target value
class A. This model looks very accurate, even reaching 99%, by the original number. A specific manual correction is made
which fully meets the requirements of high scores, but in to this group of magnification values in the original distri-
fact, it completely ignores a few classes, and the model is bution of various proportions. There is a specific difference
almost ineffective. between the multiplied resampling quantities, and the mul-
To solve this problem, on the one hand, it is necessary to tiple classes are relatively balanced. After resampling, the total
choose a new evaluation criterion to accurately analyze the number of samples is 12850, which is not much different from
training and prediction of each class of samples to determine the total number after the original augmentation.
whether the high accuracy of a model is the result of being As for the specific source of sample expansion, a sample
“cheated” by an imbalanced dataset. The confusion matrix slice with a complete neighborhood grid, 90° rotation, mir-
was used as an additional evaluation criterion in this study. roring, etc. can generate new samples, and one sample can be
A confusion matrix is an effective tool for evaluating the changed to a maximum of 8; that is, the maximum magni-
accuracy of multiclassification problems in supervised fication is 8 times; for slices with incomplete neighborhood
learning and can calculate multiple evaluation metrics grids, the expansion space depends on the degree of “in-
suitable for multiclassification problems. completeness”; for example, any one of the 8 grids in the inner
On the other hand, after determining that the effect of a circle lacks any one of the corresponding C18 � 8 possibilities,
model has indeed been affected by the problem, action needs lack any 2 corresponding C28 � 28 possibilities, and so on,
to be taken for it. The treatment of dataset imbalance different incomplete cases have different expandable mag-
problem is generally considered from two aspects: algorithm nifications, and the minimum is 8 times. When the dataset is
and data. The former refers to using some optimization unbalanced, this study will set the corresponding resampling
methods in the machine learning algorithm to incorporate ratio for each dataset based on the above process. Reconstruct
the difference in the number of classes into the impact on the the distribution of the dataset to reduce the negative impact of
parameters as much as possible. The latter refers to trying to imbalance and train to obtain a better model effect.
Mathematical Problems in Engineering 7

Table 6: Dataset reconstruction examples.

Class 1 Class 2 Class 3 Class 4 Class 5 Class 6 Class 7 Class 8 Class 9
Number of original samples 100 1000 100 400 400 200 50 200 25
Resampling ratio before correction 16.5 1.65 16.5 4.125 4.125 8.25 33 8.25 66
Corrected resampling ratio 13 3 13 4 5 6 25 8 40
Number of resampling 1300 3000 1300 1600 2000 1600 1250 1600 1000

2.3. Development of Single-Grid Basic Algorithm. ANN is the network is repeatedly adjusted and optimized, and finally
generally composed of the input, hidden, and output layers; a model that can reflect the internal law of urban spatial
each layer contains several neuron nodes, and directed layout in the corresponding area of the dataset is obtained.
weighted arcs are used to connect each node. Build an ANN The evaluation function generally refers to the loss
algorithm model and load the processed dataset for training. function and accuracy function of the model. The worse the
The input data is the three-dimensional (land property, LCZ, weight of the neural network is, the worse the performance of
traffic intensity) data in the neighborhood of the unknown the neural network is calculated automatically. In this study,
grid, and the output result and the corresponding label are the cross-entropy error function is used as the loss function:
the two-dimensional (land property, LCZ) data of the un-
known grid. In reality, road traffic often exists before the E � − 􏽘 ti log yi . (2)
construction of urban land. The subsequent urban design
generally does not change the traffic conditions around the In formula (2), E is the cross-entropy error, yi is the
design land, so the traffic intensity data is only used as an neural network output of the ith sample, ti is the correct
input and not an output. In addition, to avoid the output unlabeling of the ith sample, and yi and ti are represented by
results being too complicated to parse, this study separates one-hot coding.
the output of land use and LCZ and trains two types of The calculation of accuracy can more intuitively show
models. The two datasets of 3 × 3 neighborhood and 5 × 5 the quality of network performance. Different tasks will
neighborhood finally correspond to four models. define different accuracy functions. For some simple
For the convenience of discussion, the numbering method problems, it can be calculated directly by judging whether
of defining the dataset is neighborhood range_number of the generated result is “equal to” label. This method is called
grids in the inner circle (_number of grids in the outer circle), “equivalence” evaluation in this study. When the result is
the same below. Among them, the dataset corresponding to more complex, and the judgment of the original result will
the 3 × 3 neighborhood is numbered by the first two groups of be affected by simply summarizing it with the word “right”
numbers: neighborhood range_inner circle grid number. For or “wrong,” the “Deviation” evaluation of the result can be
example, 3_8 means that, in the 3 × 3 neighborhood, the data obtained by calculating the difference between the generated
of the 8 grids in the inner circle is used as the input dataset; the result and the label data. This study combines the two
dataset corresponding to the 5 × 5 neighborhood uses 3 sets of standards of equivalence and deviation to score the two
numbers: neighborhood range_number of grids in the inner dimensions of output.
circle_number of grids in the outer circle; for example, 5_8_16
means that, in a 5 × 5 neighborhood, the data of 8 grids in the 3. Analysis and Discussion
inner circle plus 16 grids in the outer circle, a total of 24 grids
are used as input dataset. It is one of the important reasons for this study to explore the
In the specific training process, the dataset shall be laws existing in urban spatial layout data from the per-
randomly divided into training set and testing set in the ratio spective of computers. Given the use of an artificial neural
of 7 : 3, and the built ANN shall be input for certain epoch103 network as the main model algorithm in this study, ex-
times of training. The training set is used as the known data ploring the layout rule is equivalent to exploring the con-
input to the neural network for learning, and the test set is nection between the input and output data, that is, analyzing
used to evaluate the performance of the model on the un- the possible relationships and laws between the current
known dataset. Through repeated training of known data, situation data of the surrounding environment input by the
the network gradually adjusts and optimizes the weight value sample and the prediction results of the central grid output.
of the connection between nodes and learns the relationship Before conducting this analysis, three questions need to be
between input and output. In addition to the weight pa- clarified: What is the sample to be analyzed? What does
rameters, there are also superparameters that need to be surrounding data mean? What is the forecast result?
manually defined, such as the number of hidden layers and For question 1, it is evident that the model’s performance
the number of neuron nodes in each layer. It is often im- on the unknown test set can better reflect the real learning
possible to find the parameter setting that can make the situation than the training set, so the rule analysis is mainly
network play the best performance immediately when the based on the test set samples of 90 models generated by the
network is initially built. It is necessary to evaluate the actual training. Moreover, no matter whether the score is high or
performance of the network on the dataset through the low, the model’s prediction of the sample will follow certain
evaluation function. Based on this, the parameter setting of internal logic and laws. Still, the laws embodied by the
8 Mathematical Problems in Engineering

Training set samples Input: surrounding current situation data

Land property data

Land usenature results
Primary sample

LCZ data
Secondary sample LCZ results

Traffic intensity data Output: forecast results

Tertiary sample

Sample: the best performing test set sample

Law of urban spatial

Figure 2: Law analysis process.

Table 7: Proportional distribution of test set results of the model.

3 × 3 neighborhood 5 × 5 neighborhood
Score Level
Number of samples Proportion (%) Number of samples Proportion (%)
≥80 A 7856 58.88 169678 77.51
60–80 B 2234 16.67 26088 11.47
Land use 40–60 C 1955 14.45 15045 6.77
20–40 D 1109 8.34 7266 3.81
≤20 E 224 1.66 1076 0.44
100 A 9322 47.11 168789 77.33
0 B 10489 52.89 49672 22.67

samples with high scores conform to the real situation, while The test set samples are divided into different grades
the samples with low scores do not. To explore the realistic according to the result score, in which the land property
part of the rules found by the model, in this study, the test set results are divided into five grades, and LCZ has only two
samples are divided into different levels according to the grades of correct or wrong prediction. Table 7 shows the
scores of the prediction results, and the in-depth analysis is specific number and proportion of samples at each level.
performed based on the best performing part of the samples. As can be seen in Table 7, the prediction performance of
For question 2, the surrounding data refer to the grid’s 5 × 5 neighborhood is better than 3×3 neighborhood, and
land use, LCZ, and traffic intensity data in the neighborhood. the prediction performance of land property is better than
For question 3, the prediction results refer to the land use LCZ, which is in line with the conclusions obtained from the
properties and LCZ prediction results of the unknown grid analysis above.
in the middle. And the results of land use properties can be Comparing the conclusions of land use property model
subdivided into 16 classes: there are only one dominant class and LCZ model, it is found that the samples with good
(5 class), two dominant classes (10 class), and no obvious prediction performance are the same, but some character-
dominant class (1 class); LCZ prediction results can be istics of their surrounding environment are opposite. For
subdivided into 9 classes, corresponding to 9 LCZ classes. example, after the proportion of landscape land increases,
Figure 2 shows the relationship between the above three the prediction effect of land use property is worse, but the
answers and briefly illustrates the steps of law analysis. prediction effect of LCZ is better. This shows that the model
After the analysis object is defined, the analysis is carried has great differences in the prediction law of land use
out according to the following steps: (1) divide the test set properties and LCZ.
samples into different grades (grades) according to the
prediction score, and briefly describe the general situation of 4. Conclusions
samples at each grade. (2) Observe whether the samples with
good performance have common neighborhood charac- This study comprehensively analyzes and evaluates the
teristics. (3) Take the best performing class A samples and artificial intelligence generation method of urban spatial
briefly describe the general situation of the prediction results layout from two aspects: algorithm and result. In the al-
of each subdivision category. For (4) the nature of land use gorithm analysis, firstly, the 90 ANN models of the example
and (5) LCZ, deeply analyze the relationship between the city are analyzed and evaluated based on various scoring
surrounding current situation and the data of the prediction indicators, including the analysis of the overall perfor-
results in the middle, and summarize the possible laws in the mance of the model and the comparative analysis of the
urban spatial layout. land use property model and LCZ model. The analysis
Mathematical Problems in Engineering 9

shows that the overall effect of the ANN algorithm is better, [8] J. Wan and H. Shi, “Research on urban renewal public space
and the performance of the land use property model is design based on convolutional neural network model,” Se-
better than LCZ model, and several factors affecting the curity and Communication Networks, vol. 2021, Article ID
performance of the model are summarized. The results 9504188, 9 pages, 2021.
show that the adjacent lattice algorithm is better than [9] X. Zhang, W. Fan, and X. Guo, “Urban landscape design based
MCTS algorithm. In the result analysis, the three multigrid on data fusion and computer virtual reality technology,”
Wireless Communications and Mobile Computing, vol. 2022,
test results generated based on the adjacent lattice filling
Article ID 7207585, 14 pages, 2022.
algorithm in the example application are briefly analyzed
[10] L. Liu, “Urban complex public space design method based on
and interpreted, and then the sample data of the test set support vector machine,” Mathematical Problems in Engi-
based on 90 ANN models are deeply discussed. The rela- neering, vol. 2022, Article ID 9812223, 13 pages, 2022.
tionship between the surrounding current situation and the [11] J. He, “Landscape design method of urban wetland park using
prediction results in the middle is analyzed, from which a the building information model,” Wireless Communications
series of more reasonable urban spatial layout laws in the and Mobile Computing, vol. 2022, Article ID 6228513,
research area are summarized. It is proved that this re- 10 pages, 2022.
search method has certain practical value. [12] X. Ma, J. Li, and X. Zhang, “Data model of key indicators of
urban architectural design based on long-and short-term
convolutional memory network,” Mathematical Problems
Data Availability in Engineering, vol. 2022, Article ID 7607928, 13 pages,
The raw data supporting the conclusions of this article [13] S. Liu, “Application of big data technology in urban greenway
can be obtained from the corresponding author upon design,” Source: Security and Communication Networks,
request. vol. 2022, Article ID 4826523, 10 pages, 2022.
[14] L. Li, B. Lei, and C. Mao, “Digital twin in smart
Conflicts of Interest manufacturing,” Journal of Industrial Information Integration,
vol. 26, no. 9, Article ID 100289, 2022.
The authors declared that they have no conflicts of interest [15] A. Naseem, M. A. Rehman, and J. Younis, “A new root-
regarding this work. finding algorithm for solving real-world problems and its
complex dynamics via computer technology,” Complexity,
vol. 2021, Article ID 6369466, 10 pages, 2021.
