0% found this document useful (0 votes)
26 views28 pages

Swimming Pool Detection and Classification Using Deep Learning

This document discusses using deep learning to detect and classify swimming pools in aerial imagery. The authors labeled over 2,000 pools in Southern California imagery to create training data. They trained a single shot multibox detector model using this data and RGB bands from NAIP imagery. Initial results were poor due to issues with the training data preparation. Improving the data processing methods, such as introducing asymmetry in image chips and including truncated bounding boxes, led to better model performance. The authors demonstrate detecting pools can help update property assessment records, while classifying pools as clean vs neglected can aid in mosquito control efforts.

Uploaded by

Jean
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
26 views28 pages

Swimming Pool Detection and Classification Using Deep Learning

This document discusses using deep learning to detect and classify swimming pools in aerial imagery. The authors labeled over 2,000 pools in Southern California imagery to create training data. They trained a single shot multibox detector model using this data and RGB bands from NAIP imagery. Initial results were poor due to issues with the training data preparation. Improving the data processing methods, such as introducing asymmetry in image chips and including truncated bounding boxes, led to better model performance. The authors demonstrate detecting pools can help update property assessment records, while classifying pools as clean vs neglected can aid in mosquito control efforts.

Uploaded by

Jean
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 28

Swimming pool detection and

classification using deep


learning
— By Divyansh Jha and Rohit Singh
Swimming pool detection and classification using deep learning | by Divyansh Jha | GeoAI | Medium

Object detection is one of the most important tasks in the field of


Computer Vision. Locating a specific object in an image is a trivial task
for humans, but can be quite challenging for machines. The field has
recently witnessed groundbreaking research with state of the art
results, but taking this research to the field and solving real-world
problems is still a challenge. Integration of the latest research in AI
with ArcGIS, the industry leading GIS, opens up a world of
opportunities ranging from feature identification and land cover
classification to creating maps straight out of imagery.

At the plenary session in the Esri User Conference this year,


we showcased one such integration to demonstrate detection of
swimming pools using aerial imagery. We went a step further and were
even able to identify which pools are in a state of neglect and might
need inspection by health inspectors to prevent the spread of vector
borne diseases.
Presentation at Esri UC 2018 plenary
The Problem

Tax assessors at local government agencies often have to rely on


planimetric mapping services to create tax assessment rolls. Such
surveys are expensive and infrequent, leading to inaccuracies in
assessment of taxes. In the case of assessing property taxes, pools are
typically added to assessment records because they impact the value of
the property. Home improvements such as addition of swimming pools
leads to increase in value of the property and thus increased property
taxes. Finding pools that are not on the assessment roll will be valuable
to the assessor. It will add incremental value to the property and
ultimately mean additional revenue to the community.

ArcGIS Pro showing residential parcels with pools highlighted in blue. Note
that some parcels with pools are missing, indicating outdated data. The goal
of this project was to identify all such parcels.
Doing this through GIS and AI would certainly reduce the heavy
amount of expensive human labor involved in updating the records
through field visits of each property.

Additionally, the downturn and slow recovery of residential real estate


market has left many homes across the country with neglected pools,
that might be breeding grounds for mosquitoes. The sheer volume of
properties affected in warmer climates has made the detection of these
risky properties challenging for many organizations. Public Health and
Mosquito Control agencies are responsible for providing the highest
level of protection from vectors and vector-borne diseases. The
problem with mosquitoes isn’t just their annoying nature and itchy
bites. The spread of viruses carried by mosquitoes like west nile and
chikungunya, are of grave concern to many agencies. To help with
remediation efforts, these agencies need a simple solution that helps
them locate neglected pool (green pools) from imagery and then use
this intelligence to drive field activity and mitigation efforts. This
solution also ties in to existing mosquito control solutions nicely.
Clean and Green pools in a neighborhood

This blog is a story of successive failures and triumphs during this


project. This blog reveals our approach, what worked and what didn’t,
and how integration with ArcGIS made the whole process much easier.
Our notebooks and script files are published here on GitHub.

Creating training data


Deep Learning models require a large number of training examples to
produce good results. There is a golden rule, ‘the more the data, the
better the results’. Keeping this in mind we started searching for data
on the Internet, but after only a few hours we learnt that there was no
openly available labeled dataset for swimming pool detection using
satellite imagery. We then labeled around 2,000 swimming pools in
cities in Southern California. Surprisingly, it didn’t take that long.

ArcGIS Pro includes tools for labeling and exporting training data

We chose ArcGIS Pro to label some sample swimming pool locations. It


provides access to a host of aerial, satellite and drone imagery from
Esri and its partners — we used the Esri World Imagery basemap for
labeling. This software includes an easy to use interface to label data as
well as advanced GIS functionality, including tools for reviewing data
to manage its quality. Additionally, it includes geoprocessing tools to
create buffers and bounding boxes around labeled pool locations and
includes the ‘Export Training Data for Deep Learning’ tool that can be
used to create labelled image chips that are needed to train a deep
learning model.
Another option to create image chips is using ArcGIS API for Python,
which has methods for exporting images from Imagery (eg NAIP
imagery layers) as well as Tile layers (such as the Esri World Imagery
layer). We first created a shapefile containing the labeled pool locations
using ArcGIS Pro and chipped out 224x224 images from
aerial/satellite imagery using the locations in the shapefile. The
GitHub repository includes a notebook demonstrating this approach.

Training deep learning models


For training the object detector, we used a Single Shot MultiBox
Detector (SSD) inspired architecture using Focal Loss, as explained in
the fast.ai course (lesson 9). The fast.ai library (by
Jeremy Howard
) is built on top of PyTorch and helps create state of the art models quickly
and train them efficiently using some slick optimizations. It provides a high-
level API for creating and training deep learning models, while also allowing
fine grained control to customize everything.

We used Resnet-34 as the base model and added a Single Shot


MultiBox Detector on top of it using PyTorch. Resnet-34 is an image
classification model, which was trained on over 1 million images of the
ImageNet visual recognition challenge. For visual intuition, the SSD
head architecture is shown below.
SSD architecture (Source: Wei Liu)

For training, we used the Adam optimizer using one-cycle learning rate
schedule. We also employed discriminative learning rates while fine-
tuning the model. All of the stated techniques are provided by the
fast.ai library.

Imagery
An important consideration for training deep learning models is to
pick the imagery to be used. Using the most current and spatially
accurate satellite imagery is important. The resolution at which to
perform the training and inference, as well as which bands to be used
can be critical.

The ArcGIS platform provides access to a large collection of aerial,


satellite and drone imagery through the Living Atlas. This includes
imagery from Landsat, Sentinel and NAIP (National Agriculture
Imagery Program) programs. High resolution imagery is also
available through partners and includes 7cm imagery from Nearmap
and Vexcel.
NAIP imagery is acquired during the agricultural growing seasons
throughout the continental US. This imagery is of 1 meter resolution
and is collected every two years for a given area. Counties can use this
imagery for detecting pools for free. Additionally, several counties in
the US also often collect their own orthoimagery (aerial imagery that
has been geometrically corrected) every one or two years. These could
be used as well. We chose to use NAIP imagery for detecting pools as it
is free, and is available throughout the US.

Discriminating between clean and green pools on the other hand,


requires more recent imagery at a higher resolution to derive
actionable insight. Nearmap and Vexcel imagery is collected much
more often and provides a much higher resolution. We chose to use
Nearmap imagery for classifying pools as clean or green, as it is
included in the Esri World Imagery basemap for the area of Redlands.

Which Bands to Use?

Satellite imagery often includes bands other than just the visible
spectrum. It might seem obvious to use all available bands for training
the model. However, there are certain advantages to using just 3 bands
and that worked quite well for us in practice.

First, the RGB bands are always available no matter which satellite /
sensor is used. In theory, we could train a model on imagery from one
satellite / sensor and deploy it on another. This strategy could also be
used for data and test time augmentation and further improve model
performance.

Second, and perhaps more importantly, we could use transfer


learning. Even though satellite images are quite different from
photographs of everyday objects, they do tend to have similar features
such as edges, shadows, curves, textures and so on. These are the lower
level features that convolutional neural networks (CNNs) first learn to
recognize. A pre-trained neural network that has been trained on over 1
million images from the ImageNet corpus already knows how to
extract such features and fine-tuning it is superior to training a new
network from scratch using just a small number of satellite images.

We initially fine-tuned the model using RGB bands from NAIP


imagery. However, the results turned out not to be so good. Our object
detector was predicting an object at the center of the image every time
and it was also missing many pools. After inspection of the training
data, we recognized a problem in the way we were chipping out the
images using Python. We didn’t realize that when selecting the location
from the shapefile the coordinates always lie at the center of the image.
This was causing the model to overfit to the center position of the
swimming pool in the test results. We changed this strategy and
introduced asymmetry while chipping out the images. Also, to avoid
missing pools that were on an edge, we also included truncated
bounding boxes that we were ignoring before. After doing these
additional data processing steps, we rechecked our results but weren’t
much impressed with them again. The validation loss was around
40.88 which was very high as compared to what we achieved in the
future models.

Results of initial model trained on RGB bands from NAIP imagery.

Next, we used NDVI (colorized) NAIP imagery from the Living


Atlas. Normalized difference vegetation index (NDVI) is computed
from the red and near-infrared bands and is often used to gauge the
health of vegetation. Upon visual inspection of this layer, the
swimming pools stand out in a bright red color, and we assumed that
the network will be able to take advantage of this fact. However, the
validation loss was around 30 and the results weren’t weren’t as good
as expected. In retrospect, it makes sense as when using NDVI, we are
losing information from one band altogether, and the neural network
has much less information to work with.
Results of the model trained on NDVI bands from NAIP imagery.

The next logical step was to use three bands but try a band
combination other than RGB. The USA NAIP Imagery: Color
Infrared uses the Near-Infrared, Red and Green bands and allows the
pools to stand out due to their being cooler than the surroundings. An
example of the NAIP Color Infrared imagery (False Color composite) is
below.

NAIP Color Infrared Imagery of Redlands

We can easily locate blue patches where the swimming pools are. We
then chipped out these images from the NAIP infrared imagery and
trained our model to finally see improved results . We were able to get
decent results with around 2,000 NAIP infrared images but the model
still made mistakes in detecting all pools. At this point, we recalled the
golden rule. We did heavy data augmentation, by taking 50 random
jitters around each pool location. Using this technique, we were able to
convert those 2,000 images to 100k images. Upon training the
complete model again, the validation loss went down to around 18. We
tried more training but the model started to overfit after that. Let’s see
some of the results after training completely on NAIP Infrared
Imagery.
Results of the model trained on infrared bands from NAIP imagery.

Inferencing

Once the model was fully trained and giving good results, we wanted to
test it out on a larger area than just the small image chips used for
training and validation. We created a script to export a larger area of
the NAIP imagery and find all pools within it. This was done by further
splitting the larger image into smaller sized chips which the model
requires. All these chips were simultaneously passed as a batch to the
model and the predictions were gathered, combined and visualized.
Below is the result of that visualization.

Pool detection in 700m x 700m area of Redlands

The whole point of this project was to do it at scale, so we decided to


run our model on an entire city using the capabilities of ArcGIS API for
Python. We took the extent of city of Redlands and exported NAIP
images from that area. We then used the simple pipeline described
above to collect predictions on all chips within each exported image.
The predictions were then converted into a feature layer by
transforming from image to geographic coordinates. A feature layer is
a grouping of similar geographic features like pools which can later be
visualized on a base map using the ArcGIS platform.

Test Time Augmentation

On observing the visualizations carefully, we found that there were still


a number of missing pools. We also noticed a strange trend that the
missing pools used to lie in a line either horizontally or vertically. After
some more analysis, we found that the pools which were missing were
at the edges of the chips. In order to overcome this issue, we performed
test time augmentation. The idea of test time augmentation is that if
we show our model a couple of slightly altered versions of the same
image, we hope that overall it will do better than if it saw just a single
image. Our main strategy was to somehow move the pools on the edges
into the center and detect those as well. Firstly, we reduced the stride
when chipping out the imagery so that no pool is left at the edge of the
chip.
source: tensorflow website

Secondly, we did predictions twice. The first on the actual chip and the
second one on a center crop from the original chip. We selected the
center crop in such a way that the pools which were positioned at the
edges of the smaller chips now started to appear at the center. This
simple strategy allowed the missing pools to be detected. Extending
this approach to five different center crops enabled us to increase the
recall (fraction of correctly detected pools over the total amount of
actual pools) without negatively affecting the precision (fraction of
correctly detected pools among all detected pools).
The mechanism of inner cropping.

Non-Max Suppression on Maps

The above augmentations at test time created multiple predictions for


the same swimming pool when we visualized on the map. We wrote a
non-maximum suppression function which would select the pool with
the highest score, when there are a bunch of pools within a specified
distance. We would suppress all pools within a range of k meters from
the pool with the highest score. We tuned this hyperparameter k and
got best results when k was 15m. Now, we could do test time
augmentation as many times and this algorithm would retain just the
best detection. The results of this step are visualized below.
Effect of non-max suppression visualized on Esri World Imagery Basemap

Using GIS to suppress false positives

Deep learning is great at what it does, but can still make silly mistakes
at times — for instance, we occasionally got false positives for pools on
freeways and as well as on the hills! Many of the false positives were of
low confidence and got filtered out, but some were high confidence
false positives, perhaps as a result of overfitting. There were false
positives specifically over large water bodies which actually contained
water and appeared blue in the NAIP imagery. We considered several
options to remove these false positives, like training the network to
detect a second class of water bodies, but that seemed to be an overkill
to solve the problem.

Since we are looking for pools in residential parcels, an easy way to


discard the false positives is to simply overlay the detected pools with
the layer of residential parcels and throw away the pools that don’t
intersect with the parcels layer. ArcGIS Online provides analysis tools
to do just that. This strategy gave even better results, shown below.
Detected pools within residential parcels

Identifying parcels with unassessed pools

Now that we had a good pool detector, we wanted to find those parcels
containing swimming pools which are not being assessed correctly.
The Join Features tool in ArcGIS Online came in handy and we were
able to create information products like feature layers of the
unassessed pools as well as web maps for visualizing the results.
Surprisingly, we were able to identify approximately 600 new pools
that were not marked correctly in the database.
The red parcels are the ones that are not being correctly assessed for having a
pool, based on our data

A web map containing the results of this analysis is


at http://arcg.is/0r0HKP and a couple results are shown below:

Result in the webmap


Clean or Green?

Once we got good results for detecting the pools, we took a step further
to classify them as clean or green (i.e. neglected pools, sometimes also
referred to as ‘zombie pools’). Green pools often contain algae and can
be breeding grounds for mosquitoes and other insects. Mosquito
Control agencies need a simple solution that helps them locate such
pools and drive field activity and mitigation efforts.

Below are the example of some clean and green pools.

We exported about half of the detected pools using recent high


resolution Nearmap imagery (7 cm resolution) and manually labeled
them as clean or green pools. The ratio of clean to green pools was very
high. For every green pool we had a 100 clean pools. It was difficult to
train on this highly unbalanced dataset. We augmented the data by
jittering around the area of the detected green pools and created 100
images in different positions around each pool, which were like this.

Augmented green pools

The above technique made our data balanced. Then we fine-tuned a


Resnet-34 classifier on it and were able to get an excellent f1-score of
97.6. Including this in our inference pipeline allowed us to get good
detections of these so-called zombie pools, as seen in this web map.
Detected green pools.

Distributed inferencing

One of the things that bothered us was the relatively long time it used
to take to do the inference. It took us approximately 10 minutes on
Google Cloud Platform to perform pool detections on the complete City
of Redlands — this could be a problem for a live demo. We then got our
hands dirty with distributed computing on GPUs. We wrote an
inference script which used python’s subprocess module to call
different GPU processes and inference on pre-downloaded chips. On a
single p2.16xlarge AWS instance, we were able to inference within 50
seconds on the entire city covering an area over 100,000 sq km. With
this speed we can detect pools in San Bernardino, which is the largest
county in the US, in under an hour.

Deployment
A primary goal of this project was to apply the latest research in deep
learning and use it to solve real-world problems, be it for updating
outdated county records or to galvanize mosquito abatement drives.

The ArcGIS platform includes a host of capabilities ranging from


online mapping, analysis, collaboration and field mobility to help
achieve these goals.

Once we had obtained the locations of the detected swimming pools, it


was relatively easy to use the analysis tools in ArcGIS Online to identify
parcels that were not being accessed correctly. The Spatial
DataFrame in ArcGIS API for Python provided an easy to
use, pandorable way for generating reports of such properties, as well
as create information products such as GIS layers that could be
visualized in web maps or used for further analysis. The map widget for
Jupyter notebooks not only enabled visualization of the detected pools
and residential parcels, but also provides renderers and symbology to
make it easy to understand the generated maps. The maps could then
be saved as web maps and shared with collaborators.
Web map with results of pool detection. See http://arcg.is/0r0HKP for an
interactive version.

Esri has also recently introduced (as beta) an Image Visit configurable
app template that lets image analysts visually inspect the results of an
object detection workflow and categorize them as correct detections or
errors. A live demo of the configured web app is here. This information
could then also be fed into better training or to filter the results and
prioritize field activities.
Image Visit app to enable visual inspection of neglected pools detected by
deep learning model.

That’s where the field mobility capabilities of the ArcGIS platform can
be put to use. Workforce for ArcGIS allows for creation of assignments
for mobile workers, such as inspectors in mosquito control agencies,
and drive field activity. We used the recently introduced apps module
in ArcGIS API for Python to automate creation of Workforce
assignments for field workers. These assignments make it easy for field
workers to stay organized, report progress, and remain productive
while conducting mosquito abatement drives based on the results of
the neglected pool detection analysis.
Pool inspection assignments for field workers in Workforce for ArcGIS

We have only scratched the surface of what the integration of deep


learning and GIS can do. This is just one application and there are
countless others waiting to be powered by these amazing technologies.
Stay tuned for more exciting stuff coming out soon!

You might also like