Project9 Report Ver2 PDF
Project9 Report Ver2 PDF
Abstract—Machine learning frameworks in conjunction with decision making in IoT systems was bottle-necked by the
hardware advancements has allowed IoT applications to support cost, latency, and power consumption of the vast majority
more data-intensive tasks. The proximity of resources not only of IoT device hardware that did not have enough processing
garners benefits in terms of latency and bandwidth, but also
allows developers to address security concerns of sensitive data capabilities to perform significant computation on the edge.
through reducing risks associated with the transport of data These devices instead streamed all data from the device to
to remote servers (i.e. bandwidth monitoring and malicious the cloud to carry out computation there. However, recent
database administrators). This is therefore especially beneficial developments in the hardware of edge devices that augment
for improving home and office surveillance systems, whose cur- IoT processing capabilities means that edge computing is
rent configurations place its subjects at risk through streaming
all recordings of them directly to a cloud-based service for experiencing a resurgence of popularity in both academic
processing. We propose a method for decreasing the amount of and consumer device spaces. Furthermore, the optimization
data sent to the cloud in this context, specifically for the task of machine learning algorithms and the increasing capabilities
of intruder detection, in order to preserve the security of video of running them on the edge has created a space for inference
data, as well as benefit from decreased latency by processing data on the edge, thus reducing the amount of data needing to be
on the edge. By utilizing caching techniques on-device, in con-
junction with Convolutional Neural Network (CNN) abstractions sent to and stored in the cloud. Putting the computation closer
for minimizing stored data per individual, we present a proof- to the data sources also potentially significantly reduces the
of-concept built on currently available commercial hardware to latency of common tasks associated with the edge, like object
experiment and evaluate our design. Our results show that facial detection in self-driving cars or face detection in surveillance
recognition done on the edge is a viable solution to the security systems, where near-instant decision-making based on real-
concerns of current surveillance networks, and also reduces the
latency of facial recognition tasks in an IoT context. time data is extremely critical.
On the consumer side, some especially promising examples
I. I NTRODUCTION of recent advances in edge hardware and ML on the edge
There are increased risks in leaking sensitive information include the Apple A12 Bionic announced in September of
associated with cloud computing, especially in the context of this year, which is the 64-bit ARM-based system on a chip
the Internet of Things (IoT). The increase in the number of (SoC) successor to the Apple A11 Bionic and is featured in
paths over which data is transported (i.e. data channels from the iPhone XS, XS MAX and XR. The chip includes dedicated
edge sensors to the cloud) and the concentration of millions neural network hardware called the “Neural Engine” that can
of users data into a small number of databases (clouds) has perform up to 5 trillion 8-bit operations per second. It is used
provided hotspots for malicious attacks, such as bandwidth for Face ID, Animoji and other machine learning tasks on
monitoring and risky administrative access. The past few device that utilize the iPhone’s cameras for computer vision-
years have seen some disastrous examples of IoT hacking and related tasks. The neural engine allows Apple to implement
vulnerabilities that arise from insecure firmware and insecure neural network and machine learning in a more energy-
internet connectivity. In 2016, the largest DDoS attack ever efficient manner than using either the main CPU or the GPU
was launched on service provider Dyn via an IoT botnet that [26]. Similarly, at the CloudNext ’18 conference this year,
used the Mirai malware to have computers continually search Google announced their Edge TPU hardware chip for the
the internet for vulnerable IoT devices. That same year, the St. deployment of ML models to the edge. They also announced
Jude Hospital’s implantable cardiac devices were discovered their Cloud IoT Edge service, which extends Google Cloud’s
to be vulnerable due to the transmitter that sends data from powerful data-processing and machine learning capabilities
the device to the cloud being insecure. to the edge by allowing users to export models trained in
One solution to this problem is to shift more computation Google Cloud to the edge [22]. This announcement only
onto the data sources at the edge, thus reducing the amount shortly followed Google’s earlier release of their commercial
of data being streamed to the cloud. Until recently, real-time Voice Kit and Vision Kit for performing natural language
processing and computer vision tasks respectively on the edge Section 3 describes the design of our system and Section 4
[21]. However, only the Voice Kit was packaged with Google goes into the details of the implementation. We evaluate the
Cloud support, leaving the Vision Kit as a standalone device implemented prototype in Section 5 and look at scalability in
without cloud connectivity. This meant putting all of the ML Section 6. Finally, we discuss the direction of our future work
model training and inference entirely on the Vision Kit’s in Section 7.
Vision Bonnet that was designed by Google and features the
Intel Movidius MA2450 vision processing unit (VPU) [27]. II. R ELATED W ORK
Earlier this year, Intel also announced support for Windows
A. Current Surveillance Systems
ML on their Movidius VPU [28]. Similarly, Microsoft Re-
search is working on resource-efficient machine learning for The number of surveillance cameras is projected to increase
IoT devices, including their ProtoNN and Bonsai algorithms by 20% worldwide every year for the next five years [32]. The
for classification, regression, and other common IoT tasks. British Security Industry Association estimates that in 2015,
These models are trained in the cloud, then can make predic- there were between 4-5.9 billion cameras in the UK alone
tions on device without the need for cloud connectivity [41]. [4]. In 2016, the global video analytics size was valued at
Another example of consumer efforts to advance the state of around USD 1.6 billion, and is trending to grow as concerns
computer vision at the edge are the Net Runner and TensorIO of safety and security of the public increase [40]. However,
open-source projects from medical AI company doc.ai. Net more cameras does not necessarily mean increased efficiency
Runner is doc.ai’s environment for prototyping and evaluating as now issues of latency and large amounts of video data come
computer vision machine learning models for IoT devices, into play. There are also many flavors of video surveillance
and ships with MobileNet image classification models [13]. systems on the market in terms of number of cameras and
TensorIO is doc.ai’s Objective-C wrapper for TensorFlow Lite, range of mobility, such as one camera systems, many camera
designed for development and deployment of ML models on systems and moving camera systems. In conjunction with
iOS devices [14]. the applications of categorization, such as object tracking,
Finally, this year marked the public release of Amazon ID re-identification and behavior analysis, there exist many
Web Service’s (AWS) DeepLens, a deep-learning enabled combinations of how video surveillance systems are used [48].
camera for developers [46]. The device features a 4 megapixel One of the oldest and most widely used type of camera
camera with up to 1080p video as well as an Intel Atom network is the Close-Circuit Television surveillance system
Processor, and ships with device-optimized versions of MXNet (CCTV). Because of limited storage volume, the majority of
and Intel’s clDNN library, with support for other deep learning these cameras use either 15 FPS or 7.5 FPS [48]. With the
frameworks. With the release of the DeepLens coinciding with introduction of IP-based cameras, the resolution and frame rate
the increasing popularity of consumer IoT home surveillance became adjustable as compressing algorithms can be used to
systems like those of the Nest [35], Ring [30] and Wink [51] reduce storage volume. These cameras also make it possible
smart cameras, it will be interesting to see whether these to view footage remotely and perform remote analytics on
devices tend towards the direction of the DeepLens in the the video data. We compare mainly with IP cameras in our
future. Especially for Ring, a company owned by Amazon that evaluation. Although our evaluation also focuses primarily on
has received consumer criticism for having cameras that are images, video surveillance networks also can carry sound and
too slow as the device relies entirely on cloud connection via GPS data [48]. For example, the Radar Video Surveillance
WiFi. These exclusively-cloud IoT systems also raise concerns System offered by Honeywell is able to detect intruders in user
regarding the privacy of stored video and image data in defined Alarm Zones using Automatic Identification System
the cloud. This month (December 2018), Facebook Research (AIS) and GPS filtering [25].
presented a paper detailing their work on ML inference at Often, the installation and hardware of the camera network
the edge [6]. Besides minimizing users’ network bandwidth, is separate from the analytics. This is the case for IP-based
inference at the edge is used by Facebook in certain Instagram cameras, where the collection of video data occurs locally and
features for real-time machine learning at image capture time. the analytics occurs remotely, allowing businesses to quickly
With these recent hardware and software trends in mind, convert their security systems to utilize the newest applications
in this paper we focus specifically on face detection and face available for analyzing video data using video management
recognition in edge-based smart camera surveillance systems. software available [18]. For instance, Axis Communications
Our main objective is to reduce risks associated with sending offers different flavors of video management software de-
and processing confidential information in a cloud-dependent pending on the use case of the customer [10]. With the
IoT surveillance system, whilst ensuring low latency and growing trend of combining both analytics and hardware as
reduced bandwidth usage by pushing more computation onto a product, companies are now looking into faster querying of
the edge. A secondary goal is to further expand on ideas video data for object detection/tracking, face recognition, and
concerning machine learning for IoT on edge networks. ID re-identification. Panasonic is set to release FacePRO, its
The rest of the paper is structured as follows. In Section 2, facial recognition system that can be paired with the available
we discuss the current state of surveillance systems, machine cameras it offers to present a service for face matching and
learning at the edge, privacy concerns and caching at the edge. search for video data [39]. However, the video data is still
transported to the cloud for processing, which is what we hope that utilizes recent advancements in deep convolutional neural
to eliminate with our design. networks (CNNs) for person detection [54]. Researchers at
the University of California, Berkeley’s RISELab are currently
B. Machine Learning on Cameras at the Edge working on ReXCam, a system built to enable at-scale video
In-part thanks to some of the aforementioned hardware inference in the context of cross-camera analytics at the edge
advancements, optimizing machine learning algorithms for the [29]. ReXCam builds a cross-camera correlation model that
edge has received recent interest in academia. The main areas encodes the locality observed in historical traffic patterns.
of interest are in performing inference at the edge in real- Again, though, the motivation here is to achieve person track-
time, and in creating ‘smart’ edge networks that learn how ing across frames and not necessarily person identification,
to adapt to certain parameters that are often in flux such as which is the focus of our work.
network bandwidth, as well as efficiently distributing tasks
across multiple devices in a network. C. Privacy in Cameras at the Edge
The topic of self-adaptive, distributed smart cameras (DSCs) With storing faces on edge IoT devices comes concerns
that perform computer vision has been long-investigated, par- regarding the accessibility of those images by potentially ma-
ticularly in the scope of video transmission to the cloud or licious actors, thus threatening the privacy of the individuals in
across local networks [42]. Today, novel camera networks the footage. The past few years have seen different approaches
are expected to perform adaptive algorithms to account for to enabling video privacy on IoT devices. Recently, Yu and
changes in the scene and objects of varying ‘importance’ to Lim et al. published their work on their Pinto system for pro-
the task at hand. In their 2015 paper, Zhang and Chowdhery et ducing privacy-protected, forgery-proof videos using low-end
al. presented a distributed wireless video surveillance system IoT cameras [52]. Pinto operates by blurring footage in real-
called Vigil that leverages edge computing for real-time track- time, thereby obscuring faces and potentially sensitive scene
ing and surveillance in enterprise campuses and retail stores elements, before the footage is sent to the cloud. Although
[53]. A similar architecture to Vigil was explored in earlier re- readily implementable in IoT devices today, this kind of video
search prototypes like IrisNet[38], Bolt[23], and IBM’s S3[47] post-processing limits the accuracy of facial recognition that
which attempted to coordinate cameras at scale. Vigil intelli- can be achieved in the cloud once the video is sent over.
gently partitions frame processing across the edge and cloud Earlier attempts at preserving privacy of video footage include
via novel video frame prioritization techniques based on the Chattopadhyay and Boult’s PrivacyCam, which uses AES
level of activity in a frame. Their preliminary results showed public-key encryption to encrypt and selectively blur detected
that reducing the amount of data being streamed over the areas of interest in a video frame [8]. However, since this
wireless network (i.e pushing more computation onto the edge) is a significantly earlier paper, PrivacyCam’s face detection
meant that the surveillance system they studied could support algorithms are potentially less accurate than what would be
a wider geographical area of coverage, between five and 200 possible with today’s CNN-based methods.
times greater than what was possible with streaming video to This year, Wang et al. at Carnegie Mellon University (CMU)
the cloud alone. The authors report that such a drastic increase published OpenFace, a new open-source face recognition
in scalability is made possible due to the fact that, in most real- system nearing state-of-the-art accuracy [50]. In fact, we
world video footage, nothing of interest is actually happening almost used their system in our work but opted for another
in a scene. Thus, it makes sense to utilize object detection or Python library. Building on OpenFace, RTFace is a mechanism
face recognition, depending on the surveillance task at-hand, to designed by Wang et al. that integrates OpenFace with inter-
limit the number of unimportant frames being sent to the cloud frame tracking to achieve real-time video denaturing of edge
and wasting bandwidth in the process. Vigil’s edge compute device video streams. Video privacy is achieved by selectively
nodes (ECNs) locally process each camera’s video feed with blurring faces in real-time, in a manner akin to the Pinto
lightweight, stateless vision models like motion and object system, but according to previously specified policies that can
detection. The Vigil algorithm also relies on geometry and be fine-tuned. Unlike in our work where all video processing
known locations of the ECNs to detect redundant viewpoints happens on the edge device itself, RTFace operates by stream-
without actually exchanging the redundant frames, as well as ing first video data to a local cloudlet and performing the
performs object re-identification to eliminate redundant views denaturing algorithm there. The term ‘cloudlet’ refers to small
of the same object across multiple camera frames. However, data-centers located near IoT devices. As Wang et al. discuss
Vigil does not store a persistent repository of unique face in their paper, prior studies have shown that denaturing on
encodings (as tied to identities) on the device, and therefore cloudlets (i.e. nearer to the devices themselves) offers greater
is not suited for facial recognition or intruder detection tasks. privacy assurances than denaturing in the cloud [50][12]. This
Rather, they use facial re-identification solely to identify in bodes well for our work, where we carry out out all video
real-time whether two faces from two different frames are the denaturing on the edge device itself, side-stepping the need to
same person or not. first transport video to an intermediary cloudlet. Furthermore,
In November of this year (2018), researchers at North we extract and store face encodings (fixed-length arrays) that
China University of Technology built an extended version are potentially more secure than images of blurred faces. In
of person re-identification across surveillance camera frames the implementation and evaluations section of this paper, we
go into further detail on how we process video streams on the types of objects might be of interest (dog, cat, person, car)
edge devices and extract face encodings. and therefore different specialized classifiers required, we are
for now only concerned with guaranteeing highly accurate
D. Caching at the Edge face recognition. Therefore we are able to store the inference
Besides concerns around the privacy of footage being stored model on device due to the fact that face recognition only
on device, another key aspect of face recognition on the edge requires one dedicated model to carry out the task. In our
is the latency associated with real-time face detection. The setup, pre-fetching specialized models from either the cloud
need that exists in current state-of-the-art systems, which is to or a nearby edge server would be overkill for the task we
offload both requests and image data to a local cloudlet or the are carrying out and would reduce unnecessary overhead. The
cloud itself, adds extra latency on top of the time it takes to vast majority today’s modern smart cameras, including all the
perform inference on an image. For real-time scenarios such as devices mentioned in the introduction of this paper, ship with
face recognition and even other tasks like people detection in enough on-device storage and memory to store a single pre-
self-driving cars, it is extremely critical that requests be served trained model on device.
with as low user-perceived latency as possible. Since those Glimpse is another real-time object recognition system for
inference results are then used to make real-time decisions, camera-equipped mobile devices (i.e not surveillance systems
late results are useless and potentially dangerous. specifically) that uses an ‘active cache’ of video frames on the
Due to the general unsuitability of the cloud for serving mobile device [9]. However, they cache entire frames as their
real-time information back to the edge device, given network focus is on object tracking, which is a significantly different
connection dependencies, leveraging resources closer to the task than face recognition. Venugopal et al. at IBM recently
device is key in order to reduce the response latency of put forth a proof-of-concept for an edge caching system that is
requests. A promising approach that is receiving recent interest similar to ours [49] (performing inference and caching feature-
is on-device content-level caching [7] [34] to store data or level identifiers on-device), however their work addresses
inference results that may be useful (e.g. the identity of object detection more broadly, whereas we focus on caching
a person), much like how a browser might cache often- in the context of face recognition specifically.
visited web pages. Caching data on the edge is not a new
III. D ESIGN
idea [3], however caching inference results for recognition
applications is novel. Drolia et al. have recently presented A. State-of-the-art workflows for face recognition at the edge
one of the first works on modelling edge-servers as caches We chose to build our system on the Amazon Web Services
for compute-intensive recognition applications [17]. They have (AWS) DeepLens. The DeepLens hardware comes with 8Gb
since built Cachier, a system for face recognition that applies of RAM, with 6G free under light load, meaning that we have
adaptive load-balancing between the cloud and a purpose-built plenty of space to build our cache in memory. AWS DeepLens
edge server cache [16]. Cachier utilizes load-balancing by also comes with a 16GB micro-SD card out-of-the-box; our
leveraging the spatiotemporal locality of requests and online device is left with 7GB free, which we intend to utilize as our
analysis of network conditions. Their initial results show a on-device back-up database of face encodings. The Intel Atom
3x speedup in responsiveness for face recognition tasks on E3930 is used for processing, which has 2 cores clocking
the edge, as compared to non-caching systems. Drolia et al. at 1.30GHz, as well as the Intel i915 graphics card which
extend the Cachier idea to build a system called PreCog, for boasts 100 billion GFLOPS. The camera used for capturing
pre-fetching image recognition classifiers from edge servers video data is 4-megapixels with MJPEG, and able to record
to perform inference on-device [15]. This kind of pre-model up to 1080p of resolution. AWS DeepLens also supports Wi-
fetching is especially applicable in a camera scenario where Fi connectivity for both 2.4 GHz and 5GHz with standard
multiple different kinds of objects are of interest besides just dual-band networking.
faces. PreCog uses Markov chains to make predictions about Our decision to use the AWS DeepLens stems from its
what data to prefetch onto to the device. Markov chains integration with AWS, which allows for fast prototyping.
are also used in Sadeghi et al.’s reinforcement-learning-based Unlike the Google Vision Kit, the DeepLens has built-in cloud
scheme for caching content in base-station units [43], although connectivity with access to the Amazon Rekognition service
clearly the domain is slightly different than IoT cameras. [44], which is a state of the art facial recognition application
PreCog’s initial results are promising, with reduced latency that operates in the cloud. It boasts high performance in the
by up to 5 and increased recognition accuracy. However, both cloud in terms of speed and accuracy, so we concluded that it
PreCog and Cachier operate on best-effort, meaning that they would serve as a worthy comparison to our on-device system.
do not provide any real-time guarantees. This is addressed by It is likely that home surveillance systems like the Ring, also
Hnetynka et al. in their paper on the importance of guaranteed owned by Amazon, will trend towards smart cameras like
latency applications in edge-cloud environments [24]. the DeepLens as hardware becomes cheaper to manufacture
In our work, we deviate from the Cachier and PreCog and more available to the public. This is why we decided to
concepts by side-stepping the dedicated edge server and in- evaluate our design using this type of device, as our targeted
stead building an in-software cache onto the edge camera use case is for office and private area settings where a fully-
itself. Unlike in the PreCog scenario, where multiple different fledged network of smart cameras is feasible.
Fig. 1. Current surveillance systems (left) and our proposed setup (right).
Currently, the AWS workflow for doing ML on edge devices an edge device would be required. Latency is critical in the
involves either storing images on device or sending images to kinds of applications like surveillance and even home security
the cloud. Both of these options are expensive, and bad for cameras, where is it critical to identify intruders as fast as
privacy concerns. Furthermore, AWS currently only supports possible.
face detection on device (i.e True or False as to whether a face
exists in the frame), and does not support face recognition on C. Modifications to perform ML at the Edge
device. Face recognition is a significantly different task, as We use the Amazon DeepLens hardware and its AWS cloud
it involves identifying whether two faces belong to the same connection to simulate cloud-independence and dependence.
person, not just whether or not a face exists in the frame. We eliminate the sending of confidential data to the cloud by
In order to do face recognition with the pre-existing AWS instead computing the task of intruder detection at the edge.
services, you must make a request from the edge device to the Confidential data is defined as real-time information of indi-
AWS cloud to use their Rekognition face recognition service, viduals that the security system is meant to protect. (i.e. video
which involves sending the image frame containing a face to streams of residents and workers in homes and office settings,
the cloud. respectively). Cloud independence is possible by making the
B. Other Benefits edge smart through running a face recognition machine learn-
ing framework, the face_recognition module powered
The AWS face detection model that runs on the DeepLens
by dlib, on the cameras generating the surveillance data.
edge device require images to be saved to the device, which
This data-heavy task is made possible through the following
can quickly become expensive as either the images increase
two techniques:
in resolution or the device sees more faces over time. This is
also bad from a privacy and security perspective, as the faces 1) Creating an on-device, software cache in memory to store
aren’t encoded so it would be relatively easy to hack into the the most recently seen people (or most frequently seen,
device and steal the face frames. depending on the cache policy).
The AWS face detection and face recognition models that 2) Storing face encodings in the cache, rather than image
run in the cloud require images of faces to be sent to and frames of faces.
from the device to the cloud, which has clear privacy concerns Technique 1 helps alleviate latency concerns, as now the
(streaming images over WiFi with little to no encryption). edge device does not have to make an expensive call to a face
Furthermore, as discussed in the section below, Amazon’s recognition service in the cloud every single time it detects
Rekognition service that operates on images of faces quickly a face in the image. Now, if a face is detected, a maximum
becomes expensive (i.e. latency increases dramatically) as of one call to the cloud is needed, if the face is one that the
more faces are requested from the service in succession. For device does not yet have stored in the cache. Once this face is
example, even the task of having Rekognition identify four stored in the cache, the face recognition can happen entirely on
faces in a frame takes considerably longer than several seconds the device, which is orders of magnitudes faster. For example,
(up to 10s), which is simply not fast enough for the kinds in situations where a security camera or system of cameras is
of situations in which quick and fast facial recognition via stationed in the same place and sees many of the same faces
Fig. 2. Edge-cloud face recognition workflow
throughout the day, caching these faces on device significantly then stored in the form of 128-length arrays of floats. These
reduces both the latency to detect a face and the bandwidth feature arrays are called ’encodings’ or ’embeddings’. The
used throughout the day that would have otherwise been spent CNN will always output feature arrays that are 128-length,
servicing requests to and from the cloud. irrespective of input image size. We store these face encodings
Technique 1 helps alleviate privacy concerns, as now the on device, instead of storing entire image frames. If a match
overall rate and number of images of faces that are sent to cannot be found for a given face encoding that the device has
and from the cloud is reduced significantly. just extrapolated from a scene, a request is sent to the cloud
Technique 2 helps alleviate latency concerns, because to identity the face.
storing just the encodings, as opposed to images, means For face recognition in the cloud, we utilize Amazon’s
that checking whether a face exists in the cache is order of cloud-based Rekognition service, which operates on images
magnitudes faster than having to re-calculate encodings every of faces. When learning how to make requests to AWS from
time a face is detected in frame. the DeepLens, we referenced Github user darwaishx’s tutorial,
Technique 2 helps alleviate privacy concerns, because it is which we are grateful for [11]. The Rekognition Collection
impossible to reverse-engineer an array of encodings back into that is used to identify familiar faces is linked to an S3
the original face (in the case of someone attacking a device database [47] we created that stores known faces as jpeg files.
in an attempt to steal its cache contents). So, in order to make a request to Amazon Rekognition, we
have to send the image of the unknown face to the cloud. On
IV. I MPLEMENTATION the device, we resize the image frame from its full-size to a
A. Edge-meets-cloud workflow considerably smaller 300x300 pixels, as this is the frame size
that the face_recognition model operates on on-device.
See Figure 2 for an overview of the workflow. A single AWS
To keep things consistent, this is also the size of the image that
Lambda function runs on the DeepLens device that handles
gets sent to the cloud (as opposed to the original frame size
the capture and display of frames, the image processing of
captured by the device). An image of this size, at the lowest
those frames to extract facial encodings, and the caching
resolution on the DeepLens (480p) results in an image whose
of those encodings. This Lambda function also facilitates
size would be, on average, 30KB. In comparison, the size of
communication with the cloud when needed, via the boto3
one face encoding array stored on-device is around 1120 bytes,
library [45], which is the AWS SDK for Python. To make
which is order of magnitudes smaller than the image size. The
face recognition as fast as possible, we have the capture and
pseudocode for our algorithm is below:
display of image frames occur in its own thread, separate
from the image processing. For face recognition on the edge
B. Creating the cache
device, we utilize the face_recognition Python module,
which recognizes and manipulates faces, built using dlibs Our cache is a software cache that is stored in memory
state-of-the-art face recognition built with deep learning. The with the code for the fastest access. The default capacity of
model has an accuracy of 99.38% on the Labeled Faces the cache is 10 entries, for 10 unique people. A cache entry
in the Wild (LFW) benchmark. The module does support consists of a {key:value} pair that represents a unique person,
GPU acceleration, however it requires nvidia’s CUDA library, where the key is the hashed name of the person (if known)
which cannot be installed on the DeepLens due to insufficient and the value is a list of face encodings for that person that the
hardware. When installing the face_recognition mod- device has seen over its lifetime. We store multiple encodings
ule on the DeepLens device, we drew inspiration from the per person to account for the various parameters that might
OneEye Deeplens hackathon project built by medcv [33]. The change across images (lighting, angle, position of face) so
face_recognition module utilizes a pre-trained CNN to as to best capture the potential different presentations of a
extract face-level features from an image of a face, which are single person. The more encodings per person we store, the
Algorithm 1 On-device recognition algorithm pseudocode
1: procedure RUN I NFINITE . Recognizes faces
2: cache ← Cache(evictionPolicy is LFU or LRU)
3: while T rue do . While camera capturing
4: locations ← getF aceLocations(f rame)
5: encodings ← getF aceEncodings(f aceLocations)
6: for face in encodings do
Fig. 3. The parameters for the cache are described above.
7: if face in cache then
8: retrieve face
9: else
10: Send face to cloud
11: if face not in cloud then
12: databaseInCloud ← add face
13: cache ← add face
more likely it is that the cache will score a hit for any given
unknown encoding we are attempting to match. However, the
length of the encoding lists is capped to ensure that we do
not store an encoding for every single frame in which the
same person is detected, as this list would quickly grow too
large and be very slow to iterate over. We experimented with
different values for the optimal encodings list size, and settled
Fig. 4. The above plot compares a pure cloud set-up camera system with two
on somewhere between 10-20 encodings per person in a cache edge-compute models.
with 10 entries (for 10 distinct people). This way, we are able
to strike a balance between the advantage of having cached
results stored close to the code, and the time it takes to linearly known encoding, we use the ‘face recognition.compare faces‘
iterate over all of the face encodings stored in the cache and function that returns True if the encodings correspond to the
call the face recognition.compare faces() function for each same person, or False if they do not.
encoding.
We implemented and tested two caching schemes, least- C. Security
recently-used (LRU) and least-frequently-used (LFU). We In addition to storing the encodings of each face instead
chose these as they made the most sense given our context of the image, the corresponding names of each individual
(surveillance at the edge). In certain situations like in homes, is cryptographically hashed using a SHA256 hash in order
offices and academic settings, it may be the case that many to minimize leaking sensitive information under an attack
faces pass by an area but that there is a consistent set of faces of a malicious attacker. In order for our caching-encoding
that appear frequently. In this case, LFU caching makes sense. system to operate, the edge device itself does not ‘care’
In other situations like in cafes and shopping malls, there is about whether the names associated with encodings have any
no concept of ’favorite faces’ that continually reappear but semantic meaning. All that matters is that the key values of
rather the same person may linger in the same area for some items in the cache are unique, so that our cache can be fully-
time before moving on to another area and never returning associative (moderate search speed, but best hit-rate). Hashing
back to the previous area. In these kinds of situations, LRU not only ensures that each key is distinct, but that individual’s
caching makes sense because once a camera no longer sees a names are protected in the instance of an information leak.
face, after a certain amount of time it is unlikely to ever see it
again. Although we implemented the LRU cache ourselves in D. Failure Tolerance
order to fine-tune certain parameters, for our LFU cache we The DeepLens comes with a 32GB SD card for additional
used the lfu cache Python module from Laurent Luce which storage once mounted. Initially, we were going to use this as a
supports O(1) deletion and insertion [31]. ’secondary cache’ to store even more face encodings. However,
Searching for a matching face in the cache consists of due to the linear fashion in which we’re currently iterating
comparing the currently unknown face encoding, grabbed from over the cache to find potential matches, it became clear via
the current frame, against each list of known encodings per some basic experimentation that using persistent storage as a
each cache item. Currently, we do this linearly by iterating secondary cache would be too slow.
over each item in the cache. However, given the relatively Instead, we opted to use the ample persistent storage to store
small cache capacity (10-20 items) and the limit on the ’backups’ of the face encodings. Since the cache is emptied
number of encodings saved per face (3-20), this process is and reset every time the device boots up, we lose all the face
actually very quick. To compare an unknown encoding to a encodings we’ve ever seen if the device should have to restart.
The benefit of storing the encodings in persistent storage is that
in the case of device failure, we can quickly recover all those
encodings rather than have to wait for the same person to come
back into frame again. This way, we don’t lose potentially vital
information about who was in a given frame at a given time.
We also don’t have to send a bunch of requests to the cloud
in order to restore the cache back to its previous state.
V. E VALUATION
As Figure 4 shows, the scaling behavior of a pure cloud
model is not only unsustainable for real-life application, but
also heavily dependent on network bandwidth. The experiment
was conducted on UC Berkeleys campus network, but relative
performance between cloud and edge devices should behave
similarly under different network conditions. Figure 3 shows
two different cache set-ups utilized in order to determine the
Fig. 5. Number of faces gained from only storing encodings instead of images
optimal configuration. Main concerns with having too many given 7GB of free memory (this number was chosen based on the amount of
encodings per person involved the delay incurred by the linear memory we had left after loading the model and code onto our device).
comparisons done for each face. The on-device storage is used
as a backup for the database of faces in case of power or other
failures. Figure 5 highlights that by only storing the encodings, In this section we present ideas on how our design might
we are able to store around 6 million unique encodings more scale through the addition of more cameras, and potentially
as opposed to storing images of faces at a given resolution of optimal configurations.
480p at 300x300 pixels. A. Parallelizing for Performance
A. Increase in Privacy As the task of intruder detection through face recognition
increases in terms of absolute individuals accounted for by
By reducing the amount of traffic sent to the cloud for the system, both in the cache and from incoming faces, a net-
processing, multiple security concerns are addressed. Consider work of communicating cameras becomes necessary to keep
the case where a constant stream of video data is sent to the performance at current standards. As previously discussed,
cloud for facial recognition processing. From network traffic, current systems have multiple cameras, with each typically
the current location of individuals appearing in the data can be set to stream data to the cloud independent of the activities
easily accessed. Although this might be desirable for CCTV of adjacent cameras. Given the linear behavior of how the
video surveillance networks, our targeted use case of intruder facial recognition program is implemented (every new face is
detection assumes that the set of people with granted access is compared to each cached encoding), we propose the utilization
known, and that the whereabouts of these individuals should of the multitude of hardware (i.e. cameras or edge servers)
not be trivially revealed regardless of intruders in the system. to parallelize our task for reduced time based on number of
As we also choose to store the image encodings of individuals cameras.
instead of pictures, the identity of each permitted person can We envision multiple sub-networks of cameras that share
also be obfuscated as faces cannot be reverse-engineered from the workload of computing, dividing and partitioning the
encodings. encodings among cameras in the same room. How this data
is partitioned, however, is greatly dependent upon the current
B. Other Benefits
number of cameras in a room and the frame of view that each
Our approach also addresses computing concerns, namely camera has. We explore two simple types of potential existing
latency and failure tolerance of the system. By storing the list camera topology and provide our recommended architecture
of encodings for permitted individuals in persistent storage, for optimal performance.
intruder detection is able to continue in the event of network
failure or powering off of a device. This is especially important B. Topology and Configurations
in rural areas where network connection is limited. Our results The two types of topologies we explore are as follows, per
have shown that although it is slower to go to persistent storage room:
first before the cloud, it is nonetheless useful to keep an on- 1) x cameras overseeing the whole room with equal cover-
device list of encodings to improve failure tolerance. age
2) one camera overseeing 70 percent of the room, x − 1
VI. S CALABILITY AND M ULTI - CAMERA N ETWORKS adjacent cameras covering hidden angles.
So far we have evaluated our system based on the behavior For the first case, we propose the network to follow a
of one device. general distributed scheme, where encodings are equally dis-
tributed among the cameras. Given n individuals stored in the For instance, these nodes could have hardware-assisted Trusted
system, each camera will be responsible for testing incoming Execution Environments (TEE) that would ensure isolation
frames to be compared to n/x encoding sets (a set being the m of the executed environment even if software on the device
encodings stored per person). Given a positive match for one is compromised [36]. As an example usage, an application
of the cameras, the camera with the matching encoding will running with Intel’s Software Guard eXtension allows for
send said encoding to all the other cameras to be cached in communication between trusted and untrusted components
memory for the reminder of the day. This reduces the number through ECalls and OCalls.
of encodings that are known by each camera to purely the
encodings being matched that day. The time component is E. Cost
flexible and needs to be further investigated for optimization The Amazon DeepLens currently retails for $249 with
purposes. AWS services charged separately depending on usage. For
For the latter case, we propose a more centralized system, our experiment, we used Amazon S3 and Amazon Lambda.
where the camera with the best coverage (preferably with For the first 50TB per month, S3 charges $.023 per GB of
sight of the entrance to the space), utilizes the processing storage [2]. AWS Lambda offers an initial free tier up to one
capabilities of the adjacent cameras, sending requests to the million requests per month, which translates to 400,000GB-
adjacent cameras as and when those backup cameras are seconds of compute time. After the free request tier expires,
needed. This leader camera, which sees the majority of traffic each subsequent one million requests gets charged $0.20 [1].
in a room, might distribute its face recognition tasks to those For this experiment, approximately $20 was used to perform
secondary cameras whilst they are not picking up any frames our evaluations. A typical video security system for a business
of interest i.e. are idle. This leader camera might also be can run around $300 per camera, with monthly monitoring
the sole facilitator of communication between the network fees of around $30 [19]. For a professional monitoring system,
of edge devices and the cloud. Further testing needs to be additional labor costs may also increase the total price. A more
done to determine whether this leader camera should store targeted system with facial recognition capabilities such as the
face encodings of its own, based on the latency of camera to Panasonic FacePRO WV-ASF950 is estimated to cost upwards
camera communication time (i.e. whether it is efficient for the of $1000 simply for the software license [39] [5].
leader to make recognition requests to the other cameras), or
if it is faster for the leader camera to perform face recognition F. Maintainability
as well. A reminder that this is only centralized at the single As the network of camera grows, the maintainability of
room scale; the edge system is still distributed in terms of our system should remain the same factoring out hardware
covering the building at large. installations. In order to update software on each edge device,
AWS Lambda helps facilitate this process by automatically
C. Locality distributing the code to each connected machine. We envision
For each target, we can consider how locality of reference a central hub that would evaluate the status of each camera
can be exploited in order to predict where individuals are through a simple heartbeat protocol.
located. If we assume the same model of a sub-network of
cameras, we examine how spatial locality can influence our VII. F UTURE W ORK
system. Given the appearance of an individual at Camera A, Our work proves that face recognition can be run on a
we can further decrease requests to the cloud if the name- simple edge device set-up. What remains to be accounted for
encoding pair is sent to cameras within close proximity of are the other factors involving scalability to completely ensure
Camera A. This preemptive action can decrease requests sent the feasibility of our system.
to the cloud per individual, and thus minimize traffic within
potentially untrusted networks. A. Intra-Camera Connectivity
A large concern of the current evaluation is how it will
D. Architecture Evaluation based on Untrusted vs Trusted scale as more cameras and individuals are added. We theorize
Networks that one way of ensuring an efficient system is introducing
Per use-case, the architecture and topology of our system camera-to-camera communication. This can not only help with
changes depending on whether the network can be trusted or decreasing traffic to the cloud as mentioned in part C of
not. In this paper, we focus primarily on a trusted camera Section VI, but also in terms of performance.
network, with an untrusted cloud network. However, if both By introducing camera connectivity, we can also introduce
were non-malicious, we can simplify our design as we would edge server nodes with a sole purpose of communicating
not have to worry about requests sent to the cloud from a with the cloud. These nodes would be designed with more
privacy perspective. For an untrusted cloud network with a security precautions in mind than camera nodes within a
trusted camera network, we envision dedicated nodes whose trusted network as they serve as the entry points for our
sole purpose is for communicating with untrusted networks. system. In order to evaluate such a system, it would be useful
These nodes will be reinforced with additional security precau- to examine the different scheduling options in order to not
tions as they serve as the entry point into our trusted network. overload one single server node as the system scales. However,
we foresee that communication with the cloud should decrease VIII. C ONCLUSION
overall as other than the initial sync operation for initializing a We introduce the current state of surveillance systems, as
new camera, intra-camera connectivity should be used instead well as look into trends of machine learning, hardware, and
to identify unknown targets upon a miss in any given camera’s privacy and caching all in the context of edge computing.
cache. We also present our implementation and evaluation of facial
recognition at the edge using caching and reduced data storage
B. Energy Consumption Evaluations
through encodings generated by the dlib library. By reducing
Our current evaluation of our system does not look at energy dependence on the cloud, we address the main concerns
consumption and the costs associated with it. The Amazon of security and privacy by reducing sensitive data sent to
DeepLens consumes 20 Watts and requires 5V and 4Amps to untrusted networks. We are also able to decrease latency
run. We would like to measure the power consumption under through our design of caching and improve storage efficiency
the workload of facial recognition and evaluate how this scales by saving encodings instead of images. Finally, we present
in terms of number of cameras, number of individuals in a a theoretical analysis into this system at scale in terms of
given time range, and the monetary cost of these operations. performance, intra-camera connectivity, and cost.
The Amazon DeepLens currently does not have an external
battery, so if power if cut off, there is no way of continuing IX. ACKNOWLEDGEMENTS
intruder detection. This introduces a major security concern We would like to thank Joey Gonzalez and Marcel
in our system as intruders can thus cut the power to avoid Neuhausler for introducing us to the issues of privacy in
detection. We would also like to analyze and compare the regards security camera streaming and to the concepts around
performance of other edge devices that are configurable with AI at the edge. We would also like to thank Paul Golding for
batteries or try to provide external sources of power to prevent his contributions of the Amazon DeepLens and AWS account
complete loss of functionality to the Amazon DeepLens and resources. Finally, we would like to thank John Kubiatowicz
observe how this scales. for teaching us advanced topics in computer systems this
semester.
C. Accuracy Analysis
The library used for facial recognition boasts an accuracy R EFERENCES
of 99.38% on the Labeled Faces in the Wild benchmark [20]. [1] Amazon Web Services, AWS Lambda Pricing, 2018.
https://aws.amazon.com/lambda/pricing/.
However, depending on the different environments in which [2] , AWS S3 Pricing, 2018. https://aws.amazon.com/s3/pricing/.
our system might implemented, there might be a need to tune [3] Ejder Bastug, Mehdi Bennis, and Merouane Debbah, Living on the
the sensitivity of the neural net. We would like to evaluate edge: The role of proactive caching in 5g wireless networks. https:
//ieeexplore.ieee.org/document/6871674.
the accuracy of the library on different sets of faces compiled [4] BBC News, CCTV: Too many cameras useless, warns surveillance
based on varying degrees of similar or distinct characteristics watchdog Tony Porter, 2015. https://www.bbc.com/news/uk-30978995/.
of facial features. For instance, in populations where facial fea- [5] BH, Panasonic FacePro 1-Channel Expansion License, 2018.
https://www.bhphotovideo.com/c/product/1395460-REG/panasonic
tures are more homogeneous, the tolerance parameter should wv asfe901w facepro 1 channel expansion license.html.
be adjusted according. Based on these different populations, [6] Kevin Chen Douglas Chen Sy Choudhury Marat Dukhan Kim Hazel-
we aim to show that the number of false positives should wood Eldad Isaac Yangqing Jia Bill Jia Tommer Leyvand Hao Lu Yang
Lu Lin Qiao Brandon Reagen Joe Spisak Fei Sun Andrew Tulloch
decrease after adjusting for irregularities in our known data Peter Vajda Xiaodong Wang Yanghan Wang Bram Wasti Yiming Wu
set. More specifically, as the list of people assigned access Ran Xian Sungjoo Yoo Peizhao Zhang Facebook Inc. Carole-Jean Wu
to an area is generally always known in an office or campus David Brooks, Machine learning at facebook: Understanding infer-
ence at the edge, 2018. https://research.fb.com/publications/machine-
scenario, the recognition program can always be tuned for learning-at-facebook-understanding-inference-at-the-edge/.
optimal performance during installation. [7] Zheng Chang, Lei Lei, Zhenyu Zhou, Shiwen Mao, and Tapani Ris-
taniemi, Learn to cache: Machine learning for network edge caching in
the big data era. http://www.eng.auburn.edu/∼szm0001/papers/Chang
D. Other Specialized Hardware Cache18.pdf.
Although the DeepLens does ship with a GPU, it does not [8] Ankur Chattopadhyay and Terrance Boult, Privacycam: a privacy pre-
serving camera using uclinux on the blackfin dsp, 200706.
support the NVIDIA CUDA toolkit [37], and so we could [9] Tiffany Yu-Han Chen, Lenin S. Ravindranath, Shuo Deng, Paramvir Vic-
not enable GPU acceleration on our model. Furthermore, tor Bahl, and Hari Balakrishnan, Glimpse: Continuous, Real-Time Object
the DeepLens does not come with any specialized vision Recognition on Mobile Devices, 13th acm conference on embedded
networked sensor systems (sensys), 2015November.
processing units (VPUs) either, so we were limited to the CPU [10] Axis Communications, Video management software. https:
only in terms of processing power. We attempted to implement //www.axis.com/products/video-management-software.
our system on the Google Vision Kit’s Intel Movidius VPU, [11] darwaishx, Deep learning with deep lens’ github tutorial. https://
github.com/darwaishx/Deep-Learning-With-Deep-Lens.
however it turns out that the C++ dlib library that powers [12] Nigel Davies, Nina Taft, Mahadev Satyanarayanan, Sarah Clinch, and
the face_recognition Python module is not compatible Brandon Amos, Privacy mediators: Helping iot cross the chasm, 201602,
with the Movidius hardware. In the future, we would like to pp. 39–44.
[13] doc.ai, Net runner by doc.ai: Machine learning on the edge, 2018.
explore other options in terms of porting our system to VPUs https://itunes.apple.com/us/app/net-runner-by-doc-ai/id1435828634?ls=
with specialized ML model support. 1&mt=8.
[14] , Tensor io: Objective-c wrapper for tensorflow lite, 2018. https: [42] Bernhard Rinner and Wayne Wolf, An introduction to distributed smart
//github.com/doc-ai/TensorIO. cameras, 200810.
[15] Utsav Drolia, Katherine Guo, and Priya Narasimhan, Precog: p refetch- [43] Alireza Sadeghi, Fatemeh Sheikholeslami, and Georgios B. Giannakis,
ing for image recog nition applications at the edge, 201710, pp. 1–13. Optimal and scalable caching for 5g using reinforcement learning of
[16] Utsav Drolia, Katherine Guo, Jiaqi Tan, Rajeev Gandhi, and Priya space-time popularities, IEEE Journal of Selected Topics in Signal
Narasimhan, Cachier: Edge-caching for recognition applications, Processing (201707).
201706, pp. 276–286. [44] Amazon Web Services, Amazon Rekognition: Easily add intelligent
[17] , Towards edge-caching for image recognition, 2017. https: image and video analysis to your applications. https://aws.amazon.com/
//ieeexplore.ieee.org/document/7917629. rekognition/.
[18] Patrick Dugan, When worlds collide: Ip-based video surveillance on an [45] , boto3: Python package for aws sdk. https:
it network, 2014. https://er.educause.edu/articles/2014/8/when-worlds- //boto3.amazonaws.com/v1/documentation/api/latest/guide/
collide-ipbased-video-surveillance-on-an-it-network. quickstart.html.
[19] fixr, Install Video Surveillance Cameras Cost, 2018. [Online; accessed [46] , Amazon web services deeplens, 2018. https://aws.amazon.com/
11-December-2018]. deeplens/.
[20] Adam Geitgey, face recognition, GitHub, 2017. [https://github.com/ [47] Ying-li Tian, Lisa M. G. Brown, Arun Hampapur, Max Lu, Andrew
ageitgey/face recognition]. Senior, and Chiao-Fe Shu, Ibm smart surveillance system (s3): Event
[21] Google, Google vision kit: Do-it-yourself intelligent camera. exper- based video surveillance system with an open and extensible framework,
iment with image recognition using neural networks., 2017. https:// Mach. Vis. Appl. 19 (200810), 315–327.
aiyprojects.withgoogle.com/vision/. [48] Vassilios Tsakanikas and Tasos Dagiuklas, Video surveillance systems-
[22] , Cloud iot edge: Deliver google ai capabilities at the edge., current status and future trends, Computers Electrical Engineering 70
2018. https://cloud.google.com/iot-edge/. (2018), 736 –753.
[23] Trinabh Gupta, Rayman Preet Singh, Amar Phanishayee, Jaeyeon Jung, [49] Srikumar Venugopal, Michele Gazzetti, Yiannis Gkoufas, and Kostas
and Ratul Mahajan, Bolt: Data management for connected homes, Nsdi, Katrinis, Shadow puppets: Cloud-level accurate AI inference at the
2014. speed and economy of edge, USENIX workshop on hot topics in edge
[24] Petr Hnetynka, Petr Kubat, Rima Al-Ali, Ilias Gerostathopoulos, and computing (hotedge 18), 2018.
Danylo Khalyeyev, Guaranteed latency applications in edge-cloud envi- [50] Junjue Wang, Brandon Amos, Anupam Das, Padmanabhan Pillai, Nor-
ronment, 201809, pp. 1–4. man Sadeh, and Mahadev Satyanarayanan, Enabling live video ana-
[25] Honeywell, Radar video surveillance (rvs) system. https: lytics with a scalable and privacy-aware framework, ACM Transac-
//www.honeywellintegrated.com/products/integrated-security/video/ tions on Multimedia Computing, Communications, and Applications 14
97630.html. (201806), 1–24.
[26] Apple Inc, A12 bionic: The smartest, most powerful chip in a smart- [51] Labs Inc. Wink, Wink: A simpler, smarter home, 2018. https://
phone., 2018. https://www.apple.com/iphone-xs/a12-bionic/. www.wink.com/products/.
[27] Intel, Intel movidius myriad vpu 2: A class-defining processor (2017). [52] Hyunwoo Yu, Jaemin Lim, Kiyeon Kim, and Suk-Bok Lee, Pinto:
https://www.movidius.com/myriad2. Enabling video privacy for commodity iot cameras, 201810, pp. 1089–
[28] , Intel and Microsoft Enable AI Inference at the Edge 1101.
with Intel Movidius Vision Processing Units on Windows ML, [53] Aakanksha Victor) Bahl Paramvir Jamieson Kyle Banerjee Suman
2018. https://www.movidius.com/news/intel-and-microsoft-enable-ai- Zhang Tan Chowdhery, The design and implementation of a wireless
inference-at-the-edge-with-intel-movidius-vis. video surveillance system., 2015. https://www.cs.princeton.edu/∼kylej/
[29] Samvit Jain, Junchen Jiang, Yuanchao Shu, Ganesh Ananthanarayanan, papers/com287-zhang.pdf.
and Joseph Gonzalez, Rexcam: Resource-efficient, cross-camera video [54] Shilin Zhang and Hangbin Yu, Person re-identification by multi-camera
analytics at enterprise scale, 2018. networks for internet of things in smart cities, IEEE Access PP (201811),
[30] Ring Labs, Ring: Video doorbells and security cameras for your smart- 1–1.
phone, 2018. https://ring.com/.
[31] Laurent Luce, lfu cache python module by laurent luce. https://
github.com/laurentluce/lfu-cache.
[32] Martinez, Michael and Cameron, Lori, Real-Time Video Ana-
lytics: The Killer App For Edge Computing (Computer, ed.),
2017. https://publications.computer.org/computer-magazine/2017/11/14/
real-time-video-analytics-for-camera-surveillance-in-edge-computing/.
[33] medcv, Oneeye deeplens hackathon project. https://github.com/medcv/
OneEyeFaceDetection.
[34] Anselme Ndikumana, Nguyen H. Tran, and Choong Seon Hong, Deep
learning based caching for self-driving car in multi-access edge com-
puting, CoRR abs/1810.01548 (2018).
[35] Nest, Nest: Create a connected home, 2018. https://nest.com/.
[36] Zhenyu Ning, Jinghui Liao, Fengwei Zhang, and Weisong Shi, Prelim-
inary study of trusted execution environments on heterogeneous edge
platforms, 201810.
[37] NVIDIA, Cuda toolkit 10.0 download — nvidia developer. https://
developer.nvidia.com/cuda-downloads.
[38] Y. Ke S. Nath P. B. Gibbons B. Karp and S. Seshan, Irisnet: An
architecture for a worldwide sensor web., 2003.
[39] Panasonic, FacePRO:Panasonic Facial Recognition System, 2018. https:
//security.panasonic.com/Face Recognition/.
[40] Grand View Research, Video Analytics Market Size Report By
Type (Software, Hardware), By Deployment (Cloud, On-premise),
By Application (Intrusion Detection, Crowd Management, Facial
Recognition), By End Use, And Segment Forecasts, 2018 - 2025, 2018.
https://www.grandviewresearch.com/industry-analysis/video-analytics-
market.
[41] Microsoft Research, Resource-efficient ml for edge and endpoint iot de-
vices, 2018. https://www.microsoft.com/en-us/research/project/resource-
efficient-ml-for-the-edge-and-endpoint-iot-devices/.