Yolactedge

YolactEdge is a real-time instance segmentation method that can run on edge devices like the NVIDIA Jetson AGX Xavier. It improves upon the existing real-time method YOLACT in two ways: (1) it applies TensorRT optimization to quantize model parameters while balancing speed and accuracy, and (2) it uses a novel feature warping module to exploit temporal redundancy in videos by transforming and propagating features across frames. Experiments show YolactEdge achieves a 3-5x speed improvement over other real-time methods while maintaining competitive mask and box detection accuracy on standard datasets. It is the first video-based real-time instance segmentation approach.

Uploaded by

Đức Anh SOne

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

62 views7 pages

Yolactedge

Uploaded by

Đức Anh SOne

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 7

YolactEdge: Real-time Instance Segmentation on the Edge

Haotian Liu∗ , Rafael A. Rivera Soto∗ , Fanyi Xiao, and Yong Jae Lee

Abstract— We propose YolactEdge, the first competitive in- parameters to fewer bits while systematically balancing any
stance segmentation approach that runs on small edge devices tradeoff in accuracy, and (2) we leverage temporal redundancy
at real-time speeds. Specifically, YolactEdge runs at up to 30.8 in video (i.e., temporally nearby frames are highly correlated),
FPS on a Jetson AGX Xavier (and 172.7 FPS on an RTX
2080 Ti) with a ResNet-101 backbone on 550x550 resolution and learn to transform and propagate features over time so that
images. To achieve this, we make two improvements to the the deep network’s expensive backbone feature computation
state-of-the-art image-based real-time method YOLACT [1]: does not need to be fully computed on every frame.
arXiv:2012.12259v2 [cs.CV] 1 Apr 2021

(1) applying TensorRT optimization while carefully trading off The proposed shift to video from static image processing
speed and accuracy, and (2) a novel feature warping module makes sense from a practical standpoint, as the real-time
to exploit temporal redundancy in videos. Experiments on
the YouTube VIS and MS COCO datasets demonstrate that aspect matters much more for video applications that require
YolactEdge produces a 3-5x speed up over existing real-time low latency and real-time response than for image applica-
methods while producing competitive mask and box detection tions; e.g., for real-time control in robotics and autonomous
accuracy. We also conduct ablation studies to dissect our driving, or real-time object/activity detection in security and
design choices and modules. Code and models are available augmented reality, where the system must process a stream
at https://github.com/haotian-liu/yolact_edge.
of video frames and generate instance segmentation outputs
I. I NTRODUCTION in real-time. Importantly, all existing real-time instance
Instance segmentation is a challenging problem that re- segmentation methods (including YOLACT) are static image-
quires the correct detection and segmentation of each object based, which makes YolactEdge the first video-dedicated
instance in an image. A fast and accurate instance segmenter real-time instance segmentation method.
would have many useful applications in robotics, autonomous In sum, our contributions are: (1) we apply TensorRT
driving, image/video retrieval, healthcare, security, and others. optimization while carefully trading off speed and accuracy,
In particular, a real-time instance segmenter that can operate (2) we propose a novel feature warping module to exploit
on small edge devices is necessary for many real-world temporal redundancy in videos, (3) we perform experiments
scenarios. For example, in safety critical applications in on the benchmark image MS COCO [9] and video YouTube
complex environments, robots, drones, and other autonomous VIS [10] datasets, demonstrating a 3-5x faster speed compared
machines may need to perceive objects and humans in real- to existing real-time instance segmentation methods while
time on device – without having access to the cloud, and in being competitive in accuracy, and (4) we publicly release our
resource constrained settings where bulky and power hungry code and models to facilitate progress in robotics applications
GPUs (e.g., Titan Xp) are impractical. However, while there that require on device real-time instance segmentation.
has been great progress in real-time instance segmentation
II. R ELATED W ORK
research [1], [2], [3], [4], [5], [6], [7], thus far, there is no
method that can run accurately at real-time speeds on small Real-time instance segmentation in images.
edge devices like the Jetson AGX Xavier. YOLACT [1] is the first real-time instance segmentation
In this paper, we present YolactEdge, a novel real-time method to achieve competitive accuracy on the challenging
instance segmentation approach that runs accurately on edge MS COCO [9] dataset. Recently, CenterMask [2],
devices at real-time speeds. Specifically, with a ResNet-101 BlendMask [5], and SOLOv2 [3] have improved accuracy
backbone, YolactEdge runs at up to 30.8 FPS on a Jetson in part by leveraging more accurate object detectors (e.g.,
AGX Xavier (and 172.7 FPS on an RTX 2080 Ti GPU), FCOS [11]). All existing real-time instance segmentation
which is 3-5x faster than existing state-of-the-art real-time approaches [1], [2], [5], [6], [3] are image-based and require
methods, while being competitive in accuracy. bulky GPUs like the Titan Xp / RTX 2080 Ti to achieve
In order to perform inference at real-time speeds on edge real-time speeds. In contrast, we propose the first video-based
devices, we build upon the state-of-the-art image-based real- real-time instance segmentation approach that can run on
time instance segmentation method, YOLACT [1], and make small edge devices like the Jetson AGX Xavier.
two fundamental improvements, one at the system-level and Feature propagation in videos has been used to improve
the other at the algorithm-level: (1) we apply NVIDIA’s speed and accuracy for video classification and video object
TensorRT inference engine [8] to quantize the network detection [12], [13], [14]. These methods use off-the-shelf
optical flow networks [15] to estimate pixel-level object
1 Fanyi Xiao is with Amazon Web Services, Inc., the rest are with the Uni-
motion and warp feature maps from frame to frame. However,
versity of California, Davis. {lhtliu, riverasoto, fyxiao,
yongjaelee}@ucdavis.edu (* Haotian Liu and Rafael A. Rivera even the most lightweight flow networks [15], [16] require
Soto are co-first authors.) non-negligible memory and compute, which are obstacles
for real-time speeds on edge devices. In contrast, our model Backbone FPN ProtoNet PredHead TensorRT mAP FPS
estimates object motion and performs feature warping directly FP32 FP32 FP32 FP32 N 29.8 6.4
FP16 FP16 FP16 FP16 N 29.7 12.1
at the feature level (as opposed to the input pixel level), which FP32 FP32 FP32 FP32 Y 29.6 19.1
enables real-time speeds. FP16 FP16 FP16 FP16 Y 29.7 21.9
INT8 FP16 FP16 FP16 Y 29.9 26.3
Improving model efficiency. Designing lightweight yet INT8 FP16 INT8 FP16 Y 29.9 26.5
performant backbones and feature pyramids has been one INT8 INT8 FP16 FP16 Y 29.7 27.7
INT8 INT8 INT8 FP16 Y 29.8 27.4
of the main thrusts in improving deep network efficiency. INT8 FP16 FP16 INT8 Y 25.4 26.2
MobileNetv2 [17] introduces depth-wise convolutions and INT8 FP16 INT8 INT8 Y 25.4 25.9
INT8 INT8 FP16 INT8 Y 25.2 26.9
inverted residuals to design a lightweight architecture for INT8 INT8 INT8 INT8 Y 25.2 26.5
mobile devices. MobileNetv3 [18], NAS-FPN [19], and
EfficientNet [20] use neural architecture search to automat- TABLE I: Effect of Mixed Precision on YOLACT [1] with a
ically find efficient architectures. Others utilize knowledge ResNet-101 backbone on the MS COCO val2017 dataset with
distillation [21], [22], [23], model compression [24], [25], or a Jetson AGX Xavier using 100 calibration images. Mixing
binary networks [26], [27]. The CVPR Low Power Computer precision across the modules results in different instance
Vision Challenge participants have used TensorRT [8], a segmentation mean Average Precision (mAP) and FPS for
deep learning inference optimizer, to quantize and speed up each instantiation of YOLACT. All results are averaged over
object detectors such as Faster-RCNN on the NVIDIA Jetson 5 runs, with a standard deviation less than 0.6 FPS.
TX2 [28]. In contrast to most of these approaches, YolactEdge
retains large expressive backbones, and exploits temporal
redundancy in video together with a TensorRT optimization ProtoNet, and (4) a Prediction Head; see Fig. 1 (right) for
for fast and accurate instance segmentation. the network architecture. (More details on YOLACT will be
provided in Sec. III-B.) The second row in Table I represents
III. A PPROACH YOLACT, with all components in FP32 (i.e., no TensorRT
Our goal is to create an instance segmentation model, optimization), and results in only 6.6 FPS on the Jetson
YolactEdge, that can achieve real-time (>30 FPS) speeds on AGX Xavier with a ResNet-101 backbone. From there, INT8
edge devices. To this end, we make two improvements to or FP16 conversion on different model components leads
the image-based real-time instance segmentation approach to various improvements in speed and changes in accuracy.
YOLACT [1]: (1) applying TensorRT optimization, and (2) Notably, conversion of the Prediction Head to INT8 (last four
exploiting temporal redundancy in video. rows) always results in a large loss of instance segmentation
accuracy. We hypothesize that this is because the final box
A. TensorRT Optimization and mask predictions require more than 28 = 256 bins to be
The edge device that we develop our model on is the encoded without loss in the final representation. Converting
NVIDIA Jetson AGX Xavier. The Xavier is equipped with every component to INT8 except for the Prediction Head and
an integrated Volta GPU with Tensor Cores, dual deep FPN (row highlighted in gray) achieves the highest FPS with
learning accelerator, 32GB of memory, and reaches up to 32 little mAP degradation. Thus, this is the final configuration
TeraOPS at a cost of $699. Importantly, the Xavier is the we go with for our model in our experiments, but different
only architecture from the NVIDIA Jetson series that supports configurations can easily be chosen based on need.
both FP16 and INT8 Tensor Cores, which are needed for In order to quantize model components to INT8 precision,
TensorRT [29] optimization. a calibration step is necessary: TensorRT collects histograms
TensorRT is NVIDIA’s deep learning inference optimizer of activations for each layer, generates several quantized
that provides mixed-precision support, optimal tensor layout, distributions with different thresholds, and compares each of
fusing of network layers, and kernel specializations [8]. A them to the reference distribution using KL Divergence [31].
major component of accelerating models using TensorRT is This step ensures that the model loses as little performance as
the quantization of model weights to INT8 or FP16 precision. possible when converted to INT8 precision. Table VIa shows
Since FP16 has a wider range of precision than INT8, it the effect of the calibration dataset size. We observe that
yields better accuracy at the cost of more computational calibration is necessary for accuracy, and generally a larger
time. Given that the weights of different deep network calibration set provides a better speed-accuracy trade-off.
components (backbone, prediction module, etc.) have different
ranges, this speed-accuracy trade-off varies from component B. Exploiting Temporal Redundancy in Video
to component. Therefore, we convert each model component The TensorRT optimization leads to a ∼4x improvement
to TensorRT independently and explore the optimal mix in speed, and when dealing with static images, this is the
between INT8 and FP16 weights that maximizes FPS while version of YolactEdge one should use. However, when dealing
preserving accuracy. with video, we can exploit temporal redundancy to make
Table I shows this analysis for YOLACT [1], which is the YolactEdge even faster, as we describe next.
baseline model that YolactEdge directly builds upon. Briefly, Given an input video as a sequence of frames {Ii }, we
YOLACT can be divided into 4 components: (1) a feature aim to predict masks for each object instance in each frame
backbone, (2) a feature pyramid network [30] (FPN), (3) a {yi = N (Ii )}, in a fast and accurate manner. For our video
Feature Pyramid
transform

Feature Backbone transform

Prediction
Head

Post-
process

Protonet

computed
transformed
previous keyframe current non-key frame not computed

Fig. 1: YolactEdge extends YOLACT [1] to video by transforming a subset of the features from keyframes (left) to non-
keyframes (right), to reduce expensive backbone computation. Specifically, on non-keyframes, we compute C3 features that are
cheap while crucial for mask prediction given its high-resolution. This largely accelerates our method while retaining accuracy
on non-keyframes. We use blue, orange, and grey to indicate computed, transformed, and skipped blocks, respectively.

instance segmentation network N , we largely follow the tasks like instance segmentation. In this work, we propose to
YOLACT [1] design for its simplicity and impressive speed- perform partial feature transforms to improve the quality of
accuracy tradeoff. Specifically, on each frame, we perform the transformed features while still maintaining a fast runtime.
two parallel tasks: (1) generating a set of prototype masks, Specifically, unlike [12], which transforms all features (P3k ,
and (2) predicting per-instance mask coefficients. Then, the P4 , P5k in our case) from a keyframe I k to a non-keyframe
k
final masks are assembled through linearly combining the I n , our method computes the backbone features for a non-
prototypes with the mask coefficients. keyframe only up through the high-resolution C3n level (i.e.,
For clarity of presentation, we decompose N into Nf eat skipping C4n , C5n and consequently P4n , P5n computation),
and Npred , where Nf eat denotes the feature backbone stage and only transforms the lower resolution P4k /P5k features
and Npred is the rest (i.e., prediction heads for class, box, from the previous keyframe to approximate P4n /P5n (denoted
mask coefficients, and ProtoNet for generating prototype as W4n /W5n ) in the current non-keyframe, as shown in Fig. 1
masks) which takes the output of Nf eat and make instance (right). It computes P6n /P7n by downsampling W5n in the
segmentation predictions. We selectively divide frames in a same way as YOLACT. With the computed C3n features and
video into two groups: keyframes I k and non-keyframes I n ; transformed W4n features, it then generates P3n as P3n = C3n +
the behavior of our model on these two groups of frames up(W4n ), where up(·) denotes upsampling. Finally, we use the
only varies in the backbone stage. P3n features to generate pixel-accurate prototypes. This way,
in contrast to [12], we can preserve high-resolution details
y k = Npred (Nf eat (I k )) (1)
for generating the mask prototypes, as the high-resolution C3
y n = Npred (N
ef eat (I n )) (2) features are computed instead of transformed and thus are
immune to errors in flow estimation.
For keyframes I k , our model computes all backbone and
pyramid features (C1 − C5 and P3 − P7 in Fig. 1). Whereas Importantly, although we compute the C1 -C3 backbone
for non-keyframes I n , we compute only a subset of the features for every frame (i.e., both key and non-keyframes),
features, and transform the rest from the temporally closest we avoid computing the most expensive part of the backbone,
previous keyframe using the mechanism that we elaborate as the computational costs in different stages of pyramid-like
on next. This way, we strike a balance between producing networks are highly imbalanced. As shown in Table II, more
accurate predictions while maintaining a fast runtime. than 66% of the computation cost of ResNet-101 lies in C4 ,
while more than half of the inference time is occupied by
Partial Feature Transform. Transforming (i.e., warping) backbone computation. By computing only lower layers of
features from neighboring keyframes was shown to be an the feature pyramid and transforming the rest, we can largely
effective strategy for reducing backbone computation to yield accelerate our method to reach real-time performance.
fast video bounding box object detectors in [12]. Specifically,
In summary, our partial feature transform design produces
[12] transforms all the backbone features using an off-the-
higher quality feature maps that are required for instance
shelf optical flow network [15]. However, due to inevitable
segmentation, while also enabling real-time speeds.
errors in optical flow estimation, we find that it fails to
provide sufficiently accurate features required for pixel-level Efficient Motion Estimation. In this section, we describe
convs convs
backbone

prediction prediction
refinement refinement
backbone
flow flow

(a) FlowNetS (b) FeatFlowNet

Fig. 2: Flow estimation. Illustration of the difference between FlowNetS [15] (a) and our FeatFlowNet (b).

C1 C2 C3 C4 C5 Stage % Stage %
# of convs 1 9 12 69 9 Backbone 54.7 FPN 6.4 Mask
R-CNN
TFLOPS 0.1 0.7 1.0 5.2 0.8 ProtoNet 7.8 Pred 10.6
% 1.5 8.7 13.2 66.2 10.3 Detect 6.6 Other 13.1
(a) ResNet-101 Backbone (b) YOLACT
YOLACT
TABLE II: Computational cost breakdown for different
stages of (a) ResNet-101 backbone, and (b) YOLACT.

Ours
how we efficiently compute flow between a keyframe and
non-keyframe. Given a non-keyframe I n and its preceding
keyframe I k , our model first encodes object motion between Fig. 3: Mask quality. Our masks are as high quality as
them as a 2-D flow field M(I k , I n ). It then uses the flow YOLACT even on non-keyframes, and are typically higher
field to transform the features F k = {P4k , P5k } from frame quality than those of Mask R-CNN [32].
I k to align with frame I n to produce the warped features
Fen = {W4n , W5n } = T (F k , M(I k , I n )).
In order to perform fast feature transformation, we need to value
P is then computed via bilinear interpolation F k→n (x) =
k
estimate object motion efficiently. Existing frameworks [12], u θ(u, x + δx)F (x), where θ is the bilinear interpolation
[13] that perform flow-guided feature transform directly adopt weight at different spatial locations.
off-the-shelf pixel-level optical flow networks for motion Loss Functions. For the instance segmentation task, we
estimation. FlowNetS [15] (Fig. 2a), for example, performs use the same losses as YOLACT [1] to train our model:
flow estimation in three stages: it first takes in raw RGB classification loss Lcls , box regression loss Lbox , mask loss
frames as input and computes a stack of features; it then Lmask , and auxiliary semantic segmentation loss Laux . For
refines a subset of the features by recursively upsampling flow estimation network pre-training, like [15], we use the
and concatenating feature maps to generate coarse-to-fine endpoint error (EPE).
features that carry both high-level (large motion) and fine
local information (small motion); finally, it uses those features IV. R ESULTS
to predict the final flow map. In this section, we analyze YolactEdge’s instance segmenta-
In our case, to save computation costs, instead of taking an tion accuracy and speed on the Jetson AGX Xavier and RTX
off-the-shelf flow network that processes raw RGB frames, 2080 Ti. We compare to state-of-the-art real-time instance
we reuse the features computed by our model’s backbone segmentation methods, and perform ablation studies to dissect
network, which already produces a set of semantically rich our various design choices and modules.
features. To this end, we propose FeatFlowNet (Fig. 2b),
Implementation details. We train with a batch size of 32 on
which generally follows the FlowNetS architecture, but in the
4 GPUs using ImageNet pre-trained weights. We leave the
first stage, instead of computing feature stacks from raw RGB
pre-trained batchnorm (bn) unfrozen and do not add any extra
image inputs, we re-use features from the ResNet backbone
bn layers. We first pre-train YOLACT with SGD for 500k
(C3 ) and use fewer convolution layers. As we demonstrate in
iterations with 5 × 10−4 initial learning rate. Then, we freeze
our experiments, our flow estimation network is much faster
YOLACT weights, and train FeatFlowNet on FlyingChairs
while being equally effective.
[33] with 2 × 10−4 initial learning rate. Finally, we fine-tune
Feature Warping. We use FeatFlowNet to estimate the flow all weights except ResNet backbone for 200k iterations with
map M(I k , I n ) between the previous keyframe I k and the 2 × 10−4 initial learning rate. When pre-training YOLACT,
current non-keyframe I n , and then transform the features we apply all data augmentations used in YOLACT; during
from I k to I n via inverse warping: by projecting each pixel fine-tuning, we disable random expand to allow the warping
x in I n to I k as x + δx, where δx = Mx (I k , I n ). The pixel module to model larger motions. For all training stages, we
Fig. 4: YolactEdge results on YouTube VIS on non-keyframes whose subset of features are warped from a keyframe 4
frames away (farthest in sampling window). Our mask predictions can tightly fit the objects, due to partial feature transform.

Method Backbone mask AP box AP RTX FPS Method Backbone mask AP box AP AGX FPS RTX FPS
Mask R-CNN [32] R-101-FPN 43.1 47.3 14.1 YOLACT [1] R-50-FPN 44.7 46.2 8.5 59.8
CenterMask-Lite [2] V-39-FPN 41.6 45.9 34.4 YolactEdge (w/o TRT) R-50-FPN 44.2 45.2 10.5 67.0
YolactEdge (w/o video) R-50-FPN 44.5 46.0 32.0 185.7
BlendMask-RT [5] R-50-FPN 44.0 47.9 49.3
YolactEdge R-50-FPN 44.0 45.1 32.4 177.6
SOLOv2-Light [3] R-50-FPN 46.3 – 43.9 YOLACT [1] R-101-FPN 47.3 48.9 5.9 42.6
YOLACT [1] R-50-FPN 44.7 46.2 59.8 YolactEdge (w/o TRT) R-101-FPN 46.9 47.8 9.5 61.2
YOLACT [1] R-101-FPN 47.3 48.9 42.6 YolactEdge (w/o video) R-101-FPN 46.9 48.4 27.9 158.2
Ours YolactEdge R-101-FPN 46.2 47.1 30.8 172.7
YolactEdge (w/o TRT) R-50-FPN 44.2 45.2 67.0
YolactEdge (w/o TRT) R-101-FPN 46.9 47.8 61.2 TABLE V: YolactEdge ablation results on Youtube VIS.
YolactEdge R-50-FPN 44.0 45.1 177.6
YolactEdge R-101-FPN 46.2 47.1 172.7

TABLE III: Comparison to state-of-the-art real-time meth- instance segmentation ground-truth masks. Since we only
ods on YouTube VIS. We use our sub-training and sub- perform instance segmentation (without tracking), we cannot
validation splits for YouTube VIS and perform joint training directly use the validation server of YouTube VIS to evaluate
with COCO using a 1:1 data sampling ratio. (Box AP is not our method. Instead, we further divide the training split into
evaluated in the authors’ code base of SOLOv2.) two train-val splits with a 85%-15% ratio (1904 and 334
videos). To demonstrate the validity of our own train-val split,
Method Backbone mask AP box AP AGX FPS RTX FPS
we created two more splits, and configured them so that any
YOLACT [1] MobileNet-V2 22.1 23.3 15.0 35.7 two splits have video overlap of less than 18%. We evaluated
YolactEdge (w/o video) MobileNet-V2 20.8 22.7 35.7 161.4
YOLACT [1] R-50-FPN 28.2 30.3 9.1 45.0
Mask R-CNN, YOLACT, and YolactEdge on all three splits,
YolactEdge (w/o video) R-50-FPN 27.0 30.1 30.7 140.3 the AP variance is within ±2.0.
YOLACT [1] R-101-FPN 29.8 32.3 6.6 36.5 We also evaluate our approach on the MS COCO [9]
YolactEdge (w/o video) R-101-FPN 29.5 32.1 27.3 124.8
dataset, which is an image instance segmentation benchmark,
TABLE IV: YolactEdge (w/o video) comparision to using the standard metrics. We train on the train2017 set and
YOLACT on MS COCO [9] test-dev split. AGX: Jetson evaluate on the val2017 and test-dev sets.
AGX Xavier; RTX: RTX 2080 Ti.
A. Instance Segmentation Results
We first compare YolactEdge to state-of-the-art real-time
methods on YouTube VIS using the RTX 2080 Ti GPU in
use cosine learning rate decay schedule, with weight decay 5×
Table III. YOLACT [1] with a R101 backbone produces the
10−4 , and momentum 0.9. We pick the first of every 5 frames
highest box detection and instance segmentation accuracy
as the keyframes. We use 100 images from the training set to
over all competing methods. Our approach, YolactEdge, offers
calibrate our INT8 model components (backbone, prototype,
competitive accuracy to YOLACT, while running at a much
FeatFlowNet) for TensorRT, and the remaining components
faster speed (177.6 FPS on a R50 backbone). Even without the
(prediction head, FPN) are converted to FP16. We do not
TensorRT optimization, it still achieves over 60 FPS for both
convert the warping module to TensorRT, as the conversion
R50 and R101 backbones, demonstrating the contribution of
of the sampling function (needed for inverse warp) is not
our partial feature transform design which allows the model
natively supported, and is also not a bottleneck for our feature
to skip a large amount of redundant computation in video.
propagation to be fast. We limit the output resolution to be a
In terms of mask quality, because YOLACT/YolactEdge
maximum of 640x480 while preserving the aspect ratio.
produce a final mask of size 138x138 directly from the
Datasets. YouTube VIS [10] is a video instance segmentation feature maps without repooling (which potentially misalign
dataset for detection, segmentation, and tracking of object the features), their masks for large objects are noticeably
instances in videos. It contains 2883 high-resolution YouTube higher quality than Mask R-CNN. For instance, in Fig. 3,
videos of 40 common objects such as person, animals, and both YOLACT and YolactEdge produce masks that follow
vehicles, at a frame rate of 30 FPS. The train, validation, the boundary of the feet of lizard and zebra, while those
and test set contain 2238, 302, and 343 videos, respectively. of Mask R-CNN have more artifacts. This also explains
Every 5th frame of each video is annotated with pixel-level YOLACT/YolactEdge’s stronger quantitative performance
#Calib. Img. mAP FPS Warp layers mAP FPS Channels mAP FPS Method mAP FPS
0 24.4 – C4 , C5 39.2 59.7 1x 47.0 48.3 w/o flow 31.8 72.5
5 29.6 27.4 P4 , P5 39.2 63.2 1/2x 46.9 53.6 FlowNetS 39.2 43.3
50 29.8 27.4 C3 , C4 , C5 37.8 59.1 1/4x 46.9 61.2 FeatFlowNet 39.2 61.2
100 29.7 27.5 P3 , P4 , P5 38.0 64.1 1/8x – 62.2
(a) INT8 calibration Effect of the (b) Partial feature transform We warp (c) FeatFlowNet We reduce chan- (d) FeatFlowNet is faster and equally
number of calibration images. P4 & P5 as it is both fast and accurate. nels for accuracy/speed tradeoff. effective compared to FlowNetS.

TABLE VI: Ablations. (a) is on COCO val2017 using YOLACT with a R101 backbone. (b-d) are YolactEdge (w/o TRT) on
our YouTube VIS sub-train/sub-val split ((b)&(d) without COCO joint training). We highlight our design choices in gray.

over Mask R-CNN on YouTube VIS, which has many large faster. If we further decrease it to 1/8, the FPS does not
objects. Moreover, our proposed partial feature transform increase by a large margin, and flow pre-training does not
allows the network to take the computed high resolution C3 converge well. As shown in Table VId, accurate flow maps
features to help generate prototypes. In this way, our method are crucial for transforming features across frames. Notably,
is less prone to artifacts brought by misalignment compared our FeatFlowNet is equally effective for mask prediction as
to warping all features (as in [12]) and thus can maintain FlowNetS [15], while being faster as it reuses C3 features for
similar accuracy to YOLACT which processes all frames pixel motion estimation (whereas FlowNetS computes flow
independently. See Fig. 4 for more qualitative results. starting from raw RGB pixels).
We next compare YolactEdge to YOLACT on the MS D. Temporal Stability
COCO [9] dataset in Table IV. Here YolactEdge is without
Finally, although YolactEdge does not perform explicit
video optimization since MS COCO is an image dataset.
temporal smoothing, it produces temporally stable masks.1 In
We compare three backbones: MobileNetv2, ResNet-50,
particular, we observe less mask jittering than YOLACT. We
and ResNet-101. Every YolactEdge configuration results
believe this is due to YOLACT only training on static images,
in a loss of AP when compared to YOLACT due to the
whereas YolactEdge utilizes temporal information in videos
quantization of network parameters performed by TensorRT.
both during training and testing. Specifically, when producing
This quantization, however, comes at an immense gain of
prototypes, our partial feature transform implicitly aggregates
FPS on the Jetson AGX and RTX 2080 Ti. For example,
information from both the previous keyframe and current non-
using ResNet-101 as a backbone results in a loss of 0.3
keyframe, and thus “averages out” noise to produce stable
mask mAP from the unquantized model but results in a
segmentation masks.
20.7/88.3 FPS improvement on the AGX/RTX. We note that
the MobileNetv2 backbone has the fastest speed (35.7 FPS V. D ISCUSSION OF L IMITATIONS
on AGX) but has a very low mAP of 20.8 when compared Despite YolactEdge’s competitiveness, it still falls behind
to the other configurations. YOLACT in mask mAP. We discuss two potential causes.
Finally, Table V shows ablations of YolactEdge. Starting a) Motion blur: We believe part of the reason lies
from YOLACT, which is equivalent to YolactEdge without in the feature transform procedure – although our partial
TensorRT and video optimization, we see that with a ResNet- feature transform corrects certain errors caused by imperfect
101 backbone, both our video and TensorRT optimizations flow maps (Table VIb), there can still be errors caused
lead to significant improvements in speed with a bit of by motion blur which lead to mis-localized detections.
degradation in mask/box mAP. The speed improvement Specifically, for non-keyframes, P4 and P5 features are
for instantiations with a ResNet-50 backbone are not as derived by transforming features of previous keyframes. It is
prominent, because video optimization mainly exploits the not guaranteed that the randomly selected keyframes are free
redundancy of computation in the backbone stage and its from motion blur. A smart way to select keyframes would
effect diminishes in smaller backbones. be interesting future work.
b) Mixed-precision conversion: The accuracy gap can
B. Which feature layers should we warp? also be attributed to mixed precision conversion – even
As shown in Table VIb, computing C3 /P3 features (rows with the optimal conversion and calibration configuration
2-3) yields 1.2-1.4 higher AP than warping C3 /P3 features (Table I,VIa), the precision gap between training (FP32) and
(rows 4-5). We choose to perform partial feature transform inference (FP16/INT8) is not fully addressed. An interesting
over P instead of C features, as there is no obvious difference direction is to explore training with mixed-precision, with
in accuracy while it is much faster to warp P features. which the model could potentially learn to compensate for
the precision loss and adapt better during inference.
C. FeatFlowNet
Acknowledgements. This work was supported in part by NSF
To encode pixel motion, FeatFlowNet takes as input C3
IIS-1751206, IIS-1812850, and AWS ML research award. We
features from the ResNet backbone. As shown in Table VIc,
thank Joohyung Kim for helpful discussions.
we choose to reduce the channels to 1/4 before it enters
FeatFlowNet as the AP only drops slightly while being much 1 See supplementary video: https://youtu.be/GBCK9SrcCLM.
R EFERENCES [28] Sergei Alyamkin, Matthew Ardi, Alexander C. Berg, Achille Brighton,
Bo Chen, Yiran Chen, Hsin-Pai Cheng, Zichen Fan, Chen Feng, Bo Fu,
[1] Daniel Bolya, Chong Zhou, Fanyi Xiao, and Yong Jae Lee. Yolact: Kent Gauen, Abhinav Goel, Alexander Goncharenko, Xuyang Guo,
real-time instance segmentation. In ICCV, 2019. Soonhoi Ha, Andrew Howard, Xiao Hu, Yuanjun Huang, Donghyun
[2] Youngwan Lee and Jongyoul Park. Centermask: Real-time anchor-free Kang, Jaeyoun Kim, Jong-gook Ko, Alexander Kondratyev, Junhyeok
instance segmentation. arXiv preprint arXiv:1911.06667, 2019. Lee, Seungjae Lee, Suwoong Lee, Zichao Li, Zhiyu Liang, Juzheng Liu,
[3] Xinlong Wang, Rufeng Zhang, Tao Kong, Lei Li, and Chunhua Shen. Xin Liu, Yang Lu, Yung-Hsiang Lu, Deeptanshu Malik, Hong Hanh
Solov2: Dynamic, faster and stronger. arXiv preprint arXiv:2003.10152, Nguyen, Eunbyung Park, Denis Repin, Liang Shen, Tao Sheng, Fei
2020. Sun, David Svitov, George K. Thiruvathukal, Baiwu Zhang, Jingchi
[4] Rufeng Zhang, Zhi Tian, Chunhua Shen, Mingyu You, and Youliang Zhang, Xiaopeng Zhang, and Shaojie Zhuo. Low-power computer
Yan. Mask encoding for single shot instance segmentation. arXiv vision: Status, challenges, opportunities. CoRR, abs/1904.07714, 2019.
preprint arXiv:2003.11712, 2020. [29] Tensorrt hardware support matrix. https://docs.nvidia.
[5] Hao Chen, Kunyang Sun, Zhi Tian, Chunhua Shen, Yongming Huang, com/deeplearning/tensorrt/support-matrix/index.
and Youliang Yan. Blendmask: Top-down meets bottom-up for instance html#hardware-precision-matrix. Accessed: 2020.
segmentation. arXiv preprint arXiv:2001.00309, 2020. [30] Tsung-Yi Lin, Piotr Dollár, Ross Girshick, Kaiming He, Bharath
[6] Daniel Bolya, Chong Zhou, Fanyi Xiao, and Yong Jae Lee. Yolact++: Hariharan, and Serge Belongie. Feature pyramid networks for object
Better real-time instance segmentation. TPAMI, 2020. detection. In CVPR, 2017.
[7] Sida Peng, Wen Jiang, Huaijin Pi, Hujun Bao, and Xiaowei Zhou. [31] Tensorrt int8 calibration. https://on-demand.
Deep snake for real-time instance segmentation. arXiv preprint gputechconf.com/gtc/2017/presentation/
arXiv:2001.01629, 2020. s7310-8-bit-inference-with-tensorrt.pdf. Accessed:
[8] Nvidia tensorrt. https://developer.nvidia.com/ 2020.
tensorrt. Accessed: 2020. [32] Kaiming He, Georgia Gkioxari, Piotr Dollár, and Ross Girshick. Mask
[9] Tsung-Yi Lin, Michael Maire, Serge Belongie, James Hays, Pietro Per- R-CNN. In ICCV, 2017.
ona, Deva Ramanan, Piotr Dollár, and C Lawrence Zitnick. Microsoft [33] A. Dosovitskiy, P. Fischer, E. Ilg, P. Häusser, C. Hazırbaş, V. Golkov,
coco: Common objects in context. In ECCV, 2014. P. v.d. Smagt, D. Cremers, and T. Brox. Flownet: Learning optical
[10] Linjie Yang, Yuchen Fan, and Ning Xu. Video instance segmentation. flow with convolutional networks. In ICCV, 2015.
In ICCV, 2019.
[11] Zhi Tian, Chunhua Shen, Hao Chen, and Tong He. Fcos: Fully
convolutional one-stage object detection. In ICCV, 2019.
[12] Xizhou Zhu, Yuwen Xiong, Jifeng Dai, Lu Yuan, and Yichen Wei.
Deep feature flow for video recognition. In CVPR, 2017.
[13] Xizhou Zhu, Yujie Wang, Jifeng Dai, Lu Yuan, and Yichen Wei. Flow-
guided feature aggregation for video object detection. In ICCV, 2017.
[14] Xizhou Zhu, Jifeng Dai, Lu Yuan, and Yichen Wei. Towards high
performance video object detection. In CVPR, 2018.
[15] Alexey Dosovitskiy, Philipp Fischer, Eddy Ilg, Philip Hausser, Caner
Hazirbas, Vladimir Golkov, Patrick Van Der Smagt, Daniel Cremers,
and Thomas Brox. Flownet: Learning optical flow with convolutional
networks. In ICCV, 2015.
[16] Deqing Sun, Xiaodong Yang, Ming-Yu Liu, and Jan Kautz. Pwc-net:
Cnns for optical flow using pyramid, warping, and cost volume. In
CVPR, 2018.
[17] Mark Sandler, Andrew G. Howard, Menglong Zhu, Andrey Zhmoginov,
and Liang-Chieh Chen. Inverted residuals and linear bottlenecks:
Mobile networks for classification, detection and segmentation. CoRR,
abs/1801.04381, 2018.
[18] Andrew Howard, Mark Sandler, Grace Chu, Liang-Chieh Chen,
Bo Chen, Mingxing Tan, Weijun Wang, Yukun Zhu, Ruoming Pang,
Vijay Vasudevan, Quoc V. Le, and Hartwig Adam. Searching for
mobilenetv3. CoRR, abs/1905.02244, 2019.
[19] Golnaz Ghiasi, Tsung-Yi Lin, Ruoming Pang, and Quoc V. Le. NAS-
FPN: learning scalable feature pyramid architecture for object detection.
CoRR, abs/1904.07392, 2019.
[20] Mingxing Tan and Quoc V. Le. Efficientnet: Rethinking model scaling
for convolutional neural networks. CoRR, abs/1905.11946, 2019.
[21] Geoffrey Hinton, Oriol Vinyals, and Jeffrey Dean. Distilling the
knowledge in a neural network. In NIPS Deep Learning and
Representation Learning Workshop, 2015.
[22] Antonio Polino, Razvan Pascanu, and Dan Alistarh. Model compression
via distillation and quantization. CoRR, abs/1802.05668, 2018.
[23] Victor Sanh, Lysandre Debut, Julien Chaumond, and Thomas Wolf.
Distilbert, a distilled version of bert: smaller, faster, cheaper and lighter.
ArXiv, abs/1910.01108, 2019.
[24] Song Han, Huizi Mao, and William J Dally. Deep compression:
Compressing deep neural networks with pruning, trained quantization
and huffman coding. ICLR, 2016.
[25] Forrest N. Iandola, Song Han, Matthew W. Moskewicz, Khalid Ashraf,
William J. Dally, and Kurt Keutzer. Squeezenet: Alexnet-level accuracy
with 50x fewer parameters and <0.5mb model size. arXiv:1602.07360,
2016.
[26] Mohammad Rastegari, Vicente Ordonez, Joseph Redmon, and Ali
Farhadi. Xnor-net: Imagenet classification using binary convolutional
neural networks. In ECCV, 2016.
[27] Adrian Bulat and Georgios Tzimiropoulos. Xnor-net++: Improved
binary neural networks. In BMVC, 2019.

Bolya YOLACT Real-Time Instance Segmentation ICCV 2019 Paper
No ratings yet
Bolya YOLACT Real-Time Instance Segmentation ICCV 2019 Paper
10 pages
YOLACT
No ratings yet
YOLACT
10 pages
3 SipMask: Spatial Information Preservation For Fast Image and Video Instance Segmentation
No ratings yet
3 SipMask: Spatial Information Preservation For Fast Image and Video Instance Segmentation
17 pages
MC 4
No ratings yet
MC 4
24 pages
Center Mask
No ratings yet
Center Mask
10 pages
1911 06667v1 PDF
No ratings yet
1911 06667v1 PDF
10 pages
Yolov10 To Its Genesis A Decadal and Comprehensive
No ratings yet
Yolov10 To Its Genesis A Decadal and Comprehensive
49 pages
Make 05 00083 v2
No ratings yet
Make 05 00083 v2
37 pages
YOLO Advances To Its Genesis: A Decadal and Comprehensive Review of The You Only Look Once (YOLO) Series
No ratings yet
YOLO Advances To Its Genesis: A Decadal and Comprehensive Review of The You Only Look Once (YOLO) Series
83 pages
Object - PAPER WORK
No ratings yet
Object - PAPER WORK
10 pages
Jetson Platform Development Guide: Definitive Reference for Developers and Engineers
From Everand
Jetson Platform Development Guide: Definitive Reference for Developers and Engineers
Richard Johnson
No ratings yet
BeagleBone Systems and Applications: Definitive Reference for Developers and Engineers
From Everand
BeagleBone Systems and Applications: Definitive Reference for Developers and Engineers
Richard Johnson
No ratings yet
Region-Based Convolutional Networks For Accurate Object Detection and Segmentation
No ratings yet
Region-Based Convolutional Networks For Accurate Object Detection and Segmentation
21 pages
P - YOLO:, Yolo 3: OLY Higher Speed More Precise Detection and Instance Segmentation For V
No ratings yet
P - YOLO:, Yolo 3: OLY Higher Speed More Precise Detection and Instance Segmentation For V
18 pages
R - Fpga 4: EAL Time Semantic Segmentation On S For Autonomous Vehicles With Hls ML
No ratings yet
R - Fpga 4: EAL Time Semantic Segmentation On S For Autonomous Vehicles With Hls ML
11 pages
(IJCST-V8I3P4) :sakshi Gupta, Dr. T. Uma Devi
No ratings yet
(IJCST-V8I3P4) :sakshi Gupta, Dr. T. Uma Devi
5 pages
PyTorch Cookbook
From Everand
PyTorch Cookbook
Matthew Rosch
No ratings yet
PyTorch Cookbook: 100+ Solutions across RNNs, CNNs, python tools, distributed training and graph networks
From Everand
PyTorch Cookbook: 100+ Solutions across RNNs, CNNs, python tools, distributed training and graph networks
Matthew Rosch
No ratings yet
Comparative Analysis of Feature Descriptors and Classifiers For Real-Time Object Detection
No ratings yet
Comparative Analysis of Feature Descriptors and Classifiers For Real-Time Object Detection
11 pages
Blitznet: A Real-Time Deep Network For Scene Understanding
No ratings yet
Blitznet: A Real-Time Deep Network For Scene Understanding
11 pages
ComSIS 17252
No ratings yet
ComSIS 17252
26 pages
gVisor Architecture and Integration: The Complete Guide for Developers and Engineers
From Everand
gVisor Architecture and Integration: The Complete Guide for Developers and Engineers
William Smith
No ratings yet
(2025-AEJ) Object Detection in Real-Time Video Surveillance Using Attention Based transformer-YOLOv8 Model
No ratings yet
(2025-AEJ) Object Detection in Real-Time Video Surveillance Using Attention Based transformer-YOLOv8 Model
14 pages
Sapkota Et Al., 2025
No ratings yet
Sapkota Et Al., 2025
28 pages
Features of Yolo11
No ratings yet
Features of Yolo11
9 pages
The Ultimate Guide to Virtual Production: Revolutionizing Filmmaking and Beyond
From Everand
The Ultimate Guide to Virtual Production: Revolutionizing Filmmaking and Beyond
Richard Frantzén
No ratings yet
Realtime Visual Recognition in Deep Convolutional Neural Networks
No ratings yet
Realtime Visual Recognition in Deep Convolutional Neural Networks
13 pages
Rust for Embedded Systems
From Everand
Rust for Embedded Systems
James Oakton
No ratings yet
2407 12040v7
No ratings yet
2407 12040v7
16 pages
Orchestrating Virtual Machines with Ignite and Kubernetes: The Complete Guide for Developers and Engineers
From Everand
Orchestrating Virtual Machines with Ignite and Kubernetes: The Complete Guide for Developers and Engineers
William Smith
No ratings yet
Motion Estimation: Advancements and Applications in Computer Vision
From Everand
Motion Estimation: Advancements and Applications in Computer Vision
Fouad Sabry
No ratings yet
JMockit in Practice: Definitive Reference for Developers and Engineers
From Everand
JMockit in Practice: Definitive Reference for Developers and Engineers
Richard Johnson
No ratings yet
TinyGo for Embedded Systems and WebAssembly: The Complete Guide for Developers and Engineers
From Everand
TinyGo for Embedded Systems and WebAssembly: The Complete Guide for Developers and Engineers
William Smith
No ratings yet
YOLOv12 - A Breakdown of The Key Architectural Features
No ratings yet
YOLOv12 - A Breakdown of The Key Architectural Features
9 pages
Object Detection History 1707305921
No ratings yet
Object Detection History 1707305921
9 pages
EdgeYOLO AnEdge-Real-Time Object Detector
No ratings yet
EdgeYOLO AnEdge-Real-Time Object Detector
7 pages
Futureinternet 16 00050 v2
No ratings yet
Futureinternet 16 00050 v2
14 pages
Programming NodeMCU for IoT Applications: Definitive Reference for Developers and Engineers
From Everand
Programming NodeMCU for IoT Applications: Definitive Reference for Developers and Engineers
Richard Johnson
No ratings yet
A Comprehensive Review of Modern Object Segmentation Approaches
No ratings yet
A Comprehensive Review of Modern Object Segmentation Approaches
177 pages
Presentation1 FINAL 1
No ratings yet
Presentation1 FINAL 1
11 pages
AR Yolo 12: A - B E - P V: Eview of V Ttention Ased Nhancements VS Revious Ersions
No ratings yet
AR Yolo 12: A - B E - P V: Eview of V Ttention Ased Nhancements VS Revious Ersions
18 pages
Smart Camera: Revolutionizing Visual Perception with Computer Vision
From Everand
Smart Camera: Revolutionizing Visual Perception with Computer Vision
Fouad Sabry
No ratings yet
KubeVirt CDI in Practice: The Complete Guide for Developers and Engineers
From Everand
KubeVirt CDI in Practice: The Complete Guide for Developers and Engineers
William Smith
No ratings yet
Real Time Sign Language Detection
No ratings yet
Real Time Sign Language Detection
6 pages
Comprehensive Guide to Mbed Development: Definitive Reference for Developers and Engineers
From Everand
Comprehensive Guide to Mbed Development: Definitive Reference for Developers and Engineers
Richard Johnson
No ratings yet
La Cci
No ratings yet
La Cci
6 pages
U - Net: Going Deeper With Nested U-Structure For Salient Object Detection
No ratings yet
U - Net: Going Deeper With Nested U-Structure For Salient Object Detection
15 pages
cs231n 2018 ds06
No ratings yet
cs231n 2018 ds06
38 pages
Real-Time Object Detection Using SSD MobileNet Mod
No ratings yet
Real-Time Object Detection Using SSD MobileNet Mod
6 pages
An Investigation of Deep Neural Network Based Techniques For Object Detection An
No ratings yet
An Investigation of Deep Neural Network Based Techniques For Object Detection An
6 pages
YOLOE: Real-Time Seeing Anything: Ao Wang Lihao Liu Hui Chen Zijia Lin Jungong Han Guiguang Ding Tsinghua University
No ratings yet
YOLOE: Real-Time Seeing Anything: Ao Wang Lihao Liu Hui Chen Zijia Lin Jungong Han Guiguang Ding Tsinghua University
15 pages
Revised
No ratings yet
Revised
8 pages
2021 ICPR FASSDNet
No ratings yet
2021 ICPR FASSDNet
8 pages
1 s2.0 S026288562300197X Main
No ratings yet
1 s2.0 S026288562300197X Main
11 pages
Du 2018 J. Phys. Conf. Ser. 1004 012029
No ratings yet
Du 2018 J. Phys. Conf. Ser. 1004 012029
9 pages
Advanced Techniques in GSAP Animation: Definitive Reference for Developers and Engineers
From Everand
Advanced Techniques in GSAP Animation: Definitive Reference for Developers and Engineers
Richard Johnson
No ratings yet
Contiki Operating System for Embedded IoT: Definitive Reference for Developers and Engineers
From Everand
Contiki Operating System for Embedded IoT: Definitive Reference for Developers and Engineers
Richard Johnson
No ratings yet
Yolo1 11
No ratings yet
Yolo1 11
38 pages
SAIL-VOS: Semantic Amodal Instance Level Video Object Segmentation - A Synthetic Dataset and Baselines
No ratings yet
SAIL-VOS: Semantic Amodal Instance Level Video Object Segmentation - A Synthetic Dataset and Baselines
11 pages
Object Detection Using Tensorflow....
No ratings yet
Object Detection Using Tensorflow....
9 pages
PCS - Process Control System ILTIS-PCS - Sistema Control de Procesos
No ratings yet
PCS - Process Control System ILTIS-PCS - Sistema Control de Procesos
9 pages
Ibme
No ratings yet
Ibme
4 pages
Mobile Networks: Hamid Reza Bolhasani
No ratings yet
Mobile Networks: Hamid Reza Bolhasani
58 pages
Exhibit A-Project Scope
No ratings yet
Exhibit A-Project Scope
13 pages
Qpir 09022024 0
No ratings yet
Qpir 09022024 0
178 pages
Radix VISO Mobile Device Management Overview 09-01-2019
No ratings yet
Radix VISO Mobile Device Management Overview 09-01-2019
13 pages
230809-4 - New Report On
No ratings yet
230809-4 - New Report On
31 pages
Motor Protection Relay REM 610 REM 610: Operator's Manual
No ratings yet
Motor Protection Relay REM 610 REM 610: Operator's Manual
68 pages
1747 UIC Procedure
No ratings yet
1747 UIC Procedure
7 pages
1x Q1 Written Work No. 2 - Attempt Review
No ratings yet
1x Q1 Written Work No. 2 - Attempt Review
7 pages
RISC V VectorExtension 1 1
No ratings yet
RISC V VectorExtension 1 1
72 pages
IWP VIT Syllabus
No ratings yet
IWP VIT Syllabus
11 pages
Exploring Abstract Algebra With Mathematica PDF
100% (1)
Exploring Abstract Algebra With Mathematica PDF
2 pages
Contents - Programming With Node-RED
100% (1)
Contents - Programming With Node-RED
8 pages
Cloud Enterprise Architecture
No ratings yet
Cloud Enterprise Architecture
3 pages
(AgileCMMI) Practical Report: CMMI Measurements and Analysis in Agile Environment
50% (2)
(AgileCMMI) Practical Report: CMMI Measurements and Analysis in Agile Environment
12 pages
Project Report SPM
No ratings yet
Project Report SPM
31 pages
Agile Scrum Mastery - Course Slides
No ratings yet
Agile Scrum Mastery - Course Slides
38 pages
Ripemd160 256
No ratings yet
Ripemd160 256
6 pages
L13 - Business Process Management Perspective
100% (2)
L13 - Business Process Management Perspective
76 pages
Grandstream DP752 - DP730 - DP722 - Administration - Guide
No ratings yet
Grandstream DP752 - DP730 - DP722 - Administration - Guide
119 pages
Data Structures Mcqs
No ratings yet
Data Structures Mcqs
27 pages
Samsung 1TB 970 PRO v-NAND SSD - Jumia - Com.ng
No ratings yet
Samsung 1TB 970 PRO v-NAND SSD - Jumia - Com.ng
3 pages
REPORT WRITING SKILLS Assignment 2
No ratings yet
REPORT WRITING SKILLS Assignment 2
7 pages
s7200 Data Sheet
No ratings yet
s7200 Data Sheet
3 pages
Verifying Employees On LinkedIn Company Pages - Maverrik
No ratings yet
Verifying Employees On LinkedIn Company Pages - Maverrik
5 pages
These
No ratings yet
These
170 pages
Ipc Hfw2249tl S Pro s0 Datasheet 20241101
No ratings yet
Ipc Hfw2249tl S Pro s0 Datasheet 20241101
4 pages
ICT Computer Support Technician 12-11 JD Ps SC 3 Feb 11
No ratings yet
ICT Computer Support Technician 12-11 JD Ps SC 3 Feb 11
3 pages
Undercarriage Inspection Service Undercarriage Inspection Service
No ratings yet
Undercarriage Inspection Service Undercarriage Inspection Service
2 pages

Yolactedge

Uploaded by

Yolactedge

Uploaded by

YolactEdge: Real-time Instance Segmentation on the Edge

Feature Backbone transform

(a) FlowNetS (b) FeatFlowNet

You might also like