# CondLaneNet A Top-To-Down Lane Detection Framework Based On Conditional Convolution
# CondLaneNet A Top-To-Down Lane Detection Framework Based On Conditional Convolution
Convolution
Abstract
Vertical range
Proposal Convolution
Linear
heatmap
Conditional
…
Location Offset
…
maps convolution
Parameter maps
map … … Kenerl parameters
RIM RIM RIM Conditional for location map
Conditional
Input image convolution convolution Kenerl parameters
Shared branch for offset map
Figure 2. The structure of our CondLaneNet framework. The backbone adopts standard ResNet [8] and FPN [23] for multi-scale feature
extraction. The transformer encoder module [37] is added for more efficient context feature extraction. The proposal head is responsible
for detecting the proposal points which are located at the start point of the line. Meanwhile, a parameter map that contains the dynamic
convolution kernels is predicted. The conditional shape head predicts the row-wise location, the vertical range, and the offset map to
describe the shape for each line. To address the cases of dense lines and fork lines, the RIM is designed.
lines expressed by curve equation. PolyLaneNet [31] firstly Step 1: Instance detection Step 2: Shape prediction
proposed to use a deep network to regress the lane curve
equation. LSTR [25] introduced transformer [37] to lane
detection task and get 420fps detection speed. However, Instance 1 Instance 2
Instance 1 Instance 3
the parametric prediction methods have not surpassed other Instance 2
methods in terms of accuracy. Instance 3
a. Conditional Instance Segmentation
3. Methods Step 1: Instance detection Step 2: Shape prediction
C×H×W
Given an input image I ∈ R , the goal of
our CondLaneNet is to predict a collection of lanes L =
Instance 1 Instance 2
{l1 , l2 , ..., lN }, where N is the total number of lanes. Gen-
erally, each lane lk is represented by an ordered set of coor-
dinates as follows. Instance 1 Instance 2 Instance 3&4 Instance 4
Instance 3
b. Conditional Lane Detection
lk = [(xk1 , yk1 ), (xk2 , yk2 ), ..., (xkNk , ykNk )] (1)
Figure 3. The difference between conditional instance segmen-
Where k is the index of lane and Nk is the max number of tation and the proposed conditional lane detection strategy. Our
sample points of the kth lane. CondLaneNet detects the start point of the lane lines to detect the
The overall structure of our CondLaneNet is shown in instance and uses the row-wise formulation to describe the line
Figure 2. This section will first present the conditional shape instead of the mask. The overlapping lines can be distin-
lane detection strategy, then introduce the RIM(Recurrent guished based on the proposed RIM, which will be detailed in
Section 3.2.
Instance Module), and finally detail the framework design.
3.1. Conditional Lane Detection
Focusing on the instance-level discrimination ability, we
propose the conditional lane detection strategy based on This strategy has achieved impressive performance on
conditional convolution – a convolution operation with dy- instance segmentation tasks [35, 38]. However, directly
namic kernel parameters [14, 40]. The conditional detec- applying the conditional instance segmentation strategy to
tion process [35, 38] has two steps: instance detection and lane detection is blunt and inappropriate. On the one
shape prediction, as is shown in Figure 3. The instance hand, the segmentation-based shape description is ineffi-
detection step predicts the object instance and regresses a cient for lane lines due to the excessively high degree of
set of dynamic kernel parameters for each instance. In the freedom [30]. On the other hand, the instance detection
shape prediction step, conditional convolutions are applied strategy for general objects is not suitable for slender and
to specify the instance shape. This process is conditioned curved objects due to the inconspicuous visual character-
on the dynamic kernel parameters. Since each instance cor- istic of the border and the central. Our conditional lane
responds to a set of dynamic kernel parameters, the shapes detection strategy improves shape prediction and instance
can be predicted instance-wisely. detection to address the above problems.
3.1.1 Shape Prediction In the training phase, L1-loss is applied.
We improve the row-wise formulation [30] to predict the 1 X
line shape based on our conditional shape head, as is shown `row = |E(x̂i ) − xi | (4)
Nv
i∈V
in Figure 2. In the row-wise formulation, we predict the
lane location on each row and then aggregate the locations Where V represents the vertical range of the labeled line,
to get the lane line in the order from bottom to top, based Nv is the number of valid rows.
on the prior of the line shape. Our row-wise formulation has
three components: the row-wise location, the vertical range,
Vertical Range The vertical lane range is determined by
and the offset map. The first two outputs are basic elements
row-wisely predicting whether the lane line passes through
for most row-wise detection methods [30, 41]. Besides, we
the current row, as is shown in Figure 4. We add a linear
predict an offset map as the third output for further refine-
layer and perform binary-classification row by row. We use
ment.
the feature vector of each row in the location map as the
𝑋 input. The softmax-cross-entropy loss is adopted to guide
the training process.
Linear
𝑌 X
i i
Positive `range = (−ygt log(vi ) − (1 − ygt )log(1 − vi )) (5)
Negetive i
Figure 7. Visualization results on CurveLanes(the first row), CULane(the middle row) and TuSimple(the last row) datasets. Different lane
instances are represented by different colors.
SOTA). Since our model can deal with cases of the fork and LaneATT-S, CondLaneNet-S achieves a 4.01 % F1 score
dense lane lines, there is a significant improvement in the improvement with similar efficiency. In most scenarios of
recall indicator. Correspondingly, false-positive results will CULane, the small version of our CondLaneNet exceeds all
increase, resulting in a decrease in the precision indicator. previous methods in the F1 measure.
CULane The results of our CondLaneNet and other state- Tusimple The results on TuSimple are shown in Table
of-the-art methods on CULane are shown in Tabel 4. Our 5. Relatively, the gap between different methods on this
method achieves a new state-of-the-art result of a 79.48 dataset is smaller, due to the smaller amount of data and
F1 score, which has increased by 3.19%. Moreover, our more single scenes. Our method achieves a new state-of-
method achieves the best performance in eight of nine sce- the-art F1 score of 97.24. Besides, the small version of our
narios, showing robustness to different scenarios. For some method gets a 97.01 F1 score with 220 FPS.
hard cases such as curve and night, our methods have ob-
4.3. Ablation Study of Improvement Strategies
vious advantages. Besides, the small version of our Cond-
LaneNet gets a 78.14 F1 score with a speed of 220 FPS, 1.12 We performed ablation experiments on the CurveLanes
higher and 8.5× speed than LaneATT-L. Compared with dataset based on the small version of our CondLaneNet.
Category Total Normal Crowded Dazzle Shadow No line Arrow Curve Cross Night FPS GFlops(G)
SCNN [28] 71.60 90.60 69.70 58.50 66.90 43.40 84.10 64.40 1990 66.10 7.5 328.4
ERFNet-E2E [41] 74.00 91.00 73.10 64.50 74.10 46.60 85.80 71.90 2022 67.90
FastDraw [29] 85.90 63.60 57.00 69.90 40.60 79.40 65.20 7013 57.80 90.3
ENet-SAD [12] 70.80 90.10 68.80 60.20 65.90 41.60 84.00 65.70 1998 66.00 75 3.9
UFAST-ResNet34 [30] 72.30 90.70 70.20 59.50 69.30 44.40 85.70 69.50 2037 66.70 175.0
UFAST-ResNet18 [30] 68.40 87.70 66.00 58.40 62.80 40.20 81.00 57.90 1743 62.10 322.5
ERFNet-IntRA-KD [11] 72.40 100.0
CurveLanes-NAS-S [39] 71.40 88.30 68.60 63.20 68.00 47.90 82.50 66.00 2817 66.20 9.0
CurveLanes-NAS-M [39] 73.50 90.20 70.50 65.90 69.30 48.80 85.70 67.50 2359 68.20 35.7
CurveLanes-NAS-L [39] 74.80 90.70 72.30 67.70 70.10 49.40 85.80 68.40 1746 68.90 86.5
LaneATT-Small [32] 75.13 91.17 72.71 65.82 68.03 49.13 87.82 63.75 1020 68.58 250 9.3
LaneATT-Medium [32] 76.68 92.14 75.03 66.47 78.15 49.39 88.38 67.72 1330 70.72 171 18.0
LaneATT-Large [32] 77.02 91.74 76.16 69.47 76.31 50.46 86.29 64.05 1264 70.81 26 70.5
CondLaneNet-Small 78.14 92.87 75.79 70.72 80.01 52.39 89.37 72.40 1364 73.23 220 10.2
CondLaneNet-Medium 78.74 93.38 77.14 71.17 79.93 51.85 89.89 73.88 1387 73.92 152 19.6
CondLaneNet-Large 79.48 93.47 77.44 70.93 80.91 54.13 90.16 75.21 1201 74.80 58 44.8
Table 4. Comparison of different methods on CULane.
Method F1 Acc FP FN FPS GFLOPS Model Small Medium Large
SCNN [28] 95.97 96.53 6.17 1.80 7.5 Target P. point Line P. point Line P. point Line
EL-GAN [6] 96.26 94.90 4.12 3.36 10.0 Standard 88.35 85.09 88.99 85.92 89.54 86.10
PINet [19] 97.21 96.70 2.94 2.63
S. w/o encoder 85.51 82.97 88.68 85.91 89.33 85.98
LineCNN [22] 96.79 96.87 4.42 1.97 30.0
PointLaneNet [2] 95.07 96.34 4.67 5.18 71.0
Hacked 88.05 84.39 88.90 85.93 89.37 85.99
ENet-SAD [12] 95.92 96.64 6.02 2.05 75.0 Table 7. Ablation study of the transformer encoder module on
ERF-E2E [41] 96.25 96.02 3.21 4.28 CurveLanes.
FastDraw [29] 93.92 95.20 7.60 4.50 90.3
UFAST-ResNet34 [30] 88.02 95.86 18.91 3.75 169.5
UFAST-ResNet18 [30] 87.87 95.82 19.05 3.92 312.5
PolyLaneNet [31] 90.62 93.36 9.42 9.33 115.0 0.9 ing the proposal points and then predicts the shape for each
LSTR [25] 96.86 96.18 2.91 3.38 420 0.3 instance. The accuracy of the proposal points greatly af-
LaneATT-ResNet18 [32] 96.71 95.57 3.56 3.01 250 9.3 fects the final accuracy of the lane lines. We design differ-
LaneATT-ResNet34 [32] 96.77 95.63 3.53 2.92 171 18.0 ent control groups to compare the accuracy of the proposal
LaneATT-ResNet122 [32] 96.06 96.10 5.64 2.17 26 70.5
CondLaneNet-S 97.01 95.48 2.18 3.80 220 10.2
points and lane lines on CurveLanes. We define the pro-
CondLaneNet-M 96.98 95.37 2.20 3.82 154 19.6 posal points which locate in the eight neighborhoods of the
CondLaneNet-L 97.24 96.54 2.01 3.50 58 44.8 groundtruth points as the true-positive samples. Consider-
Table 5. Comparison of different methods on TuSimple. ing the function of RIM, the proposal point corresponding
to multiple lines are regarded as multiple different proposal
points. We report the F1 score of the proposal points and
The results are shown in Tabel 6. We take the lane detec- lane lines, as is shown in Table 7.
tion model based on the original conditional instance seg-
mentation strategy [35, 38] (as is shown in Figure 3.a) as The first row shows the results of the small, medium and
the baseline. The first row shows the results of the baseline. large versions of the standard CondLaneNet. In the sec-
In the second row, the proposed conditional lane detection ond row, the transformer encoder is removed. In the third
strategy is applied and the lane mask expression is replaced row, we hack the inference process of the second row by
by the row-wise formulation(as is shown in 3.b). In the replacing the proposal heatmap with the proposal heatmap
third row, the offset map for post-refinement is added. In output by the standard model(the first row). For the small
the fourth row, the transformer encoder is added and the version, removing the encoder leads to a significant drop
offset map is removed. The fifth row presents the result of for both proposal points and lanes. However, using the pro-
the model with the row-wise formulation, the offset map, posal heatmap of the standard model, the results on the third
and the transformer encoder. In the last row, RIM is added. row are close to the first row.
The above results prove that the function of the encoder
Baseline Row-wise Offset Encoder RIM F1 score is mainly to improve the detection of the proposal points,
√
72.19
√ which rely on contextual features and global information.
√ √ 80.09(+7.9)
81.24(+9.05) Besides, the contextual features can be more fully refined
√ √
81.85(+9.66) in deeper networks. Therefore, for the medium and large
√ √ √
√ √ √ √ 83.41(+11.22) versions, the improvement of the encoder is far less than
85.09(+12.90) the small version.
Table 6. Ablation study of the improvement strategies on Curve-
Lanes base on the small version of our CondLaneNet.
5. Conclusion
Comparing the first two rows, we can see that the pro-
posed conditional lane detection strategy has significantly In this work, We proposed CondLaneNet, a novel top-
improved the performance. Comparing the results of the to-down lane detection framework that detects the lane in-
2nd and the 3rd row, the 4th and the 5th row, we can see the stances first and then instance-wisely predict the shapes.
positive effect of the offset map. Moreover, the transformer Aiming to resolve the instance-level discrimination prob-
encoder plays a vital role in our framework, which can be lem, we proposed the conditional lane detection strategy
indicated by comparing the 2nd and the 4th row, the 3rd and based on conditional convolution and row-wise formula-
the 5th row. Besides, RIM designed for the fork lines and tion. Moreover, we designed RIM to cope with complex
dense lines also improves the accuracy. lane line topologies such as dense lines and fork lines. Our
CondLaneNet framework refreshed the state-of-the-art per-
4.4. Ablation Study of Transformer Encoder
formance on CULane, CurveLanes, and TuSimple. More-
This section further analyzes the function of the trans- over, on CULane and CurveLanes, the small version of our
former encoder which indicates a vital role in the previous CondLaneNet not only surpassed other methods in accu-
experiments. Our method first detects instances by detect- racy, but also presented real-time efficiency.
References [15] Ruyi Jiang, Reinhard Klette, Tobi Vaudrey, and Shigang
Wang. New lane model and distance transform for lane de-
[1] Amol Borkar, Monson Hayes, and Mark T Smith. Robust tection and tracking. In Proceedings of the International
lane detection and tracking with ransac and kalman filter. In Conference on Computer Analysis of Images and Patterns,
Proceedings of the IEEE International Conference on Image pages 1044–1052, 2009.
Processing (ICIP), pages 3261–3264, 2009.
[16] Yan Jiang, Feng Gao, and Guoyan Xu. Computer vision-
[2] Zhenpeng Chen, Qianfei Liu, and Chenfan Lian. Point- based multiple-lane detection on straight road and in a curve.
lanenet: Efficient end-to-end cnns for accurate real-time lane In Proceedings of the International Conference on Image
detection. In IEEE Intelligent Vehicles Symposium (IV), Analysis and Signal Processing, pages 114–117, 2010.
pages 2563–2568, 2019. [17] ZuWhan Kim. Robust lane detection and tracking in chal-
[3] Shriyash Chougule, Nora Koznek, Asad Ismail, Ganesh lenging scenarios. IEEE Transactions on Intelligent Trans-
Adam, Vikram Narayan, and Matthias Schulze. Reliable portation Systems, 9(1):16–26, 2008.
multilane detection and classification by utilizing cnn as a [18] Diederick P Kingma and Jimmy Ba. Adam: A method for
regression network. In Proceedings of the European Confer- stochastic optimization. In Proceedings of the International
ence on Computer Vision (ECCV) Workshops, 2018. Conference on Learning Representations (ICLR), 2015.
[4] Bert De Brabandere, Davy Neven, and Luc Van Gool. [19] Yeongmin Ko, Jiwon Jun, Donghwuy Ko, and Moongu Jeon.
Semantic instance segmentation with a discriminative loss Key points estimation and point instance segmentation ap-
function. arXiv preprint arXiv:1708.02551, 2017. proach for lane detection. arXiv preprint arXiv:2002.06604,
[5] Kaiwen Duan, Song Bai, Lingxi Xie, Honggang Qi, Qing- 2020.
ming Huang, and Qi Tian. Centernet: Keypoint triplets for [20] Hei Law and Jia Deng. Cornernet: Detecting objects as
object detection. In Proceedings of the IEEE International paired keypoints. In Proceedings of the European Confer-
Conference on Computer Vision, pages 6569–6578, 2019. ence on Computer Vision (ECCV), pages 734–750, 2018.
[6] Mohsen Ghafoorian, Cedric Nugteren, Nóra Baka, Olaf [21] Seokju Lee, Junsik Kim, Jae Shin Yoon, Seunghak
Booij, and Michael Hofmann. El-gan: Embedding loss Shin, Oleksandr Bailo, Namil Kim, Tae-Hee Lee, Hyun
driven generative adversarial networks for lane detection. In Seok Hong, Seung-Hoon Han, and In So Kweon. Vpgnet:
Proceedings of the European Conference on Computer Vi- Vanishing point guided network for lane and road marking
sion (ECCV) Workshops, 2018. detection and recognition. In Proceedings of the IEEE Inter-
[7] Kaiming He, Georgia Gkioxari, Piotr Dollár, and Ross Gir- national Conference on Computer Vision, pages 1947–1955,
shick. Mask r-cnn. In Proceedings of the IEEE International 2017.
Conference on Computer Vision, pages 2961–2969, 2017. [22] Xiang Li, Jun Li, Xiaolin Hu, and Jian Yang. Line-cnn:
[8] Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. End-to-end traffic line detection with line proposal unit.
Deep residual learning for image recognition. In Proceed- IEEE Transactions on Intelligent Transportation Systems,
ings of the IEEE Conference on Computer Vision and Pattern 21(1):248–258, 2019.
Recognition, pages 770–778, 2016. [23] Tsung-Yi Lin, Piotr Dollár, Ross Girshick, Kaiming He,
[9] Sepp Hochreiter and Jürgen Schmidhuber. Long short-term Bharath Hariharan, and Serge Belongie. Feature pyramid
memory. Neural computation, 9(8):1735–1780, 1997. networks for object detection. In Proceedings of the IEEE
[10] Namdar Homayounfar, Wei-Chiu Ma, Justin Liang, Xinyu Conference on Computer Vision and Pattern Recognition,
Wu, Jack Fan, and Raquel Urtasun. Dagmapper: Learning pages 2117–2125, 2017.
to map by discovering lane topology. In Proceedings of the [24] Guoliang Liu, Florentin Wörgötter, and Irene Markelić.
IEEE International Conference on Computer Vision, pages Combining statistical hough transform and particle filter for
2911–2920, 2019. robust lane detection and tracking. In IEEE Intelligent Vehi-
[11] Yuenan Hou, Zheng Ma, Chunxiao Liu, Tak-Wai Hui, and cles Symposium (IV), pages 993–997, 2010.
Chen Change Loy. Inter-region affinity distillation for road [25] Ruijin Liu, Zejian Yuan, Tie Liu, and Zhiliang Xiong. End-
marking segmentation. In Proceedings of the IEEE Con- to-end lane shape prediction with transformers. In Proceed-
ference on Computer Vision and Pattern Recognition, pages ings of the IEEE Winter Conference on Applications of Com-
12486–12495, 2020. puter Vision, pages 3694–3702, 2021.
[12] Yuenan Hou, Zheng Ma, Chunxiao Liu, and Chen Change [26] Ilya Loshchilov and Frank Hutter. Decoupled weight decay
Loy. Learning lightweight lane detection cnns by self atten- regularization. In Proceedings of the International Confer-
tion distillation. In Proceedings of the IEEE International ence on Learning Representations (ICLR), 2019.
Conference on Computer Vision, pages 1013–1021, 2019. [27] Davy Neven, Bert De Brabandere, Stamatios Georgoulis,
[13] Junhwa Hur, Seung-Nam Kang, and Seung-Woo Seo. Multi- Marc Proesmans, and Luc Van Gool. Towards end-to-end
lane detection in urban driving environments using condi- lane detection: an instance segmentation approach. In IEEE
tional random fields. In IEEE Intelligent Vehicles Symposium Intelligent Vehicles Symposium (IV), pages 286–291, 2018.
(IV), pages 1297–1302, 2013. [28] Xingang Pan, Jianping Shi, Ping Luo, Xiaogang Wang, and
[14] Xu Jia, Bert De Brabandere, Tinne Tuytelaars, and Luc Xiaoou Tang. Spatial as deep: Spatial cnn for traffic scene
Van Gool. Dynamic filter networks. In Advances in Neu- understanding. In Proceedings of the AAAI Conference on
ral Information Processing Systems, page 667–675, 2016. Artificial Intelligence, 2018.
[29] Jonah Philion. Fastdraw: Addressing the long tail of lane de- [43] Shengyan Zhou, Yanhua Jiang, Junqiang Xi, Jianwei Gong,
tection by adapting a sequential prediction network. In Pro- Guangming Xiong, and Huiyan Chen. A novel lane detec-
ceedings of the IEEE Conference on Computer Vision and tion based on geometrical model and gabor filter. In IEEE
Pattern Recognition, pages 11582–11591, 2019. Intelligent Vehicles Symposium (IV), pages 59–64, 2010.
[30] Zequn Qin, Huanyu Wang, and Xi Li. Ultra fast structure-
aware deep lane detection. In Proceedings of the European
Conference on Computer Vision (ECCV), pages 276–291,
2020.
[31] Lucas Tabelini, Rodrigo Berriel, Thiago M Paixao, Claudine
Badue, Alberto F De Souza, and Thiago Oliveira-Santos.
Polylanenet: Lane estimation via deep polynomial regres-
sion. In Proceedings of the International Conference on Pat-
tern Recognition, 2020.
[32] Lucas Tabelini, Rodrigo Berriel, Thiago M Paixão, Clau-
dine Badue, Alberto F De Souza, and Thiago Olivera-Santos.
Keep your eyes on the lane: Attention-guided lane detection.
arXiv preprint arXiv:2010.12035, 2020.
[33] Huachun Tan, Yang Zhou, Yong Zhu, Danya Yao, and
Keqiang Li. A novel curve lane detection based on im-
proved river flow and ransa. In Proceedings of the Interna-
tional IEEE Conference on Intelligent Transportation Sys-
tems, pages 133–138, 2014.
[34] Jigang Tang, Songbin Li, and Peng Liu. A review of lane
detection methods based on deep learning. Pattern Recogni-
tion, 2020.
[35] Zhi Tian, Chunhua Shen, and Hao Chen. Conditional con-
volutions for instance segmentation. In Proceedings of the
European Conference on Computer Vision (ECCV), 2020.
[36] TuSimple. Tusimple lane detection benchmark,
2017. https://github.com/TuSimple/
tusimple-benchmark.
[37] Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszko-
reit, Llion Jones, Aidan N Gomez, Ł ukasz Kaiser, and Illia
Polosukhin. Attention is all you need. In Advances in Neural
Information Processing Systems, 2017.
[38] Xinlong Wang, Rufeng Zhang, Tao Kong, Lei Li, and Chun-
hua Shen. SOLOv2: Dynamic and fast instance segmenta-
tion. In Advances in Neural Information Processing Systems,
pages 17721–17732, 2020.
[39] Hang Xu, Shaoju Wang, Xinyue Cai, Wei Zhang, Xiaodan
Liang, and Zhenguo Li. Curvelane-nas: Unifying lane-
sensitive architecture search and adaptive point blending. In
Proceedings of the European Conference on Computer Vi-
sion (ECCV), pages 689–704, 2020.
[40] Brandon Yang, Gabriel Bender, Quoc V Le, and Jiquan
Ngiam. Condconv: Conditionally parameterized convolu-
tions for efficient inference. In Advances in Neural Informa-
tion Processing Systems, 2019.
[41] Seungwoo Yoo, Hee Seok Lee, Heesoo Myeong, Sungrack
Yun, Hyoungwoo Park, Janghoon Cho, and Duck Hoon Kim.
End-to-end lane marker detection via row-wise classifica-
tion. In Proceedings of the IEEE Conference on Computer
Vision and Pattern Recognition Workshops, pages 1006–
1007, 2020.
[42] Ekim Yurtsever, Jacob Lambert, Alexander Carballo, and
Kazuya Takeda. A survey of autonomous driving: Com-
mon practices and emerging technologies. IEEE Access,
8:58443–58469, 2020.