Ece484 mp1
Ece484 mp1
1 Introduction
In this assignment, you will apply neural networks and computer vision techniques for lane detection.
In the Section 2, you will independently answer questions related to neural networks and vision. Then
in Section 3, you and your group will implement a deep learning based lane detection module. Your
code should be in the files train.py and test_lane_detection.py. Your group will also write a
brief report mp1_<groupname>.pdf, answering questions in Section 3. You will have to use functions
from the Pytorch library. This document gives you the first steps to get started on implementing the lane
detection module. You will have to take advantage of tutorials and documentations available online. Cite
all resources in your report. All the regulations for academic integrity and plagiarism spelled out in the
student code apply.
Learning objectives
• Gradients for neural network models
• Perspective transforms
• Semantic segmentation mask
• Embedding mask
System requirements
• Ubuntu 20.04
• Mambaforge
2 Homework Problems
Write the proof yourself, and do not copy off of the Internet/peers. Proving these on your own is good practice for quizzes/midterms.
An aside, it is typically quite simple to determine when students copy proofs. We are happy to help over Campuswire, during office
hours, etc.
Problem 1 [Individual] (10 points) This is a review problem related to safety and verification. The fol-
lowing figure shows four different automata with identical states Q = {q0 , . . . , q7 }. Q0 = {q0 } for all the
automata. We attach a red or green color to each state, denoted by qi .col, for the ease of writing require-
ments.
1
(b) How many executions does A2 have?
(c) Consider the following 4 requirements:
• Always red: R1 = {α |∀i, αi .col = red}.
• Never green: Unsafe = {q | q.col = green}.
• Eventually green: R2 = {α | ∃i, αi .col = green}.
• Never red: R3 = {α | ∀i, αi .col ̸= red}.
For each of the 4 automata A1 , . . . A4 and each of the 4 requirements, say whether the automaton sat-
isfies the requirement or give a counter-example. You can present the answer in the form of a 4 × 4
table.
Problem 2 [Individual] (15 points) Consider a neural network with one hidden layer:
∂ ŷ ∂ ŷ ∂ ŷ ∂ ŷ
, , , .
∂W (2) ∂b(2) ∂W (1) ∂b(1)
Compute the output ŷ and all gradients derived above. Identify which weight in W (1) has the largest
influence on ŷ.
(c) Explain how replacing σ with ReLU would affect the gradient magnitudes. Discuss the practical impli-
cations for training this network.
Problem 3 [Individual] (25 points) Consider a neural network with two hidden layers:
h(1) = f (W (1) x + b(1) ), h(2) = f (W (2) h(1) + b(2) ), ŷ = W (3) h(2) + b(3) ,
2
(a) Derive symbolic expressions for the following gradients:
∂ ŷ ∂ ŷ ∂ ŷ ∂ ŷ ∂ ŷ ∂ ŷ
, , , , , .
∂W (3) ∂b(3) ∂W (2) ∂b(2) ∂W (1) ∂b(1)
(b) Using the given network parameters:
2 −1 0.5 0 h i
W (1) = , W (2) = , W (3) = 1 −1 ,
−3 1 1 −2
1 0 h i 1
b(1) = , b(2) = , b(3) = 0.2 , x = .
−1 0 −1
Problem 5 [Individual] (15 points) You are given the pixel coordinates (xi , yi ) = (400, 300) of a point
observed by a camera, along with the camera’s intrinsic matrix
800 0 320 0.6 −0.8 0 1
K= 0 800 240 and [R|t] = 0.8 0.6 0 2 .
0 0 1 0 0 1 3
Recall that [R|t] is the extrinsic matrix. Assuming that the depth of the point in the camera coordinate
system, z̃, is 5m, calculate the 3D world coordinates Xw , Yw , Zw of the point.
3
3 Programming Assignment: Lane detection using ENet
3.1 Introduction
The goal of this programming assignment is to detect lanes from images using a neural network (NN) with
something called a dual-head architecture (we don’t expect you to know what this is; we provide details in
Sec. 3.2.1). Once the NN is properly trained, the NN will produce 2 outputs for each pixel. Each head of the
NN produces one output.
The first output is the probability that the pixel belongs to a lane (or to another entity such as a vehicle,
a tree, etc.). This is called segmentation. The second output will assign a vector value to each pixel which
captures a notion of distance is useful for determining whether two pixels belong to the same lane. This
second output is called an embedding. Intuitively, you can think of an embedding as a vector representation
of some other data type, whether that be a string of alphanumeric characters, an audio waveform, or an
entire image.
You are going to use a neural network architecture called ENet. In this assignment, you will implement
various components essential for training and evaluating the model. Specifically, you will load and pre-
process the dataset, initialize the optimizer, set up the training loop, implement validation code to assess
model performance, and apply a perspective transform to convert the image from the camera view to a
bird’s-eye view. By the end of this assignment, you will have a fully trained lane detection model and a
visualization of detected lanes in both the original and transformed perspectives.
Segmentation
ENet
Perspective
Input Image DBSCAN Output Image
Transform
Pixel-Wise
Embeddings
Figure 1: System Overview. Given an input image, the ENet model processes the image and generates two
outputs: segmentation and pixel-wise embeddings. A clustering algorithm, DBSCAN, is then applied to
group pixels and extract lane information from the ENet outputs. Finally, a perspective transformation is
performed to convert the image into a bird’s-eye view.
Head 1: Semantic Segmentation Mask This output predicts a mask that classifies each pixel as either part
of a lane or background. It provides a high-level lane detection output by labeling lane pixels.
Head 2: Pixel-Wise Embeddings This output assigns a feature vector to each pixel, capturing detailed
lane instance information. These embeddings help differentiate between different lane markings by group-
ing pixels that belong to the same lane.
4
3.2.2 Loss Function
The model is trained using two types of loss functions: (1) segmentation loss for accurate pixel classifica-
tion, and (2) discriminative loss for clustering lane pixels correctly.
Segmentation Loss To ensure accurate lane classification, we use Cross-Entropy Loss, a standard function
for classification tasks. Cross-Entropy Loss measures how well the model’s predicted probability for each
pixel matches the actual label. It penalizes incorrect predictions, guiding the model to improve its classifi-
cation over time. For each pixel in the image:
1. The model predicts a probability between 0 and 1, representing how likely the pixel belongs to a lane.
2. The correct label (ground truth) is either 1 (lane pixel) or 0 (background).
3. Cross-Entropy Loss, Lseg , calculates a penalty based on the difference between the predicted proba-
bility and the actual label. Mathematically, we can compute this loss as
N
1 X
Lseg = − [yi log(yˆi ) + (1 − yi ) log(1 − yˆi )]
N i=1
where N is the total number of pixels, yi is the ground-truth label for the i-th pixel (1 for lane, 0 for back-
ground). yˆi is the model’s predicted probability for the i-th pixel.
Discriminative Loss While segmentation loss classifies lane pixels, it does not ensure that pixels belong-
ing to different lanes are well-separated. This is where lane clustering comes in. Instead of treating lane
pixels independently, we group them into distinct clusters, where each cluster represents a lane. To achieve
this, we use discriminative loss, which encourages pixels within the same lane to be close together while
pushing different lane clusters apart. To aid in your intuition, a visualization of this is included in Fig. 3.2.2.
Let C be the set of lane clusters, where each cluster c ∈ C contains Nc pixels. The loss function consists
of three components:
• Variance Loss (Lvar ): This term ensures that pixels within the same lane remain close to their cluster
center, applying an intra-cluster pull force.
• Distance Loss (Ldist ): This term pushes different lane clusters away from each other to avoid overlap,
acting as an inter-cluster repulsion force.
• Regularization Loss (Lreg ): This term prevents cluster centers from drifting too far from the origin,
stabilizing activations.
Each of these terms is mathematically defined as follows:
1 X 1 X
Lvar = max(∥xi − µc ∥ − δv , 0)2 (1)
|C| Nc i∈c
c∈C
1 X
Ldist = max(2δd − ∥µc1 − µc2 ∥, 0)2 (2)
|C|(|C| − 1)
c1 ̸=c2
1 X
Lreg = ∥µc ∥2 (3)
|C|
c∈C
where xi represents a pixel embedding, µc is the cluster center for lane c, and δv , δd are margin parame-
ters that control intra- and inter-cluster distances, respectively.
The total discriminative loss is a weighted sum of all three components:
Ldisc = α · Lvar + β · Ldist + γ · Lreg (4)
The weighting factors α, β, and γ control the relative importance of these terms.
5
Figure 2: Visualization of the discriminative loss from the original paper. The intra-cluster pulling force
moves embeddings toward their respective cluster centers, while the inter-cluster repelling force pushes
cluster centers apart. These forces are only active within a certain distance, determined by the margins δv
and δd , represented by the dotted circles.
Total Loss The final loss function is computed as the summation of the segmentation loss and the dis-
criminative loss:
4 Development Instructions
To begin, clone the repository to your desired location with:
All the python scripts you will need to modify are train.py and test_lane_detection.py which
are located in the following path:
The machines in the lab already have the necessary packages, including the TUSimple dataset, installed.
If you are setting up the environment on your own computer, please refer to Section 4.1 for instructions
before continuing. If this is your first time using Weights & Biases (wandb), register your account. Now
activate the Conda environment:
6
You will be in the (base) environment of Conda. Now, install the necessary packages from the Conda
package manager by the following command. This will also generate another environment called (tusim-
ple) inside your (base) Conda environment.
After the installation has finished, log in to wandb. The prompt will ask you for a user-specific API key
generated when you created your account above. You are now all set!
$ wandb l o g i n
|− /checkpoints # Generated a f t e r c h e c k p o i n t i n g
|− /datasets # Code f o r l o a d i n g and p r e p r o c e s s i n g d a t a s e t s
|− /models # Model a r c h i t e c t u r e s f o r l a n e d e t e c t i o n
|− /utils # U t i l i t y functions ( e . g . , visualization )
|− t r a i n . py # T r a i n i n g t h e model
|− e v a l . py # E v a l u a t i n g t h e model
|− t e s t _ l a n e _ d e t e c t i o n . py # T e s t i n g and v i s u a l i z i n g l a n e d e t e c t i o n
|− conda_environment . yml # yml f o r package i n s t a l l a t i o n
/opt # Shared d i r e c t o r y
|− /data
|− /TUSimple # TUSimple d a t a s e t
7
4.3 Preparing the Dataset and Dataloader
To train and validate the model, you will need to load and preprocess the dataset. Modify the code below
to use the LaneDataset class from datasets/lane_dataset.py along with PyTorch’s DataLoader
for efficient batch processing. Refer to the PyTorch tutorial on data loading for guidance: PyTorch Data
Loading Tutorial.
# ###############################################################################
# train_dataset = . . .
# t r a i n _ l o a d e r = DataLoader ( . . . )
# val_dataset = . . .
# v a l _ l o a d e r = DataLoader ( . . . )
# ###############################################################################
# ###############################################################################
# optimizer = . . .
# ###############################################################################
# TODO: C o m p l e t e t h e t r a i n i n g s t e p f o r a s i n g l e b a t c h .
# ###############################################################################
# Hint :
# 1 . Move ‘ i m a g e s ‘ , ‘ b i n a r y _ l a b e l s ‘ , and ‘ i n s t a n c e _ l a b e l s ‘ t o t h e c o r r e c t d e v i c e .
# 2 . Perform a forward pass using ‘ enet_model ‘ to get p r e d i c t i o n s .
# 3 . Compute t h e b i n a r y and i n s t a n c e l o s s e s u s i n g ‘ c o m p u t e _ l o s s ‘ .
# 4 . Sum t h e l o s s e s ( ‘ l o s s = b i n a r y _ l o s s + i n s t a n c e _ l o s s ‘ ) f o r b a c k p r o p a g a t i o n .
# 5 . Z e r o t h e o p t i m i z e r g r a d i e n t s , b a c k p r o p a g a t e t h e l o s s , and t a k e an o p t i m i z e r s t e p .
# ###############################################################################
8
# TODO: P e r f o r m v a l i d a t i o n a f t e r e a c h e p o c h .
# ###############################################################################
# Hint :
# C a l l t h e ‘ v a l i d a t e ‘ f u n c t i o n , p a s s i n g t h e m o d e l and v a l i d a t i o n d a t a l o a d e r .
def perspective_transform(image):
"""
Get bird’s eye view from input image
"""
return transformed_image
During the perspective transform we wish to preserve collinearity (i.e., all points lying on a line initially
still lie on a line after transformation). The perspective transformation requires a 3-by-3 transformation
matrix. Here (x, y) and (u, v) are the coordinates of the same point in the coordinate systems of the original
perspective and new perspective.
tu a b c x
tv = d e f y
(6)
t g h 1 1
To find the matrix, we need to find the location of 4 points on the original image and map the same 4
points on the Bird’s Eye View. Any 3 of those 4 points should not be on the same line. Put those 2 groups
of points into cv2.getPerspectiveTransform(), the output will be the transformation matrix. Pass the matrix
into cv2.warpPerspective() and we will have the warped Bird’s Eye View image. Hint: The following website
might be useful to find the warp points of the image.
9
def visualize_lanes_row(images, instances_maps, alpha):
"""
Visualize lane predictions for multiple images in a single row
"""
plt.tight_layout()
plt.show()
Figure 3: Visualized output without (up) and with (down) the transformation
5 Code Evaluation
5.1 Checkpointing
Save the best model checkpoint in the following location:
checkpoints/enet_checkpoint_epoch_best.pth
5.2 Evaluation
Evaluate and visualize your model’s performance using the provided scripts:
• Run eval.py to evaluate the model on the validation dataset.
• Run test_lane_detection.py to visualize lane segmentation and clustering results.
6 Report
Each group should upload a short report with following questions answered.
Problem 7 [Group] (30 points) Explain how you selected the following hyperparameters:
• BATCH_SIZE: Justify your choice and discuss its impact on training speed and model convergence.
• Learning Rate (LR): How did you determine the appropriate learning rate?
• Number of Epochs: How did you decide on the total number of training epochs? What criteria did
you use to determine whether training should continue or stop?
How do these parameters influence the training performance and final accuracy of your model?
10
Problem 8 [Group] (20 points) Provide qualitative and quantitative results of your trained model. Include
the following visualizations:
• The original input image.
• The ground truth segmentation mask.
Problem 9 [Group] (30 points) Conduct an ablation study to analyze the impact of different loss compo-
nents on model performance. Specifically:
• Train the model with a high weighting on segmentation loss (Lseg ), while keeping the contribution
of discriminative loss minimal. Evaluate its performance on lane detection.
• Train the model with a high weighting on discriminative loss (Ldisc ), reducing the influence of seg-
mentation loss. Analyze its effect on clustering and lane differentiation.
• Compare these results with a model trained using a balanced loss function (Ltotal ), where both seg-
mentation and discriminative losses contribute meaningfully.
Discuss how adjusting the relative importance of these loss terms affects the model’s ability to:
• Accurately segment lane pixels.
• Differentiate between distinct lane instances.
Problem 10 [Group] (10 points) Visualize and analyze the transformed output from test_lane_detection.py.
Demo (10 points) You will need to demo your solution on both scenarios to the TAs during the lab demo.
There may be an autograding component, using an error metric calculation between your solution and a
golden solution.
7 Submission Instructions
7.1 Individual Submission (Homework 1 - Problems 1-6)
• Write your solutions in a PDF file named hw1_<netid>.pdf and upload it to Canvas.
• Include your name and netid in the document.
• You may discuss the problems with others, but you must write your own answers.
11
7.2 Group Submission (MP1 - Problems 7-10)
• One member from each group should submit the report as mp1_<groupname>.pdf on Canvas.
• Include the names and netids of all group members in the report.
• Upload your code (train.py, test_lane_detection.py, and any other relevant files) to a cloud
storage service (Google Drive, Dropbox, or Box).
• Provide a sharable link to the uploaded code in your report.
12