PHD Interview Preparation - Machine Learning-Based Video & Point-Cloud Compression
PHD Interview Preparation - Machine Learning-Based Video & Point-Cloud Compression
Test:
• Sorting (Theory): Common sorts include Merge Sort (divide-and-conquer). Merge sort runs in
O(n log n) time in all cases 2 and uses O(n) extra space. Quick Sort has average O(n log n) but worst
O(n²) when pivots are poor 3 .
Coding Problem: Implement QuickSort in Python.
def quicksort(arr):
if len(arr) <= 1:
return arr
pivot = arr[len(arr) // 2]
1
left = [x for x in arr if x < pivot]
middle = [x for x in arr if x == pivot]
right = [x for x in arr if x > pivot]
return quicksort(left) + middle + quicksort(right)
# Example:
print(quicksort([3,6,2,7,5,8,1]))
Explanation: Average-case complexity is O(n log n) 3 ; worst case O(n²). Quick sort uses recursion
and partitioning.
• Recursion (Theory): A recursive function calls itself with a smaller input. It must have a base case to
stop recursion 4 5 . Its time complexity depends on the recurrence (e.g., T(n)=T(n−1)+O(1) yields
O(n)). For example, computing n! recursively takes O(n) time (each call does constant work). Coding
Problem: Implement factorial using recursion.
def factorial(n):
if n <= 1:
return 1 # base case
else:
return n * factorial(n-1)
print(factorial(5)) # Output: 120
Answer: The recursion has a base case (n≤1) to stop, and time complexity O(n) 4 . Without a proper
base case, recursion causes stack overflow 6 .
• Breadth-First Search (BFS): Explores a graph layer by layer using a queue. Time complexity is O(|V|+|
E|) 7 , where |V| and |E| are number of vertices and edges. BFS is optimal for shortest paths in
unweighted graphs.
• Depth-First Search (DFS): Explores as deep as possible along each branch before backtracking. Also
O(|V|+|E|) time 8 . Useful for topological sort, detecting cycles, etc.
Coding Problem: Implement BFS and DFS on an adjacency-list graph.
2
if neighbor not in visited:
visited.add(neighbor)
queue.append(neighbor)
return order
# Example graph
graph = {
'A': ['B','C'], 'B': ['A','D'], 'C': ['A','D'],
'D': ['B','C','E'], 'E': ['D']
}
print(bfs(graph, 'A')) # e.g., ['A','B','C','D','E']
print(dfs(graph, 'A')) # e.g., ['A','B','D','C','E']
Analysis: Both BFS and DFS visit each vertex and edge once in the worst case, so O(|V|+|E|) 7 8 .
• Complexity Analysis:
• Big-O notation describes worst-case runtime growth. For example, linear search is O(n), binary search
is O(log n) 1 , and an algorithm like DFS/BFS is O(V+E) 7 8 .
• Use Master Theorem or recurrence solving for recursive algorithms. E.g., merge sort satisfies
T(n)=2T(n/2)+O(n), leading to O(n log n) 9 .
import numpy as np
from scipy.fftpack import dct, idct
3
signal = np.array([1, 2, 3, 4, 5], dtype=float)
coeffs = dct(signal, norm='ortho') # Discrete Cosine Transform
recon = idct(coeffs, norm='ortho') # Inverse DCT
print("DCT Coeffs:", coeffs)
print("Reconstructed:", np.round(recon))
• Discrete Fourier Transform (DFT): The DFT converts a discrete time-domain signal into its
frequency-domain representation 11 . Formally, the DFT of x[n] is X[k] = ∑n x[n]e−j2πkn/N . In
practice, the Fast Fourier Transform (FFT) algorithm computes this efficiently in O(N log N) instead of
O(N²) 12 .
Theoretical Q: Why use an FFT?
Answer: A naive DFT is O(N²) (double sum); the FFT factorizes the calculation to achieve O(N log N)
12 , making spectral analysis of long signals feasible.
import numpy as np
# Create a sample signal: sum of two sinusoids
N = 64
t = np.arange(N)
signal = np.sin(2*np.pi*5*t/N) + 0.5*np.sin(2*np.pi*10*t/N)
# Compute FFT
spectrum = np.fft.fft(signal)
print("Magnitude Spectrum:", np.abs(spectrum)[:5])
import numpy as np
def quantize(arr, levels):
# Map arr to 'levels' discrete values between min and max
mn, mx = arr.min(), arr.max()
quantized = np.round((arr - mn) / (mx - mn) * (levels - 1))
return quantized.astype(int)
data = np.random.randn(10)
q = quantize(data, 4)
print("Original:", data)
print("Quantized (4 levels):", q)
4
• Entropy Coding (Huffman): Huffman coding is an optimal prefix code for known symbol
frequencies 14 . More frequent symbols get shorter codes. It is a lossless entropy coding method.
Q: How does Huffman coding work?
A: Build a binary tree by repeatedly merging the two least frequent symbols until one tree remains
14 . The tree yields variable-length prefix codes.
import heapq
from collections import Counter
class Node:
def __init__(self, freq, symbol=None, left=None, right=None):
self.freq = freq
self.symbol = symbol
self.left = left
self.right = right
def __lt__(self, other):
return self.freq < other.freq
def huffman_codes(s):
# Build frequency heap
heap = [Node(freq, sym) for sym, freq in Counter(s).items()]
heapq.heapify(heap)
# Build Huffman tree
while len(heap) > 1:
a = heapq.heappop(heap)
b = heapq.heappop(heap)
heapq.heappush(heap, Node(a.freq+b.freq, None, a, b))
# Traverse tree to get codes
def traverse(node, prefix=''):
if node is None: return {}
if node.symbol is not None:
return {node.symbol: prefix or '0'}
codes = {}
codes.update(traverse(node.left, prefix+'0'))
codes.update(traverse(node.right, prefix+'1'))
return codes
root = heap[0] if heap else None
return traverse(root)
sample = "aaabbc"
codes = huffman_codes(sample)
print("Huffman Codes:", codes)
Note: This outputs a code map like {'a':'0', 'b':'10','c':'11'} . Huffman code yields
optimal prefix codes 14 .
5
• Point-Cloud Preprocessing (Theory & Code): Raw 3D point-clouds often require preprocessing:
• Normalization: Shift/scale points to zero mean or unit cube for numerical stability.
• Filtering: Remove noise/outliers (e.g. using voxel downsampling or statistical filters).
• Voxelization: Convert to a fixed 3D grid (voxels) for certain CNNs.
Q: Why normalize point clouds?
A: To put data in a consistent scale/position, improving convergence in training (e.g. subtract
centroid, scale to unit sphere).
Coding Task: Normalize a point cloud in NumPy.
import numpy as np
def normalize_pointcloud(pc):
# pc: Nx3 array
centroid = np.mean(pc, axis=0)
pc_centered = pc - centroid
max_dist = np.max(np.sqrt((pc_centered**2).sum(axis=1)))
return pc_centered / max_dist
import torch
import torch.nn as nn
class Autoencoder(nn.Module):
def __init__(self, input_dim=784, hidden_dim=128):
super().__init__()
self.encoder = nn.Sequential(
nn.Linear(input_dim, hidden_dim),
nn.ReLU()
)
self.decoder = nn.Sequential(
nn.Linear(hidden_dim, input_dim),
nn.Sigmoid()
)
def forward(self, x):
z = self.encoder(x)
out = self.decoder(z)
6
return out
model = Autoencoder()
x = torch.rand(1, 784)
recon = model(x) # forward pass
import tensorflow as tf
from tensorflow.keras import layers
class TF_Autoencoder(tf.keras.Model):
def __init__(self, input_dim=784, hidden_dim=128):
super().__init__()
self.encoder = tf.keras.Sequential([
layers.Input(shape=(input_dim,)),
layers.Dense(hidden_dim, activation='relu')
])
self.decoder = tf.keras.Sequential([
layers.Dense(input_dim, activation='sigmoid')
])
def call(self, x):
z = self.encoder(x)
return self.decoder(z)
tf_model = TF_Autoencoder()
x_tf = tf.random.uniform((1,784))
recon_tf = tf_model(x_tf) # forward pass
Answer: Both models use a single hidden layer (fully connected) for encoding and decoding.
criterion = nn.MSELoss()
optimizer = torch.optim.SGD(model.parameters(), lr=0.1)
target = x.clone() # using input as target for autoencoder
loss = criterion(recon, target)
loss.backward() # backpropagate gradients
optimizer.step() # update weights
This uses PyTorch’s autograd, a reverse-mode differentiation system 15 . PyTorch records operations
on tensors to build a compute graph, then applies the chain rule in backward pass 15 .
7
TensorFlow: In eager execution, use tf.GradientTape :
optimizer = tf.keras.optimizers.SGD(learning_rate=0.1)
with tf.GradientTape() as tape:
recon = tf_model(x_tf)
loss_tf = tf.reduce_mean(tf.square(recon - x_tf))
grads = tape.gradient(loss_tf, tf_model.trainable_variables)
optimizer.apply_gradients(zip(grads, tf_model.trainable_variables))
class WeightedMSE(nn.Module):
def __init__(self, weight=1.0):
super().__init__()
self.weight = weight
def forward(self, input, target):
return self.weight * torch.mean((input - target)**2)
# Use like: criterion = WeightedMSE(weight=0.5)
• Training Loop:
PyTorch:
TensorFlow:
8
Answer: PointNet processes point clouds directly by applying shared MLPs (pointwise) followed by a
symmetric max-pooling to respect permutation invariance 16 . It maps each point individually to
features, then aggregates a global feature. This avoids voxelization and is robust to point order.
Coding Idea (PyTorch): Sketch of a simple PointNet-like model:
class SimplePointNet(nn.Module):
def __init__(self):
super().__init__()
self.fc1 = nn.Linear(3, 64)
self.fc2 = nn.Linear(64, 128)
self.fc3 = nn.Linear(128, 1024)
self.fc_cls = nn.Linear(1024, 10) # e.g., 10 classes
def forward(self, x): # x is [B, N, 3]
x = F.relu(self.fc1(x))
x = F.relu(self.fc2(x))
x = F.relu(self.fc3(x)) # now [B, N, 1024]
x = torch.max(x, dim=1)[0] # symmetric max-pool -> [B, 1024]
out = self.fc_cls(x) # classification output
return out
Voxel-based Networks: One can also convert points to a 3D occupancy grid (voxels) and apply 3D
CNNs. For example, bin points into a fixed 3D grid and use nn.Conv3d . This is more memory-
intensive but straightforward.
import cv2
cap = cv2.VideoCapture('input_video.mp4')
frames = []
while cap.isOpened():
ret, frame = cap.read()
if not ret:
break
gray = cv2.cvtColor(frame, cv2.COLOR_BGR2GRAY)
frames.append(gray)
cap.release()
print(f"Loaded {len(frames)} frames.")
This reads each frame until the video ends. You can then preprocess frames (resize, normalize with
NumPy, etc).
9
• Image Loading (NumPy/OpenCV): Use cv2.imread() or PIL.Image to load images; convert to
NumPy arrays for processing.
Coding:
import cv2
img = cv2.imread('frame.png', cv2.IMREAD_COLOR)
img = cv2.resize(img, (224,224))
img = img.astype('float32') / 255.0 # normalize to [0,1]
• Point-Cloud File Parsing: Common formats include PLY, PCD, LAS. For ASCII PLY (Stanford triangle
format), you can read and parse manually or use libraries like open3d .
Coding Task: Read a simple ASCII PLY file (with vertices only).
import numpy as np
def read_ply_xyz(filename):
pts = []
with open(filename, 'r') as f:
header = True
for line in f:
if header:
if line.strip() == "end_header":
header = False
continue
x,y,z = map(float, line.split()[:3])
pts.append((x,y,z))
return np.array(pts)
pointcloud = read_ply_xyz('cloud.ply')
print(f"Loaded {pointcloud.shape[0]} points.")
This reads the vertex coordinates after the PLY header. For large point clouds, consider binary
readers or libraries.
• Covariance Matrix: In statistics, the covariance matrix of a random vector contains covariances
between each pair of variables 17 . For data matrix X (rows are samples), one can compute:
1 ⊤
Cov(X) = n−1 X X (after zero-mean centering).
10
Coding:
import numpy as np
X = np.random.rand(5,3) # 5 samples, 3 features
X_centered = X - X.mean(axis=0)
cov = np.dot(X_centered.T, X_centered) / (X_centered.shape[0] - 1)
print("Covariance matrix:\n", cov)
# Or simply:
print("np.cov", np.cov(X, rowvar=False))
• Least Squares (Normal Equation): To solve minw ∥Xw − y∥2 , the normal equation is X T Xw =
XT y .
Coding: Use NumPy:
X = np.random.rand(10,3)
true_w = np.array([1.5, -2.0, 0.5])
y = X.dot(true_w) + 0.1*np.random.randn(10)
w_est = np.linalg.lstsq(X, y, rcond=None)[0]
print("Estimated w:", w_est)
• Gradient Descent (One Step): Gradient descent updates parameters opposite to the gradient. For
wnew = wold − α∇L .
Example: For a linear model y = wx with loss L = (ypred − y)2 , one gradient step:
Δw = −α ⋅ 2x(xw − y) .
Coding:
11
• Coding Practices: Write clear, modular code; comment and document functions; use tools like
pytest for testing; follow PEP8 style (Python).
Each of these topics can be expanded with more examples as needed. The above provides a comprehensive
baseline of theory and practice, including Python and deep learning implementations, with references to
authoritative sources 1 7 20 10 13 14 15 2 12 19 .
16 [1612.00593] PointNet: Deep Learning on Point Sets for 3D Classification and Segmentation
https://arxiv.org/abs/1612.00593
19 Git - Wikipedia
https://en.wikipedia.org/wiki/Git
12