0% found this document useful (0 votes)
4 views12 pages

PHD Interview Preparation - Machine Learning-Based Video & Point-Cloud Compression

This document serves as a comprehensive guide for PhD interview preparation focused on machine learning, specifically video and point-cloud compression. It covers essential algorithms, data structures, signal processing techniques, and deep learning frameworks, providing theoretical insights and Python coding tasks for practical understanding. Key topics include binary search, sorting algorithms, graph traversal, discrete transforms, quantization, Huffman coding, and the architecture of neural networks like autoencoders and PointNet.

Uploaded by

JOYSON GEORGE
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
4 views12 pages

PHD Interview Preparation - Machine Learning-Based Video & Point-Cloud Compression

This document serves as a comprehensive guide for PhD interview preparation focused on machine learning, specifically video and point-cloud compression. It covers essential algorithms, data structures, signal processing techniques, and deep learning frameworks, providing theoretical insights and Python coding tasks for practical understanding. Key topics include binary search, sorting algorithms, graph traversal, discrete transforms, quantization, Huffman coding, and the architecture of neural networks like autoencoders and PointNet.

Uploaded by

JOYSON GEORGE
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 12

PhD Interview Preparation: Machine Learning-

Based Video & Point-Cloud Compression


This guide covers key theory questions and coding tasks (with Python solutions) across the following
areas. Each section includes succinct Q&A and example problems. Relevant facts are cited from reputable
sources.

Core Algorithms and Data Structures


• Binary Search (Theory): Binary search finds a target in a sorted array by repeatedly halving the
search range. Its time complexity is O(log n) 1 .
Example: Q: What is the worst-case complexity of binary search? A: O(log n) 1 .
Python Implementation:

def binary_search(arr, target):


low, high = 0, len(arr) - 1
while low <= high:
mid = (low + high) // 2
if arr[mid] == target:
return mid
elif arr[mid] < target:
low = mid + 1
else:
high = mid - 1
return -1

Test:

print(binary_search([1,2,3,4,5], 4)) # Output: 3

• Sorting (Theory): Common sorts include Merge Sort (divide-and-conquer). Merge sort runs in
O(n log n) time in all cases 2 and uses O(n) extra space. Quick Sort has average O(n log n) but worst
O(n²) when pivots are poor 3 .
Coding Problem: Implement QuickSort in Python.

def quicksort(arr):
if len(arr) <= 1:
return arr
pivot = arr[len(arr) // 2]

1
left = [x for x in arr if x < pivot]
middle = [x for x in arr if x == pivot]
right = [x for x in arr if x > pivot]
return quicksort(left) + middle + quicksort(right)
# Example:
print(quicksort([3,6,2,7,5,8,1]))

Explanation: Average-case complexity is O(n log n) 3 ; worst case O(n²). Quick sort uses recursion
and partitioning.

• Recursion (Theory): A recursive function calls itself with a smaller input. It must have a base case to
stop recursion 4 5 . Its time complexity depends on the recurrence (e.g., T(n)=T(n−1)+O(1) yields
O(n)). For example, computing n! recursively takes O(n) time (each call does constant work). Coding
Problem: Implement factorial using recursion.

def factorial(n):
if n <= 1:
return 1 # base case
else:
return n * factorial(n-1)
print(factorial(5)) # Output: 120

Answer: The recursion has a base case (n≤1) to stop, and time complexity O(n) 4 . Without a proper
base case, recursion causes stack overflow 6 .

• Graph Traversal (Theory & Code):

• Breadth-First Search (BFS): Explores a graph layer by layer using a queue. Time complexity is O(|V|+|
E|) 7 , where |V| and |E| are number of vertices and edges. BFS is optimal for shortest paths in
unweighted graphs.

• Depth-First Search (DFS): Explores as deep as possible along each branch before backtracking. Also
O(|V|+|E|) time 8 . Useful for topological sort, detecting cycles, etc.
Coding Problem: Implement BFS and DFS on an adjacency-list graph.

from collections import deque

def bfs(graph, start):


visited, queue = set(), deque([start])
visited.add(start)
order = []
while queue:
node = queue.popleft()
order.append(node)
for neighbor in graph[node]:

2
if neighbor not in visited:
visited.add(neighbor)
queue.append(neighbor)
return order

def dfs(graph, start, visited=None, order=None):


if visited is None:
visited, order = set(), []
visited.add(start)
order.append(start)
for neighbor in graph[start]:
if neighbor not in visited:
dfs(graph, neighbor, visited, order)
return order

# Example graph
graph = {
'A': ['B','C'], 'B': ['A','D'], 'C': ['A','D'],
'D': ['B','C','E'], 'E': ['D']
}
print(bfs(graph, 'A')) # e.g., ['A','B','C','D','E']
print(dfs(graph, 'A')) # e.g., ['A','B','D','C','E']

Analysis: Both BFS and DFS visit each vertex and edge once in the worst case, so O(|V|+|E|) 7 8 .

• Complexity Analysis:

• Big-O notation describes worst-case runtime growth. For example, linear search is O(n), binary search
is O(log n) 1 , and an algorithm like DFS/BFS is O(V+E) 7 8 .
• Use Master Theorem or recurrence solving for recursive algorithms. E.g., merge sort satisfies
T(n)=2T(n/2)+O(n), leading to O(n log n) 9 .

Signal Processing and Compression


• Discrete Cosine Transform (DCT): The DCT represents data as a sum of cosine functions
(frequencies) 10 . It is widely used in image/video compression (e.g. JPEG, MPEG) because it
concentrates signal energy into a few coefficients 10 .
Theoretical Q: What does a 2D DCT do on an image block?
Answer: It transforms a block of pixel intensities into a set of frequency-domain coefficients (DC and
AC terms). High-frequency coefficients tend to be small and can be quantized more coarsely.
Coding Task: Compute DCT using SciPy/NumPy.

import numpy as np
from scipy.fftpack import dct, idct

# Example: 1D DCT of a signal

3
signal = np.array([1, 2, 3, 4, 5], dtype=float)
coeffs = dct(signal, norm='ortho') # Discrete Cosine Transform
recon = idct(coeffs, norm='ortho') # Inverse DCT
print("DCT Coeffs:", coeffs)
print("Reconstructed:", np.round(recon))

• Discrete Fourier Transform (DFT): The DFT converts a discrete time-domain signal into its
frequency-domain representation 11 . Formally, the DFT of x[n] is X[k] = ∑n x[n]e−j2πkn/N . In
practice, the Fast Fourier Transform (FFT) algorithm computes this efficiently in O(N log N) instead of
O(N²) 12 .
Theoretical Q: Why use an FFT?
Answer: A naive DFT is O(N²) (double sum); the FFT factorizes the calculation to achieve O(N log N)
12 , making spectral analysis of long signals feasible.

Coding Task: Use NumPy’s FFT.

import numpy as np
# Create a sample signal: sum of two sinusoids
N = 64
t = np.arange(N)
signal = np.sin(2*np.pi*5*t/N) + 0.5*np.sin(2*np.pi*10*t/N)
# Compute FFT
spectrum = np.fft.fft(signal)
print("Magnitude Spectrum:", np.abs(spectrum)[:5])

• Quantization (Theory): Quantization maps continuous or high-resolution values to a smaller


discrete set, introducing quantization error 13 . In compression, we often use uniform quantization
(round to nearest level) or non-uniform quantization. Quantization is a core of lossy compression 13 .
Q: What is quantization noise?
A: The difference between original and quantized values 13 . Quantization is inherently lossy.
Coding Task: Uniform quantization of a signal:

import numpy as np
def quantize(arr, levels):
# Map arr to 'levels' discrete values between min and max
mn, mx = arr.min(), arr.max()
quantized = np.round((arr - mn) / (mx - mn) * (levels - 1))
return quantized.astype(int)
data = np.random.randn(10)
q = quantize(data, 4)
print("Original:", data)
print("Quantized (4 levels):", q)

4
• Entropy Coding (Huffman): Huffman coding is an optimal prefix code for known symbol
frequencies 14 . More frequent symbols get shorter codes. It is a lossless entropy coding method.
Q: How does Huffman coding work?
A: Build a binary tree by repeatedly merging the two least frequent symbols until one tree remains
14 . The tree yields variable-length prefix codes.

Coding Task: Implement Huffman encoding. (Simplified example:)

import heapq
from collections import Counter

class Node:
def __init__(self, freq, symbol=None, left=None, right=None):
self.freq = freq
self.symbol = symbol
self.left = left
self.right = right
def __lt__(self, other):
return self.freq < other.freq

def huffman_codes(s):
# Build frequency heap
heap = [Node(freq, sym) for sym, freq in Counter(s).items()]
heapq.heapify(heap)
# Build Huffman tree
while len(heap) > 1:
a = heapq.heappop(heap)
b = heapq.heappop(heap)
heapq.heappush(heap, Node(a.freq+b.freq, None, a, b))
# Traverse tree to get codes
def traverse(node, prefix=''):
if node is None: return {}
if node.symbol is not None:
return {node.symbol: prefix or '0'}
codes = {}
codes.update(traverse(node.left, prefix+'0'))
codes.update(traverse(node.right, prefix+'1'))
return codes
root = heap[0] if heap else None
return traverse(root)
sample = "aaabbc"
codes = huffman_codes(sample)
print("Huffman Codes:", codes)

Note: This outputs a code map like {'a':'0', 'b':'10','c':'11'} . Huffman code yields
optimal prefix codes 14 .

5
• Point-Cloud Preprocessing (Theory & Code): Raw 3D point-clouds often require preprocessing:

• Normalization: Shift/scale points to zero mean or unit cube for numerical stability.
• Filtering: Remove noise/outliers (e.g. using voxel downsampling or statistical filters).
• Voxelization: Convert to a fixed 3D grid (voxels) for certain CNNs.
Q: Why normalize point clouds?
A: To put data in a consistent scale/position, improving convergence in training (e.g. subtract
centroid, scale to unit sphere).
Coding Task: Normalize a point cloud in NumPy.

import numpy as np
def normalize_pointcloud(pc):
# pc: Nx3 array
centroid = np.mean(pc, axis=0)
pc_centered = pc - centroid
max_dist = np.max(np.sqrt((pc_centered**2).sum(axis=1)))
return pc_centered / max_dist

pc = np.random.rand(100,3) # example point cloud


pc_norm = normalize_pointcloud(pc)
print(np.min(pc_norm,axis=0), np.max(pc_norm,axis=0)) # roughly in [-1,1]

Deep Learning Framework Tasks


• Autoencoder Definition (PyTorch & TensorFlow): Autoencoders compress data into a lower-
dimensional code and reconstruct it.
PyTorch Example:

import torch
import torch.nn as nn

class Autoencoder(nn.Module):
def __init__(self, input_dim=784, hidden_dim=128):
super().__init__()
self.encoder = nn.Sequential(
nn.Linear(input_dim, hidden_dim),
nn.ReLU()
)
self.decoder = nn.Sequential(
nn.Linear(hidden_dim, input_dim),
nn.Sigmoid()
)
def forward(self, x):
z = self.encoder(x)
out = self.decoder(z)

6
return out

model = Autoencoder()
x = torch.rand(1, 784)
recon = model(x) # forward pass

TensorFlow (Keras) Equivalent:

import tensorflow as tf
from tensorflow.keras import layers

class TF_Autoencoder(tf.keras.Model):
def __init__(self, input_dim=784, hidden_dim=128):
super().__init__()
self.encoder = tf.keras.Sequential([
layers.Input(shape=(input_dim,)),
layers.Dense(hidden_dim, activation='relu')
])
self.decoder = tf.keras.Sequential([
layers.Dense(input_dim, activation='sigmoid')
])
def call(self, x):
z = self.encoder(x)
return self.decoder(z)

tf_model = TF_Autoencoder()
x_tf = tf.random.uniform((1,784))
recon_tf = tf_model(x_tf) # forward pass

Answer: Both models use a single hidden layer (fully connected) for encoding and decoding.

• Forward/Backward Pass & Custom Loss:


PyTorch: After computing the forward pass, call .backward() on a scalar loss to compute
gradients. Example:

criterion = nn.MSELoss()
optimizer = torch.optim.SGD(model.parameters(), lr=0.1)
target = x.clone() # using input as target for autoencoder
loss = criterion(recon, target)
loss.backward() # backpropagate gradients
optimizer.step() # update weights

This uses PyTorch’s autograd, a reverse-mode differentiation system 15 . PyTorch records operations
on tensors to build a compute graph, then applies the chain rule in backward pass 15 .

7
TensorFlow: In eager execution, use tf.GradientTape :

optimizer = tf.keras.optimizers.SGD(learning_rate=0.1)
with tf.GradientTape() as tape:
recon = tf_model(x_tf)
loss_tf = tf.reduce_mean(tf.square(recon - x_tf))
grads = tape.gradient(loss_tf, tf_model.trainable_variables)
optimizer.apply_gradients(zip(grads, tf_model.trainable_variables))

Custom Loss Function: Example: Weighted MSE in PyTorch.

class WeightedMSE(nn.Module):
def __init__(self, weight=1.0):
super().__init__()
self.weight = weight
def forward(self, input, target):
return self.weight * torch.mean((input - target)**2)
# Use like: criterion = WeightedMSE(weight=0.5)

• Training Loop:
PyTorch:

for epoch in range(5):


for batch in data_loader:
optimizer.zero_grad()
outputs = model(batch)
loss = criterion(outputs, batch)
loss.backward()
optimizer.step()

TensorFlow:

for epoch in range(5):


for batch in tf_data:
with tf.GradientTape() as tape:
outputs = tf_model(batch)
loss = tf.reduce_mean(tf.square(outputs - batch))
grads = tape.gradient(loss, tf_model.trainable_variables)
optimizer.apply_gradients(zip(grads, tf_model.trainable_variables))

• Point-Cloud Networks (PointNet/Voxel):


Question: Describe the PointNet architecture.

8
Answer: PointNet processes point clouds directly by applying shared MLPs (pointwise) followed by a
symmetric max-pooling to respect permutation invariance 16 . It maps each point individually to
features, then aggregates a global feature. This avoids voxelization and is robust to point order.
Coding Idea (PyTorch): Sketch of a simple PointNet-like model:

class SimplePointNet(nn.Module):
def __init__(self):
super().__init__()
self.fc1 = nn.Linear(3, 64)
self.fc2 = nn.Linear(64, 128)
self.fc3 = nn.Linear(128, 1024)
self.fc_cls = nn.Linear(1024, 10) # e.g., 10 classes
def forward(self, x): # x is [B, N, 3]
x = F.relu(self.fc1(x))
x = F.relu(self.fc2(x))
x = F.relu(self.fc3(x)) # now [B, N, 1024]
x = torch.max(x, dim=1)[0] # symmetric max-pool -> [B, 1024]
out = self.fc_cls(x) # classification output
return out

Voxel-based Networks: One can also convert points to a 3D occupancy grid (voxels) and apply 3D
CNNs. For example, bin points into a fixed 3D grid and use nn.Conv3d . This is more memory-
intensive but straightforward.

Computer Vision and Data Handling


• Video Frame Loading (OpenCV):
Q: How to read frames from a video file in Python?
A: Use OpenCV’s cv2.VideoCapture .

import cv2
cap = cv2.VideoCapture('input_video.mp4')
frames = []
while cap.isOpened():
ret, frame = cap.read()
if not ret:
break
gray = cv2.cvtColor(frame, cv2.COLOR_BGR2GRAY)
frames.append(gray)
cap.release()
print(f"Loaded {len(frames)} frames.")

This reads each frame until the video ends. You can then preprocess frames (resize, normalize with
NumPy, etc).

9
• Image Loading (NumPy/OpenCV): Use cv2.imread() or PIL.Image to load images; convert to
NumPy arrays for processing.
Coding:

import cv2
img = cv2.imread('frame.png', cv2.IMREAD_COLOR)
img = cv2.resize(img, (224,224))
img = img.astype('float32') / 255.0 # normalize to [0,1]

• Point-Cloud File Parsing: Common formats include PLY, PCD, LAS. For ASCII PLY (Stanford triangle
format), you can read and parse manually or use libraries like open3d .
Coding Task: Read a simple ASCII PLY file (with vertices only).

import numpy as np

def read_ply_xyz(filename):
pts = []
with open(filename, 'r') as f:
header = True
for line in f:
if header:
if line.strip() == "end_header":
header = False
continue
x,y,z = map(float, line.split()[:3])
pts.append((x,y,z))
return np.array(pts)

pointcloud = read_ply_xyz('cloud.ply')
print(f"Loaded {pointcloud.shape[0]} points.")

This reads the vertex coordinates after the PLY header. For large point clouds, consider binary
readers or libraries.

Math and Optimization Routines


• Gradient Derivation:
Question: Compute the gradient of the mean squared error loss L = 12 (y − wx)2 w.r.t. w .
∂L
Answer: ∂w = (y − wx)(−x) = −x(y − wx) . This comes from the power rule and chain rule. In
general, backpropagation applies the chain rule over layers.

• Covariance Matrix: In statistics, the covariance matrix of a random vector contains covariances
between each pair of variables 17 . For data matrix X (rows are samples), one can compute:
1 ⊤
Cov(X) = n−1 X X (after zero-mean centering).

10
Coding:

import numpy as np
X = np.random.rand(5,3) # 5 samples, 3 features
X_centered = X - X.mean(axis=0)
cov = np.dot(X_centered.T, X_centered) / (X_centered.shape[0] - 1)
print("Covariance matrix:\n", cov)
# Or simply:
print("np.cov", np.cov(X, rowvar=False))

• Least Squares (Normal Equation): To solve minw ∥Xw − y∥2 , the normal equation is X T Xw =
XT y .
Coding: Use NumPy:

X = np.random.rand(10,3)
true_w = np.array([1.5, -2.0, 0.5])
y = X.dot(true_w) + 0.1*np.random.randn(10)
w_est = np.linalg.lstsq(X, y, rcond=None)[0]
print("Estimated w:", w_est)

• Gradient Descent (One Step): Gradient descent updates parameters opposite to the gradient. For
wnew = wold − α∇L .
Example: For a linear model y = wx with loss L = (ypred − y)2 , one gradient step:
Δw = −α ⋅ 2x(xw − y) .
Coding:

# One step of gradient descent for f(w)=w^2 (min at 0)


w = 5.0
lr = 0.1
grad = 2*w # derivative of w^2
w_new = w - lr * grad
print(w_new) # should be closer to 0

(Optional) General ML and Software Practices


• Overfitting vs. Underfitting: Overfitting occurs when a model memorizes training data (low train
error, high test error); underfitting is when a model is too simple (high error on both) 18 . Mitigation
includes regularization, cross-validation, and more data.
• Cross-Validation: Used to estimate generalization performance and detect overfitting. For instance,
k-fold CV splits data to train/validate and averages performance.
• Reproducibility: Use version control (e.g. Git, the de-facto standard SCM 19 ) to track code;
document dependencies; write unit tests for modules.

11
• Coding Practices: Write clear, modular code; comment and document functions; use tools like
pytest for testing; follow PEP8 style (Python).

Each of these topics can be expanded with more examples as needed. The above provides a comprehensive
baseline of theory and practice, including Python and deep learning implementations, with references to
authoritative sources 1 7 20 10 13 14 15 2 12 19 .

1 Binary search - Wikipedia


https://en.wikipedia.org/wiki/Binary_search

2 9 Time and Space Complexity Analysis of Merge Sort | GeeksforGeeks


https://www.geeksforgeeks.org/time-and-space-complexity-analysis-of-merge-sort/

3 Time and Space Complexity Analysis of Quick Sort | GeeksforGeeks


https://www.geeksforgeeks.org/time-and-space-complexity-analysis-of-quick-sort/

4 5 6 Algorithmic Concepts: Recursion Cheatsheet | Codecademy


https://www.codecademy.com/learn/algorithmic-concepts-java/modules/recursion-java/cheatsheet

7 Breadth-first search - Wikipedia


https://en.wikipedia.org/wiki/Breadth-first_search

8 20 Depth-first search - Wikipedia


https://en.wikipedia.org/wiki/Depth-first_search

10 Discrete cosine transform - Wikipedia


https://en.wikipedia.org/wiki/Discrete_cosine_transform

11 Discrete Fourier transform - Wikipedia


https://en.wikipedia.org/wiki/Discrete_Fourier_transform

12 Fast Fourier transform - Wikipedia


https://en.wikipedia.org/wiki/Fast_Fourier_transform

13 Quantization (signal processing) - Wikipedia


https://en.wikipedia.org/wiki/Quantization_(signal_processing)

14 Huffman coding - Wikipedia


https://en.wikipedia.org/wiki/Huffman_coding

15 Autograd mechanics — PyTorch 2.7 documentation


https://docs.pytorch.org/docs/stable/notes/autograd.html

16 [1612.00593] PointNet: Deep Learning on Point Sets for 3D Classification and Segmentation
https://arxiv.org/abs/1612.00593

17 Covariance matrix - Wikipedia


https://en.wikipedia.org/wiki/Covariance_matrix

18 What Is Overfitting vs. Underfitting? | IBM


https://www.ibm.com/think/topics/overfitting-vs-underfitting

19 Git - Wikipedia
https://en.wikipedia.org/wiki/Git

12

You might also like