0% found this document useful (0 votes)
94 views10 pages

ML Syllabus

This document provides a comprehensive checklist for preparing for machine learning interviews at MAANG companies, covering essential topics, resources, and question types. It emphasizes the importance of mastering fundamental concepts, practicing on real-world datasets, and understanding advanced machine learning techniques. Additionally, it outlines common pitfalls and strategies for success in interviews, including theoretical, coding, and design-based questions.

Uploaded by

Prerna Bhandari
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
94 views10 pages

ML Syllabus

This document provides a comprehensive checklist for preparing for machine learning interviews at MAANG companies, covering essential topics, resources, and question types. It emphasizes the importance of mastering fundamental concepts, practicing on real-world datasets, and understanding advanced machine learning techniques. Additionally, it outlines common pitfalls and strategies for success in interviews, including theoretical, coding, and design-based questions.

Uploaded by

Prerna Bhandari
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 10

Machine Learning Interview Preparation Checklist (MAANG Level)

CLASSES
https://www.youtube.com/playlist?list=PLKnIA16_Rmvbr7zKYQuBfsVkjoLcJgxHH

STATSQUEST
https://www.youtube.com/playlist?list=PLblh5JKOoLUICTaGLRoHQDuF_7q2GfuJF

Andrew ng
https://www.youtube.com/playlist?list=PLoROMvodv4rMiGQp3WXShtMGgzqpfVfbU
https://www.youtube.com/playlist?list=PLkDaE6sCZn6FNC6YRfRQc_FbeQrF8BwGI

STANFORD
https://www.youtube.com/playlist?list=PLoROMvodv4rNyWOpJg_Yh4NSqI4Z4vOYy
https://www.youtube.com/playlist?list=PLoROMvodv4rNH7qL6-efu_q2_bPuy0adh

Krish naik
https://www.youtube.com/playlist?list=PLTDARY42LDV7WGmlzZtY-w9pemyPrKNUZ
https://www.youtube.com/playlist?list=PLZoTAELRMXVPBTrWtJkn3wWQxZkmTXGwe

MIT
https://openlearninglibrary.mit.edu/courses/course-v1:MITx+6.036+1T2019/course/

REVISION
https://www.youtube.com/playlist?list=PLfFghEzKVmjsNtIRwErklMAN8nJmebB0I
SIMPLILEARN
https://www.youtube.com/playlist?list=PLEiEAq2VkUULYYgj13YHUWmRePqiu8Ddy

NOTES
https://www.geeksforgeeks.org/100-days-of-machine-learning/?ref=shm

QUESTIONS
https://www.geeksforgeeks.org/machine-learning-interview-questions/?ref=shm

1. Machine Learning Fundamentals


✅ Topics:
 Supervised vs. Unsupervised Learning
 Types of Machine Learning Algorithms
 Bias-Variance Tradeoff
 Curse of Dimensionality
 Overfitting & Underfitting
 Model Interpretability (SHAP, LIME, Feature Importance)
✅ Question Types:
 Explain key ML concepts theoretically
 Given a dataset, which ML approach would you choose and why?
 Coding problems (train a model using Scikit-Learn, tune hyperpar
 ameters)
 Solve a bias-variance tradeoff numerical problem
 Compare overfitting vs. underfitting in a given scenario

✅ Depth: Intermediate to Advanced


✅ Common Pitfalls: Misidentifying when a model is overfitting or underfitting,
misunderstanding interpretability methods

Fundamentals of Machine Learning


 Supervised Learning (Regression, Classification)
 Unsupervised Learning (Clustering, Dimensionality Reduction)
 Semi-supervised and Self-supervised Learning
 Reinforcement Learning Basics
II. Statistical Foundations
 Probability Distributions & Bayes Theorem
 Hypothesis Testing & Confidence Intervals
 Maximum Likelihood Estimation (MLE) & MAP Estimation
 Bias-Variance Tradeoff
 Overfitting, Underfitting, and Regularization
III. Model Evaluation & Metrics
 Precision, Recall, F1-Score, ROC-AUC, Log-loss
 Cross-validation Techniques (k-fold, Stratified, LOOCV)
 Bootstrap Sampling
 A/B Testing
IV. Feature Engineering & Data Preprocessing
 Handling Missing Data, Outliers
 Feature Scaling, Encoding Categorical Variables
 Feature Selection & Dimensionality Reduction (PCA, LDA, t-SNE)
 Feature Engineering for Text & Time-Series Data

2. Linear Regression
✅ Topics:
 Ordinary Least Squares (OLS)
 Assumptions of Linear Regression
 Multicollinearity, Autocorrelation
 R-squared & Adjusted R-squared
 Regularization (Ridge & Lasso)
✅ Question Types:
 Theoretical: Explain assumptions, impact of violating assumptions
 Numerical: Compute OLS manually, interpret coefficients
 Coding: Fit a regression model, handle multicollinearity, use L1/L2 regularization
 Application: How does multicollinearity affect feature importance?
✅ Depth: Advanced
✅ Common Pitfalls: Assuming linearity in all data, incorrect handling of categorical
variables

3. Logistic Regression & Classification Metrics


✅ Topics:
 Sigmoid Function & Decision Boundary
 Assumptions of Logistic Regression
 ROC Curve, AUC, Precision-Recall, F1-score
 Handling Imbalanced Data (SMOTE, Class Weights)
✅ Question Types:
 Explain log odds and probability interpretation
 Calculate precision, recall, F1-score for given confusion matrix
 Coding: Implement logistic regression, tune hyperparameters, handle class
imbalance
✅ Depth: Advanced
✅ Common Pitfalls: Incorrect metric choice for imbalanced classification tasks

4. Decision Trees & Ensemble Methods


✅ Topics:
 Gini Index, Entropy, Information Gain
 Overfitting in Decision Trees, Pruning
 Bagging, Boosting (Random Forest, XGBoost, LightGBM, CatBoost)
 Feature Importance & SHAP Values
✅ Question Types:
 Explain Gini vs. Entropy
 Manually compute splits for a decision tree
 Implement and optimize Random Forest/XGBoost in Python
✅ Depth: Advanced
✅ Common Pitfalls: Ignoring feature selection in tree-based models

5. Support Vector Machines (SVM)


✅ Topics:
 Hyperplanes, Margin Maximization
 Kernel Trick (Polynomial, RBF)
 SVM vs. Logistic Regression
✅ Question Types:
 Explain kernel trick and why it’s needed
 Compute margin width given support vectors
 Coding: Implement SVM with different kernels
✅ Depth: Intermediate to Advanced
✅ Common Pitfalls: Misunderstanding kernel functions and their impact on computational
cost

6. k-Nearest Neighbors (kNN) & Clustering


✅ Topics:
 Distance Metrics (Euclidean, Manhattan, Cosine)
 Choosing the Right k
 DBSCAN, Hierarchical Clustering
 Silhouette Score, Elbow Method
✅ Question Types:
 Compute distance metrics manually
 Apply kNN for classification in Python
 Explain when k-means fails and how to fix it
✅ Depth: Intermediate
✅ Common Pitfalls: Not normalizing data before applying kNN

7. Principal Component Analysis (PCA) & Dimensionality Reduction


✅ Topics:
 Eigenvalues & Eigenvectors
 Explained Variance Ratio
 When to use PCA vs. t-SNE vs. UMAP
✅ Question Types:
 Manually compute PCA transformation
 Implement PCA in Python & interpret explained variance
 Compare PCA vs. t-SNE in different scenarios
✅ Depth: Advanced
✅ Common Pitfalls: Applying PCA blindly without considering feature interpretability

8. Probability & Bayesian Methods in ML


✅ Topics:
 Naïve Bayes Classifier
 Maximum Likelihood Estimation (MLE)
 Bayesian Inference & Bayesian Optimization
✅ Question Types:
 Explain Naïve Bayes assumptions
 Compute posterior probability manually
 Implement Naïve Bayes in Python
✅ Depth: Intermediate to Advanced
✅ Common Pitfalls: Misunderstanding conditional independence assumption

9. Neural Networks & Deep Learning Basics


✅ Topics:
 Perceptron, Activation Functions
 Backpropagation, Gradient Descent (SGD, Adam)
 CNNs vs. RNNs vs. Transformers
✅ Question Types:
 Explain forward & backward propagation
 Implement a simple MLP in TensorFlow/PyTorch
 Compute gradient updates manually
✅ Depth: Advanced
✅ Common Pitfalls: Choosing incorrect learning rate, overfitting due to excessive
parameters

10. Model Evaluation & Hyperparameter Tuning


✅ Topics:
 Cross-Validation (K-Fold, Stratified)
 Grid Search vs. Random Search vs. Bayesian Optimization
 Performance Metrics for Regression & Classification
✅ Question Types:
 Compute k-fold CV manually
 Implement hyperparameter tuning in Python
 Explain when to use F1-score vs. ROC-AUC
✅ Depth: Advanced
✅ Common Pitfalls: Ignoring cross-validation, misunderstanding metric trade-offs

11. Feature Engineering & Data Preprocessing


✅ Topics:
 One-Hot Encoding vs. Label Encoding
 Handling Missing Values (Mean Imputation, KNN Imputation)
 Feature Scaling (Min-Max, Standardization)
✅ Question Types:
 Coding: Preprocess raw dataset for ML model
 Theoretical: How does feature scaling affect distance-based models?
✅ Depth: Advanced
✅ Common Pitfalls: Incorrect encoding of categorical variables leading to data leakage

Strategies & Tips for Success


✅ Prioritize Depth Over Breadth: MAANG companies expect in-depth knowledge. Master
fundamental concepts deeply rather than superficially covering everything.
✅ Practice on Real-world Datasets: Kaggle competitions & real-world projects are the
best way to gain hands-on experience.
✅ Improve Speed in Manual Calculations: Expect questions where you must calculate
probabilities, entropy, or regression coefficients manually.
✅ Optimize Code for Efficiency: Focus on writing clean, efficient Python code using
NumPy, Pandas, and Scikit-Learn.
✅ Mock Interviews & Whiteboarding: Simulate real interview conditions by solving
problems on a whiteboard or explaining your thought process aloud.
✅ Stay Updated with Research: Read ML papers, follow top researchers, and stay up-to-
date with advancements in ML techniques.

Advanced Machine Learning Curriculum for MAANG Interviews

1. Core Topics
A structured breakdown of key machine learning concepts with Python:
I.
V. Machine Learning Algorithms
 Linear & Logistic Regression
 Decision Trees & Random Forests
 Support Vector Machines (SVMs)
 k-Nearest Neighbors (k-NN)
 Gradient Boosting (XGBoost, LightGBM, CatBoost)
 Neural Networks & Deep Learning Fundamentals
 Bayesian Methods & Probabilistic Graphical Models
VI. Advanced Topics
 Generative Models (GANs, VAEs)
 Transformer-based Architectures (BERT, GPT)
 Reinforcement Learning (DQN, PPO, Actor-Critic)
 AutoML & Hyperparameter Optimization
 Explainable AI (SHAP, LIME, Feature Importance)

2. Question Types
For each topic, expect questions in the following categories:
 Theoretical Questions (Explain concepts, trade-offs, comparisons)
 Conceptual Problem Solving (Derive equations, intuitive explanations)
 Coding Challenges (Implement models from scratch, optimize algorithms)
 Debugging Questions (Identify issues in ML models and pipelines)
 Design Patterns (ML system design, model scalability)
 Simulation-based Questions (Analyze hypothetical scenarios)
 Best Practices & Trade-offs (Optimize performance, handle real-world challenges)

3. Depth & Real-World Application Examples


Each topic requires different levels of understanding:
 Basic: Implementing models in Python using sklearn, numpy, pandas
 Intermediate: Hyperparameter tuning, improving model performance
 Advanced: End-to-end ML systems, deployment, distributed training
Examples:
 Bias-Variance Tradeoff → Tuning regularization in Ridge Regression
 Model Evaluation → Choosing metrics for imbalanced datasets
 Feature Engineering → Creating domain-specific features for NLP

4. Common Pitfalls
 Confusing correlation with causation
 Over-reliance on accuracy as a metric
 Ignoring data leakage
 Misinterpreting bias-variance tradeoff
 Improper handling of class imbalance
 Poor feature selection leading to model degradation

5. Advanced Applications
 NLP: Transformer-based models, sentiment analysis
 Computer Vision: CNN architectures, object detection
 Time Series: Forecasting with LSTMs, Prophet
 Recommendation Systems: Collaborative filtering, matrix factorization
 AutoML: Automating model selection & hyperparameter tuning
 ML in Production: MLOps, model deployment, monitoring drift

6. Practice & Resources


 Books:
o "Pattern Recognition and Machine Learning" - Christopher Bishop
o "The Elements of Statistical Learning" - Hastie, Tibshirani, Friedman
o "Deep Learning" - Ian Goodfellow, Yoshua Bengio, Aaron Courville
o "Hands-On Machine Learning with Scikit-Learn, Keras & TensorFlow" -
Géron
 Courses:
o Andrew Ng’s ML & DL Courses (Coursera)
o Stanford CS229 (ML), CS231n (CV), CS224n (NLP)
 Coding Platforms:
o LeetCode (ML & SQL problems)
o Kaggle (Competitions & Datasets)
o Google Colab (Python Notebooks)
 Research Papers:
o "Attention Is All You Need" (Transformers)
o "ImageNet Classification with Deep Convolutional Networks" (AlexNet)
o "Auto-Encoding Variational Bayes" (VAEs)

7. Interview Strategy
Structuring Answers
 Theoretical: Define → Explain → Example → Use Cases
 Coding: Clarify → Plan → Implement → Optimize → Test
 ML System Design: Problem Breakdown → Data Pipeline → Model Choice →
Deployment Plan → Scaling
Common Tricks & Best Practices
 Think aloud to demonstrate reasoning
 Trade-offs: When to use complex vs. simple models
 Debugging: Find model errors via loss curves, feature importance
 Edge cases: Handle extreme values, unseen categories
Time Management
 Theoretical Qs: Answer in 2-3 min
 Coding Qs: Allocate time (15 min for approach, 30 min for coding, 15 min for
debugging & optimization)

1. Types of Questions in MAANG ML Interviews


1.1 Theoretical Questions
 What is the difference between supervised, unsupervised, and reinforcement
learning?
 Explain overfitting and underfitting. How can you prevent them?
 What is bias-variance tradeoff?
 What are precision, recall, F1-score, and accuracy?
 Explain feature selection techniques.
 What are different types of distance metrics (Euclidean, Manhattan, Cosine
Similarity)?
 Explain different types of activation functions in neural networks.
 What are loss functions and how do you choose them?
1.2 Conceptual Questions
 How do decision trees work? What are entropy and information gain?
 When should you use a support vector machine vs. a neural network?
 How does gradient descent work, and what are different variants like SGD, Adam,
RMSprop?
 How do convolutional neural networks (CNNs) work?
 Explain Principal Component Analysis (PCA) and when to use it.
 What is transfer learning, and why is it useful?
1.3 Numerical Problems
 Given a dataset, manually calculate Gini impurity and entropy for a decision tree.
 Compute precision, recall, and F1-score from a confusion matrix.
 Calculate the number of parameters in a neural network given layer dimensions.
 Compute eigenvalues and eigenvectors for PCA manually.
 Given a probability distribution, calculate expected value and variance.
1.4 Coding-Based Questions
 Implement logistic regression from scratch.
 Write a function to calculate the Euclidean distance between two points.
 Implement k-means clustering from scratch.
 Build a simple neural network using NumPy only (no deep learning libraries).
 Write code to preprocess missing values in a dataset.
 Implement gradient descent for linear regression.
1.5 Simulation-Based Questions
 How would you detect fraudulent transactions using machine learning?
 Design a recommendation system for an e-commerce website.
 How would you use ML to classify spam emails?
 Predict stock prices using machine learning models.
 Apply clustering for customer segmentation in a marketing campaign.
1.6 Design-Based Questions
 How would you design a system that can predict demand for an online grocery
store?
 How would you build a scalable recommendation system for Netflix?
 How do you design an ML pipeline for real-time fraud detection?
 How do you handle concept drift in a production ML model?
1.7 Debugging Questions
 Why is your deep learning model not converging?
 How do you fix a random forest that is overfitting?
 Your logistic regression model has low accuracy; what steps would you take?
 Why is your gradient descent algorithm getting stuck in local minima?
1.8 Pattern-Based Questions
 Identify edge cases for an anomaly detection model.
 How do you handle unbalanced datasets in classification?
 What are common pitfalls in hyperparameter tuning?
 How do you choose the right ML model for different types of data?
1.9 Optimization Problems
 How would you reduce the computational cost of training deep learning models?
 How can you optimize hyperparameters efficiently?
 How do you speed up inference time for a large-scale ML model?
 How do you optimize memory usage when working with large datasets?
1.10 Application-Based Questions
 How is ML used in self-driving cars?
 How does YouTube recommend videos?
 How does Google Search use ML?
 How does ML improve cybersecurity threat detection?
2. Common Pitfalls in ML Interviews
 Failing to explain why a specific model is chosen over others.
 Ignoring feature engineering and assuming raw data is good enough.
 Not considering edge cases and failure scenarios in real-world applications.
 Struggling with debugging when a model doesn’t perform well.
 Over-reliance on deep learning when simpler models work better.
 Not understanding time complexity and scalability of ML algorithms.
 Poorly explaining trade-offs in model selection.

3. Best Resources for Mastering ML


3.1 Books
 "Pattern Recognition and Machine Learning" – Christopher Bishop
 "The Elements of Statistical Learning" – Hastie, Tibshirani, Friedman
 "Hands-On Machine Learning with Scikit-Learn, Keras, and TensorFlow" – Aurélien
Géron
 "Deep Learning" – Ian Goodfellow
3.2 Videos & Courses
 Andrew Ng's Machine Learning Course (Coursera) – Link
 Fast.ai Practical Deep Learning for Coders – Link
 MIT OpenCourseWare on ML – Link
3.3 PPTs and Notes
 Stanford CS229 ML Notes – Link
 MIT Deep Learning Lecture Slides – Link
3.4 Question Banks
 Machine Learning Interviews Book by Chip Huyen – Link
 Interview Query – Link
3.5 Coding Platforms
 LeetCode (ML Section) – Link
 Kaggle ML Competitions – Link
 HackerRank ML Challenges – Link
3.6 Research Papers
 "Attention is All You Need" – Vaswani et al. (Transformers)
 "Batch Normalization: Accelerating Deep Network Training"
 "Gradient-Based Optimization of Hyperparameters"

4. Practice Strategy
4.1 Learning Approach
 Start with theoretical understanding: Watch Andrew Ng’s ML course.
 Follow with hands-on coding: Implement algorithms in Python from scratch.
 Work on real-world datasets: Use Kaggle and UCI ML Repository.
4.2 Balancing Theory and Coding
 Morning: Read ML concepts from books/resources.
 Afternoon: Solve coding problems on LeetCode/Kaggle.
 Evening: Work on ML projects or case studies.
4.3 Handling Edge Cases & Complexity
 Identify corner cases in classification problems (e.g., class imbalance).
 Optimize models for both accuracy and efficiency.
 Learn parallelization and distributed computing (Spark, Dask).
4.4 Preparing for Follow-Up Questions
 Be ready to explain how to improve an ML model step-by-step.
 Justify why you selected a model based on trade-offs.
 Discuss challenges faced in real-world ML deployment.

Final Words
By following this structured approach, you'll be well-prepared for any ML-related interview
question at MAANG companies. Focus on deep conceptual understanding, hands-on
coding, and real-world application to gain confidence in handling diverse ML problems.
Would you like additional mock interview questions tailored to your specific weak areas? 😊

You might also like