Data Science Roadmap
Data Science Roadmap
1
Day 5 (Saturday, July 05, 2025)
Topic: Python data structures (dictionaries, sets)
Key Concepts: Dictionary key-value pairs, accessing/updating, set operations (union, intersection)
Tools/Libraries Introduced: --
Outcome by EOD: Use dict and set data structures to organize data and remove duplicates.
Hands-On Exercise: Count word frequencies in a string using a dict; demonstrate set operations.
Deliverable: Notebook with dictionary and set examples (word count, set uniqueness).
Week 1 Project
Title: Python Fundamentals Practice
Summary: Build a basic Python script or notebook that demonstrates use of variables, loops, and data
structures on a small data processing task (e.g., text word count or simple calculation).
Tools Used: Python, Jupyter Notebook
Expected Deliverable: A GitHub repository or notebook with code, showing Python fundamentals in action.
2
push to GitHub.
Deliverable: GitHub repository with initial commits of project code.
3
Day 13 (Sunday, July 13, 2025)
Topic: Data wrangling with Pandas
Key Concepts: DataFrame indexing, selection, filtering, handling missing values, adding/removing columns
Tools/Libraries Introduced: Pandas
Outcome by EOD: Manipulate DataFrames by selecting subsets, handling nulls, and creating new features.
Hands-On Exercise: Use Pandas to filter rows based on condition, fill or drop missing data, and create a new
column (e.g., total from components).
Deliverable: Notebook demonstrating Pandas data wrangling tasks.
Week 2 Project
Title: Git and Data Exploration Project
Summary: Create a GitHub repo, add a dataset (e.g., Iris or Titanic CSV), commit code that loads data with
Pandas, cleans it, and produces initial plots (e.g., histograms or scatter plots).
Tools Used: Python, Pandas, Matplotlib/Seaborn, Git
Expected Deliverable: GitHub repo with code and visualizations showing data loading, cleaning, and basic
plots.
4
Outcome by EOD: Create pair plots and heatmaps to explore relationships; visualize categories.
Hands-On Exercise: Generate a pairplot of numeric features, a correlation heatmap, and boxplots for
categorical vs numeric data.
Deliverable: Notebook with visual exploration plots (pairplot, heatmap, boxplots).
5
groups.
Deliverable: Notebook with descriptive stats and t-test results.
Week 3 Project
Title: Exploratory Data Analysis Project
Summary: Conduct an end-to-end exploratory analysis on a chosen dataset, including data cleaning,
visualization, and basic statistical insights.
Tools Used: Python, Pandas, Matplotlib/Seaborn
Expected Deliverable: A well-documented Jupyter notebook (on GitHub) demonstrating the EDA process
with visuals and observations.
6
Problem Statement: Analyze the Titanic passenger dataset to understand factors that influenced survival
rates.
Skills Integrated: Python fundamentals, data cleaning, Pandas operations, data visualization, basic statistics
Deliverables: A comprehensive notebook (or report) with data loading, cleaning steps, visualizations
(survival by class, gender, age distribution) and interpretations.
Bonus/Stretch Goal: Implement a simple logistic regression model to predict survival and evaluate its
accuracy.
7
Hands-On Exercise: Apply LogisticRegression on a sample classification dataset (e.g., Iris binary classes or
Titanic data prepared).
Deliverable: Notebook with code training logistic regression and predicting classes.
Week 5 Project
Title: Regression and Classification Models
Summary: Develop a regression model (predicting numeric target) and a classification model (predicting
categorical target) using scikit-learn. Include data splitting, model training, and evaluation.
Tools Used: Python, scikit-learn, Pandas, Matplotlib
Expected Deliverable: A notebook containing data preprocessing, model training, and performance
evaluation for both a regression and a classification task.
8
Day 31 (Thursday, July 31, 2025)
Topic: Feature Engineering – Encoding and Scaling
Key Concepts: One-hot encoding for categorical data, label encoding, feature scaling (standardization,
normalization)
Tools/Libraries Introduced: scikit-learn (OneHotEncoder, StandardScaler)
Outcome by EOD: Transform categorical features and scale numeric features appropriately.
Hands-On Exercise: Use pandas and sklearn to one-hot encode a categorical column and scale numeric
columns of a dataset.
Deliverable: Notebook showing encoded and scaled features ready for modeling.
9
Day 35 (Monday, August 04, 2025)
Topic: Ensemble Methods – Random Forest
Key Concepts: Bagging, random forests, out-of-bag error, feature importance
Tools/Libraries Introduced: scikit-learn (RandomForestClassifier/Regressor)
Outcome by EOD: Build a random forest ensemble and interpret its feature importances.
Hands-On Exercise: Train a RandomForest on a dataset, extract feature importances, and compare
performance to a single decision tree.
Deliverable: Notebook with random forest results and feature importance plot.
Week 6 Project
Title: ML Pipeline with Ensemble Model
Summary: Integrate feature engineering, pipeline creation, and an ensemble model (random forest or
boosted trees) on a real dataset. Include hyperparameter tuning and evaluation.
Tools Used: Python, pandas, scikit-learn
Expected Deliverable: A notebook or script on GitHub demonstrating the full ML pipeline and ensemble
model performance with commentary.
10
Day 38 (Thursday, August 07, 2025)
Topic: Ensemble Methods – XGBoost and Gradient Boosting
Key Concepts: Gradient Boosting Machines, XGBoost algorithm, tuning boosting parameters
Tools/Libraries Introduced: XGBoost or LightGBM, scikit-learn interface
Outcome by EOD: Train an XGBoost model and understand its performance improvements.
Hands-On Exercise: Install xgboost and train an XGBClassifier on a dataset, compare with Random Forest.
Deliverable: Notebook with XGBoost model training and performance comparison.
11
Outcome by EOD: Visualize high-dimensional data structure in 2D.
Hands-On Exercise: Use t-SNE on a dataset (e.g., MNIST or CIFAR reduced features) and plot clusters in 2D.
Deliverable: Notebook with t-SNE/UMAP scatter plot of data clusters.
Week 7 Project
Title: Clustering and Dimensionality Reduction Project
Summary: Use K-means, hierarchical clustering, and PCA/t-SNE on a real dataset to segment the data and
visualize clusters. Optionally include anomaly detection.
Tools Used: Python, scikit-learn, matplotlib/seaborn
Expected Deliverable: A Jupyter notebook showing data clusters, plots of reduced dimensions, and
interpretation of clusters.
12
Day 46 (Friday, August 15, 2025)
Topic: Review – Data Manipulation and Visualization
Key Concepts: Pandas DataFrame operations, Matplotlib/Seaborn plot types
Tools/Libraries Introduced: --
Outcome by EOD: Solidify understanding of data wrangling and visualization techniques.
Hands-On Exercise: Summarize how to handle missing data, and create one example plot of each type
learned.
Deliverable: Notebook or notes with sample visualizations and data cleaning steps.
13
Day 50 (Tuesday, August 19, 2025)
Topic: Review – Key Tools and Libraries
Key Concepts: Summary of packages used, best practices
Tools/Libraries Introduced: --
Outcome by EOD: Consolidate familiarity with tools.
Hands-On Exercise: Create flashcards or brief notes summarizing the purpose of libraries used (pandas,
numpy, sklearn, etc.) and commands learned.
Deliverable: Collection of flashcards or summary notes.
Week 8 Project
Title: Integrated Data Analysis Challenge
Summary: Conduct a full data analysis cycle on a dataset, combining data cleaning, visualization, and a
basic model, showcasing everything learned so far.
Tools Used: Python, Pandas, Scikit-learn, Matplotlib/Seaborn
Expected Deliverable: A notebook/report demonstrating the entire workflow from raw data to initial insights
and model.
14
Outcome by EOD: Solidify understanding of data wrangling and visualization techniques.
Hands-On Exercise: Summarize how to handle missing data, and create one example plot of each type
learned.
Deliverable: Notebook or notes with sample visualizations and data cleaning steps.
15
numpy, sklearn, etc.) and commands learned.
Deliverable: Collection of flashcards or summary notes.
Week 8 Project
Title: Integrated Data Analysis Challenge
Summary: Conduct a full data analysis cycle on a dataset, combining data cleaning, visualization, and a
basic model, showcasing everything learned so far.
Tools Used: Python, Pandas, Scikit-learn, Matplotlib/Seaborn
Expected Deliverable: A notebook/report demonstrating the entire workflow from raw data to initial insights
and model.
16
Day 61 (Saturday, August 30, 2025)
Topic: Introduction to Neural Networks and Deep Learning
Key Concepts: Perceptron, neurons, activation functions, architecture of neural networks
Tools/Libraries Introduced: TensorFlow and PyTorch installation
Outcome by EOD: Understand perceptron model and set up deep learning frameworks.
Hands-On Exercise: Implement a simple perceptron in Python; install TensorFlow and PyTorch libraries.
Deliverable: Notebook with a perceptron example and proof of installation (versions printed).
17
Hands-On Exercise: Choose a dataset (e.g., MNIST or Fashion-MNIST), build a simple neural network in
TensorFlow or PyTorch, train it, and evaluate accuracy.
Deliverable: Notebook with neural network training and test accuracy on the chosen dataset.
Week 9 Project
Title: Handwritten Digit Recognition with Neural Network
Summary: Train a simple feedforward neural network on the MNIST dataset to classify handwritten digits.
Tools Used: Python, TensorFlow (or PyTorch)
Expected Deliverable: A trained neural network model and a report of its performance (accuracy, confusion
matrix).
Week 10 Project
Title: CNN for Image Classification
Summary: Implement and train a convolutional neural network on an image classification dataset, and
report performance improvements over a simple MLP.
Tools Used: Python, Keras (TensorFlow) or PyTorch, Matplotlib
Expected Deliverable: Trained CNN model and evaluation metrics in a notebook.
18
Hands-On Exercise: Use an RNN (e.g., SimpleRNN in Keras) on a toy sequence dataset (e.g., binary sequence
classification).
Deliverable: Notebook with RNN model and results on the sequence task.
Week 11 Project
Title: Sentiment Analysis with LSTM
Summary: Fine-tune an LSTM network on a text classification task (e.g., IMDB sentiment analysis) and
evaluate its performance.
Tools Used: Python, Keras or PyTorch, NLTK or similar for text preprocessing
Expected Deliverable: A trained LSTM model and a notebook showing test accuracy and confusion matrix.
19
Day 72 (Wednesday, September 10, 2025)
Topic: Advanced NLP – Using Pretrained Models
Key Concepts: Tokenization, embeddings, fine-tuning pre-trained models (BERT/GPT)
Tools/Libraries Introduced: Hugging Face Transformers (BertTokenizer, model classes)
Outcome by EOD: Tokenize text data and fine-tune a BERT model for classification.
Hands-On Exercise: Fine-tune BERT on a small text classification problem (e.g., sentiment or topic
classification) and evaluate.
Deliverable: Notebook with tokenization, model fine-tuning code, and evaluation metrics.
Week 12 Project
Title: Advanced Neural Network Project
Summary: Demonstrate mastery of advanced deep learning by implementing either a transformer-based
NLP model or a generative model (GAN/VAE) and analyzing results.
20
Tools Used: Python, TensorFlow/PyTorch, Hugging Face Transformers
Expected Deliverable: A notebook showing the use of the chosen advanced model, training process, and
evaluation or generated samples.
21
Week 13 Project
Title: Time Series Forecasting Challenge
Summary: Forecast a real-world time series (e.g., temperature, sales) using ARIMA and LSTM; compare their
performance.
Tools Used: Python, statsmodels, TensorFlow/Keras
Expected Deliverable: A notebook with forecasts, plots of actual vs predicted, and evaluation metrics for
each method.
22
training on GPU.
Deliverable: Screenshot or log of training on a cloud GPU environment.
23
Day 88 (Friday, September 26, 2025)
Topic: Final Project Preparation
Key Concepts: Integrating skills, project planning
Tools/Libraries Introduced: --
Outcome by EOD: Plan the capstone deliverables and prepare any last-minute needs.
Hands-On Exercise: Review all project requirements and prepare final materials; double-check environment
and code.
Deliverable: Updated README with final capstone plan and dataset links.
24
Day 92 (Tuesday, September 30, 2025)
Topic: Fine-Tuning Pretrained Models
Key Concepts: Retraining last layers, freezing base layers
Tools/Libraries Introduced: Keras functional API for fine-tuning
Outcome by EOD: Fine-tune a pre-trained network on a new dataset.
Hands-On Exercise: Freeze convolutional base of a pretrained model and train new dense layers on a small
dataset.
Deliverable: Notebook showing fine-tuning process and training results.
25
Outcome by EOD: Visualize what features influence model predictions.
Hands-On Exercise: Use SHAP to explain predictions of a trained model (e.g., tree or neural net) on sample
data.
Deliverable: Notebook with SHAP plots and explanations.
Week 14 Project
Title: Image Classification with Transfer Learning
Summary: Use a pretrained CNN to build an image classifier on a custom dataset (e.g., cats vs dogs), fine-
tune it, and evaluate performance.
Tools Used: Python, TensorFlow/Keras (or PyTorch)
Expected Deliverable: A trained model and a notebook documenting the transfer learning process and
results.
26
Day 100 (Wednesday, October 08, 2025)
Topic: GPT for Text Generation
Key Concepts: Transformer-based text generation, GPT-2/3
Tools/Libraries Introduced: Hugging Face (GPT-2), OpenAI API (optional)
Outcome by EOD: Generate coherent text using a pre-trained GPT model.
Hands-On Exercise: Use Hugging Face to generate text continuations from a prompt using GPT-2.
Deliverable: Notebook with prompts and generated text samples.
27
Hands-On Exercise: Use a pre-trained BERT/GPT model for a new NLP task (e.g., text summarization) or train
an RL agent on a custom environment.
Deliverable: Notebook showing results of the advanced model on the chosen task.
Week 15 Project
Title: NLP/AI Integration Project
Summary: Use an advanced model (e.g., GPT-2 for story generation or a RL agent for a game) and analyze
its outputs.
Tools Used: Python, Hugging Face Transformers or RL library
Expected Deliverable: A demonstration of the model's output on a complex task and discussion of its
performance.
28
Day 108 (Thursday, October 16, 2025)
Topic: Meta-Learning and AutoML (Overview)
Key Concepts: Learning to learn, automated model selection
Tools/Libraries Introduced: AutoKeras or Google AutoML (conceptual)
Outcome by EOD: Explore AutoML tools briefly.
Hands-On Exercise: Try an AutoML library (e.g., AutoKeras) on a small dataset.
Deliverable: Notebook showing AutoML usage and results.
Week 16 Project
Title: Advanced AI Project
Summary: Use one of the latest AI technologies or models (e.g., DQN agent, AutoML, GPT) on a suitable
task, demonstrating its potential.
Tools Used: Python, relevant libraries (e.g., Stable Baselines, Hugging Face)
29
Expected Deliverable: A report or notebook showing the application of the advanced AI method and
outcome analysis.
30
Day 116 (Friday, October 24, 2025)
Topic: Interview Question Practice
Key Concepts: Common ML/AI interview scenarios
Tools/Libraries Introduced: --
Outcome by EOD: Prepare answers for potential questions.
Hands-On Exercise: Write down answers or give a mock response to 3 typical interview questions (e.g.,
explain bias-variance, describe a project).
Deliverable: Document or recorded answers to the questions.
31
Day 120 (Tuesday, October 28, 2025)
Topic: Month 4 Capstone Project – Advanced AI Application
Problem Statement: Integrate multiple advanced AI techniques into a comprehensive solution (e.g., object
detection with a deep CNN, or a language model application).
Skills Integrated: Deep learning, transfer learning, model optimization, deployment considerations
Deliverables: A final project notebook or report implementing the solution, including code, visualizations,
and analysis of results.
Bonus/Stretch Goal: Deploy the solution as a live demo or incorporate an additional feature (e.g., real-time
inference, user interface).
32
Outcome by EOD: Deploy a trained model as a REST API in a Docker container.
Hands-On Exercise: Take a small trained model (e.g., sklearn classifier), write a Flask app for prediction,
containerize it.
Deliverable: Dockerfile for the model API and demo of it responding to a request.
Week 17 Project
Title: ML Model in Docker
Summary: Containerize a machine learning model by wrapping it in a Flask or FastAPI web service inside
Docker.
Tools Used: Python, Flask or FastAPI, Docker
Expected Deliverable: A Docker image that exposes a prediction endpoint, with usage instructions.
33
Day 128 (Wednesday, November 05, 2025)
Topic: CI/CD Concepts
Key Concepts: Continuous Integration/Continuous Deployment pipelines, automation
Tools/Libraries Introduced: GitHub Actions (or Travis CI)
Outcome by EOD: Understand how code changes trigger automated pipelines.
Hands-On Exercise: Write a basic CI workflow (e.g., GitHub Actions YAML) that runs tests or lints the code.
Deliverable: Workflow file and passing build for a sample repo.
34
Hands-On Exercise: (Conceptual) Describe how you would run a distributed training job on Kubernetes.
Deliverable: Diagram or description of a training job setup.
Week 18 Project
Title: Continuous Integration for ML Project
Summary: Create a CI workflow for an ML repo that automatically builds and tests code changes.
Tools Used: GitHub Actions (or Travis/Jenkins)
Expected Deliverable: A working CI pipeline definition and documentation on how it works.
35
Outcome by EOD: Create a simple interactive app.
Hands-On Exercise: Build a Streamlit app that takes user input and shows a model prediction or data
visualization.
Deliverable: Streamlit app code and a link to deployed app or screenshot.
Week 19 Project
Title: ML Model Web App
Summary: Deploy an existing trained model as a web app or API (using Streamlit, Gradio, or Flask) for
interactive use.
Tools Used: Python, chosen web framework, deployed on local server or cloud
Expected Deliverable: A hosted app or API endpoint with usage instructions.
36
Day 140 (Monday, November 17, 2025)
Topic: A/B Testing for Models
Key Concepts: Comparing multiple model versions with live traffic
Tools/Libraries Introduced: --
Outcome by EOD: Learn how to evaluate models in production.
Hands-On Exercise: Design an A/B test plan for two model versions (describe how you'd split traffic and
measure).
Deliverable: Written A/B testing strategy outline.
Week 20 Project
Title: End-to-End ML Deployment
Summary: Demonstrate an end-to-end ML system from model training to deployment and monitoring.
Include an example of how the model is served and how its performance is tracked over time.
Tools Used: Python, Docker/Cloud, monitoring tools (Prometheus/Grafana)
Expected Deliverable: Documentation of the complete pipeline and any dashboards or logs produced.
37
Hands-On Exercise: Create a set of 5 quiz questions covering Docker, Kubernetes, CI/CD, and cloud.
Deliverable: Quiz questions and answers in a text file.
38
Day 148 (Tuesday, November 25, 2025)
No specific task (buffer for capstone work)
39
Outcome by EOD: Prepare portfolio materials.
Hands-On Exercise: Write a blog post summarizing one of your projects; ensure code is organized in a
GitHub repo.
Deliverable: Completed blog post markdown and updated GitHub repository links.
40
Day 158 (Friday, December 05, 2025)
No specific task (final presentations and submission)
41