roadmap-to-crack-data-science-ml-interviews
roadmap-to-crack-data-science-ml-interviews
Science/ML Interviews
This roadmap, constructed from
information solely from the sources
you provided, offers a detailed plan
to guide you on your journey to
becoming a data scientist.
Anwar Haq
Sr. Data Scientist @ Cisco
Ex-Meta | Ex-Wells Fargo
02
Introduction
Entering the World of Data Science
Data Scientist:
Data Architect:
Designs and manages the organization's data,
ensuring data accessibility and well-organised
metadata.
Research and
Networking
Laying the Groundwork
Identifying Target Companies
Identify Key Skills and Qualifications: Recurring technical skills (Python, SQL, machine
learning) and soft skills (communication, problem-solving).
Understand Role Responsibilities: What does the daily work entail? Does it align with
your interests and skills?
Analyse Job Titles: Different titles for similar roles ("Data Analyst" vs. "Business
Analyst").
Evaluate Job Requirements: Tailor your resume and cover letter accordingly.
Note Industry-Specific Requirements: E.g., healthcare data science roles may require
knowledge of HIPAA regulations.
Building Your
Network
Networking is vital for your job search and
career development.
Building
Foundational
Skills
Statistics and Mathematics
1. Descriptive Statistics
Summarizing and describing data using measures like:
2. Inferential Statistics
Making inferences and predictions about a population based on a sample.
Key concepts:
3. Statistical Tests
The depth of your statistical knowledge will depend on your chosen data
science role.
Resources
Books: "Practical Statistics for Data Scientists" provides a beginner-friendly introduction
with Python examples.
Online Courses: Platforms like Coursera and edX offer statistics courses tailored for data
science, such as "Statistics with R" and "Inferential Statistics."
YouTube Channels: StatQuest with Josh Starmer offers clear and engaging explanations
of statistical concepts.
Time allocation:
Allocate approximately 3 weeks to build a solid foundation in statistics.
Handling Missing Values: Imputing missing values or removing rows with missing
data.
Detecting and Treating Outliers: Identifying and addressing extreme values that can
skew your analysis.
Data Transformation: Converting data types, scaling variables, and creating new
features.
2. Univariate Analysis
Exploring individual variables using:
3. Bivariate Analysis
Examining relationships between pairs of variables using:
4. Multivariate Analysis
Investigating interactions among multiple variables using techniques like:
Time allocation:
Allocate 3 weeks to develop proficiency in EDA.
Building
Foundational
Skills
Data Visualization
Data visualization is crucial for communicating insights and findings to both technical and
non-technical audiences. You will need to:
Bar charts and line graphs are suitable for comparing categories or showing trends over
time.
Scatter plots are useful for visualizing relationships between two variables.
Heatmaps are effective for displaying patterns in large datasets.
Tell a story with your data: Use visualizations to support a narrative, highlighting key
insights and trends.
Resources
Python Libraries: Matplotlib and Seaborn offer powerful tools for creating static and
interactive visualizations.
Data Visualisation Tools: Tableau, Power BI, and even Excel can be used for creating
interactive dashboards and reports.
Time allocation:
Allocate 3 weeks to develop your data visualisation skills, focusing on both technical skills
and design principles
Python is the dominant language in data science due to its versatility, simplicity,
and extensive libraries.
1. Core Python
2. Essential Libraries
3. Advanced Techniques
Resources
Tutorials: The official Python documentation and Real Python offer comprehensive
resources.
Online Courses: Coursera's "Python for Everybody" and Udemy's "Complete Python
Bootcamp" are popular choices.
Time allocation:
Allocate approximately 3 weeks to gain proficiency in the basics of Python and an
additional 7 weeks to enhance your skills while working on projects.
Acquiring Essential
Coding Skills
SQL
SQL (Structured Query Language) is
essential for interacting with databases,
enabling you to extract, manipulate, and
analyse data.
SQL (Structured Query Language) is essential for interacting with databases, enabling
you to extract, manipulate, and analyse data.
1. Fundamental Concepts
2. Advanced Techniques
Resources
Tutorials: W3Schools SQL Tutorial and Mode Analytics SQL Tutorial offer practical
lessons.
Online Courses: Coursera's "SQL for Data Science" and DataCamp's "Introduction to SQL"
are recommended.
Time allocation:
Allocate approximately 3 weeks to achieve job readiness in SQL.
Machine Learning
The Heart of Data Science
1. Supervised Learning
Training models on labelled data, where the outcome is known. Key techniques:
2. Unsupervised Learning
Exploring unlabeled data to discover patterns. Key techniques:
3. Deep Learning
Using artificial neural networks to analyse complex data and solve challenging tasks. Key
concepts:
4. AI Feedback Loops
Incorporating feedback mechanisms to enable models to learn from their output.
Examples:
Resources
Books: "Hands-On Machine Learning with Scikit-Learn, Keras & TensorFlow" provides a
practical guide.
YouTube Channels: StatQuest with Josh Starmer and 3Blue1Brown offer intuitive
explanations of machine learning concepts.
Time allocation:
Allocate 3 weeks for learning fundamental concepts in machine learning and an additional
7 weeks for delving deeper into different areas and building portfolio projects.
Jupyter Notebook: An interactive environment for running code, visualizing data, and
creating documentation.
Google Colab: A cloud-based platform similar to Jupyter Notebook, offering free access to
GPUs for deep learning.
Integrated Development Environments (IDEs): Specialized software for code editing,
debugging, and project management. Popular IDEs include:
PyCharm
VS Code
Spyder
Git: A distributed version control system for tracking changes in code and collaborating
with others.
GitHub: A popular platform for hosting Git repositories and collaborating on projects
Key Concepts:
Branching
Merging
Pull Requests
Amazon Web Services (AWS): A comprehensive cloud platform offering various services
for storage, computing, and machine learning.
Microsoft Azure: A cloud platform with services for data storage, analytics, and AI.
Google Cloud Platform (GCP): A suite of cloud computing services for data storage,
analysis, and machine learning.
Time allocation:
You can incorporate these tools into your workflow as you progress through the other
steps, allocating dedicated time as needed.
A well-crafted portfolio showcases your skills and practical experience, making you a more
competitive candidate.
1. Project Selection
Choose projects that cover diverse aspects of data science, demonstrating a broad skillset
Data Cleaning and Preprocessing: Showcase your ability to handle real-world data
with missing values, inconsistencies, and outliers.
Exploratory Data Analysis (EDA): Demonstrate your skills in data exploration,
visualization, and identifying patterns.
Machine Learning Model Building: Include projects involving various algorithms and
techniques, such as classification, regression, clustering, and deep learning.
Domain-Specific Projects: If you have a specific industry in mind, build projects that
address problems in that domain. For example, you could create a project on customer
churn prediction for the telecom industry or a fraud detection model for the financial
sector.
2. Project Documentation
Clearly document each project, making it easy for others to understand your work
Introduction: Describe the problem you addressed and the project's objectives.
Data: Explain the datasets used, their source, and any preprocessing steps taken.
Methodology: Detail the techniques and algorithms used for analysis and model
building.
Results: Present your findings using clear visualizations and insightful interpretations.
Conclusion: Summarize your key takeaways and any limitations of your approach.
Code: Include well-commented code snippets to demonstrate your coding skills.
GitHub: A popular platform for sharing code and collaborating on projects. Create a
well-organized repository for each project, including your code, documentation, and
any supporting files.
Personal Website: Build a website to showcase your projects and skills professionally.
Include a portfolio section with links to your GitHub repositories or detailed project
descriptions.
Data Preprocessing
Statistical Analysis
Machine Learning
Data Visualization
Tools and Technologies Used (Python, R, SQL, TensorFlow, Tableau)
Resources
Books: "Data Science Projects with Python" by Stephen Klosterman and "Python for Data
Analysis" by Wes McKinney offer guidance on building practical projects.
Portfolio Examples: Explore portfolios of experienced data scientists for inspiration and
ideas.
Time allocation:
Allocate approximately 7 weeks to develop 7 portfolio projects, working on documentation
and website creation alongside your project development.