EXA Data Roadmap_ based on MIT Applied Data Science Program
EXA Data Roadmap_ based on MIT Applied Data Science Program
Stay connected for updates, industry insights, and career advice on LinkedIn and
YouTube.
Prerequisites
No prior programming or math experience is required. We'll start from the basics and
guide you through the entire journey, from understanding data to building complex
machine-learning models.
Module 1: Foundations
The first module in the program for applied Data Science begins with the foundations,
which cover Python and Statistics foundations.
Part 1: Python
● Descriptive Statistics is a method that helps you study data analysis using
multiple data sets by describing and summarizing them. For example, the data
● A Distribution is a statistical function used to report all the probable values that a
random variable takes within a certain range.
○ Classes:
■ Data Analysis with Python on Coursera
■ Associate Data Analyst in SQL by DataCamp (paid)
If you're more interested in roles like data analyst or data engineer, you might not need
to dive deep into these topics. To learn more about different data roles and their
requirements, check out my newsletter, “Demystifying Data Careers: Your Guide to Data
Analyst vs Scientist vs Data Engineer vs ML Engineer,” where I've outlined the various
career paths in data science.
Utilizing ChatGPT
ChatGPT can be a valuable tool for creating detailed project plans.
○ Debt-to-income ratio
○ Credit utilization rate
○ Payment history (e.g., number of late payments, missed
payments)
○ Employment stability
● Train the Model: Use the training data to train the selected model.
● Hyperparameter Tuning: Optimize the model's performance by
tuning hyperparameters like learning rate, number of trees, and
maximum depth.
5. Model Evaluation:
● Deploy the Model: Integrate the model into the bank's credit
decisioning system to predict default probabilities for new loan
applications.
● Coursera:
○ Machine Learning by Andrew Ng
○ Data Science Specialization by UC Irvine
● edX:
○ MicroMasters Program in Statistics and Data Science
● Kaggle: Explore datasets and notebooks related to loan default
prediction.
● YouTube:
○ StatQuest, Sentdex, 3Blue1Brown
Python Libraries:
● Identify your interests: What areas of AI and ML interest you the most? This will
help you narrow down your project topic.
● Define your project's scope: Clearly define your project's goals and objectives.
What do you want to achieve? What are the key questions you want to answer?
● Gather data: Collect the data you'll need to train and evaluate your AI or ML
model. While using preprepared datasets (such as Kaggle) is fine, it's important
to explore and analyze the data, including preprocessing and error analysis, to
fully understand the problem.
● Train and evaluate your model: Train it on your data and assess its performance
using appropriate metrics.
● Iterate and improve: If your model is not performing as well as you'd like, iterate
on your approach and make improvements.
Ultimately, the best project for you will depend on your specific interests and career
goals. Consider your previous coursework, your strengths and weaknesses, and the
areas of AI that excite you the most.
Additional Tips:
Remember, everyone learns at their own pace. Keep practicing, and you'll improve!
Good luck!