0% found this document useful (0 votes)
13 views7 pages

Action PlanJournaling

The document outlines a comprehensive action plan for learning data science and machine learning, detailing various resources, courses, and tasks to be completed over several weeks. It emphasizes foundational topics such as Python programming, data manipulation with Pandas, data visualization, and machine learning algorithms, including regression and classification techniques. Additionally, it includes a section on recommendation systems, highlighting their importance and the methodologies for building them.

Uploaded by

unkown21
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
13 views7 pages

Action PlanJournaling

The document outlines a comprehensive action plan for learning data science and machine learning, detailing various resources, courses, and tasks to be completed over several weeks. It emphasizes foundational topics such as Python programming, data manipulation with Pandas, data visualization, and machine learning algorithms, including regression and classification techniques. Additionally, it includes a section on recommendation systems, highlighting their importance and the methodologies for building them.

Uploaded by

unkown21
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 7

Action Plan/Journaling

Saturday, 14 December 2024 12:49 PM

# Building the recommendation system


https://github.com/PacktPublishing/Building-Recommender-Systems-with-Machine-Learning-and-
AI/tree/master

# Updating the linked in profile that aligns with AI /ML

Learning Path -
https://learn.365datascience.com/my-learning-path/
How I'd learn ML in 2025 (if I could start over)

https://www.scaler.com/topics/course/python-for-data-science/ --- for catching on upon Probability and


Statistics topics

YT20STL1KAS

Main Topic 1

https://app.datacamp.com/learn/career-tracks/data-scientist-in-python

Main Topic 2
https://app.datacamp.com/learn/career-tracks/associate-data-scientist-in-python
https://app.datacamp.com/learn/courses/introduction-to-tensorflow-in-python ---Started [Inprogress]
https://app.datacamp.com/learn/courses/introduction-to-statistics-in-python [ TBD]
Explicit tutorial for each these topics need to be done
Pandas[https://app.datacamp.com/learn/courses/data-manipulation-with-pandas]

NumPy ---> Scientific Calculations [TBD]


TensorFlow ---> Computational Calculations [https://app.datacamp.com/learn/courses/introduction-to-
tensorflow-in-python]
Matplotlib ---> Visual representation [Done , need revision]
https://app.datacamp.com/learn/courses/introduction-to-statistics-in-python
https://app.datacamp.com/learn/courses/foundations-of-probability-in-python
--------
Machine Learning
https://app.datacamp.com/learn/skill-tracks/machine-learning-fundamentals-with-python [Basic]

List comprehension
Class Inheritance
https://app.datacamp.com/learn/courses/end-to-end-machine-learning [Advanced]
----------
Maths Important Topics
Ø How to do derivates and integrals
Ø What Vector and Matrices are and how their basic operations work
Ø The basic concepts behind probability theory
Ø Some basic rules of summation and logarithm
Main Topic 3
https://app.datacamp.com/learn/career-tracks/associate-python-developer
Subtopic
https://app.datacamp.com/learn/courses/introduction-to-python-for-developers -- Completed
https://app.datacamp.com/learn/courses/intermediate-python-for-developers ---
Started[Inprogress]

Data Science Specific Python

https://app.datacamp.com/learn/courses/intermediate-python. -- Next Python course for DS


https://app.datacamp.com/learn/courses/introduction-to-python-for-developers -- Completed
https://app.datacamp.com/learn/courses/intermediate-python-for-developers ---
Started[Inprogress]

Data Science Specific Python

https://app.datacamp.com/learn/courses/intermediate-python. -- Next Python course for DS

https://app.datacamp.com/learn/courses/intro-to-python-for-data-science -- Next Python course for DS

Python for Data Science -


https://www.coursera.org/learn/programming-for-data-science

Pandas for Data Science


https://www.coursera.org/learn/pandas-data-science

=================================================
Machine Learning in python - this is more related to the M-tech course

https://app.datacamp.com/learn/courses/supervised-learning-with-scikit-learn

================
https://www.coursera.org/learn/numpy-data-science [Data science with NumPy, Sets, and Dictionaries]
https://www.coursera.org/learn/machine-learning-with-python [ This will give basic machine learning
understanding - Machine learning in python via IBM]

Pandas
NumPy ---> Scientific Calculations
TensorFlow ---> Computational Calculations
Matplotlib ---> Visual representation

=================================================
Week 1: Foundations of Data Science
Day 1-2: Introduction to Data Science
• Learn: What is Data Science? Roles, tools, and applications.
• Resources:
○ Articles: Harvard Data Science Overview.
○ Video: "What is Data Science?" (Kaggle YouTube).
• Task: Write a one-page summary of Data Science.
Day 3-4: Learn Python Basics
• Topics: Variables, data types, loops, conditional statements.
• Resources:
○ Codecademy’s Python Course.
○ "Python for Data Science" playlist on YouTube.
• Task: Write a Python program to calculate the mean and median of a list of numbers.
Day 5-7: Data Manipulation with Pandas
• Topics: DataFrames, filtering, sorting, grouping, missing data handling.
• Resources:
○ Kaggle’s Pandas micro-course.
○ Book: "Python for Data Analysis" by Wes McKinney.
• Task: Analyze a sample dataset (e.g., Titanic dataset) using Pandas.

Week 2: Building Data Analysis Skills


Day 8-9: Data Visualization
• Topics: Creating graphs with Matplotlib and Seaborn.
• Resources:
○ YouTube: Corey Schafer’s "Matplotlib" tutorials.
○ Documentation: Seaborn library.
• Task: Visualize patterns in the Iris dataset.
Day 10-11: Probability and Statistics Basics
• Topics: Mean, variance, probability distributions, hypothesis testing.
• Resources:
○ Khan Academy: Probability & Statistics.
○ Book: "Practical Statistics for Data Scientists."
• Task: Calculate and interpret statistics for a small dataset.
Day 12-14: Exploratory Data Analysis (EDA)
• Topics: Data cleaning, feature engineering, outlier detection.
• Resources:
○ Kaggle EDA examples.
○ Article: "A Beginner’s Guide to EDA" (Towards Data Science).
• Task: Perform EDA on a new dataset from Kaggle.

Week 3: Machine Learning Basics


Day 15-16: Introduction to Machine Learning
Day 12-14: Exploratory Data Analysis (EDA)
• Topics: Data cleaning, feature engineering, outlier detection.
• Resources:
○ Kaggle EDA examples.
○ Article: "A Beginner’s Guide to EDA" (Towards Data Science).
• Task: Perform EDA on a new dataset from Kaggle.

Week 3: Machine Learning Basics


Day 15-16: Introduction to Machine Learning
• Topics: Supervised vs. unsupervised learning, key ML algorithms.
• Resources:
○ Coursera: Andrew Ng’s ML course (Week 1).
○ Video: "Machine Learning for Everyone" by Google Developers.
• Task: Define use cases for supervised and unsupervised learning.
Day 17-18: Linear Regression
• Topics: Concept, implementation in Scikit-learn.
• Resources:
○ Scikit-learn documentation.
○ Kaggle tutorials on regression.
• Task: Build a linear regression model to predict house prices.
Day 19-20: Classification with Logistic Regression
• Topics: Logistic regression, accuracy, precision, recall.
• Resources:
○ YouTube: Simplilearn’s "Logistic Regression" tutorial.
○ Dataset: Titanic (Survival Prediction).
• Task: Train a logistic regression model and evaluate its performance.
Day 21: Evaluation Metrics
• Topics: Confusion matrix, ROC curve, F1-score.
• Resources:
○ Scikit-learn metrics documentation.
○ Article: "Evaluation Metrics for Machine Learning" (Analytics Vidhya).
• Task: Evaluate the Titanic survival model using different metrics.

Week 4: Advanced Topics and Projects


Day 22-23: Clustering
• Topics: K-Means, Hierarchical Clustering.
• Resources:
○ Kaggle micro-course on unsupervised learning.
○ Video: K-Means clustering example on YouTube.
• Task: Perform clustering on the Iris dataset.
Day 24-25: Decision Trees and Random Forests
• Topics: Tree-based models, feature importance.
• Resources:
○ Scikit-learn documentation.
○ Video: Decision Trees explained by StatQuest.
• Task: Build a random forest model for a Kaggle dataset.
Day 26-27: Introduction to Deep Learning
• Topics: Basics of Neural Networks.
• Resources:
○ TensorFlow/Keras tutorials.
○ Coursera: Andrew Ng’s Deep Learning specialization (Week 1).
• Task: Build a simple neural network using Keras.
Day 28: Deployment Basics
• Topics: Save and deploy models using Flask or Streamlit.
• Resources:
○ YouTube: "Deploy a ML Model with Flask."
○ Documentation: Streamlit.
• Task: Deploy a model for public access.

Day 29-30: Build and Present a Project


• Goal: Consolidate all learning into a capstone project.
• Project Ideas:
○ Predict house prices.
○ Analyze customer churn for a business.
○ Perform sentiment analysis on tweets.
• Presentation: Prepare slides or a report summarizing your project.

Tips for Success:


1. Dedicate at least 2 hours daily for learning and coding.
2. Use GitHub to track and showcase your progress.
3. Participate in Kaggle competitions for practical experience.
4. Join Data Science communities for networking and advice.
Let me know if you’d like me to customize or add more resources to this plan!

================================
Knowledge on recommendation system -

https://app.datacamp.com/learn/courses/building-recommendation-engines-in-python
2. Use GitHub to track and showcase your progress.
3. Participate in Kaggle competitions for practical experience.
4. Join Data Science communities for networking and advice.
Let me know if you’d like me to customize or add more resources to this plan!

================================
Knowledge on recommendation system -

https://app.datacamp.com/learn/courses/building-recommendation-engines-in-python

Week 11: Recommendation Systems


As organizations are increasingly leaning towards data-driven approaches, an
understanding of recommendation systems can help not only data science experts but
also professionals in other areas such as marketing who, too, are expected to be data
literate today. Learn why recommendation systems are now everywhere and some insight
on what is required to build a suitable recommendation system by covering statistical
modeling and algorithms.

Ø Recommendations and Ranking


Recommendation System algorithms, simply put, suggest relevant items to users -
explaining the trends of their usage across a range of industries and their central role
in revenue generation.

• What does a recommendation system do?

As the name indicates, recommendation systems assist you in predicting the future
preference of any product and recommending the best-suited items to users.

In this chapter, you will understand the procedure to utilize a recommendation


system to choose the best products for users.

• So what is the recommendation prediction problem? And what data do we have?

The technique where the system predicts whether an individual or a business likes
the product (a classification problem) or the reviews or ratings by them (a regression
problem) is known as the recommendation prediction problem.

• Using population averages

Here, you will understand the procedure for using population averages.

• Using population comparisons and ranking

-----------------------------

Ø Collaborative Filtering
Collaborative filtering is an aspect of recommendation systems with which we
interact quite frequently. Upon collecting data on the preferences of multiple users,
collaborative filtering makes predictions for the choice of a particular user.

• Personalization using collaborative filtering using similar users

Here, you will understand the procedure to use collaborative filtering with the help of
similar users.

• Personalization using collaborative filtering using similar items

Here, you will understand the procedure to use collaborative filtering with the help of
similar items.

• Personalization using collaborative filtering using similar users and items

Here, you will understand the procedure to use collaborative filtering with the help of
similar users and items.

-------------------------------------------

Ø Personalized Recommendations
As suggested by the name itself, personalized recommendations work to filter out
recommendations that are personally relevant for a user, based on their browsing
trends, etc.

• Personalization using comparisons, rankings, and user items

Here, you will learn how to utilize personalization recommendations with the help of
comparisons, rankings, and user items.

• Hidden Markov Model / Neural Nets, Bipartite graph, and graphical model

The Hidden Markov Model (HMM) is a statistical Markov model in which the system
being modeled is regarded as a Markov process with hidden/unobserved states.

• Using side information


• Personalization using comparisons, rankings, and user items

Here, you will learn how to utilize personalization recommendations with the help of
comparisons, rankings, and user items.

• Hidden Markov Model / Neural Nets, Bipartite graph, and graphical model

The Hidden Markov Model (HMM) is a statistical Markov model in which the system
being modeled is regarded as a Markov process with hidden/unobserved states.

• Using side information

This chapter will familiarize you with the procedure to use side information with the
assistance of Meta-Prod2Vec.

• Building a system: Algorithmic and system challenges

This chapter will familiarize you with the procedure to make a system considering
algorithmic and system challenges.

----------------------------------------------
MIT 12 weeks DatacScience program for industry adoption

https://idss-gl.mit.edu/dsml-program-preview?
enc_e_lid=W3MRxJJU14xooTB6Qy0FM9wrOIO9Nh14XiFBxPaXCD%
2Blu8iESCXaChC7Tgbl07eUBci7QGHwHV30TV60%2B8aOa1PUyKc%2B5qjSMnGb%2FY%2Fb%
2BGuB156k85zIA1%2Fgwa8lJlDVh%2FrX1twtFmAGe4%2FAwBFY2mcXvaU2sMWA0io%3D--
pZEXEPDNdVspASCo--%2BwBolkyIkFfGhYtNCpom8w%3D%3D

Python for Data Science

Python, for Data Scientists and Machine Learning specialists, is a lingua franca owing to
the immense promise of this widely-used programming language. To strengthen your
Python foundations, this module focuses on NumPy, Pandas, and Data Visualization.

• Numpy

Numpy is a Python package for scientific computing that enables one to work with multi-
dimensional arrays and matrices.

• Pandas

Pandas is an open-source and powerful library in Python that is used to analyze and
manipulate data.

• Data Visualization

Data Visualization means dealing with the graphic representation of data, which
effectively generates insights from data by using matplotlib, seaborn, etc., libraries.

===========================

Statistics for Data Science


This week’s chapter will help you understand the role of statistics in helping organizations
make effective decisions, learn its most widely-used tools, and learn to solve business
problems using analysis, data interpretation, and experiments. It will cover the following
topics:

• Descriptive Statistics

It gives you the fundamental measures of a statistical summary of the data.

• Inferential Statistics

It will explore the areas of distributions and parameter estimation, ultimately allowing you
to make inferences from the data.

===================

Week 6: Regression and Prediction

Classical Linear and Nonlinear Regression and Extensions


Here, you will learn about linear and nonlinear regression together with their extensions,
including the crucial case of logistic regression for binary classification and causal
inference, where the goal is to understand the effects of actively manipulating a variable
as opposed to passively measuring it.

• Linear regression with one and several variables

Here, you will understand the procedure to implement linear regression with one and
several variables.

• Linear regression for prediction

This chapter will familiarize you with the procedure to implement linear regression for
predictive analysis.
Here, you will understand the procedure to implement linear regression with one and several
variables.

• Linear regression for prediction

This chapter will familiarize you with the procedure to implement linear regression for predictive
analysis.

• Linear regression for causal inference

This chapter will familiarize you with the procedure to implement linear regression for causal
inference.

• Logistic and other types of nonlinear regression

Logistic regression is a simple classification algorithm in Machine Learning that predicts the
categorical dependent variables using independent variables.

This chapter will familiarize you with all the fundamentals of Logistic Regression and other types
of nonlinear regression in Machine Learning.

-----------------------------

Modern Regression with High-Dimensional Data


In the next module of this Data Science for working professionals course, you will learn about
modern regression with high-dimensional data or finding a needle in a haystack. For large
datasets, it becomes necessary to sort out which variables are relevant for prediction and
which are not. Recent years have witnessed the development of new statistical techniques,
such as Lasso or Random Forests, that are computationally superior to large datasets and
automatically select relevant data.

• Making good predictions with high-dimensional data

This chapter will teach you the process of making good predictions with high-dimensional data.

• Avoiding overfitting by validation and cross-validation

Overfitting occurs when a model over-trains the data. In Layman's terms, suppose a model
learns the detail and noise within the training data. In that case, the training data will negatively
affect the performance of the model on new data.

This chapter will teach you the process of avoiding overfitting through validation and cross-
validation techniques.

• Regularization by Lasso, Ridge, and their modification

Here, you will understand regularization by Lasso, Ridge, and their modification.

• Regression Trees, Random Forest, Boosted Trees

Regression Trees are built using binary recursive partitioning, an iterative process that splits the
data into partitions or branches. It later splits each portion into smaller groups as the process
advances every branch.

Random Forest is a prevalent supervised Machine Learning algorithm that constitutes


numerous decision trees on the given innumerable subsets of a dataset. Later, it will calculate
the average to enhance the data set's predictive accuracy.Boosting is a meta-algorithm in
Machine Learning, which transforms robust classifiers from several weak classifiers.

Boosting can be distinguished as Gradient boosting and Adaptive (ADA) boosting.

-------------------------

The Use of Modern Regression for Causal Inference


This part will cover regression and causal inference to explain why “correlation does not imply
causation” and how we can overcome this intrinsic limitation of regression by resorting to
randomized control studies or controlling for confounding.

• Randomized Control Trials

This chapter will teach you the process of identifying and working with Randomized Control
Trials.

• Observational Studies with Confounding

Confounding is a common hazard of observational clinical research opposing randomized


experiments. Yet, it can easily pass unrecognized, although its recognition is essential for
significantly interpreting causal relationships, like evaluating treatment effects.

---------------
experiments. Yet, it can easily pass unrecognized, although its recognition is essential for
significantly interpreting causal relationships, like evaluating treatment effects.

---------------

You might also like