0% found this document useful (0 votes)
124 views3 pages

Important Links in Data Science

The document provides a summary of important links to understand the workflow in data science projects. It begins by noting that many people have algorithm knowledge but lack understanding of data science workflows. It then lists links on how to do a data science project from scratch, how machine learning requires iteration, and a scikit-learn user guide to learn machine learning in Python. Further links are provided on mastering data preparation, including exploratory data analysis, handling missing data, outliers removal, imbalanced data, data transformations, categorical variables, feature engineering, and how not to misuse principal component analysis.

Uploaded by

Rishabh Agrawal
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
124 views3 pages

Important Links in Data Science

The document provides a summary of important links to understand the workflow in data science projects. It begins by noting that many people have algorithm knowledge but lack understanding of data science workflows. It then lists links on how to do a data science project from scratch, how machine learning requires iteration, and a scikit-learn user guide to learn machine learning in Python. Further links are provided on mastering data preparation, including exploratory data analysis, handling missing data, outliers removal, imbalanced data, data transformations, categorical variables, feature engineering, and how not to misuse principal component analysis.

Uploaded by

Rishabh Agrawal
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 3

Important Links in Data Science

Often, I have seen that many people have a knowledge about various algorithms in Machine Learning
and have gone through carious courses from the Internet, but they often lack the basic understanding of
workflow in Data Science projects. And it is completely fine. This comes from practice. Based on this
practice, I have come up with various articles that have really helped me to understand Data Science in
depth. Hope it helps you!

Rishabh Agrawal

Some Essentials
1. How to do a Data Science Project from Scratch:
a. https://www.freecodecamp.org/news/how-to-build-a-data-science-project-from-
scratch-dc4f096a62a1/

2. MACHINE LEARNING IS COMPLETE ITERATION:


a. https://elitedatascience.com/machine-learning-iteration
b. https://blog.insightdatascience.com/how-to-deliver-on-machine-learning-projects-
c8d82ce642b0

3. Additional: To completely learn Machine Learning in Python, I also suggest you to go through
this User Guide completely. You don’t need any other course to actually learn scikit. I’ll be
referring to many articles from this User Guide below too!
a. https://scikit-learn.org/stable/user_guide.html

Complete Data Preparation

Mastering Data Preparation in Python: Read this 2 page blog and the related links to learn the exact
basic workflow involved in any Data Science Project.

a. https://www.kdnuggets.com/2019/06/7-steps-mastering-data-preparation-python.html
b. https://www.kdnuggets.com/2017/06/7-steps-mastering-data-preparation-
python.html/2

Based on the above 2 page blog, below are the additional links related for every step involved:

1. Exploratory Data Analysis:


a. https://www.analyticsvidhya.com/blog/2016/01/guide-data-exploration/
b. https://www.kaggle.com/ekami66/detailed-exploratory-data-analysis-with-python (this
article is based on the ProbStats course from Stanford. You’ll know how to actually use
Statistics in real life. ProbStats Course link:
https://lagunita.stanford.edu/login?next=/courses/course-
v1%3AOLI%2BProbStat%2BOpen_Jan2017/course/)
c. http://seaborn.pydata.org/tutorial/distributions.html

2. Working with Missing Data


a. http://pandas.pydata.org/pandas-docs/stable/missing_data.html
b. https://towardsdatascience.com/how-to-handle-missing-data-8646b18db0d4
c. https://clevertap.com/blog/how-to-treat-missing-values-in-your-data-part-i/
d. https://clevertap.com/blog/how-to-treat-missing-values-in-your-data-part-ii

3. Outliers Removal
a. https://www.kdnuggets.com/2017/06/7-steps-mastering-data-preparation-
python.html/2 (go through all the links given in the Outliers section of this article)
b. https://www.analyticsvidhya.com/blog/2016/01/guide-data-exploration/#three
c. https://scikit-learn.org/stable/modules/outlier_detection.html

4. Imbalanced Data Learning


a. https://www.kdnuggets.com/2016/08/learning-from-imbalanced-classes.html
b. https://www.kdnuggets.com/2017/06/7-techniques-handle-imbalanced-data.html
c. https://www.analyticsvidhya.com/blog/2017/03/imbalanced-classification-problem/
d. https://machinelearningmastery.com/tactics-to-combat-imbalanced-classes-in-your-
machine-learning-dataset/
e. https://www.analyticsvidhya.com/blog/2016/09/this-machine-learning-project-on-
imbalanced-data-can-add-value-to-your-resume/

5. Data Transformations: The best one (in the scikit User Guide itself)
a. http://scikit-learn.org/stable/modules/preprocessing.html
6. Dealing with Categorical Variables:
a. https://www.analyticsvidhya.com/blog/2015/11/easy-methods-deal-categorical-
variables-predictive-modeling/
b. https://towardsdatascience.com/understanding-feature-engineering-part-2-categorical-
data-f54324193e63

7. Feature Engineering:
a. https://www.freecodecamp.org/news/how-to-build-a-data-science-project-from-
scratch-dc4f096a62a1/
b. https://www.analyticsvidhya.com/blog/2016/12/introduction-to-feature-selection-
methods-with-an-example-or-how-to-select-the-right-variables/

8. How not to use PCA?


a. https://medium.com/data-design/how-to-not-be-dumb-at-applying-principal-
component-analysis-pca-6c14de5b3c9d

You might also like