KDnuggets The Complete Collection of Data Science Cheatsheets v3
KDnuggets The Complete Collection of Data Science Cheatsheets v3
Scikit-learn for Machine Learning Linux for Data Science Git for Data Science
1
Table of Contents
1. SQL
2. Web Scraping
4. Data Analytics
5. Business Intelligence
6. Big Data
8. Machine Learning
9. Deep Learning
11. Generative AI
2
SQL
Majority of technical interviews and assessment tests include some type of
SQL questions so, it is better to prepare for the interview using the collection
of SQL cheat sheets. These cheat sheets will also help you get better at
creating and managing databases. It will also help you understand complex
SQL queries.
• SQL Basics
• SQL Expert
• SQL Window Functions Cheat Sheet
• SQL Joins Cheat Sheet
• SQL – Data Analysis
• PostgreSQL
• SQL for the Job Interview
3
Web Scraping
Web Scraping is an essential part of data science, as it is used for gathering
data, market research, and maintaining data pipelines. Beautiful Soup is a
popular library for parsing HTML/Java scripts and converting them into
human-readable dataframe. The section consists of tools that are used to
parse scripts in Python and R.
Python
The list contains the most used Python packages from data ingestion,
manipulation, and visualization. Numpy and Pandas are the most popular
tools among the data community for performing scientific calculation and
data augmentation.
Julia
• Julia Basics
• Data Wrangling with DataFrames.jl
• Plots.jl
• MATLAB Vs. Python Vs. Julia
• Pluto.jl
• Make.jl Examples
• Julia & IJulia
7
Business Intelligence
No code applications for Business Intelligence are becoming industry
standards. These applications can help you create data analytical reports,
dashboards, and immersive visualization. These tools are helping
businesses make data-driven decisions. The most popular tools are MS
Excel, Power BI, and Tableau.
• Hadoop
• Scala
• Spark
• Hive Functions
• Spark with sparklyr
9
Data Structures & Algorithms
The most common technical interview questions are about data structures
and algorithms. If you are a software engineer or data scientist then you
must know common data structure operations, search & sorting algorithms,
and data structure types. The list was created to help you understand
complex sorting functions and algorithms.
• Supervised learning
• Statistics & Mathematics for Machine Learning
• Unsupervised learning
• Scikit-Learn: Python Machine Learning
• Scikit-Learn: Machine Learning Algorithm Selection
• Machine Learning Models
• Time Series with R
• Machine Learning Algorithms & Usecases
• Caret: Modeling and machine learning in R
• Machine Learning Modeling with R
• Deep Learning
• PyTorch
• Neural Network Architectures
• Neural Network Graphs
• Neural Network Cells
• Neural Network Type with Diagram
• Keras: Neural Networks in Python
• Deep learning with Keras in R
• TensorFlow
12
Natural Language Processing
Natural Language Processing (NLP) is used for processing and cleaning text,
audio, and image data so we can extract useful information. NLP
applications are limitless, as it is used for language translation, transcription,
conversation AI, question & answering, generative technology, classification,
name entity recognition, and many more. The collection of cheat sheets
contains bite-size information about the most famous NLP tools and
algorithms.
About Author
Abid Ali Awan (@1abidaliawan) is a certified data scientist professional who
loves building and deploying machine learning models. Currently, he is
focusing on content creation and writing technical blogs on language
models and data science. Abid holds a master’s degree in Technology
Management and a bachelor’s degree in Telecommunication Engineering.
His vision is to build an AI product using a graph neural network for students
struggling with mental illness.