Data Science with
Python Workflow
If you want to learn Python, then join our course: Python for
Data Science Automation (DS4B 101-P).
Click the links for
Documentation
CS = Cheat Sheet
matplotlib
plotnine
seaborn plotly (CS)
Pandas
text
time series
Visualize
Pandas
categorical
(CS) missing
---
Numpy Transform
Import Tidy Communicate
Pandas
Pandas
Dash
I/O tools
Model JupyterLab
Streamlit
data structures
SQLAlchemy Papermill Flask
group by
joins & merge
FastAPI
Pycharm
Jupyter
reshape (pivot) Pycaret
RStudio
VSCode
Scikit-Learn
TensorFlow
Spyder
Statsmodels
Keras
Important Resources
Anaconda Distribution: https://www.anaconda.com/download/
Python Documentation: https://docs.python.org/
Python Standard Library: https://docs.python.org/3/library
Business Science University
Join the Python for Data Science Automation Course university.business-science.io
version: 2.0
Data Science with Text Analysis & NLP Machine Learning
Special Topics Scikit-Learn - ML in Python
NLTK - Text Tokenization & Modeling H2O - Scalable & AutoML
spaCy - NLP using Cython for Speed TPOT - TPOT Automated ML Tool
fuzzywuzzy - Fuzzy String Matching PyCaret - PyCaret Low Code ML
Time Series Forecasting Dask ML - Scalable ML with Dask
ML Packages: XGBoost, LightGBM, CatBoost
sktime - Scikit-Learn Extension for Time Series
Recommendation
statsmodels - Time Series Analysis
GluonTS - MXNet/Gluon Deep Learning for Time
Systems Feature Engineering
Series Annoy - Approximate Nearest Neighbors Sklearn Data Transformations
LightFM - Popular recommendation algo's. sklearn-pandas - Sklearn Extension for Pandas
Time Series Features Featuretools - Automated Feature Engineering
category_encoders - Categorical Encoding
TSFresh - Time Series Feature Engineering imbalanced-learn - Resampling for Imbalanced
tslearn - Time Series Features fancyimpute - Extended imputation strategies
Pandas Time Series
Arrow - Human-Friendly Time Deep Learning
Apps & APIs TensorFlow & Keras
PyTorch
FastAPI - Web framework for building APIs in MXNet, Gluon, & GluonTS
EDA Python
Flask - Web Development
OpenAI Gym - Reenforcement Learning
pandas-profiling, SweetViz, lux Dash & Streamlit - DS Web Frameworks
Image & Comp Vision
OpenCV - Open Source Computer Vision
MLOps Scikit Image - Image Processing
Web Pycaret MLFlow Integration Pillow - Python Imaging Library
MLFlow - Machine Learning Lifecycle, Tracking,
beautifulsoup - Extract data from HTML Deployment
requests-html - HTML Parsing MetaFlow - Scalable AWS Jobs for Data
scrappy - Web crawling Scientists
Speed & Scale
Cloud datatable - C++ Speed Up
Dask (CS) - Parallel Pandas & Scikit Learn
MS Office & PDF boto3 (AWS) - AWS Python SDK
Google Cloud - GCP Python SDK
RAPIDS (CS)- GPU Accelerated Pandas
PySpark - Spark Clusters
Azure - Azure Python SDK Optimus - PySpark Extension for Humans
XlsxWriter - Create Excel Workbooks
pyexcel -Read/Write Excel
xlwings - Call python from Excel
python-docx - Word Documents ETL & Automations Coming from R?
python-pptx - PowerPoint Documents Airflow - Workflow Scheduling & Monitoring
pdfminer - Text extraction from PDF Luigi - Batch Job Tool, Scheduling, Monitoring R-to-Pandas Comparison
textract - Extract text from any document Ansible - Deployment Automation siuba & plydata - dplyr/tidyr ports
PyPDF2 - Create PDF documents JobLib - Run python jobs datatable - data.table port
gspread - Google Sheets plotnine - ggplot2 port
Business Science University
Join the Python for Data Science Automation Course university.business-science.io