Data Science With Python Workflow: Click The Links For Documentation
Data Science With Python Workflow: Click The Links For Documentation
Python Workflow
If you are an R-User and want to learn Python, then join our
waitlist: R / Python Teams Course Waitlist.
CS = Cheat Sheet
matplotlib plotnine
seaborn plotly (CS)
Pandas
text
time series Visualize
Pandas categorical
(CS) missing
---
Numpy Transform
Important Resources
Anaconda Distribution: https://www.anaconda.com/download/
Python Documentation: https://docs.python.org/
Python Standard Library: https://docs.python.org/3/library
version: 1.0
Data Science with Text Analysis & NLP Machine Learning
Special Topics Scikit-Learn - ML in Python
NLTK - Text Tokenization & Modeling H2O - Scalable & AutoML
spaCy - NLP using Cython for Speed TPOT - TPOT Automated ML Tool
fuzzywuzzy - Fuzzy String Matching PyCaret - PyCaret Low Code ML
Dask ML - Scalable ML with Dask
Time Series Forecasting Recommendation
ML Packages
XGBoost
LightGBM
sktime - Scikit-Learn Extension for Time Series
statsmodels - Time Series Analysis
Systems CatBoost
GluonTS - MXNet/Gluon Deep Learning for Time Annoy - Approximate Nearest Neighbors
Series LightFM - Popular recommendation algo's. Feature Engineering
Featuretools - Automated Feature Engineering
Time Series Features sklearn-pandas - Sklearn Extension for Pandas
category_encoders - Categorical Encoding
TSFresh - Time Series Feature Engineering
imbalanced-learn - Resampling for Imbalanced
tslearn - Time Series Features
Pandas Time Series
Arrow - Human-Friendly Time
Apps & APIs Deep Learning
TensorFlow & Keras
FastAPI - Web framework for building APIs in pytorch
Python MXNet, Gluon, & GluonTS
Flask - Web Development OpenAI Gym - Reenforcement Learning
Dash & Streamlit - DS Web Frameworks
Web Image & Comp Vision
beautifulsoup - Extract data from HTML
requests-html - HTML Parsing MLOps OpenCV - Open Source Computer Vision
scrappy - Web crawling Scikit Image - Image Processing
MLFlow - Machine Learning Lifecycle, Tracking, Pillow - Python Imaging Library
Deployment
MetaFlow - Scalable AWS Jobs for Data
Scientists
Speed & Scale
Cloud datatable - C++ Speed Up
Dask (CS) - Parallel Pandas
MS Office & PDF boto3 (AWS) - AWS Python SDK
Google Cloud - GCP Python SDK
RAPIDS (CS)- GPU Accelerated Pandas
PySpark - Spark Clusters
Azure - Azure Python SDK Optimus - PySpark Extension for Humans
XlsxWriter - Create Excel Workbooks
pyexcel -Read/Write Excel
xlwings - Call python from Excel
python-docx - Word Documents ETL & Automations Coming from R?
python-pptx - PowerPoint Documents Airflow - Workflow Scheduling & Monitoring
pdfminer - Text extraction from PDF Luigi - Batch Job Tool, Scheduling, Monitoring R-to-Pandas Comparison
textract - Extract text from any document Ansible - Deployment Automation siuba & plydata - dplyr/tidyr ports
PyPDF2 - Create PDF documents JobLib - Run python jobs datatable - data.table port
gspread - Google Sheets plotnine - ggplot2 port