100% found this document useful (2 votes)
190 views

Data Science With Python Workflow

This document provides an overview of the Python for Data Science Automation (DS4B 101-P) course. It lists important Python data science resources and tools for data ingestion, wrangling, visualization, modeling, deployment, and more. It encourages the reader to join the course to learn Python for data science automation.

Uploaded by

Philip Raymond
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
100% found this document useful (2 votes)
190 views

Data Science With Python Workflow

This document provides an overview of the Python for Data Science Automation (DS4B 101-P) course. It lists important Python data science resources and tools for data ingestion, wrangling, visualization, modeling, deployment, and more. It encourages the reader to join the course to learn Python for data science automation.

Uploaded by

Philip Raymond
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 2

Data Science with

Python Workflow
If you want to learn Python, then join our course: Python for
Data Science Automation (DS4B 101-P).

Click the links for


Documentation

CS = Cheat Sheet
matplotlib
plotnine

seaborn plotly (CS)

Pandas

text

time series
Visualize
Pandas
categorical

(CS) missing

---

Numpy Transform

Import Tidy Communicate

Pandas
Pandas
Dash

I/O tools
Model JupyterLab
Streamlit

data structures

SQLAlchemy Papermill Flask

group by

joins & merge


FastAPI
Pycharm 

Jupyter 
reshape (pivot) Pycaret
RStudio

VSCode
Scikit-Learn
TensorFlow

Spyder
Statsmodels
Keras

Important Resources
Anaconda Distribution: https://www.anaconda.com/download/
Python Documentation: https://docs.python.org/
Python Standard Library: https://docs.python.org/3/library

Business Science University


Join the Python for Data Science Automation Course university.business-science.io

version: 2.0
Data Science with   Text Analysis & NLP Machine Learning
Special Topics Scikit-Learn - ML in Python
NLTK - Text Tokenization & Modeling H2O - Scalable & AutoML 
spaCy - NLP using Cython for Speed TPOT - TPOT Automated ML Tool 
fuzzywuzzy - Fuzzy String Matching PyCaret - PyCaret Low Code ML
Time Series Forecasting Dask ML - Scalable ML with Dask
ML Packages: XGBoost, LightGBM, CatBoost
sktime - Scikit-Learn Extension for Time Series

Recommendation
statsmodels - Time Series Analysis
GluonTS - MXNet/Gluon Deep Learning for Time
Systems Feature Engineering
Series Annoy - Approximate Nearest Neighbors Sklearn Data Transformations
LightFM - Popular recommendation algo's. sklearn-pandas - Sklearn Extension for Pandas

Time Series Features Featuretools - Automated Feature Engineering


category_encoders - Categorical Encoding

TSFresh - Time Series Feature Engineering imbalanced-learn - Resampling for Imbalanced


tslearn - Time Series Features fancyimpute - Extended imputation strategies
Pandas Time Series

Arrow - Human-Friendly Time Deep Learning


Apps & APIs TensorFlow & Keras
PyTorch
FastAPI - Web framework for building APIs in MXNet, Gluon, & GluonTS
EDA Python 
Flask - Web Development
OpenAI Gym - Reenforcement Learning
pandas-profiling, SweetViz, lux Dash & Streamlit - DS Web Frameworks 
Image & Comp Vision
OpenCV - Open Source Computer Vision
MLOps Scikit Image - Image Processing
Web Pycaret MLFlow Integration Pillow - Python Imaging Library
MLFlow - Machine Learning Lifecycle, Tracking,
beautifulsoup - Extract data from HTML Deployment
requests-html - HTML Parsing MetaFlow - Scalable AWS Jobs for Data
scrappy - Web crawling Scientists
Speed & Scale 
Cloud datatable - C++ Speed Up
Dask (CS) - Parallel Pandas & Scikit Learn
MS Office & PDF boto3 (AWS) - AWS Python SDK
Google Cloud - GCP Python SDK
RAPIDS (CS)- GPU Accelerated Pandas
PySpark - Spark Clusters
Azure - Azure Python SDK Optimus - PySpark Extension for Humans
XlsxWriter - Create Excel Workbooks
pyexcel -Read/Write Excel
xlwings - Call python from Excel
python-docx - Word Documents ETL & Automations Coming from R?
python-pptx - PowerPoint Documents Airflow - Workflow Scheduling & Monitoring
pdfminer - Text extraction from PDF Luigi - Batch Job Tool, Scheduling, Monitoring R-to-Pandas Comparison
textract - Extract text from any document Ansible - Deployment Automation siuba & plydata - dplyr/tidyr ports
PyPDF2 - Create PDF documents JobLib - Run python jobs datatable - data.table port
gspread - Google Sheets plotnine - ggplot2 port

Business Science University


Join the Python for Data Science Automation Course university.business-science.io

You might also like