0% found this document useful (0 votes)
289 views

Data Science With Python Workflow: Click The Links For Documentation

This document provides an overview of the Python data science workflow and important Python libraries for each step of the workflow. It outlines libraries for data ingestion with Pandas I/O tools, data transformation with Pandas and Numpy, data visualization with Matplotlib and Plotly, machine learning with Scikit-Learn and TensorFlow, natural language processing with NLTK and spaCy, deep learning with TensorFlow and PyTorch, and deploying models with MLFlow. It also lists resources for time series analysis, recommendations, computer vision, and scaling workflows across CPUs and GPUs.

Uploaded by

Aditya Pisupati
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
289 views

Data Science With Python Workflow: Click The Links For Documentation

This document provides an overview of the Python data science workflow and important Python libraries for each step of the workflow. It outlines libraries for data ingestion with Pandas I/O tools, data transformation with Pandas and Numpy, data visualization with Matplotlib and Plotly, machine learning with Scikit-Learn and TensorFlow, natural language processing with NLTK and spaCy, deep learning with TensorFlow and PyTorch, and deploying models with MLFlow. It also lists resources for time series analysis, recommendations, computer vision, and scaling workflows across CPUs and GPUs.

Uploaded by

Aditya Pisupati
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 2

Data Science with

Python Workflow
If you are an R-User and want to learn Python, then join our
waitlist: R / Python Teams Course Waitlist.

Click the links for


Documentation

CS = Cheat Sheet
matplotlib plotnine
seaborn plotly (CS)

Pandas
text
time series Visualize
Pandas categorical
(CS) missing
---
Numpy Transform

Import Tidy Communicate

Pandas Pandas JupyterLab


data structures Model Dash
I/O tools
group by Streamlit
joins & merge Flask
reshape (pivot) Scikit-Learn TensorFlow
Statsmodels Keras
Jupyter | Pycharm | VSCode

Important Resources
Anaconda Distribution: https://www.anaconda.com/download/
Python Documentation: https://docs.python.org/
Python Standard Library: https://docs.python.org/3/library

Business Science University


Join the R/Python Teams Course Waitlist university.business-science.io

version: 1.0
Data Science with Text Analysis & NLP Machine Learning
Special Topics Scikit-Learn - ML in Python
NLTK - Text Tokenization & Modeling H2O - Scalable & AutoML
spaCy - NLP using Cython for Speed TPOT - TPOT Automated ML Tool
fuzzywuzzy - Fuzzy String Matching PyCaret - PyCaret Low Code ML
Dask ML - Scalable ML with Dask
Time Series Forecasting Recommendation
ML Packages
XGBoost
LightGBM
sktime - Scikit-Learn Extension for Time Series
statsmodels - Time Series Analysis
Systems CatBoost

GluonTS - MXNet/Gluon Deep Learning for Time Annoy - Approximate Nearest Neighbors
Series LightFM - Popular recommendation algo's. Feature Engineering
Featuretools - Automated Feature Engineering
Time Series Features sklearn-pandas - Sklearn Extension for Pandas
category_encoders - Categorical Encoding
TSFresh - Time Series Feature Engineering
imbalanced-learn - Resampling for Imbalanced
tslearn - Time Series Features
Pandas Time Series
Arrow - Human-Friendly Time
Apps & APIs Deep Learning
TensorFlow & Keras
FastAPI - Web framework for building APIs in pytorch
Python MXNet, Gluon, & GluonTS
Flask - Web Development OpenAI Gym - Reenforcement Learning
Dash & Streamlit - DS Web Frameworks
Web Image & Comp Vision
beautifulsoup - Extract data from HTML
requests-html - HTML Parsing MLOps OpenCV - Open Source Computer Vision
scrappy - Web crawling Scikit Image - Image Processing
MLFlow - Machine Learning Lifecycle, Tracking, Pillow - Python Imaging Library
Deployment
MetaFlow - Scalable AWS Jobs for Data
Scientists
Speed & Scale
Cloud datatable - C++ Speed Up
Dask (CS) - Parallel Pandas
MS Office & PDF boto3 (AWS) - AWS Python SDK
Google Cloud - GCP Python SDK
RAPIDS (CS)- GPU Accelerated Pandas
PySpark - Spark Clusters
Azure - Azure Python SDK Optimus - PySpark Extension for Humans
XlsxWriter - Create Excel Workbooks
pyexcel -Read/Write Excel
xlwings - Call python from Excel
python-docx - Word Documents ETL & Automations Coming from R?
python-pptx - PowerPoint Documents Airflow - Workflow Scheduling & Monitoring
pdfminer - Text extraction from PDF Luigi - Batch Job Tool, Scheduling, Monitoring R-to-Pandas Comparison
textract - Extract text from any document Ansible - Deployment Automation siuba & plydata - dplyr/tidyr ports
PyPDF2 - Create PDF documents JobLib - Run python jobs datatable - data.table port
gspread - Google Sheets plotnine - ggplot2 port

Business Science University


Join the R/Python Teams Course Waitlist university.business-science.io

You might also like