Model Usage
Model Usage
com/
Here’s a table listing various popular machine learning models and frameworks, along with their
primary usage in fields such as audio, vision, language processing, and more:
YOLO (You Only Look Once) Real-time object detection Vision (Computer
Vision)
Key Insights:
● Audio Models: Whisper, DeepSpeech, Wav2Vec 2.0, and Tacotron are widely used for
tasks involving speech recognition, transcription, and synthesis.
● Vision Models: YOLO, ResNet, ViT, and StyleGAN dominate in object detection,
classification, and image generation tasks.
● Language Models: GPT, BERT, and T5 focus on text generation, understanding, and
summarization.
● Multi-modal Models: CLIP, DALL·E, and Stable Diffusion work across both text and
vision domains, handling tasks such as image generation from text or aligning images
and text.
These models are designed for specialized tasks, but some of them, like GPT or CLIP, have a
broader range of applications across multiple domains.
https://www.nitinkapse.com/ https://nichethyself.com/
YOLO (You Only Look Once) Real-time object detection Vision (Computer
Vision)
Key Additions:
Here’s a list of models and frameworks designed for reading and extracting tabular data from
PDFs, images, or scanned documents. These models utilize a combination of OCR (Optical
Character Recognition) and deep learning techniques for parsing structured data like tables.
Tesseract OCR OCR for extracting text and simple OCR for Images & PDFs
tables from images/PDFs
PaddleOCR OCR for table and text extraction, OCR for Images & PDFs
supports multi-language
DocTR (Document Text OCR for detecting and recognizing OCR & Document
Recognition) structured text like tables in Analysis
documents
Adobe PDF Extract API Extracting structured data including PDF/Table Extraction
tables from PDFs
Keras-OCR OCR for detecting and extracting text OCR for Images
and tables from images
Amazon Textract Automated text and table extraction OCR for PDFs & Images
from documents
Google Cloud Vision OCR with table detection capabilities OCR for Images & PDFs
API for scanned images
1. Camelot, Tabula, pdfplumber: Focus on extracting tables from PDFs and converting
them into structured formats like CSV or Excel.
2. Tesseract OCR, PaddleOCR: Used for general OCR tasks like reading text and simple
tables from images or scanned documents.
3. TableNet, DeepDeSRT: Specifically designed to detect and extract tabular structures
in scanned documents or images.
https://www.nitinkapse.com/ https://nichethyself.com/
These tools and models provide capabilities for converting unstructured data (like tables in
PDFs or images) into structured formats, making it easier to analyze and process the data
programmatically.
https://forms.gle/PrzkmvYh5yvEWUKZ6