0% found this document useful (0 votes)
13 views9 pages

Model Usage

Uploaded by

rahul wankhade
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
13 views9 pages

Model Usage

Uploaded by

rahul wankhade
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 9

https://www.nitinkapse.com/ https://nichethyself.

com/

Here’s a table listing various popular machine learning models and frameworks, along with their
primary usage in fields such as audio, vision, language processing, and more:

Model/Framework Primary Usage Domain

Whisper Speech recognition, Audio (Speech-to-Text)


transcription

CLIP Image and text alignment, zero- Vision & Language


shot learning

GPT (Generative Pre-trained Text generation, language Language Processing


Transformer) understanding

BERT Text classification, question Language Processing


answering

DALL·E Image generation from text Vision & Text


descriptions

ViT (Vision Transformer) Image classification, object Vision


detection

YOLO (You Only Look Once) Real-time object detection Vision (Computer
Vision)

VQ-VAE-2 Image generation, compression Vision

StyleGAN High-quality image generation Vision (Image


Synthesis)

Stable Diffusion Text-to-image generation, Vision & Text


artistic creation

Wav2Vec 2.0 Speech recognition, audio Audio (Speech-to-Text)


processing
https://www.nitinkapse.com/ https://nichethyself.com/

DeepSpeech Automatic speech recognition Audio

T5 (Text-to-Text Transfer Text generation, Language Processing


Transformer) summarization, translation

PaLM Text generation, understanding, Language Processing


multilingual tasks

OpenAI Codex Code generation, code Programming/Code


completion

Tacotron Speech synthesis (Text-to- Audio (Speech


Speech) Synthesis)

WavLM Speech enhancement, speech Audio


recognition

LLaMA Language generation and Language Processing


comprehension

OPT (Open Pretrained Language tasks, text Language Processing


Transformer) generation

DeepLab Image segmentation Vision (Computer


Vision)

ResNet Image classification, object Vision


detection

VGG Image classification Vision

CycleGAN Image-to-image translation Vision


(e.g., style transfer)

BART Text summarization, machine Language Processing


translation
https://www.nitinkapse.com/ https://nichethyself.com/

Swin Transformer Image classification, object Vision


detection

TransUNet Medical image segmentation Vision (Medical


Imaging)

BigGAN High-resolution image synthesis Vision

OpenAI CLIP Multi-modal learning (image Vision & Text


and text)

FastSpeech Text-to-Speech synthesis Audio (Speech


Synthesis)

Reformer Efficient Transformer for long Language Processing


text generation

SAM (Segment Anything Object segmentation in images Vision (Object


Model) Segmentation)

SEER Self-supervised image learning, Vision


classification

Key Insights:

● Audio Models: Whisper, DeepSpeech, Wav2Vec 2.0, and Tacotron are widely used for
tasks involving speech recognition, transcription, and synthesis.
● Vision Models: YOLO, ResNet, ViT, and StyleGAN dominate in object detection,
classification, and image generation tasks.
● Language Models: GPT, BERT, and T5 focus on text generation, understanding, and
summarization.
● Multi-modal Models: CLIP, DALL·E, and Stable Diffusion work across both text and
vision domains, handling tasks such as image generation from text or aligning images
and text.

These models are designed for specialized tasks, but some of them, like GPT or CLIP, have a
broader range of applications across multiple domains.
https://www.nitinkapse.com/ https://nichethyself.com/

Model/Framework Primary Usage Domain

Whisper Speech recognition, transcription Audio (Speech-to-Text)

CLIP Image and text alignment, zero- Vision & Language


shot learning

GPT (Generative Pre-trained Text generation, language Language Processing


Transformer) understanding

Claude 1 Conversational AI, safe Language Processing


language generation

Claude 2 Advanced conversational AI, text Language Processing


understanding

Databricks Dolly Fine-tuned language model for Language Processing


enterprise applications

BERT Text classification, question Language Processing


answering

DALL·E Image generation from text Vision & Text


descriptions

ViT (Vision Transformer) Image classification, object Vision


detection

YOLO (You Only Look Once) Real-time object detection Vision (Computer
Vision)

VQ-VAE-2 Image generation, compression Vision


https://www.nitinkapse.com/ https://nichethyself.com/

StyleGAN High-quality image generation Vision (Image


Synthesis)

Stable Diffusion Text-to-image generation, Vision & Text


artistic creation

Wav2Vec 2.0 Speech recognition, audio Audio (Speech-to-Text)


processing

DeepSpeech Automatic speech recognition Audio

T5 (Text-to-Text Transfer Text generation, summarization, Language Processing


Transformer) translation

PaLM Text generation, understanding, Language Processing


multilingual tasks

OpenAI Codex Code generation, code Programming/Code


completion

Tacotron Speech synthesis (Text-to- Audio (Speech


Speech) Synthesis)

WavLM Speech enhancement, speech Audio


recognition

LLaMA Language generation and Language Processing


comprehension

OPT (Open Pretrained Language tasks, text generation Language Processing


Transformer)

DeepLab Image segmentation Vision (Computer


Vision)
https://www.nitinkapse.com/ https://nichethyself.com/

ResNet Image classification, object Vision


detection

VGG Image classification Vision

CycleGAN Image-to-image translation (e.g., Vision


style transfer)

BART Text summarization, machine Language Processing


translation

Swin Transformer Image classification, object Vision


detection

TransUNet Medical image segmentation Vision (Medical


Imaging)

BigGAN High-resolution image synthesis Vision

OpenAI CLIP Multi-modal learning (image and Vision & Text


text)

FastSpeech Text-to-Speech synthesis Audio (Speech


Synthesis)

Reformer Efficient Transformer for long Language Processing


text generation

SAM (Segment Anything Object segmentation in images Vision (Object


Model) Segmentation)

SEER Self-supervised image learning, Vision


classification

Databricks Lakehouse AI AI and machine learning for Enterprise AI


enterprise data lakehouse
https://www.nitinkapse.com/ https://nichethyself.com/

Key Additions:

● Claude models, developed by Anthropic, focus on conversational AI with an emphasis


on safety and steering language generation.
● Databricks Dolly is fine-tuned for enterprise applications, leveraging Databricks' cloud
platform to provide business use cases for AI.
● Databricks Lakehouse AI offers models specifically designed for enterprise-level AI
and machine learning, integrated with the Lakehouse architecture for handling large-
scale data.

Here’s a list of models and frameworks designed for reading and extracting tabular data from
PDFs, images, or scanned documents. These models utilize a combination of OCR (Optical
Character Recognition) and deep learning techniques for parsing structured data like tables.

Model/Framework Primary Usage Domain

TabNet Interpretable deep learning model for Tabular Data


tabular data

Camelot Extracting tables from PDFs PDF/Table Extraction

pdfplumber Parsing and extracting tables and PDF/Table Extraction


text from PDFs

Tesseract OCR OCR for extracting text and simple OCR for Images & PDFs
tables from images/PDFs

PaddleOCR OCR for table and text extraction, OCR for Images & PDFs
supports multi-language

TableNet Extracting tabular data from Table Detection in


document images Images

DeepDeSRT Detecting and recognizing table Table Detection in


structures in scanned documents PDFs/Images
https://www.nitinkapse.com/ https://nichethyself.com/

DocTR (Document Text OCR for detecting and recognizing OCR & Document
Recognition) structured text like tables in Analysis
documents

Adobe PDF Extract API Extracting structured data including PDF/Table Extraction
tables from PDFs

PyMuPDF (Fitz) Extracting content (text, tables) from PDF Parsing


PDF documents

Tabula Extracting tables from PDFs into PDF/Table Extraction


CSV/Excel

Keras-OCR OCR for detecting and extracting text OCR for Images
and tables from images

LayoutLM Pre-trained model for reading and Document


extracting structured data from Understanding/OCR
scanned documents

TrOCR (Transformer OCR model based on Transformer OCR for Documents


OCR) architecture for extracting text and
tables

Amazon Textract Automated text and table extraction OCR for PDFs & Images
from documents

Google Cloud Vision OCR with table detection capabilities OCR for Images & PDFs
API for scanned images

Overview of Popular Models:

1. Camelot, Tabula, pdfplumber: Focus on extracting tables from PDFs and converting
them into structured formats like CSV or Excel.
2. Tesseract OCR, PaddleOCR: Used for general OCR tasks like reading text and simple
tables from images or scanned documents.
3. TableNet, DeepDeSRT: Specifically designed to detect and extract tabular structures
in scanned documents or images.
https://www.nitinkapse.com/ https://nichethyself.com/

4. LayoutLM: Pre-trained language model focused on document understanding, useful


for recognizing structured data like tables in scanned documents.
5. Amazon Textract, Google Cloud Vision API: Cloud-based APIs for extracting text,
tables, and forms from documents.

These tools and models provide capabilities for converting unstructured data (like tables in
PDFs or images) into structured formats, making it easier to analyze and process the data
programmatically.

Please click on the link below to register for Generative AI workshop

https://forms.gle/PrzkmvYh5yvEWUKZ6

You might also like