Adding support for HuggingFace vision Transformers

Hi,

Thanks for this great work!

In 🤗 Transformers, we support the [Vision Transformer (ViT)](https://huggingface.co/docs/transformers/model_doc/vit) - among many other models like [MAE](https://huggingface.co/docs/transformers/model_doc/vit_mae), [BEiT](https://huggingface.co/docs/transformers/model_doc/beit), ConvNeXt, Swin Transformer, Swin Transformer v2, etc. Recent additions also include Transformer-based video models, like [VideoMAE](https://huggingface.co/docs/transformers/model_doc/videomae) and [X-CLIP](https://huggingface.co/docs/transformers/model_doc/xclip).

As can be seen on the [hub](https://huggingface.co/models?other=vision), the 2 most popular ViT models have +500k and +300k downloads respectively the last month. Would be great if people can leverage this speed up in performance! An increase in throughtput would be very beneficial for people putting these algorithms in production.

As models in the Transformers library are implemented very independently (we duplicate code rather than inheriting, for the sake of readability + independence among the models), we could add ToMe as a separate model in the library.

Let me know your thoughts :) 

Best,

Niels
ML Engineer @ HuggingFace

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Adding support for HuggingFace vision Transformers #2

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Adding support for HuggingFace vision Transformers #2

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions