multimodal

Use PEFT or Full-parameter to finetune 350+ LLMs or 100+ MLLMs. (LLM: Qwen2.5, Llama3.2, GLM4, Internlm2.5, Yi1.5, Mistral, Baichuan2, DeepSeek, Gemma2, ...; MLLM: Qwen2-VL, Qwen2-Audio, Llama3.2-Vision, Llava, InternVL2, MiniCPM-V-2.6, GLM4v, Xcomposer2.5, Yi-VL, DeepSeek-VL, Phi3.5-Vision, ...)

Updated Oct 18, 2024
Python

rerun-io / rerun

Star

Visualize streams of multimodal data. Fast, easy to use, and simple to integrate. Built in Rust using egui.

visualization python rust computer-vision cpp robotics multimodal

Updated Oct 18, 2024
Rust

enricoros / big-AGI

Sponsor

Star

Generative AI suite powered by state-of-the-art models and providing advanced AI/AGI functions. It features AI personas, AGI functions, multi-model chats, text-to-image, voice, response streaming, code highlighting and execution, PDF import, presets for developers, much more. Deploy on-prem or in the cloud.

ui beam agi openai gpt mistral multimodal groq openai-api gpt-4 large-language-models stable-diffusion generative-ai chatgpt chatgpt-ui gpt-5 anthropic

Updated Oct 18, 2024
TypeScript

TIGER-AI-Lab / MEGA-Bench

Star

This repo contains the code and data for "MEGA-Bench Scaling Multimodal Evaluation to over 500 Real-World Tasks"

benchmark evaluation multimodal

Updated Oct 18, 2024
Python

Kazooki123 / LunarDB

Star

LunarDB is a cache key-value store database made in C++

database cpp nosql cache cache-storage key-value-store multimodal

Updated Oct 18, 2024
C++

PaddlePaddle / PaddleMIX

Star

Paddle Multimodal Integration and eXploration, supporting mainstream multi-modal tasks, including end-to-end large-scale multi-modal pretrain models and diffusion model toolbox. Equipped with high performance and flexibility.

Updated Oct 18, 2024
Python

NVIDIA / NeMo

Star

A scalable generative AI framework built for researchers and developers working on Large Language Models, Multimodal, and Speech AI (Automatic Speech Recognition and Text-to-Speech)

machine-translation tts speech-synthesis neural-networks deeplearning speaker-recognition asr multimodal speech-translation large-language-models speaker-diariazation generative-ai

Updated Oct 18, 2024
Python

UbiquitousLearning / mllm

Star

Fast Multimodal LLM on Mobile Devices

llama multimodal large-language-models

Updated Oct 18, 2024
C++

RobotecAI / rai

Star

RAI is a multi-vendor agent framework for robotics, utilizing Langchain and ROS 2 tools to perform complex actions, defined scenarios, free interface execution, log summaries, voice interaction and more.

ai robotics ros2 vlm multimodal embodied-artificial-intelligence embodied-agent embodied-ai o3de llm generative-ai ai-agents-framework embodied-agents robotec

Updated Oct 18, 2024
Python

bentoml / BentoML

Star

The easiest way to serve AI apps and models - Build reliable Inference APIs, LLM apps, Multi-model chains, RAG service, and much more!

python machine-learning deep-learning model-serving multimodal mlops ml-engineering ai-inference llm generative-ai llmops llm-serving model-inference-service llm-inference inference-platform

Updated Oct 18, 2024
Python

zorin-egor / ripedotnet

Star

Sample app for service ripe.net

android plugins jetpack compose multimodal build-logic ripedotnet

Updated Oct 18, 2024
Kotlin

showlab / Show-o

Star

Repository for Show-o, One Single Transformer to Unify Multimodal Understanding and Generation.

multimodal diffusion-models large-language-models

Updated Oct 18, 2024
Python

smsharma / PAPERCLIP-Hubble

Star

Semantic alignment of astronomical data with natural language using multi-modal models. (Jax) Code associated with https://arxiv.org/abs/2403.08851 (COLM 2023).

astronomy paperclip astrophysics hst clip hubble multimodal jax foundation-models