This repo contains the code for 1D tokenizer and generator
A framework to enable multimodal models to operate a computer
Witness the aha moment of VLM with less than $3
Reference PyTorch implementation and models for DINOv3
Official implementation of Watermark Anything with Localized Messages
LTX-Video Support for ComfyUI
The most powerful Android RPA agent framework
Generating Immersive, Explorable, and Interactive 3D Worlds
CogView4, CogView3-Plus and CogView3(ECCV 2024)
Static Analyzer for Solidity
Unified Multimodal Understanding and Generation Models
The library to build & auto-optimize LLM applications
PDF to Markdown with vision models
Taming Stable Diffusion for Lip Sync
SAPIEN Manipulation Skill Framework
State-of-the-art Image & Video CLIP, Multimodal Large Language Models
GLM-4.6V/4.5V/4.1V-Thinking, towards versatile multimodal reasoning
Python inference and LoRA trainer package for the LTX-2 audio–video
Virtual AI anchor that combines state-of-the-art technology
VGGSfM: Visual Geometry Grounded Deep Structure From Motion
Powerful framework for controlling Android and iOS devices
PaddlePaddle End-to-End Development Toolkit
Modular quant framework
Gemma open-weight LLM library, from Google DeepMind
ICLR2024 Spotlight: curation/training code, metadata, distribution