Skip to content

mindspore-lab/mindone

MindSpore ONE

This repository contains SoTA algorithms, models, and interesting projects in the area of multimodal understanding and content generation.

ONE is short for "ONE for all"

News

  • [2025.12.24] We release v0.5.0, compatibility with πŸ€— Transformers v4.57.1 (70+ new models) and πŸ€— Diffusers v0.35.2, plus previews of v0.36 pipelines like Flux2, QwenImageEditPlus, Lucy and Kandinsky5. Also introduces initial ComfyUI integration. Happy exploring!
  • [2025.11.02] v0.4.0 is released, with 280+ transformers models and 70+ diffusers pipelines supported. See here
  • [2025.04.10] We release v0.3.0. More than 15 SoTA generative models are added, including Flux, CogView4, OpenSora2.0, Movie Gen 30B, CogVideoX 5B~30B. Have fun!
  • [2025.02.21] We support DeepSeek Janus-Pro, a SoTA multimodal understanding and generation model. See here
  • [2024.11.06] v0.2.0 is released

Quick tour

To install v0.5.0, please install MindSpore 2.6.0 - 2.7.1 and run pip install mindone

Alternatively, to install the latest version from the master branch, please run:

git clone https://github.com/mindspore-lab/mindone.git
cd mindone
pip install -e .

We support state-of-the-art diffusion models for generating images, audio, and video. Let's get started using Stable Diffusion 3 as an example.

Hello MindSpore from Stable Diffusion 3!

sd3
import mindspore
from mindone.diffusers import StableDiffusion3Pipeline

pipe = StableDiffusion3Pipeline.from_pretrained(
    "stabilityai/stable-diffusion-3-medium-diffusers",
    mindspore_dtype=mindspore.float16,
)
prompt = "A cat holding a sign that says 'Hello MindSpore'"
image = pipe(prompt)[0][0]
image.save("sd3.png")

run hf diffusers on mindspore

  • mindone diffusers is under active development, most tasks were tested with MindSpore 2.6.0-2.7.1 on Ascend Atlas 800T A2 machines
  • compatible with πŸ€— diffusers v0.35.2, preview supports for SoTA v0.36 pipelines, see support list
  • 18+ training examples - controlnet, dreambooth, lora and more

run hf transformers on mindspore

  • mindone transformers is under active development, most tasks were tested with mindspore 2.6.0-2.7.1 on Ascend Atlas 800T A2 machines
  • compatibale with πŸ€— transformers v4.57.1
  • providing 350+ state-of-the-art machine learning models in text, computer vision, audio, video, and multimodal model for inference, see support list

supported models under mindone/examples

task model inference finetune pretrain institute
Text/Image-to-Video wan2.1 πŸ”₯ βœ… βœ–οΈ βœ–οΈ Alibaba
Text/Image-to-Video wan2.2 πŸ”₯πŸ”₯ βœ… βœ… βœ–οΈ Alibaba
Audio/Image-Text-to-Text qwen2_5_omni πŸ”₯πŸ”₯ βœ… βœ… βœ–οΈ Alibaba
Image/Video-Text-to-Text qwen2_5_vl πŸ”₯πŸ”₯ βœ… βœ… βœ–οΈ Alibaba
Any-to-Any qwen3_omni_moe πŸ”₯πŸ”₯πŸ”₯ βœ… βœ–οΈ βœ–οΈ Alibaba
Image-Text-to-Text qwen3_vl/qwen3_vl_moe πŸ”₯πŸ”₯πŸ”₯ βœ… βœ–οΈ βœ–οΈ Alibaba
Text-to-Image qwen_image πŸ”₯πŸ”₯πŸ”₯ βœ… βœ… βœ–οΈ Alibaba
Text-to-Text minicpm πŸ”₯πŸ”₯ βœ… βœ–οΈ βœ–οΈ OpenBMB
Any-to-Any janus βœ… βœ… βœ… DeepSeek
Any-to-Any emu3 βœ… βœ… βœ… BAAI
Class-to-Image var βœ… βœ… βœ… ByteDance
Text-to-Image omnigen2 πŸ”₯ βœ… βœ… βœ–οΈ VectorSpaceLab
Text/Image-to-Video hpcai open sora 1.2/2.0 βœ… βœ… βœ… HPC-AI Tech
Text/Image-to-Video cogvideox 1.5 5B~30B βœ… βœ… βœ… Zhipu
Image/Text-to-Text glm4v πŸ”₯ βœ… βœ–οΈ βœ–οΈ Zhipu
Text-to-Video open sora plan 1.3 βœ… βœ… βœ… PKU
Text-to-Video hunyuanvideo βœ… βœ… βœ… Tencent
Image-to-Video hunyuanvideo-i2v πŸ”₯ βœ… βœ–οΈ βœ–οΈ Tencent
Text-to-Video movie gen 30B βœ… βœ… βœ… Meta
Segmentation lang_sam πŸ”₯ βœ… βœ–οΈ βœ–οΈ Meta
Segmentation sam2 βœ… βœ–οΈ βœ–οΈ Meta
Text-to-Video step_video_t2v βœ… βœ–οΈ βœ–οΈ StepFun
Text-to-Speech sparktts βœ… βœ–οΈ βœ–οΈ Spark Audio
Text-to-Image flux βœ… βœ… βœ–οΈ Black Forest Lab
Text-to-Image stable diffusion 3 βœ… βœ… βœ–οΈ Stability AI

supported captioner

task model inference finetune pretrain features
Image-Text-to-Text pllava βœ… βœ–οΈ βœ–οΈ support video and image captioning

training-free acceleration

Introduce dit infer acceleration - DiTCache, PromptGate and FBCache with Taylorseer, tested on sd3 and flux.1.

About

one for all, Optimal generator with No Exception

Resources

License

Code of conduct

Contributing

Stars

Watchers

Forks

Packages

No packages published

Contributors 58

Languages