Skip to content
View milinxiaobo's full-sized avatar

Block or report milinxiaobo

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Please don't include any personal information such as legal names or email addresses. Maximum 100 characters, markdown supported. This note will be visible to only you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse

Starred repositories

Showing results

Since the emergence of chatGPT in 2022, the acceleration of Large Language Model has become increasingly important. Here is a list of papers on accelerating LLMs, currently focusing mainly on infer…

172 7 Updated Nov 5, 2024

Collective communications library with various primitives for multi-machine training.

C++ 1,222 304 Updated Nov 9, 2024

Repository containing short articles about several computer science and software development topics, with the goal of providing quick-to-read insights and link resources to get a deeper knowledge a…

2 Updated Jan 20, 2019

Collect excellent books and websites related to computer science and technology

1 Updated Jun 27, 2024

VPTQ, A Flexible and Extreme low-bit quantization algorithm

Python 498 28 Updated Nov 6, 2024

Official PyTorch implementation of FlatQuant: Flatness Matters for LLM Quantization

Python 59 5 Updated Nov 7, 2024

A high-throughput and memory-efficient inference and serving engine for LLMs

Python 4 2 Updated Sep 3, 2024

A high-throughput and memory-efficient inference and serving engine for LLMs

Python 6 2 Updated Nov 10, 2024
Python 1,021 94 Updated Jan 4, 2024

Examples for using ONNX Runtime for machine learning inferencing.

C++ 1,205 336 Updated Nov 4, 2024

Generative AI extensions for onnxruntime

C++ 504 127 Updated Nov 12, 2024

Official PyTorch Implementation of "Scalable Diffusion Models with Transformers"

Python 6,313 566 Updated May 31, 2024

Replace OpenAI GPT with another LLM in your app by changing a single line of code. Xinference gives you the freedom to use any LLM you need. With Xinference, you're empowered to run inference with …

Python 5,351 433 Updated Nov 12, 2024

Dynamic Memory Management for Serving LLMs without PagedAttention

C 228 14 Updated Nov 6, 2024

PyTensor allows you to define, optimize, and efficiently evaluate mathematical expressions involving multi-dimensional arrays.

Python 360 107 Updated Nov 11, 2024

Theano was a Python library that allows you to define, optimize, and evaluate mathematical expressions involving multi-dimensional arrays efficiently. It is being continued as PyTensor: www.github.…

Python 9,901 2,486 Updated Jan 15, 2024

A code generator for array-based code on CPUs and GPUs

Python 588 73 Updated Nov 11, 2024

Transformers-compatible library for applying various compression algorithms to LLMs for optimized deployment with vLLM

Python 666 54 Updated Nov 12, 2024

Emscripten: An LLVM-to-WebAssembly Compiler

C++ 25,813 3,308 Updated Nov 12, 2024

Curated list of awesome material on optimization techniques to make artificial intelligence faster and more efficient 🚀

112 11 Updated Oct 8, 2023

Inference Model Manager for Kubernetes

Python 47 8 Updated Apr 10, 2019

This repository compiles prescriptive guidance and code samples demonstrating how to operationalize AlphaFold batch inference using Vertex AI Pipelines.

Python 64 28 Updated Oct 25, 2024

llm deploy project based onnx.

C++ 26 4 Updated Oct 9, 2024

A Toolkit to Help Optimize Onnx Model

Python 75 9 Updated Nov 11, 2024

Large Language Model Onnx Inference Framework

Python 24 2 Updated Oct 15, 2024

llm-export can export llm model to onnx.

Python 226 27 Updated Nov 5, 2024

Stable Diffusion web UI

Python 142,678 26,909 Updated Nov 6, 2024

OneDiff: An out-of-the-box acceleration library for diffusion models.

Jupyter Notebook 1,683 100 Updated Nov 8, 2024

KaHIP -- Karlsruhe HIGH Quality Partitioning.

C++ 396 97 Updated Jul 1, 2024

Graph Partitioning for Large-scale Graph Datasets

C++ 89 13 Updated Dec 14, 2021
Next