milinxiaobo

linxiaobo milinxiaobo

6 followers · 42 following

Lists (4)

Sort

Starred repositories

galeselee / Awesome_LLM_System-PaperList

Since the emergence of chatGPT in 2022, the acceleration of Large Language Model has become increasingly important. Here is a list of papers on accelerating LLMs, currently focusing mainly on infer…

172 7 Updated Nov 5, 2024

facebookincubator / gloo

Collective communications library with various primitives for multi-machine training.

C++ 1,222 304 Updated Nov 9, 2024

AlexGascon / awesome-learnings

Repository containing short articles about several computer science and software development topics, with the goal of providing quick-to-read insights and link resources to get a deeper knowledge a…

2 Updated Jan 20, 2019

dyarthur / awesomeKnowledge

Collect excellent books and websites related to computer science and technology

1 Updated Jun 27, 2024

microsoft / VPTQ

VPTQ, A Flexible and Extreme low-bit quantization algorithm

Python 498 28 Updated Nov 6, 2024

ruikangliu / FlatQuant

Official PyTorch implementation of FlatQuant: Flatness Matters for LLM Quantization

Python 59 5 Updated Nov 7, 2024

aashaka / vllm-oss

Forked from vllm-project/vllm

A high-throughput and memory-efficient inference and serving engine for LLMs

Python 4 2 Updated Sep 3, 2024

KuntaiDu / vllm

Forked from vllm-project/vllm

A high-throughput and memory-efficient inference and serving engine for LLMs

Python 6 2 Updated Nov 10, 2024

microsoft / Llama-2-Onnx

Python 1,021 94 Updated Jan 4, 2024

microsoft / onnxruntime-inference-examples

Examples for using ONNX Runtime for machine learning inferencing.

C++ 1,205 336 Updated Nov 4, 2024

microsoft / onnxruntime-genai

Generative AI extensions for onnxruntime

C++ 504 127 Updated Nov 12, 2024

facebookresearch / DiT

Official PyTorch Implementation of "Scalable Diffusion Models with Transformers"

Python 6,313 566 Updated May 31, 2024

xorbitsai / inference

Replace OpenAI GPT with another LLM in your app by changing a single line of code. Xinference gives you the freedom to use any LLM you need. With Xinference, you're empowered to run inference with …

Python 5,351 433 Updated Nov 12, 2024

microsoft / vattention

Dynamic Memory Management for Serving LLMs without PagedAttention

C 228 14 Updated Nov 6, 2024

pymc-devs / pytensor

PyTensor allows you to define, optimize, and efficiently evaluate mathematical expressions involving multi-dimensional arrays.

Python 360 107 Updated Nov 11, 2024

Theano / Theano

Theano was a Python library that allows you to define, optimize, and evaluate mathematical expressions involving multi-dimensional arrays efficiently. It is being continued as PyTensor: www.github.…

Python 9,901 2,486 Updated Jan 15, 2024

inducer / loopy

A code generator for array-based code on CPUs and GPUs

Python 588 73 Updated Nov 11, 2024

vllm-project / llm-compressor

Transformers-compatible library for applying various compression algorithms to LLMs for optimized deployment with vLLM

Python 666 54 Updated Nov 12, 2024

emscripten-core / emscripten

Emscripten: An LLVM-to-WebAssembly Compiler

C++ 25,813 3,308 Updated Nov 12, 2024

nebuly-ai / exploring-AI-optimization

Curated list of awesome material on optimization techniques to make artificial intelligence faster and more efficient 🚀

112 11 Updated Oct 8, 2023

intel / inference-model-manager

Inference Model Manager for Kubernetes

Python 47 8 Updated Apr 10, 2019

GoogleCloudPlatform / vertex-ai-alphafold-inference-pipeline

This repository compiles prescriptive guidance and code samples demonstrating how to operationalize AlphaFold batch inference using Vertex AI Pipelines.

Python 64 28 Updated Oct 25, 2024