Lists (4)
Sort Name ascending (A-Z)
- All languages
- Assembly
- Batchfile
- C
- C#
- C++
- CSS
- Clojure
- Common Lisp
- Cuda
- D2
- Emacs Lisp
- Fortran
- Go
- HTML
- Haskell
- Java
- JavaScript
- Julia
- Jupyter Notebook
- Kotlin
- LLVM
- MLIR
- Makefile
- Markdown
- Mojo
- Objective-C++
- OpenEdge ABL
- Perl
- PostScript
- PureBasic
- Python
- Racket
- Ruby
- Rust
- Sass
- Scala
- Scheme
- Shell
- TeX
- TypeScript
- V
- Zig
Starred repositories
Since the emergence of chatGPT in 2022, the acceleration of Large Language Model has become increasingly important. Here is a list of papers on accelerating LLMs, currently focusing mainly on infer…
Collective communications library with various primitives for multi-machine training.
Repository containing short articles about several computer science and software development topics, with the goal of providing quick-to-read insights and link resources to get a deeper knowledge a…
Collect excellent books and websites related to computer science and technology
VPTQ, A Flexible and Extreme low-bit quantization algorithm
Official PyTorch implementation of FlatQuant: Flatness Matters for LLM Quantization
aashaka / vllm-oss
Forked from vllm-project/vllmA high-throughput and memory-efficient inference and serving engine for LLMs
KuntaiDu / vllm
Forked from vllm-project/vllmA high-throughput and memory-efficient inference and serving engine for LLMs
Examples for using ONNX Runtime for machine learning inferencing.
Generative AI extensions for onnxruntime
Official PyTorch Implementation of "Scalable Diffusion Models with Transformers"
Replace OpenAI GPT with another LLM in your app by changing a single line of code. Xinference gives you the freedom to use any LLM you need. With Xinference, you're empowered to run inference with …
Dynamic Memory Management for Serving LLMs without PagedAttention
PyTensor allows you to define, optimize, and efficiently evaluate mathematical expressions involving multi-dimensional arrays.
Theano was a Python library that allows you to define, optimize, and evaluate mathematical expressions involving multi-dimensional arrays efficiently. It is being continued as PyTensor: www.github.…
A code generator for array-based code on CPUs and GPUs
Transformers-compatible library for applying various compression algorithms to LLMs for optimized deployment with vLLM
Emscripten: An LLVM-to-WebAssembly Compiler
Curated list of awesome material on optimization techniques to make artificial intelligence faster and more efficient 🚀
Inference Model Manager for Kubernetes
This repository compiles prescriptive guidance and code samples demonstrating how to operationalize AlphaFold batch inference using Vertex AI Pipelines.
Stable Diffusion web UI
OneDiff: An out-of-the-box acceleration library for diffusion models.
Graph Partitioning for Large-scale Graph Datasets