Highlights
Stars
- All languages
- Arc
- Assembly
- Awk
- C
- C++
- CSS
- CoffeeScript
- Cython
- DIGITAL Command Language
- Dockerfile
- Emacs Lisp
- Go
- Groff
- HTML
- Haskell
- Java
- JavaScript
- Jinja
- Jupyter Notebook
- Kotlin
- LilyPond
- Lua
- Makefile
- Markdown
- Mathematica
- PHP
- Perl
- PigLatin
- Python
- Roff
- Ruby
- Rust
- SCSS
- Scala
- Shell
- TeX
- TypeScript
- Vala
- Vim Script
- Web Ontology Language
- XML
- XSLT
- mIRC Script
Compilation of BIOSes for various emulation platforms
Faster Whisper transcription with CTranslate2
Multi-Modal Discussion Transformer: Integrating Text, Images and Graph Transformers to Detect Hate Speech on Social Media (AAAI 2024)
Expose the contents of .docx files without leaving your terminal. Fast, safe, and smart — no Office required!
A Python library for extracting structured information from unstructured text using LLMs with precise source grounding and interactive visualization.
A simple CLI for tracking your working time.
[Mirror] Self-hosted abuse detection and rule enforcement against low-effort mass AI scraping and bots.
A scalable generative AI framework built for researchers and developers working on Large Language Models, Multimodal, and Speech AI (Automatic Speech Recognition and Text-to-Speech)
A whirlwind tour of Common Crawl's data using Python
Annif is a multi-algorithm automated subject indexing tool for libraries, archives and museums.
A web front end for an elastic search cluster
Weighs the soul of incoming HTTP requests to stop AI crawlers
[ICCV 2025] Implementation for Describe Anything: Detailed Localized Image and Video Captioning
Introduction to WebGraphs - Workshop at the IIPC Web Archiving Conference 2025
Watch (parts of) webpages and get notified when something changes via e-mail, on your phone or via other means. Highly configurable.
A list of AI agents and robots to block.
A polite and user-friendly downloader for Common Crawl data
WhisperX: Automatic Speech Recognition with Word-level Timestamps (& Diarization)
The Unofficial TikTok API Wrapper In Python
A simple module to collect video, text, and metadata from Tiktok.
A Lit web-component for viewing a Whisper JSON transcript file
A python program that turns an LLM, running on Ollama, into an automated researcher, which will with a single query determine focus areas to investigate, do websearches and scrape content from vari…