Papers With Code v2
Papers With Code v2
Sentence Embeddings
120. CAMEL: Communicative Agents for "Mind" Exploration of Large Scale Language Model Society
lightaime/camel • 31 Mar 2023
To address the challenges of achieving autonomous cooperation, we propose a novel communicative agent framework named role-
playing.
Language Modelling
121. SadTalker: Learning Realistic 3D Motion Coefficients for Stylized Audio-Driven Single Image Talking Face Animation
winfredy/sadtalker • • CVPR 2023
We present SadTalker, which generates 3D motion coefficients (head pose, expression) of the 3DMM from audio and implicitly
modulates a novel 3D-aware face render for talking head generation.
Talking Head Generation
122. Aggregated Contextual Transformations for High-Resolution Image Inpainting
zyddnys/manga-image-translator • • 3 Apr 2021
For improving texture synthesis, we enhance the discriminator of AOT-GAN by training it with a tailored mask-prediction task.
Ranked #6 on Image Inpainting on Places2
Image Inpainting Texture Synthesis +1
123. Recognize Anything: A Strong Image Tagging Model
xinyu1205/Recognize_Anything-Tag2Text • • 6 Jun 2023
We are releasing the RAM at \url{https://recognize-anything. github. io/} to foster the advancements of large models in computer
vision.
Semantic Parsing
124. Tree of Thoughts: Deliberate Problem Solving with Large Language Models
ysymyth/tree-of-thought-llm • 17 May 2023
Language models are increasingly being deployed for general problem solving across a wide range of tasks, but are still confined to
token-level, left-to-right decision-making processes during inference.
Decision Making Language Modelling
125. Memory Transformer
lucidrains/x-transformers • • 20 Jun 2020
Adding trainable memory to selectively store local as well as global representations of a sequence is a promising direction to
improve the Transformer model.
Language Modelling Machine Translation +4
126. One Embedder, Any Task: Instruction-Finetuned Text Embeddings
shibing624/text2vec • • 19 Dec 2022
Our analysis suggests that INSTRUCTOR is robust to changes in instructions, and that instruction finetuning mitigates the challenge
of training a single model on diverse datasets.
Information Retrieval Learning Word Embeddings +3
127. Factuality Enhanced Language Models for Open-Ended Text Generation
NVIDIA/FasterTransformer • • 9 Jun 2022
In this work, we measure and improve the factual accuracy of large-scale LMs for open-ended text generation.
Misconceptions Sentence Completion +1
128. PoissonNet: Resolution-Agnostic 3D Shape Reconstruction using Fourier Neural Operators
arsenal9971/poissonnet • • 3 Aug 2023
Furthermore, we demonstrate that the Poisson surface reconstruction problem is well-posed in the limit case by showing a
universal approximation theorem for the solution operator of the Poisson equation with distributional data utilizing the Fourier
Neural Operator, which provides a theoretical foundation for our numerical results.
3D Shape Reconstruction Super-Resolution +1
129. Learning Landmarks Motion from Speech for Speaker-Agnostic 3D Talking Heads Generation
fedenoce/s2l-s2d • • 2 Jun 2023
This presents a novel approach for generating 3D talking heads from raw audio inputs.
3D Face Animation Talking Head Generation
130. AWQ: Activation-aware Weight Quantization for LLM Compression and Acceleration
mit-han-lab/llm-awq • • 1 Jun 2023
Large language models (LLMs) have shown excellent performance on various tasks, but the astronomical model size raises the
hardware barrier for serving (memory size) and slows down token generation (memory bandwidth).
Common Sense Reasoning Language Modelling +1
131. ChatLaw: Open-Source Legal Large Language Model with Integrated External Knowledge Bases
pku-yuangroup/chatlaw • • 28 Jun 2023
Furthermore, we propose a self-attention method to enhance the ability of large models to overcome errors present in reference
data, further optimizing the issue of model hallucinations at the model level and improving the problem-solving capabilities of large
models.
Language Modelling Retrieval
132. BAA-NGP: Bundle-Adjusting Accelerated Neural Graphics Primitives
IntelLabs/baa-ngp • • 7 Jun 2023
Implicit neural representation has emerged as a powerful method for reconstructing 3D scenes from 2D images.
3D Scene Reconstruction Novel View Synthesis +2
133. Gentopia: A Collaborative Platform for Tool-Augmented LLMs
gentopia-ai/gentopia • 8 Aug 2023
We present gentopia, an ALM framework enabling flexible customization of agents through simple configurations, seamlessly
integrating various language models, task formats, prompting modules, and plugins into a unified paradigm.
134. Voyager: An Open-Ended Embodied Agent with Large Language Models
MineDojo/Voyager • 25 May 2023
We introduce Voyager, the first LLM-powered embodied lifelong learning agent in Minecraft that continuously explores the world,
acquires diverse skills, and makes novel discoveries without human intervention.
135. Magic123: One Image to High-Quality 3D Object Generation Using Both 2D and 3D Diffusion Priors
guochengqian/magic123 • • 30 Jun 2023
We present Magic123, a two-stage coarse-to-fine approach for high-quality, textured 3D meshes generation from a single unposed
image in the wild using both2D and 3D priors.
Image to 3D
136. Neural c Language Models are Zero-Shot Text to Speech Synthesizers
suno-ai/bark • • 5 Jan 2023
In addition, we find Vall-E could preserve the speaker's emotion and acoustic environment of the acoustic prompt in synthesis.
Language Modelling Speech Synthesis +1
137. One-2-3-45: Any Single Image to 3D Mesh in 45 Seconds without Per-Shape Optimization
One-2-3-45/One-2-3-45 • 29 Jun 2023
Single image 3D reconstruction is an important but challenging task that requires extensive knowledge of our natural world.
3D Reconstruction Image to 3D +2
138. Exploring Predicate Visual Context in Detecting of Human-Object Interactions
fredzzhang/pvic • • 11 Aug 2023
Recently, the DETR framework has emerged as the dominant approach for human--object interaction (HOI) research.
Ranked #1 on Human-Object Interaction Detection on HICO-DET
Human-Object Interaction Detection
139. Vocos: Closing the gap between time-domain and Fourier-based neural vors for high-quality audio synthesis
charactr-platform/vocos • • 1 Jun 2023
Recent advancements in neural vocoding are predominantly driven by Generative Adversarial Networks (GANs) operating in the
time-domain.
Inductive Bias
140. UniVTG: Towards Unified Video-Language Temporal Grounding
showlab/univtg • • 31 Jul 2023
Most methods in this direction develop taskspecific models that are trained with type-specific labels, such as moment retrieval
(time interval) and highlight detection (worthiness curve), which limits their abilities to generalize to various VTG tasks and labels.
Ranked #1 on Highlight Detection on QVHighlights (using extra training data)
Highlight Detection Moment Retrieval +2
141. OpenFlamingo: An Open-Source Framework for Training Large Autoregressive Vision-Language Models
mlfoundations/open_flamingo • • 2 Aug 2023
We introduce OpenFlamingo, a family of autoregressive vision-language models ranging from 3B to 9B parameters.
142. AltCLIP: Altering the Language Enr in CLIP for Extended Language Capabilities
automatic1111/stable-diffusion-webui • • 12 Nov 2022
In this work, we present a conceptually simple and effective method to train a strong bilingual/multilingual multimodal
representation model.
Ranked #1 on Zero-Shot Transfer Image Classification on CN-ImageNet-Sketch
Contrastive Learning Cross-Modal Retrieval +10
143. Turning Whisper into Real-Time Transcription System
ufal/whisper_streaming • • 27 Jul 2023
Whisper is one of the recent state-of-the-art multilingual speech recognition and translation models, however, it is not designed for
real time transcription.
speech-recognition Speech Recognition +1
144. DragDiffusion: Harnessing Diffusion Models for Interactive Point-based Image Editing
Yujun-Shi/DragDiffusion • • 26 Jun 2023
In this work, we extend such an editing framework to diffusion models and propose DragDiffusion.
145. SkiROS2: A skill-based Robot Control Platform for ROS
rvmi/skiros2 • • 29 Jun 2023
The need for autonomous robot systems in both the service and the industrial domain is larger than ever.
Scheduling
146. PlankAssembly: Robust 3D Reconstruction from Three Orthographic Views with Learnt Shape Programs
manycore-research/PlankAssembly • • 10 Aug 2023
In this , we develop a new method to automatically convert 2D line drawings from three orthographic views into 3D CAD models.
3D Reconstruction
147. A Survey on Evaluation of Large Language Models
mlgroupjlu/llm-eval-survey • 6 Jul 2023
Large language models (LLMs) are gaining increasing popularity in both academia and industry, owing to their unprecedented
performance in various applications.
Ethics
148. Judging LLM-as-a-judge with MT-Bench and Chatbot Arena
lm-sys/fastchat • • 9 Jun 2023
Evaluating large language model (LLM) based chat assistants is challenging due to their broad capabilities and the inadequacy of
existing benchmarks in measuring human preferences.
Chatbot Language Modelling
149. Multi-scale Multi-band DenseNets for Audio Source Separation
Anjok07/ultimatevocalremovergui • • 29 Jun 2017
This deals with the problem of audio source separation.
Audio Source Separation Music Source Separation
150. Retentive Network: A Successor to Transformer for Large Language Models
microsoft/torchscale • • 17 Jul 2023
In this work, we propose Retentive Network (RetNet) as a foundation architecture for large language models, simultaneously
achieving training parallelism, low-cost inference, and good performance.
Language Modelling
151. AgentBench: Evaluating LLMs as Agents
thudm/agentbench • 7 Aug 2023
Large Language Models (LLMs) are becoming increasingly smart and autonomous, targeting real-world pragmatic missions beyond
traditional NLP tasks.
Decision Making