Skip to main content

Showing 1–50 of 50 results for author: Shu, R

Searching in archive cs. Search in all archives.
.
  1. arXiv:2407.10098  [pdf, other

    cs.OS cs.AR cs.DC cs.NI cs.PF

    Accelerator-as-a-Service in Public Clouds: An Intra-Host Traffic Management View for Performance Isolation in the Wild

    Authors: Jiechen Zhao, Ran Shu, Katie Lim, Zewen Fan, Thomas Anderson, Mingyu Gao, Natalie Enright Jerger

    Abstract: I/O devices in public clouds have integrated increasing numbers of hardware accelerators, e.g., AWS Nitro, Azure FPGA and Nvidia BlueField. However, such specialized compute (1) is not explicitly accessible to cloud users with performance guarantee, (2) cannot be leveraged simultaneously by both providers and users, unlike general-purpose compute (e.g., CPUs). Through ten observations, we present… ▽ More

    Submitted 14 July, 2024; originally announced July 2024.

  2. arXiv:2404.13206  [pdf, other

    cs.RO

    Wheelchair Maneuvering with a Single-Spherical-Wheeled Balancing Mobile Manipulator

    Authors: Cunxi Dai, Xiaohan Liu, Roberto Shu, Ralph Hollis

    Abstract: In this work, we present a control framework to effectively maneuver wheelchairs with a dynamically stable mobile manipulator. Wheelchairs are a type of nonholonomic cart system, maneuvering such systems with mobile manipulators (MM) is challenging mostly due to the following reasons: 1) These systems feature nonholonomic constraints and considerably varying inertial parameters that require online… ▽ More

    Submitted 19 April, 2024; originally announced April 2024.

  3. arXiv:2403.18702  [pdf, other

    cs.AR

    Toward CXL-Native Memory Tiering via Device-Side Profiling

    Authors: Zhe Zhou, Yiqi Chen, Tao Zhang, Yang Wang, Ran Shu, Shuotao Xu, Peng Cheng, Lei Qu, Yongqiang Xiong, Guangyu Sun

    Abstract: The Compute Express Link (CXL) interconnect has provided the ability to integrate diverse memory types into servers via byte-addressable SerDes links. Harnessing the full potential of such heterogeneous memory systems requires efficient memory tiering. However, existing research in this domain has been constrained by low-resolution and high-overhead memory access profiling techniques. To address t… ▽ More

    Submitted 27 March, 2024; originally announced March 2024.

  4. arXiv:2402.01791  [pdf, other

    quant-ph cs.AI cs.ET cs.LG

    Variational Quantum Circuits Enhanced Generative Adversarial Network

    Authors: Runqiu Shu, Xusheng Xu, Man-Hong Yung, Wei Cui

    Abstract: Generative adversarial network (GAN) is one of the widely-adopted machine-learning frameworks for a wide range of applications such as generating high-quality images, video, and audio contents. However, training a GAN could become computationally expensive for large neural networks. In this work, we propose a hybrid quantum-classical architecture for improving GAN (denoted as QC-GAN). The performa… ▽ More

    Submitted 1 February, 2024; originally announced February 2024.

  5. arXiv:2401.12999  [pdf, other

    physics.chem-ph cs.AI cs.LG

    Quantum-Inspired Machine Learning for Molecular Docking

    Authors: Runqiu Shu, Bowen Liu, Zhaoping Xiong, Xiaopeng Cui, Yunting Li, Wei Cui, Man-Hong Yung, Nan Qiao

    Abstract: Molecular docking is an important tool for structure-based drug design, accelerating the efficiency of drug development. Complex and dynamic binding processes between proteins and small molecules require searching and sampling over a wide spatial range. Traditional docking by searching for possible binding sites and conformations is computationally complex and results poorly under blind docking. Q… ▽ More

    Submitted 21 February, 2024; v1 submitted 22 January, 2024; originally announced January 2024.

  6. arXiv:2312.11871  [pdf, other

    cs.NI cs.DC

    Meili: Enabling SmartNIC as a Service in the Cloud

    Authors: Qiang Su, Shaofeng Wu, Zhixiong Niu, Ran Shu, Peng Cheng, Yongqiang Xiong, Zaoxing Liu, Hong Xu

    Abstract: SmartNICs are touted as an attractive substrate for network application offloading, offering benefits in programmability, host resource saving, and energy efficiency. The current usage restricts offloading to local hosts and confines SmartNIC ownership to individual application teams, resulting in poor resource efficiency and scalability. This paper presents Meili, a novel system that realizes Sma… ▽ More

    Submitted 30 July, 2024; v1 submitted 19 December, 2023; originally announced December 2023.

  7. arXiv:2309.13233  [pdf, other

    cs.CL

    User Simulation with Large Language Models for Evaluating Task-Oriented Dialogue

    Authors: Sam Davidson, Salvatore Romeo, Raphael Shu, James Gung, Arshit Gupta, Saab Mansour, Yi Zhang

    Abstract: One of the major impediments to the development of new task-oriented dialogue (TOD) systems is the need for human evaluation at multiple stages and iterations of the development process. In an effort to move toward automated evaluation of TOD, we propose a novel user simulator built using recently developed large pretrained language models (LLMs). In order to increase the linguistic diversity of o… ▽ More

    Submitted 22 September, 2023; originally announced September 2023.

    Comments: 13 pages

  8. arXiv:2308.00878  [pdf, other

    cs.CL

    DiactTOD: Learning Generalizable Latent Dialogue Acts for Controllable Task-Oriented Dialogue Systems

    Authors: Qingyang Wu, James Gung, Raphael Shu, Yi Zhang

    Abstract: Dialogue act annotations are important to improve response generation quality in task-oriented dialogue systems. However, it can be challenging to use dialogue acts to control response generation in a generalizable way because different datasets and tasks may have incompatible annotations. While alternative methods that utilize latent action spaces or reinforcement learning do not require explicit… ▽ More

    Submitted 1 August, 2023; originally announced August 2023.

    Comments: SIGDial 2023

  9. arXiv:2305.14827  [pdf, other

    cs.CL

    Pre-training Intent-Aware Encoders for Zero- and Few-Shot Intent Classification

    Authors: Mujeen Sung, James Gung, Elman Mansimov, Nikolaos Pappas, Raphael Shu, Salvatore Romeo, Yi Zhang, Vittorio Castelli

    Abstract: Intent classification (IC) plays an important role in task-oriented dialogue systems. However, IC models often generalize poorly when training without sufficient annotated examples for each user intent. We propose a novel pre-training method for text encoders that uses contrastive learning with intent psuedo-labels to produce embeddings that are well-suited for IC tasks, reducing the need for manu… ▽ More

    Submitted 13 November, 2023; v1 submitted 24 May, 2023; originally announced May 2023.

    Comments: EMNLP 2023

  10. arXiv:2304.12982  [pdf, other

    cs.CL

    Intent Induction from Conversations for Task-Oriented Dialogue Track at DSTC 11

    Authors: James Gung, Raphael Shu, Emily Moeng, Wesley Rose, Salvatore Romeo, Yassine Benajiba, Arshit Gupta, Saab Mansour, Yi Zhang

    Abstract: With increasing demand for and adoption of virtual assistants, recent work has investigated ways to accelerate bot schema design through the automatic induction of intents or the induction of slots and dialogue states. However, a lack of dedicated benchmarks and standardized evaluation has made progress difficult to track and comparisons between systems difficult to make. This challenge track, hel… ▽ More

    Submitted 25 April, 2023; originally announced April 2023.

    Comments: 18 pages, 1 figure. Accepted at the DSTC 11 Workshop to be located at SIGDIAL 2023

  11. arXiv:2302.08362  [pdf, other

    cs.CL

    Conversation Style Transfer using Few-Shot Learning

    Authors: Shamik Roy, Raphael Shu, Nikolaos Pappas, Elman Mansimov, Yi Zhang, Saab Mansour, Dan Roth

    Abstract: Conventional text style transfer approaches focus on sentence-level style transfer without considering contextual information, and the style is described with attributes (e.g., formality). When applying style transfer in conversations such as task-oriented dialogues, existing approaches suffer from these limitations as context can play an important role and the style attributes are often difficult… ▽ More

    Submitted 21 September, 2023; v1 submitted 16 February, 2023; originally announced February 2023.

    Comments: IJCNLP-AACL'2023 Camera Ready Version

  12. arXiv:2212.09946  [pdf, other

    cs.CL

    Dialog2API: Task-Oriented Dialogue with API Description and Example Programs

    Authors: Raphael Shu, Elman Mansimov, Tamer Alkhouli, Nikolaos Pappas, Salvatore Romeo, Arshit Gupta, Saab Mansour, Yi Zhang, Dan Roth

    Abstract: Functionality and dialogue experience are two important factors of task-oriented dialogue systems. Conventional approaches with closed schema (e.g., conversational semantic parsing) often fail as both the functionality and dialogue experience are strongly constrained by the underlying schema. We introduce a new paradigm for task-oriented dialogue - Dialog2API - to greatly expand the functionality… ▽ More

    Submitted 19 December, 2022; originally announced December 2022.

  13. arXiv:2211.16677  [pdf, other

    cs.CV cs.AI cs.GR

    3D Neural Field Generation using Triplane Diffusion

    Authors: J. Ryan Shue, Eric Ryan Chan, Ryan Po, Zachary Ankner, Jiajun Wu, Gordon Wetzstein

    Abstract: Diffusion models have emerged as the state-of-the-art for image generation, among other tasks. Here, we present an efficient diffusion-based model for 3D-aware generation of neural fields. Our approach pre-processes training data, such as ShapeNet meshes, by converting them to continuous occupancy fields and factoring them into a set of axis-aligned triplane feature representations. Thus, our 3D t… ▽ More

    Submitted 29 November, 2022; originally announced November 2022.

    Comments: Project page: https://jryanshue.com/nfd

  14. arXiv:2208.01595  [pdf, other

    cs.SE cs.CR

    Do I really need all this work to find vulnerabilities? An empirical case study comparing vulnerability detection techniques on a Java application

    Authors: Sarah Elder, Nusrat Zahan, Rui Shu, Monica Metro, Valeri Kozarev, Tim Menzies, Laurie Williams

    Abstract: CONTEXT: Applying vulnerability detection techniques is one of many tasks using the limited resources of a software project. OBJECTIVE: The goal of this research is to assist managers and other decision-makers in making informed choices about the use of software vulnerability detection techniques through an empirical study of the efficiency and effectiveness of four techniques on a Java-based we… ▽ More

    Submitted 2 August, 2022; originally announced August 2022.

    ACM Class: D.2.5

  15. arXiv:2205.13921  [pdf, other

    cs.LG cs.AI

    Federated Semi-Supervised Learning with Prototypical Networks

    Authors: Woojung Kim, Keondo Park, Kihyuk Sohn, Raphael Shu, Hyung-Sin Kim

    Abstract: With the increasing computing power of edge devices, Federated Learning (FL) emerges to enable model training without privacy concerns. The majority of existing studies assume the data are fully labeled on the client side. In practice, however, the amount of labeled data is often limited. Recently, federated semi-supervised learning (FSSL) is explored as a way to effectively utilize unlabeled data… ▽ More

    Submitted 30 May, 2022; v1 submitted 27 May, 2022; originally announced May 2022.

  16. arXiv:2205.00665  [pdf, other

    cs.CR cs.SE

    Reducing the Cost of Training Security Classifier (via Optimized Semi-Supervised Learning)

    Authors: Rui Shu, Tianpei Xia, Huy Tu, Laurie Williams, Tim Menzies

    Abstract: Background: Most of the existing machine learning models for security tasks, such as spam detection, malware detection, or network intrusion detection, are built on supervised machine learning algorithms. In such a paradigm, models need a large amount of labeled data to learn the useful relationships between selected features and the target class. However, such labeled data can be scarce and expen… ▽ More

    Submitted 2 May, 2022; originally announced May 2022.

  17. arXiv:2203.11410  [pdf, other

    cs.CR cs.LG cs.SE

    Dazzle: Using Optimized Generative Adversarial Networks to Address Security Data Class Imbalance Issue

    Authors: Rui Shu, Tianpei Xia, Laurie Williams, Tim Menzies

    Abstract: Background: Machine learning techniques have been widely used and demonstrate promising performance in many software security tasks such as software vulnerability prediction. However, the class ratio within software vulnerability datasets is often highly imbalanced (since the percentage of observed vulnerability is usually very low). Goal: To help security practitioners address software security d… ▽ More

    Submitted 2 May, 2022; v1 submitted 21 March, 2022; originally announced March 2022.

  18. arXiv:2111.01869  [pdf, other

    cs.RO

    Towards Very Low-Cost Iterative Prototyping for Fully Printable Dexterous Soft Robotic Hands

    Authors: Dominik Bauer, Cornelia Bauer, Arjun Lakshmipathy, Roberto Shu, Nancy S. Pollard

    Abstract: The design and fabrication of soft robot hands is still a time-consuming and difficult process. Advances in rapid prototyping have accelerated the fabrication process significantly while introducing new complexities into the design process. In this work, we present an approach that utilizes novel low-cost fabrication techniques in conjunction with design tools helping soft hand designers to system… ▽ More

    Submitted 16 April, 2022; v1 submitted 2 November, 2021; originally announced November 2021.

  19. arXiv:2106.07156  [pdf, other

    cs.LG cs.AI

    Temporal Predictive Coding For Model-Based Planning In Latent Space

    Authors: Tung Nguyen, Rui Shu, Tuan Pham, Hung Bui, Stefano Ermon

    Abstract: High-dimensional observations are a major challenge in the application of model-based reinforcement learning (MBRL) to real-world environments. To handle high-dimensional sensory inputs, existing approaches use representation learning to map high-dimensional observations into a lower-dimensional latent space that is more amenable to dynamics estimation and planning. In this work, we present an inf… ▽ More

    Submitted 14 June, 2021; originally announced June 2021.

    Comments: International Conference on Machine Learning

  20. arXiv:2104.07541  [pdf, other

    cs.CL cs.LG

    Reward Optimization for Neural Machine Translation with Learned Metrics

    Authors: Raphael Shu, Kang Min Yoo, Jung-Woo Ha

    Abstract: Neural machine translation (NMT) models are conventionally trained with token-level negative log-likelihood (NLL), which does not guarantee that the generated translations will be optimized for a selected sequence-level evaluation metric. Multiple approaches are proposed to train NMT with BLEU as the reward, in order to directly improve the metric. However, it was reported that the gain in BLEU do… ▽ More

    Submitted 15 April, 2021; originally announced April 2021.

  21. arXiv:2103.05088  [pdf, other

    cs.SE cs.CR

    Structuring a Comprehensive Software Security Course Around the OWASP Application Security Verification Standard

    Authors: Sarah Elder, Nusrat Zahan, Val Kozarev, Rui Shu, Tim Menzies, Laurie Williams

    Abstract: Lack of security expertise among software practitioners is a problem with many implications. First, there is a deficit of security professionals to meet current needs. Additionally, even practitioners who do not plan to work in security may benefit from increased understanding of security. The goal of this paper is to aid software engineering educators in designing a comprehensive software securit… ▽ More

    Submitted 8 March, 2021; originally announced March 2021.

    Comments: 10 pages, 5 figures, 1 table, submitted to International Conference on Software Engineering: Joint Track on Software Engineering Education and Training (ICSE-JSEET)

    ACM Class: K.3.0; D.2.0; K.6.5

  22. arXiv:2102.11495  [pdf, other

    cs.LG

    Anytime Sampling for Autoregressive Models via Ordered Autoencoding

    Authors: Yilun Xu, Yang Song, Sahaj Garg, Linyuan Gong, Rui Shu, Aditya Grover, Stefano Ermon

    Abstract: Autoregressive models are widely used for tasks such as image and audio generation. The sampling process of these models, however, does not allow interruptions and cannot adapt to real-time computational resources. This challenge impedes the deployment of powerful autoregressive models, which involve a slow sampling process that is sequential in nature and typically scales linearly with respect to… ▽ More

    Submitted 23 February, 2021; originally announced February 2021.

    Comments: Accepted by ICLR 2021

  23. arXiv:2102.02977  [pdf, other

    cs.CL

    GraphPlan: Story Generation by Planning with Event Graph

    Authors: Hong Chen, Raphael Shu, Hiroya Takamura, Hideki Nakayama

    Abstract: Story generation is a task that aims to automatically produce multiple sentences to make up a meaningful story. This task is challenging because it requires high-level understanding of semantic meaning of sentences and causality of story events. Naive sequence-to-sequence models generally fail to acquire such knowledge, as the logical correctness can hardly be guaranteed in a text generation model… ▽ More

    Submitted 4 February, 2021; originally announced February 2021.

  24. arXiv:2011.12720  [pdf, other

    cs.CR cs.LG

    Omni: Automated Ensemble with Unexpected Models against Adversarial Evasion Attack

    Authors: Rui Shu, Tianpei Xia, Laurie Williams, Tim Menzies

    Abstract: Background: Machine learning-based security detection models have become prevalent in modern malware and intrusion detection systems. However, previous studies show that such models are susceptible to adversarial evasion attacks. In this type of attack, inputs (i.e., adversarial examples) are specially crafted by intelligent malicious adversaries, with the aim of being misclassified by existing st… ▽ More

    Submitted 12 October, 2021; v1 submitted 23 November, 2020; originally announced November 2020.

    Comments: Submitted to EMSE

  25. arXiv:2009.07177  [pdf, other

    cs.CL

    Iterative Refinement in the Continuous Space for Non-Autoregressive Neural Machine Translation

    Authors: Jason Lee, Raphael Shu, Kyunghyun Cho

    Abstract: We propose an efficient inference procedure for non-autoregressive machine translation that iteratively refines translation purely in the continuous space. Given a continuous latent variable model for machine translation (Shu et al., 2020), we train an inference network to approximate the gradient of the marginal log probability of the target sentence, using only the latent variable as input. This… ▽ More

    Submitted 15 September, 2020; originally announced September 2020.

    Comments: Accepted to EMNLP 2020

  26. arXiv:2006.07240  [pdf, other

    cs.SE

    Predicting Health Indicators for Open Source Projects (using Hyperparameter Optimization)

    Authors: Tianpei Xia, Wei Fu, Rui Shu, Rishabh Agrawal, Tim Menzies

    Abstract: Software developed on public platform is a source of data that can be used to make predictions about those projects. While the individual developing activity may be random and hard to predict, the developing behavior on project level can be predicted with good accuracy when large groups of developers work together on software projects. To demonstrate this, we use 64,181 months of data from 1,159… ▽ More

    Submitted 17 March, 2022; v1 submitted 12 June, 2020; originally announced June 2020.

    Comments: Accepted to EMSE 2022

  27. arXiv:2003.01086  [pdf, other

    cs.LG eess.SY stat.ML

    Predictive Coding for Locally-Linear Control

    Authors: Rui Shu, Tung Nguyen, Yinlam Chow, Tuan Pham, Khoat Than, Mohammad Ghavamzadeh, Stefano Ermon, Hung H. Bui

    Abstract: High-dimensional observations and unknown dynamics are major challenges when applying optimal control to many real-world decision making tasks. The Learning Controllable Embedding (LCE) framework addresses these challenges by embedding the observations into a lower dimensional latent space, estimating the latent dynamics, and then performing control directly in the latent space. To ensure the lear… ▽ More

    Submitted 2 March, 2020; originally announced March 2020.

  28. arXiv:1912.04189  [pdf, other

    cs.SE

    Sequential Model Optimization for Software Process Control

    Authors: Tianpei Xia, Rui Shu, Xipeng Shen, Tim Menzies

    Abstract: Many methods have been proposed to estimate how much effort is required to build and maintain software. Much of that research assumes a ``classic'' waterfall-based approach rather than contemporary projects (where the developing process may be more iterative than linear in nature). Also, much of that work tries to recommend a single method-- an approach that makes the dubious assumption that one m… ▽ More

    Submitted 17 February, 2020; v1 submitted 9 December, 2019; originally announced December 2019.

  29. arXiv:1911.02476  [pdf, other

    cs.SE

    How to Better Distinguish Security Bug Reports (using Dual Hyperparameter Optimization

    Authors: Rui Shu, Tianpei Xia, Jianfeng Chen, Laurie Williams, Tim Menzies

    Abstract: Background: In order that the general public is not vulnerable to hackers, security bug reports need to be handled by small groups of engineers before being widely discussed. But learning how to distinguish the security bug reports from other bug reports is challenging since they may occur rarely. Data mining methods that can find such scarce targets require extensive optimization effort. Goal:… ▽ More

    Submitted 17 March, 2021; v1 submitted 4 November, 2019; originally announced November 2019.

    Comments: arXiv admin note: substantial text overlap with arXiv:1905.06872

  30. arXiv:1910.12008  [pdf, other

    cs.LG cs.CV stat.ML

    Fair Generative Modeling via Weak Supervision

    Authors: Kristy Choi, Aditya Grover, Trisha Singh, Rui Shu, Stefano Ermon

    Abstract: Real-world datasets are often biased with respect to key demographic factors such as race and gender. Due to the latent nature of the underlying factors, detecting and mitigating bias is especially challenging for unsupervised machine learning. We present a weakly supervised algorithm for overcoming dataset bias for deep generative models. Our approach requires access to an additional small, unlab… ▽ More

    Submitted 30 June, 2020; v1 submitted 26 October, 2019; originally announced October 2019.

    Comments: First two authors contributed equally

  31. arXiv:1910.09772  [pdf, other

    cs.LG stat.ML

    Weakly Supervised Disentanglement with Guarantees

    Authors: Rui Shu, Yining Chen, Abhishek Kumar, Stefano Ermon, Ben Poole

    Abstract: Learning disentangled representations that correspond to factors of variation in real-world data is critical to interpretable and human-controllable machine learning. Recently, concerns about the viability of learning disentangled representations in a purely unsupervised manner has spurred a shift toward the incorporation of weak supervision. However, there is currently no formalism that identifie… ▽ More

    Submitted 10 April, 2020; v1 submitted 22 October, 2019; originally announced October 2019.

    Comments: ICLR 2020

  32. arXiv:1909.01506  [pdf, other

    cs.LG stat.ML

    Prediction, Consistency, Curvature: Representation Learning for Locally-Linear Control

    Authors: Nir Levine, Yinlam Chow, Rui Shu, Ang Li, Mohammad Ghavamzadeh, Hung Bui

    Abstract: Many real-world sequential decision-making problems can be formulated as optimal control with high-dimensional observations and unknown dynamics. A promising approach is to embed the high-dimensional observations into a lower-dimensional latent representation space, estimate the latent dynamics model, then utilize this model for control in the latent space. An important open question is how to lea… ▽ More

    Submitted 10 February, 2020; v1 submitted 3 September, 2019; originally announced September 2019.

  33. arXiv:1908.07181  [pdf, other

    cs.CL cs.LG

    Latent-Variable Non-Autoregressive Neural Machine Translation with Deterministic Inference Using a Delta Posterior

    Authors: Raphael Shu, Jason Lee, Hideki Nakayama, Kyunghyun Cho

    Abstract: Although neural machine translation models reached high translation quality, the autoregressive nature makes inference difficult to parallelize and leads to high translation latency. Inspired by recent refinement-based approaches, we propose LaNMT, a latent-variable non-autoregressive model with continuous latent variables and deterministic inference procedure. In contrast to existing approaches,… ▽ More

    Submitted 21 November, 2019; v1 submitted 20 August, 2019; originally announced August 2019.

    Comments: This paper was accepted to AAAI 2020, the copyright is transferred to AAAI

  34. arXiv:1905.12892  [pdf, other

    cs.LG cs.NE stat.ML

    AlignFlow: Cycle Consistent Learning from Multiple Domains via Normalizing Flows

    Authors: Aditya Grover, Christopher Chute, Rui Shu, Zhangjie Cao, Stefano Ermon

    Abstract: Given datasets from multiple domains, a key challenge is to efficiently exploit these data sources for modeling a target domain. Variants of this problem have been studied in many contexts, such as cross-domain translation and domain adaptation. We propose AlignFlow, a generative modeling framework that models each domain via a normalizing flow. The use of normalizing flows allows for a) flexibili… ▽ More

    Submitted 21 December, 2019; v1 submitted 30 May, 2019; originally announced May 2019.

    Comments: AAAI 2020

  35. arXiv:1905.06872  [pdf, other

    cs.SE

    Better Security Bug Report Classification via Hyperparameter Optimization

    Authors: Rui Shu, Tianpei Xia, Laurie Williams, Tim Menzies

    Abstract: When security bugs are detected, they should be (a)~discussed privately by security software engineers; and (b)~not mentioned to the general public until security patches are available. Software engineers usually report bugs to bug tracking system, and label them as security bug reports (SBRs) or not-security bug reports (NSBRs), while SBRs have a higher priority to be fixed before exploited by at… ▽ More

    Submitted 16 May, 2019; originally announced May 2019.

    Comments: 12 pages, 1 figure, submitted to 34th IEEE/ACM International Conference on Automated Software Engineering (ASE 2019)

  36. arXiv:1902.10294  [pdf, other

    stat.ML cs.LG

    Training Variational Autoencoders with Buffered Stochastic Variational Inference

    Authors: Rui Shu, Hung H. Bui, Jay Whang, Stefano Ermon

    Abstract: The recognition network in deep latent variable models such as variational autoencoders (VAEs) relies on amortized inference for efficient posterior approximation that can scale up to large datasets. However, this technique has also been demonstrated to select suboptimal variational parameters, often resulting in considerable additional error called the amortization gap. To close the amortization… ▽ More

    Submitted 26 February, 2019; originally announced February 2019.

    Comments: AISTATS 2019

  37. arXiv:1810.09309  [pdf, other

    cs.CL cs.LG

    Real-time Neural-based Input Method

    Authors: Jiali Yao, Raphael Shu, Xinjian Li, Katsutoshi Ohtsuki, Hideki Nakayama

    Abstract: The input method is an essential service on every mobile and desktop devices that provides text suggestions. It converts sequential keyboard inputs to the characters in its target language, which is indispensable for Japanese and Chinese users. Due to critical resource constraints and limited network bandwidth of the target devices, applying neural models to input method is not well explored. In t… ▽ More

    Submitted 19 October, 2018; originally announced October 2018.

  38. arXiv:1808.04525  [pdf, other

    cs.CL

    Discrete Structural Planning for Neural Machine Translation

    Authors: Raphael Shu, Hideki Nakayama

    Abstract: Structural planning is important for producing long sentences, which is a missing part in current language generation models. In this work, we add a planning phase in neural machine translation to control the coarse structure of output sentences. The model first generates some planner codes, then predicts real output words conditioned on them. The codes are learned to capture the coarse structure… ▽ More

    Submitted 14 August, 2018; originally announced August 2018.

  39. arXiv:1805.08913  [pdf, other

    stat.ML cs.AI cs.LG

    Amortized Inference Regularization

    Authors: Rui Shu, Hung H. Bui, Shengjia Zhao, Mykel J. Kochenderfer, Stefano Ermon

    Abstract: The variational autoencoder (VAE) is a popular model for density estimation and representation learning. Canonically, the variational principle suggests to prefer an expressive inference model so that the variational approximation is accurate. However, it is often overlooked that an overly-expressive inference model can be detrimental to the test set performance of both the amortized posterior app… ▽ More

    Submitted 9 January, 2019; v1 submitted 22 May, 2018; originally announced May 2018.

    Comments: NeurIPS 2018

  40. arXiv:1805.07894  [pdf, other

    cs.LG cs.AI cs.CR cs.CV stat.ML

    Constructing Unrestricted Adversarial Examples with Generative Models

    Authors: Yang Song, Rui Shu, Nate Kushman, Stefano Ermon

    Abstract: Adversarial examples are typically constructed by perturbing an existing data point within a small matrix norm, and current defense methods are focused on guarding against this type of attack. In this paper, we propose unrestricted adversarial examples, a new threat model where the attackers are not restricted to small norm-bounded perturbations. Different from perturbation-based attacks, we propo… ▽ More

    Submitted 2 December, 2018; v1 submitted 21 May, 2018; originally announced May 2018.

    Comments: Neural Information Processing Systems (NeurIPS 2018)

  41. arXiv:1802.08735  [pdf, other

    stat.ML cs.CV cs.LG

    A DIRT-T Approach to Unsupervised Domain Adaptation

    Authors: Rui Shu, Hung H. Bui, Hirokazu Narui, Stefano Ermon

    Abstract: Domain adaptation refers to the problem of leveraging labeled data in a source domain to learn an accurate model in a target domain where labels are scarce or unavailable. A recent approach for finding a common representation of the two domains is via domain adversarial training (Ganin & Lempitsky, 2015), which attempts to induce a feature extractor that matches the source and target feature distr… ▽ More

    Submitted 19 March, 2018; v1 submitted 23 February, 2018; originally announced February 2018.

    Comments: ICLR 2018

  42. arXiv:1711.01068  [pdf, other

    cs.CL

    Compressing Word Embeddings via Deep Compositional Code Learning

    Authors: Raphael Shu, Hideki Nakayama

    Abstract: Natural language processing (NLP) models often require a massive number of parameters for word embeddings, resulting in a large storage or memory footprint. Deploying neural NLP models to mobile devices requires compressing the word embeddings without any significant sacrifices in performance. For this purpose, we propose to construct the embeddings with few basis vectors. For each word, the compo… ▽ More

    Submitted 17 November, 2017; v1 submitted 3 November, 2017; originally announced November 2017.

  43. arXiv:1710.05373  [pdf, other

    cs.LG

    Robust Locally-Linear Controllable Embedding

    Authors: Ershad Banijamali, Rui Shu, Mohammad Ghavamzadeh, Hung Bui, Ali Ghodsi

    Abstract: Embed-to-control (E2C) is a model for solving high-dimensional optimal control problems by combining variational auto-encoders with locally-optimal controllers. However, the E2C model suffers from two major drawbacks: 1) its objective function does not correspond to the likelihood of the data sequence and 2) the variational encoder used for embedding typically has large variational approximation e… ▽ More

    Submitted 21 February, 2018; v1 submitted 15 October, 2017; originally announced October 2017.

    Comments: 13 pages

  44. arXiv:1707.01830  [pdf, other

    cs.CL

    Single-Queue Decoding for Neural Machine Translation

    Authors: Raphael Shu, Hideki Nakayama

    Abstract: Neural machine translation models rely on the beam search algorithm for decoding. In practice, we found that the quality of hypotheses in the search space is negatively affected owing to the fixed beam size. To mitigate this problem, we store all hypotheses in a single priority queue and use a universal score function for hypothesis selection. The proposed algorithm is more flexible as the discard… ▽ More

    Submitted 8 July, 2017; v1 submitted 6 July, 2017; originally announced July 2017.

  45. arXiv:1704.03169  [pdf, other

    cs.CL

    Later-stage Minimum Bayes-Risk Decoding for Neural Machine Translation

    Authors: Raphael Shu, Hideki Nakayama

    Abstract: For extended periods of time, sequence generation models rely on beam search algorithm to generate output sequence. However, the correctness of beam search degrades when the a model is over-confident about a suboptimal prediction. In this paper, we propose to perform minimum Bayes-risk (MBR) decoding for some extra steps at a later stage. In order to speed up MBR decoding, we compute the Bayes ris… ▽ More

    Submitted 8 June, 2017; v1 submitted 11 April, 2017; originally announced April 2017.

  46. arXiv:1612.06043  [pdf, other

    cs.CL cs.AI

    An Empirical Study of Adequate Vision Span for Attention-Based Neural Machine Translation

    Authors: Raphael Shu, Hideki Nakayama

    Abstract: Recently, the attention mechanism plays a key role to achieve high performance for Neural Machine Translation models. However, as it computes a score function for the encoder states in all positions at each decoding step, the attention model greatly increases the computational complexity. In this paper, we investigate the adequate vision span of attention models in the context of machine translati… ▽ More

    Submitted 8 June, 2017; v1 submitted 18 December, 2016; originally announced December 2016.

  47. arXiv:1611.08568  [pdf, other

    stat.ML cs.LG

    Bottleneck Conditional Density Estimation

    Authors: Rui Shu, Hung H. Bui, Mohammad Ghavamzadeh

    Abstract: We introduce a new framework for training deep generative models for high-dimensional conditional density estimation. The Bottleneck Conditional Density Estimator (BCDE) is a variant of the conditional variational autoencoder (CVAE) that employs layer(s) of stochastic variables as the bottleneck between the input $x$ and target $y$, where both are high-dimensional. Crucially, we propose a new hybr… ▽ More

    Submitted 30 June, 2017; v1 submitted 25 November, 2016; originally announced November 2016.

  48. arXiv:1605.07732  [pdf, other

    cs.NI

    Isolating Mice and Elephant in Data Centers

    Authors: Wenxue Cheng, Fengyuan Ren, Wanchun Jiang, Kun Qian, Tong Zhang, Ran Shu

    Abstract: Data centers traffic is composed by numerous latency-sensitive "mice" flows, which is consisted of only several packets, and a few throughput-sensitive "elephant" flows, which occupy more than 80% of overall load. Generally, the short-lived "mice" flows induce transient congestion and the long-lived "elephant" flows cause persistent congestion. The network congestion is a major performance inhibit… ▽ More

    Submitted 31 May, 2016; v1 submitted 25 May, 2016; originally announced May 2016.

  49. arXiv:1604.07621  [pdf, other

    cs.NI

    Micro-burst in Data Centers: Observations, Implications, and Applications

    Authors: Danfeng Shan, Fengyuan Ren, Peng Cheng, Ran Shu

    Abstract: Micro-burst traffic is not uncommon in data centers. It can cause packet dropping, which results in serious performance degradation (e.g., Incast problem). However, current solutions that attempt to suppress micro-burst traffic are extrinsic and ad hoc, since they lack the comprehensive and essential understanding of micro-burst's root cause and dynamic behavior. On the other hand, traditional stu… ▽ More

    Submitted 26 April, 2016; originally announced April 2016.

    Comments: 14 pages, 18 figures

  50. arXiv:1405.0616  [pdf, other

    cs.CL cs.DL stat.ML

    Automated Attribution and Intertextual Analysis

    Authors: James Brofos, Ajay Kannan, Rui Shu

    Abstract: In this work, we employ quantitative methods from the realm of statistics and machine learning to develop novel methodologies for author attribution and textual analysis. In particular, we develop techniques and software suitable for applications to Classical study, and we illustrate the efficacy of our approach in several interesting open questions in the field. We apply our numerical analysis te… ▽ More

    Submitted 3 May, 2014; originally announced May 2014.

    Comments: 10 pages, 4 tables, 4 figures