Skip to main content

Showing 1–8 of 8 results for author: Nathan, V

Searching in archive cs. Search in all archives.
.
  1. arXiv:2403.02286  [pdf, other

    cs.DB

    Stage: Query Execution Time Prediction in Amazon Redshift

    Authors: Ziniu Wu, Ryan Marcus, Zhengchun Liu, Parimarjan Negi, Vikram Nathan, Pascal Pfeil, Gaurav Saxena, Mohammad Rahman, Balakrishnan Narayanaswamy, Tim Kraska

    Abstract: Query performance (e.g., execution time) prediction is a critical component of modern DBMSes. As a pioneering cloud data warehouse, Amazon Redshift relies on an accurate execution time prediction for many downstream tasks, ranging from high-level optimizations, such as automatically creating materialized views, to low-level tasks on the critical path of query execution, such as admission, scheduli… ▽ More

    Submitted 4 March, 2024; originally announced March 2024.

    Comments: 15 pages

  2. arXiv:2312.15922  [pdf, other

    cs.CL

    Towards Probing Contact Center Large Language Models

    Authors: Varun Nathan, Ayush Kumar, Digvijay Ingle, Jithendra Vepa

    Abstract: Fine-tuning large language models (LLMs) with domain-specific instructions has emerged as an effective method to enhance their domain-specific understanding. Yet, there is limited work that examines the core characteristics acquired during this process. In this study, we benchmark the fundamental characteristics learned by contact-center (CC) specific instruction fine-tuned LLMs with out-of-the-bo… ▽ More

    Submitted 26 December, 2023; originally announced December 2023.

  3. arXiv:2012.06683  [pdf, other

    cs.DB cs.IR

    Cortex: Harnessing Correlations to Boost Query Performance

    Authors: Vikram Nathan, Jialin Ding, Tim Kraska, Mohammad Alizadeh

    Abstract: Databases employ indexes to filter out irrelevant records, which reduces scan overhead and speeds up query execution. However, this optimization is only available to queries that filter on the indexed attribute. To extend these speedups to queries on other attributes, database systems have turned to secondary and multi-dimensional indexes. Unfortunately, these approaches are restrictive: secondary… ▽ More

    Submitted 11 December, 2020; originally announced December 2020.

    Comments: 13 pages, including references. Under submission

  4. arXiv:2006.13282  [pdf, other

    cs.DB cs.LG

    Tsunami: A Learned Multi-dimensional Index for Correlated Data and Skewed Workloads

    Authors: Jialin Ding, Vikram Nathan, Mohammad Alizadeh, Tim Kraska

    Abstract: Filtering data based on predicates is one of the most fundamental operations for any modern data warehouse. Techniques to accelerate the execution of filter expressions include clustered indexes, specialized sort orders (e.g., Z-order), multi-dimensional indexes, and, for high selectivity queries, secondary indexes. However, these schemes are hard to tune and their performance is inconsistent. Rec… ▽ More

    Submitted 23 June, 2020; originally announced June 2020.

  5. arXiv:1912.01668  [pdf, other

    cs.DB cs.DS cs.LG

    Learning Multi-dimensional Indexes

    Authors: Vikram Nathan, Jialin Ding, Mohammad Alizadeh, Tim Kraska

    Abstract: Scanning and filtering over multi-dimensional tables are key operations in modern analytical database engines. To optimize the performance of these operations, databases often create clustered indexes over a single dimension or multi-dimensional indexes such as R-trees, or use complex sort orders (e.g., Z-ordering). However, these schemes are often hard to tune and their performance is inconsisten… ▽ More

    Submitted 3 December, 2019; originally announced December 2019.

  6. arXiv:1910.04728  [pdf, other

    cs.DB cs.DS cs.LG

    LISA: Towards Learned DNA Sequence Search

    Authors: Darryl Ho, Jialin Ding, Sanchit Misra, Nesime Tatbul, Vikram Nathan, Vasimuddin Md, Tim Kraska

    Abstract: Next-generation sequencing (NGS) technologies have enabled affordable sequencing of billions of short DNA fragments at high throughput, paving the way for population-scale genomics. Genomics data analytics at this scale requires overcoming performance bottlenecks, such as searching for short DNA sequences over long reference sequences. In this paper, we introduce LISA (Learned Indexes for Sequence… ▽ More

    Submitted 10 October, 2019; originally announced October 2019.

  7. arXiv:1412.2485  [pdf, other

    cs.LG

    Accurate Streaming Support Vector Machines

    Authors: Vikram Nathan, Sharath Raghvendra

    Abstract: A widely-used tool for binary classification is the Support Vector Machine (SVM), a supervised learning technique that finds the "maximum margin" linear separator between the two classes. While SVMs have been well studied in the batch (offline) setting, there is considerably less work on the streaming (online) setting, which requires only a single pass over the data using sub-linear space. Existin… ▽ More

    Submitted 8 December, 2014; originally announced December 2014.

    Comments: 2 figures, 8 pages

  8. arXiv:1410.5169  [pdf, other

    cs.CC

    Hardness of Peeling with Stashes

    Authors: Michael Mitzenmacher, Vikram Nathan

    Abstract: The analysis of several algorithms and data structures can be framed as a peeling process on a random hypergraph: vertices with degree less than k and their adjacent edges are removed until no vertices of degree less than k are left. Often the question is whether the remaining hypergraph, the k-core, is empty or not. In some settings, it may be possible to remove either vertices or edges from the… ▽ More

    Submitted 1 June, 2016; v1 submitted 20 October, 2014; originally announced October 2014.

    Comments: 12 pages (including title/abstract), 6 JPEG black/white figures