Skip to main content

Showing 1–22 of 22 results for author: Tirthapura, S

Searching in archive cs. Search in all archives.
.
  1. arXiv:2003.06508  [pdf, other

    cs.LG stat.ML

    DriftSurf: A Risk-competitive Learning Algorithm under Concept Drift

    Authors: Ashraf Tahmasbi, Ellango Jothimurugesan, Srikanta Tirthapura, Phillip B. Gibbons

    Abstract: When learning from streaming data, a change in the data distribution, also known as concept drift, can render a previously-learned model inaccurate and require training a new model. We present an adaptive learning algorithm that extends previous drift-detection-based methods by incorporating drift detection into a broader stable-state/reactive-state process. The advantage of our approach is that w… ▽ More

    Submitted 2 August, 2020; v1 submitted 13 March, 2020; originally announced March 2020.

    Comments: 32 pages, 12 figures. Submitted to NeurIPS 2020. Replaced to include revision of Lemma 2 and additional experimental results

    ACM Class: I.2.6

  2. arXiv:2001.11433  [pdf, other

    cs.DC

    Shared-Memory Parallel Maximal Clique Enumeration from Static and Dynamic Graphs

    Authors: Apurba Das, Seyed-Vahid Sanei-Mehri, Srikanta Tirthapura

    Abstract: Maximal Clique Enumeration (MCE) is a fundamental graph mining problem, and is useful as a primitive in identifying dense structures in a graph. Due to the high computational cost of MCE, parallel methods are imperative for dealing with large graphs. We present shared-memory parallel algorithms for MCE, with the following properties: (1) the parallel algorithms are provably work-efficient relative… ▽ More

    Submitted 30 January, 2020; originally announced January 2020.

    Comments: This paper is accepted in ACM Transactions on Parallel Computing (TOPC). A preliminary version [arXiv:1807.09417] of this work appeared in the proceedings of the 25th IEEE International Conference on. High Performance Computing, Data, and Analytics (HiPC), 2018

  3. arXiv:1909.02629  [pdf, other

    cs.DB cs.DS

    Random Sampling for Group-By Queries

    Authors: Trong Duc Nguyen, Ming-Hung Shih, Sai Sree Parvathaneni, Bojian Xu, Divesh Srivastava, Srikanta Tirthapura

    Abstract: Random sampling has been widely used in approximate query processing on large databases, due to its potential to significantly reduce resource usage and response times, at the cost of a small approximation error. We consider random sampling for answering the ubiquitous class of group-by queries, which first group data according to one or more attributes, and then aggregate within each group after… ▽ More

    Submitted 12 September, 2019; v1 submitted 5 September, 2019; originally announced September 2019.

  4. arXiv:1906.04120  [pdf, other

    cs.DS

    Parallel Streaming Random Sampling

    Authors: Kanat Tangwongsan, Srikanta Tirthapura

    Abstract: This paper investigates parallel random sampling from a potentially-unending data stream whose elements are revealed in a series of element sequences (minibatches). While sampling from a stream was extensively studied sequentially, not much has been explored in the parallel context, with prior parallel random-sampling algorithms focusing on the static batch model. We present parallel algorithms fo… ▽ More

    Submitted 10 June, 2019; originally announced June 2019.

  5. arXiv:1904.04126  [pdf, ps, other

    cs.DS

    Weighted Reservoir Sampling from Distributed Streams

    Authors: Rajesh Jayaram, Gokarna Sharma, Srikanta Tirthapura, David P. Woodruff

    Abstract: We consider message-efficient continuous random sampling from a distributed stream, where the probability of inclusion of an item in the sample is proportional to a weight associated with the item. The unweighted version, where all weights are equal, is well studied, and admits tight upper and lower bounds on message complexity. For weighted sampling with replacement, there is a simple reduction t… ▽ More

    Submitted 8 April, 2019; originally announced April 2019.

    Comments: To appear in PODS 2019

  6. arXiv:1903.12065  [pdf, ps, other

    cs.DC

    Optimal Random Sampling from Distributed Streams Revisited

    Authors: Srikanta Tirthapura, David P. Woodruff

    Abstract: We give an improved algorithm for drawing a random sample from a large data stream when the input elements are distributed across multiple sites which communicate via a central coordinator. At any point in time the set of elements held by the coordinator represent a uniform random sample from the set of all the elements observed so far. When compared with prior work, our algorithms asymptotically… ▽ More

    Submitted 28 March, 2019; originally announced March 2019.

    Comments: This writeup is a revised version of a paper with the same title and authors, which appeared in the Proceedings of the International Conference on Distributed Computing (DISC) 2011

    Journal ref: DISC 2011: 283-297

  7. FLEET: Butterfly Estimation from a Bipartite Graph Stream

    Authors: Seyed-Vahid Sanei-Mehri, Yu Zhang, Ahmet Erdem Sariyuce, Srikanta Tirthapura

    Abstract: We consider space-efficient single-pass estimation of the number of butterflies, a fundamental bipartite graph motif, from a massive bipartite graph stream where each edge represents a connection between entities in two different partitions. We present a space lower bound for any streaming algorithm that can estimate the number of butterflies accurately, as well as FLEET, a suite of algorithms for… ▽ More

    Submitted 28 August, 2019; v1 submitted 8 December, 2018; originally announced December 2018.

    Comments: This is the author's version of the work. It is posted here by permission of ACM for your personal use. Not for redistribution. The definitive version was published in Seyed-Vahid Sanei-Mehri, Yu Zhang, Ahmet Erdem Sariyuce and Srikanta Tirthapura. "FLEET: Butterfly Estimation from a Bipartite Graph Stream". The 28th ACM International Conference on Information and Knowledge Management

  8. Enumerating Top-k Quasi-Cliques

    Authors: Seyed-Vahid Sanei-Mehri, Apurba Das, Srikanta Tirthapura

    Abstract: Quasi-cliques are dense incomplete subgraphs of a graph that generalize the notion of cliques. Enumerating quasi-cliques from a graph is a robust way to detect densely connected structures with applications to bio-informatics and social network analysis. However, enumerating quasi-cliques in a graph is a challenging problem, even harder than the problem of enumerating cliques. We consider the enum… ▽ More

    Submitted 28 August, 2018; originally announced August 2018.

    Comments: 10 pages

    Journal ref: 2018 IEEE International Conference on Big Data (Big Data)

  9. arXiv:1807.09417  [pdf, other

    cs.DS

    Shared-Memory Parallel Maximal Clique Enumeration

    Authors: Apurba Das, Seyed-Vahid Sanei-Mehri, Srikanta Tirthapura

    Abstract: We present shared-memory parallel methods for Maximal Clique Enumeration (MCE) from a graph. MCE is a fundamental and well-studied graph analytics task, and is a widely used primitive for identifying dense structures in a graph. Due to its computationally intensive nature, parallel methods are imperative for dealing with large graphs. However, surprisingly, there do not yet exist scalable and para… ▽ More

    Submitted 24 July, 2018; originally announced July 2018.

    Comments: 10 pages, 3 figures, proceedings of the 25th IEEE International Conference on. High Performance Computing, Data, and Analytics (HiPC), 2018

  10. arXiv:1801.09039  [pdf, other

    cs.DB

    Variance-Optimal Offline and Streaming Stratified Random Sampling

    Authors: Trong Duc Nguyen, Ming-Hung Shih, Divesh Srivastava, Srikanta Tirthapura, Bojian Xu

    Abstract: Stratified random sampling (SRS) is a fundamental sampling technique that provides accurate estimates for aggregate queries using a small size sample, and has been used widely for approximate query processing. A key question in SRS is how to partition a target sample size among different strata. While Neyman allocation provides a solution that minimizes the variance of an estimate using this sampl… ▽ More

    Submitted 20 February, 2018; v1 submitted 27 January, 2018; originally announced January 2018.

  11. arXiv:1801.07399  [pdf, other

    cs.CG cs.DC cs.DS

    Onion Curve: A Space Filling Curve with Near-Optimal Clustering

    Authors: Pan Xu, Cuong Nguyen, Srikanta Tirthapura

    Abstract: Space filling curves (SFCs) are widely used in the design of indexes for spatial and temporal data. Clustering is a key metric for an SFC, that measures how well the curve preserves locality in moving from higher dimensions to a single dimension. We present the {\em onion curve}, an SFC whose clustering performance is provably close to optimal for the cube and near-cube shaped query sets, irrespec… ▽ More

    Submitted 3 June, 2018; v1 submitted 23 January, 2018; originally announced January 2018.

    Comments: The short version is published in ICDE 18

  12. arXiv:1801.00338  [pdf, other

    cs.DM

    Butterfly Counting in Bipartite Networks

    Authors: Seyed-Vahid Sanei-Mehri, Ahmet Erdem Sariyuce, Srikanta Tirthapura

    Abstract: We consider the problem of counting motifs in bipartite affiliation networks, such as author-paper, user-product, and actor-movie relations. We focus on counting the number of occurrences of a "butterfly", a complete $2 \times 2$ biclique, the simplest cohesive higher-order structure in a bipartite graph. Our main contribution is a suite of randomized algorithms that can quickly approximate the nu… ▽ More

    Submitted 15 March, 2018; v1 submitted 31 December, 2017; originally announced January 2018.

    Comments: 28 pages, 5 tables, 6 figures

  13. arXiv:1710.02103  [pdf, other

    cs.AI cs.LG stat.ML

    Learning Graphical Models from a Distributed Stream

    Authors: Yu Zhang, Srikanta Tirthapura, Graham Cormode

    Abstract: A current challenge for data management systems is to support the construction and maintenance of machine learning models over data that is large, multi-dimensional, and evolving. While systems that could support these tasks are emerging, the need to scale to distributed, streaming data requires new models and algorithms. In this setting, as well as computational scalability and model accuracy, we… ▽ More

    Submitted 5 October, 2017; originally announced October 2017.

  14. arXiv:1707.08272  [pdf, other

    cs.DS cs.DB

    A Change-Sensitive Algorithm for Maintaining Maximal Bicliques in a Dynamic Bipartite Graph

    Authors: Apurba Das, Srikanta Tirthapura

    Abstract: We consider the maintenance of maximal bicliques from a dynamic bipartite graph that changes over time due to the addition or deletion of edges. When the set of edges in a graph changes, we are interested in knowing the change in the set of maximal bicliques (the "change"), rather than in knowing the set of maximal bicliques that remain unaffected. The challenge in an efficient algorithm is to enu… ▽ More

    Submitted 25 July, 2017; originally announced July 2017.

    Comments: 12 pages, 9 figures

  15. arXiv:1701.03826  [pdf, other

    cs.DS cs.SE

    Streaming k-Means Clustering with Fast Queries

    Authors: Yu Zhang, Kanat Tangwongsan, Srikanta Tirthapura

    Abstract: We present methods for k-means clustering on a stream with a focus on providing fast responses to clustering queries. Compared to the current state-of-the-art, our methods provide substantial improvement in the query time for cluster centers while retaining the desirable properties of provably small approximation error and low space usage. Our algorithms rely on a novel idea of "coreset caching" t… ▽ More

    Submitted 6 December, 2018; v1 submitted 13 January, 2017; originally announced January 2017.

  16. arXiv:1602.05232  [pdf, other

    cs.DS cs.DC

    Work-Efficient Parallel and Incremental Graph Connectivity

    Authors: Natcha Simsiri, Kanat Tangwongsan, Srikanta Tirthapura, Kun-Lung Wu

    Abstract: On an evolving graph that is continuously updated by a high-velocity stream of edges, how can one efficiently maintain if two vertices are connected? This is the connectivity problem, a fundamental and widely studied problem on graphs. We present the first shared-memory parallel algorithm for incremental graph connectivity that is both provably work-efficient and has polylogarithmic parallel depth… ▽ More

    Submitted 16 February, 2016; originally announced February 2016.

    Comments: 18 pages

  17. arXiv:1601.06311  [pdf, other

    cs.DS cs.DB

    Incremental Maintenance of Maximal Cliques in a Dynamic Graph

    Authors: Apurba Das, Michael Svendsen, Srikanta Tirthapura

    Abstract: We consider the maintenance of the set of all maximal cliques in a dynamic graph that is changing through the addition or deletion of edges. We present nearly tight bounds on the magnitude of change in the set of maximal cliques, as well as the first change-sensitive algorithms for clique maintenance, whose runtime is proportional to the magnitude of the change in the set of maximal cliques. We pr… ▽ More

    Submitted 17 March, 2018; v1 submitted 23 January, 2016; originally announced January 2016.

    Comments: 18 pages, 8 figures

  18. arXiv:1404.4910  [pdf, ps, other

    cs.DC

    Enumerating Maximal Bicliques from a Large Graph using MapReduce

    Authors: Arko Provo Mukherjee, Srikanta Tirthapura

    Abstract: We consider the enumeration of maximal bipartite cliques (bicliques) from a large graph, a task central to many practical data mining problems in social network analysis and bioinformatics. We present novel parallel algorithms for the MapReduce platform, and an experimental evaluation using Hadoop MapReduce. Our algorithm is based on clustering the input graph into smaller sized subgraphs, followe… ▽ More

    Submitted 18 April, 2014; originally announced April 2014.

    Comments: A preliminary version of the paper was accepted at the Proceedings of the 3rd IEEE International Congress on Big Data 2014

  19. arXiv:1310.6780  [pdf, ps, other

    cs.DS cs.DB

    Mining Maximal Cliques from an Uncertain Graph

    Authors: Arko Provo Mukherjee, Pan Xu, Srikanta Tirthapura

    Abstract: We consider mining dense substructures (maximal cliques) from an uncertain graph, which is a probability distribution on a set of deterministic graphs. For parameter 0 < α < 1, we present a precise definition of an α-maximal clique in an uncertain graph. We present matching upper and lower bounds on the number of α-maximal cliques possible within an uncertain graph. We present an algorithm to enum… ▽ More

    Submitted 22 October, 2014; v1 submitted 24 October, 2013; originally announced October 2013.

    Comments: ICDE 2015

  20. arXiv:1310.1161  [pdf, ps, other

    cs.DB

    Identifying Correlated Heavy-Hitters in a Two-Dimensional Data Stream

    Authors: Bibudh Lahiri, Arko Provo Mukherjee, Srikanta Tirthapura

    Abstract: We consider online mining of correlated heavy-hitters from a data stream. Given a stream of two-dimensional data, a correlated aggregate query first extracts a substream by applying a predicate along a primary dimension, and then computes an aggregate along a secondary dimension. Prior work on identifying heavy-hitters in streams has almost exclusively focused on identifying heavy-hitters on a sin… ▽ More

    Submitted 3 October, 2013; originally announced October 2013.

  21. arXiv:1308.2166  [pdf, other

    cs.DB cs.DC cs.DS cs.SI

    Parallel Triangle Counting in Massive Streaming Graphs

    Authors: Kanat Tangwongsan, A. Pavan, Srikanta Tirthapura

    Abstract: The number of triangles in a graph is a fundamental metric, used in social network analysis, link classification and recommendation, and more. Driven by these applications and the trend that modern graph datasets are both large and dynamic, we present the design and implementation of a fast and cache-efficient parallel algorithm for estimating the number of triangles in a massive undirected graph… ▽ More

    Submitted 9 August, 2013; originally announced August 2013.

  22. arXiv:1004.1569   

    cs.DS cs.DB

    A Streaming Approximation Algorithm for Klee's Measure Problem

    Authors: Gokarna Sharma, Costas Busch, Srikanta Tirthapura

    Abstract: The efficient estimation of frequency moments of a data stream in one-pass using limited space and time per item is one of the most fundamental problem in data stream processing. An especially important estimation is to find the number of distinct elements in a data stream, which is generally referred to as the zeroth frequency moment and denoted by $F_0$. In this paper, we consider streams of rec… ▽ More

    Submitted 28 October, 2010; v1 submitted 9 April, 2010; originally announced April 2010.

    Comments: This paper has been withdrawn by the author due to a small technical error in Algorithm 3 and 4