Kartik Kulkarni

Kartik Kulkarni

Redwood City, California, United States
4K followers 500+ connections

Activity

Join now to see all activity

Experience

  • Stealth

  • -

    Redwood City, California, United States

  • -

  • -

  • -

  • -

  • -

  • -

    Redwood City, CA, USA

  • -

    New York, United States

  • -

    Redwood City

  • -

  • -

    Redwood City, California

  • -

    Bangalore, India

  • -

    Bangalore/Hubli

Education

  • Carnegie Mellon University Graphic

    Carnegie Mellon University

    -

    Parallel Data Lab (PDL) Research Alumni.
    Nvidia prize for parallel computing project.

  • -

    Best Outgoing Student - 2009 batch;
    Gold Medal, sponsored by Tata Consultancy Services

Volunteer Experience

  • Plenary Speaker

    IEEE Global Humanitarian Technology Conference GHTC

    - 1 month

    Science and Technology

    Plenary Session on Building a Locally‐focused Community of Engineers for Global Development, SIGHT Workshop Presentation on Deep Dive into Creating and Sustaining Local Impact.
    Link: https://tinyurl.com/2s35d4vw (Shared lessons learned from our community interventions and how this community is becoming relevant and a partner to global initiatives and programs that aim to accomplish specific goals/mission such as People‐Centered‐Internet, IEEE Smart Village, and others.)

Publications

  • Distributed Architecture of Oracle Database In-memory

    VLDB 2015: 41st International Conference on Very Large Data Bases, Kohala Coast, Hawai'i, USA

    The Oracle RDBMS In-memory Option (DBIM) is an industry firstdistributed dual format architecture that allows a database object to be stored in columnar format in main memory highly optimized to break performance barriers in analytic query workloads, simultaneously maintaining transactional consistency with the corresponding OLTP optimized row-major format persisted in storage and accessed through database buffer cache.

    In this paper, we present the distributed, highly-available, and…

    The Oracle RDBMS In-memory Option (DBIM) is an industry firstdistributed dual format architecture that allows a database object to be stored in columnar format in main memory highly optimized to break performance barriers in analytic query workloads, simultaneously maintaining transactional consistency with the corresponding OLTP optimized row-major format persisted in storage and accessed through database buffer cache.

    In this paper, we present the distributed, highly-available, and fault-tolerant architecture of the Oracle DBIM that enables the RDBMS to transparently scale out in a database cluster, both in terms of memory capacity and query processing throughput. We believe that the architecture is unique among all mainstream inmemory databases. It allows complete application-transparent, extremely scalable and automated distribution of Oracle RDBMS objects in-memory across a cluster, as well as across multiple NUMA nodes within a single server. It seamlessly provides distribution awareness to the Oracle SQL execution framework through affinitized fault-tolerant parallel execution within and across servers without explicit optimizer plan changes or query rewrites.

    Other authors
    See publication
  • Local Content Development Framework and Methodology for Knowledge and Skill Development: IEEE Madras Section SIGHT Case Study

    IEEE Xplore/ IEEE Global Humanitarian Technology Conference (GHTC) 2014

  • Real-World proficiency augmentation among learners through merger of Project Based Learning (PBL) and Student Social Responsibility (SSR)

    IEEE Xplore/ IEEE Global Humanitarian Technology Conference (GHTC) 2014, Silicon Valley, USA

    Other authors
  • Building a High-Performance Metadata Service by Reusing Scalable I/O Bandwidth

    SOSP 2013: The 24th ACM Symposium on Operating Systems Principles, Farmington, Pennsylvania, USA

    Modern parallel and cluster file systems provide highly scalable I/O bandwidth by enabling highly parallel access to file data. Unfortunately metadata access does not benefit from parallel data transfer, so metadata performance scaling is less common. To support metadata-intensive workloads, we offer a middleware design that layers on top of existing cluster file systems, adds support for load balanced and high-performance metadata operations without sacrificing data bandwidth. The core idea is…

    Modern parallel and cluster file systems provide highly scalable I/O bandwidth by enabling highly parallel access to file data. Unfortunately metadata access does not benefit from parallel data transfer, so metadata performance scaling is less common. To support metadata-intensive workloads, we offer a middleware design that layers on top of existing cluster file systems, adds support for load balanced and high-performance metadata operations without sacrificing data bandwidth. The core idea is to integrate a distributed indexing mechanism with a metadata optimized on-disk Log-Structured Merge tree layout. The integration requires several optimizations including cross-server split operations with minimum data migration, and decoupling of data and metadata paths. To demonstrate the feasibility of our approach, we implemented a prototype middleware layer GIGA+TableFS and evaluated it with a Panasas parallel file system. GIGA+TableFS improves metadata performance of PanFS by as much an order of magnitude, while still performing comparably on data-intensive workloads.

    Other authors
    See publication
  • Giga+TableFS on PanFS: Scaling Metadata Performance on Cluster File Systems

    Parallel Data Lab, Carnegie Mellon University

    Modern File Systems provide scalable performance for large file data management. However, in case of metadata management the usual approach is to have single or few points of metadata service (MDS). In the current world, file systems are challenged by unique needs such as managing exponentially growing files, using filesystem as a key-value store, check-pointing that are highly metadata intensive and are usually bottle-necked by the centralized MDS schemes.

    To overcome this metadata…

    Modern File Systems provide scalable performance for large file data management. However, in case of metadata management the usual approach is to have single or few points of metadata service (MDS). In the current world, file systems are challenged by unique needs such as managing exponentially growing files, using filesystem as a key-value store, check-pointing that are highly metadata intensive and are usually bottle-necked by the centralized MDS schemes.

    To overcome this metadata bottle-neck, we evaluate a scalable MDS layer for the existing cluster file systems using Giga+ -a high performance distributed index without synchronization and serialization and TableFS -a file system with an embedded No-SQL database using modern key-value pair levelDB. We take layered approach to scale the metadata performance which does not need any hardware infrastructure upgrade in the existing storage clusters. In addition to providing scalable and increased metadata performance by several folds, avoiding metadata hotspots, packing small files, our MDS layer adds no-or-low performance overhead on the data throughput and resource utilizations of the underlying cluster.

    Other authors
    See publication

Patents

  • Flexible in-memory column store placement

    Filed US US20180096010A1

    Techniques are described herein for distributing distinct portions of a database object across volatile memories of selected nodes of a plurality of nodes in a clustered database system. The techniques involve storing a unit-to-service mapping that associates a unit (a database object or portion thereof) to one or more database services. The one or more database services are mapped to one or more nodes. The nodes to which a service is mapped may include nodes in disjoint database systems, so…

    Techniques are described herein for distributing distinct portions of a database object across volatile memories of selected nodes of a plurality of nodes in a clustered database system. The techniques involve storing a unit-to-service mapping that associates a unit (a database object or portion thereof) to one or more database services. The one or more database services are mapped to one or more nodes. The nodes to which a service is mapped may include nodes in disjoint database systems, so long as those database systems have access to a replica of the unit. The database object is treated as in-memory enabled by nodes that are associated with the service, and are treated as not in-memory enabled by nodes that are not associated with the service.

    Other inventors
  • Space management for transactional consistency of in-memory objects on a standby database

    Filed US US20180165324A1

    Embodiments store transaction metadata in dedicated pools of allocated memory chunks. Portions of the pools of allocated memory chunks are dedicated to the respective apply slave processes that mine and process change records. Also, the pools of allocated memory chunks are anchored within the structure of a transaction log such that buffering and application of metadata for one transaction does not block required buffering and application of metadata for other transactions. The standby database…

    Embodiments store transaction metadata in dedicated pools of allocated memory chunks. Portions of the pools of allocated memory chunks are dedicated to the respective apply slave processes that mine and process change records. Also, the pools of allocated memory chunks are anchored within the structure of a transaction log such that buffering and application of metadata for one transaction does not block required buffering and application of metadata for other transactions. The standby database system pre-processes transaction metadata in preparation for application of the metadata to invalidate appropriate portions of MF data. Further, embodiments divide the work of pre-processing invalidation records among the many apply slave processes that record the invalidation records. A garbage collection selects memory chunks for garbage collection in reverse order of how the chunks were allocated. Also, a deduplication algorithm ensures that typically only a single invalidation message per block is applied to invalidate MF data.

    Other inventors
  • DISTRIBUTION OF AN OBJECT IN VOLATILE MEMORY ACROSS A MULTI-NODE CLUSTER

    Issued US 9875259

    Techniques are described herein for distributing distinct portions of a database object across the volatile memories of a plurality of nodes in a clustered database system. The techniques involve establishing a single database server instance located on a node in a multi-node cluster as a load-operation master for a particular data set. The load-operation master determines how the data set may be separated into chunks using a hash function. The load-operation master then broadcasts a small…

    Techniques are described herein for distributing distinct portions of a database object across the volatile memories of a plurality of nodes in a clustered database system. The techniques involve establishing a single database server instance located on a node in a multi-node cluster as a load-operation master for a particular data set. The load-operation master determines how the data set may be separated into chunks using a hash function. The load-operation master then broadcasts a small payload of consistency information to other database servers, so each database server may independently execute the hash function and independently load their respectively assigned chunks of data.

    Other inventors
    See patent
  • Efficient determination of committed changes

    Filed US US20180060377A1

    A minimum value (MV) is computed for start timestamps that each correspond to an uncommitted transaction. In an embodiment, the MV is computed for a pluggable database that is open on at least first and second instances of a database. The MV is computed for the first instance as of a first current timestamp (CT). The MV and the first CT are communicated to a second instance that has a second CT. If the first and second CTs are equal, the second instance store the MV. If the first CT is bigger…

    A minimum value (MV) is computed for start timestamps that each correspond to an uncommitted transaction. In an embodiment, the MV is computed for a pluggable database that is open on at least first and second instances of a database. The MV is computed for the first instance as of a first current timestamp (CT). The MV and the first CT are communicated to a second instance that has a second CT. If the first and second CTs are equal, the second instance store the MV. If the first CT is bigger, the second CT also becomes equal to the first CT. If the first CT is smaller, the MV is discarded, and the first CT becomes equal to the second CT. In an embodiment, if the MV remains unchanged for a predetermined time period, a start timestamp corresponding to the MV is advanced to current or future timestamp.

    Other inventors
  • Method and mechanism for efficient re-distribution of in-memory columnar units in a clustered rdbms on topology change

    Filed US US20170212939A1

    Techniques are described herein for executing queries on distinct portions of a database object that has been separate into chunks and distributed across the volatile memories of a plurality of nodes in a clustered database system. The techniques involve redistributing the in-memory database object portions on changes to the clustered database system. Each node may maintain a mapping indicating which nodes in the clustered database system store which chunks, and timestamps indicating when each…

    Techniques are described herein for executing queries on distinct portions of a database object that has been separate into chunks and distributed across the volatile memories of a plurality of nodes in a clustered database system. The techniques involve redistributing the in-memory database object portions on changes to the clustered database system. Each node may maintain a mapping indicating which nodes in the clustered database system store which chunks, and timestamps indicating when each mapping entry was created or updated. A query coordinator may use the timestamps to select a database server instance with local in memory access to data required by a portion of a query to process that portion of the query.

    Other inventors
  • MEMORY-AWARE JOINS BASED IN A DATABASE CLUSTER

    Issued US 14/806,411

    Techniques are described herein for distributing data from one or more partitioned tables across the volatile memories of a cluster. In memory copies of data from partitioned tables are grouped based on the data falling within the same partition criteria. These groups are used for assigning data from corresponding partitions to the same node when distributing data from partitioned tables across the volatile memories of a multi-node cluster. When a query requires a join between rows of…

    Techniques are described herein for distributing data from one or more partitioned tables across the volatile memories of a cluster. In memory copies of data from partitioned tables are grouped based on the data falling within the same partition criteria. These groups are used for assigning data from corresponding partitions to the same node when distributing data from partitioned tables across the volatile memories of a multi-node cluster. When a query requires a join between rows of partitioned tables, the work for the join query is divided into work granules that correspond to partition-wise join operations. Those partition-wise join operations are assigned to nodes by a query coordinator based on the partition-to-node mapping located in the node of the query coordinator.

    Other inventors
    See patent
  • FRAMEWORK FOR VOLATILE MEMORY QUERY EXECUTION IN A MULTI NODE CLUSTER

    Filed US 14/805,949

    Techniques are described herein for executing queries on distinct portions of a database object that has been separate into chunks and distributed across the volatile memories of a plurality of nodes in a clustered database system. The techniques involve receiving a query that requires work to be performed on data that resides in a plurality of on disk extents. A parallel query coordinator that is aware of the in-memory distribution divides the work into granules that align with the in-memory…

    Techniques are described herein for executing queries on distinct portions of a database object that has been separate into chunks and distributed across the volatile memories of a plurality of nodes in a clustered database system. The techniques involve receiving a query that requires work to be performed on data that resides in a plurality of on disk extents. A parallel query coordinator that is aware of the in-memory distribution divides the work into granules that align with the in-memory separation. The parallel query coordinator then sends each granule to the database server instance with local in memory access to the data required by the granule and aggregates the results to respond to the query.

    Other inventors
  • Query execution against an in-memory standby database

    US US20170116252A1

    Techniques related to query execution against an in-memory standby database are disclosed. A first database includes PF data stored on persistent storage in a persistent format. The first database is accessible to a first database server that converts the PF data to a mirror format to produce MF data that is stored within volatile memory. The first database server receives, from a second database server, one or more change records indicating one or more transactions performed against a second…

    Techniques related to query execution against an in-memory standby database are disclosed. A first database includes PF data stored on persistent storage in a persistent format. The first database is accessible to a first database server that converts the PF data to a mirror format to produce MF data that is stored within volatile memory. The first database server receives, from a second database server, one or more change records indicating one or more transactions performed against a second database. The one or more change records are applied to the PF data, and a reference timestamp is advanced from a first to a second timestamp. The first database server invalidates any MF data that is changed by a subset of the one or more transactions that committed between the first and second timestamps.

    Other inventors

Honors & Awards

  • Theodore Hissey Outstanding Young Professional Award

    IEEE Awards Board

    Citation - “For contributions to the technical fields of transactions and in-memory databases, as well as for enabling young professionals working on technologies for sustainable development.”

  • New Faces of Engineering 2015

    DiscoverE Foundation - http://www.discovere.org/

    http://www.discovere.org/content/discovere-announces-2015-new-faces-engineering-honorees

    “This year’s 13 honorees personify one of the core tenets of our profession -- engineering is more than just a job, it’s a way of improving the world through continuous innovation and commitment,” states Leslie Collins, Executive Director of DiscoverE.

  • Gold Medal

    -

    Academic Excellance

  • Best Outgoing Student

    B.V.Bhoomaraddi College of Engineering

  • Distinguished Student Humnaitarian

    IEEE Presidents' Change the World Competition

    Project: "Electronic/Computing aids for physically handicapped children"
    Awarded at IEEE Honors Ceremony, Los Angeles

Recommendations received

More activity by Kartik

View Kartik’s full profile

  • See who you know in common
  • Get introduced
  • Contact Kartik directly
Join to view full profile

Other similar profiles

Explore collaborative articles

We’re unlocking community knowledge in a new way. Experts add insights directly into each article, started with the help of AI.

Explore More

Others named Kartik Kulkarni in United States

Add new skills with these courses