Samarth Aggarwal

Samarth Aggarwal

Palo Alto, California, United States
2K followers 500+ connections

Activity

Join now to see all activity

Experience

  • Glean Graphic

    Glean

    Palo Alto, California, United States

  • -

    Palo Alto, California, United States

  • -

    Menlo Park, California, United States

  • -

    Gurugram, Haryana, India

  • -

    Bengaluru Area, India

  • -

    Gurgaon, India

Education

  • University of Illinois Urbana-Champaign Graphic

    University of Illinois Urbana-Champaign

    Teaching Assistantships:
    CS425: Distributed Systems, Prof. Indranil (Indy) Gupta
    CS441: Applied Machine Learning, Prof. Marco Morales Aguirre

  • Student mentor to 7 freshmen.
    Teaching Assistant for Intro to Computer Science.

Volunteer Experience

  • Sewa Bharti - India Graphic

    Mentor

    Sewa Bharti - India

    - 3 months

    Education

    Tutored underprivileged students on the basics of computers and software.

  • National Service Scheme, IITD Graphic

    Student Mentor

    National Service Scheme, IITD

    - 1 year 2 months

    Environment

    Dedicated >100 hours to NSS IITD in these projects.
    1) Arohan - Taught and mentored JEE aspirants
    2) Apna Parivaar - Raised funds for an orphanage 'Apna Parivaar', planted saplings in their garden, organized various cultural and educational activities for those children.
    3) Toys from Trash - Made toys that exhibit some scientific principle using trash, demonstrated their making and scientific principle behind them at different schools.

Publications

  • Goal-Driven Command Recommendation for Analysts

    Proceedings of the 2020 ACM Conference on Recommender Systems (RecSys)

    Recent times have seen data analytics software applications become an integral part of the decision-making process of analysts. The users of these software applications generate a vast amount of unstructured log data. These logs contain clues to the user intentions, which traditional recommender systems may find difficult to model implicitly from the log data. With this assumption, we would like to assist the analytics process of a user through command recommendations. We categorize the…

    Recent times have seen data analytics software applications become an integral part of the decision-making process of analysts. The users of these software applications generate a vast amount of unstructured log data. These logs contain clues to the user intentions, which traditional recommender systems may find difficult to model implicitly from the log data. With this assumption, we would like to assist the analytics process of a user through command recommendations. We categorize the commands into software and data categories based on their purpose to fulfill the task at hand. On the premise that the sequence of commands leading up to a data command is a good predictor of the latter, we design, develop, and validate various sequence modeling techniques. In this paper, we propose a framework to provide intent-driven data command recommendations to the user by leveraging unstructured logs. We use the log data of a web-based analytics software to train our neural network models and quantify their performance, in comparison to relevant and competitive baselines. We propose a custom loss function to tailor the recommended data commands according to the intent provided exogenously. We also propose an evaluation metric that captures the degree of intent orientation of the recommendations. We demonstrate the promise of our approach by evaluating the models with the proposed metric and showcasing the robustness of our models in the case of adversarial examples, where the selected intent is misaligned with user activity, through offline evaluation.

    See publication
  • IMoJIE : Iterative Memory based Joint Open Information Extraction

    Association of Computational Linguistics (ACL) 2020

    While traditional systems for Open Information Extraction were statistical and rule-based, recently neural models have been introduced for the task. Our work builds upon CopyAttention, a sequence generation OpenIE model (Cui et. al., 2018). Our analysis reveals that CopyAttention produces a constant number of extractions per sentence, and its extracted tuples often express redundant information.
    We present IMoJIE, an extension to CopyAttention, which produces the next extraction conditioned…

    While traditional systems for Open Information Extraction were statistical and rule-based, recently neural models have been introduced for the task. Our work builds upon CopyAttention, a sequence generation OpenIE model (Cui et. al., 2018). Our analysis reveals that CopyAttention produces a constant number of extractions per sentence, and its extracted tuples often express redundant information.
    We present IMoJIE, an extension to CopyAttention, which produces the next extraction conditioned on all previously extracted tuples. This approach overcomes both shortcomings of CopyAttention, resulting in a variable number of diverse extractions per sentence. We train IMoJIE on training data bootstrapped from extractions of several non-neural systems, which have been automatically filtered to reduce redundancy and noise. IMoJIE outperforms CopyAttention by about 18 F1 pts, and a BERT-based strong baseline by 2 F1 pts, establishing a new state of the art for the task.

    See publication
  • CaRB: A Crowdsourced Benchmark for OpenIE

    Empirical Methods in Natural Language Processing (EMNLP) 2019

    Open Information Extraction (Open IE) systems have been traditionally evaluated via manual annotation. Recently, an automated evaluator with a benchmark dataset (OIE2016) was released – it scores Open IE systems automatically by matching system predictions with predictions in the benchmark dataset. Unfortunately, our analysis reveals that its data is rather noisy, and the tuple matching in the evaluator has issues, making the results of automated comparisons less trustworthy.

    We…

    Open Information Extraction (Open IE) systems have been traditionally evaluated via manual annotation. Recently, an automated evaluator with a benchmark dataset (OIE2016) was released – it scores Open IE systems automatically by matching system predictions with predictions in the benchmark dataset. Unfortunately, our analysis reveals that its data is rather noisy, and the tuple matching in the evaluator has issues, making the results of automated comparisons less trustworthy.

    We contribute CaRB, an improved dataset and framework for testing Open IE systems. To the best of our knowledge, CaRB is the first crowdsourced Open IE dataset and it also makes substantive changes in the matching code and metrics. NLP experts annotate CaRB’s dataset to be more accurate than OIE2016. Moreover, we find that on one pair of Open IE systems, CaRB framework provides contradictory results to OIE2016. Human assessment verifies that CaRB’s ranking of the two systems is the accurate ranking. We release the CaRB framework along with its crowdsourced dataset.

    See publication
  • OpenIE6: Iterative Grid Labeling and Coordination Analysis for Open Information Extraction

    Empirical Methods in Natural Language Processing (EMNLP) 2020

    A recent state-of-the-art neural open information extraction (OpenIE) system generates extractions iteratively, requiring repeated encoding of partial outputs. This comes at a significant computational cost. On the other hand, sequence labeling approaches for OpenIE are much faster, but worse in extraction quality. In this paper, we bridge this trade-off by presenting an iterative labeling-based system that establishes a new state of the art for OpenIE, while extracting 10x faster. This is…

    A recent state-of-the-art neural open information extraction (OpenIE) system generates extractions iteratively, requiring repeated encoding of partial outputs. This comes at a significant computational cost. On the other hand, sequence labeling approaches for OpenIE are much faster, but worse in extraction quality. In this paper, we bridge this trade-off by presenting an iterative labeling-based system that establishes a new state of the art for OpenIE, while extracting 10x faster. This is achieved through a novel Iterative Grid Labeling (IGL) architecture, which treats OpenIE as a 2-D grid labeling task. We improve its performance further by applying coverage (soft) constraints on the grid at training time.
    Moreover, on observing that the best OpenIE systems falter at handling coordination structures, our OpenIE system also incorporates a new coordination analyzer built with the same IGL architecture. This IGL based coordination analyzer helps our OpenIE system handle complicated coordination structures, while also establishing a new state of the art on the task of coordination analysis, with a 12.3 pts improvement in F1 over previous analyzers. Our OpenIE system, OpenIE6, beats the previous systems by as much as 4 pts in F1, while being much faster.

    See publication

Patents

Languages

  • English

    Native or bilingual proficiency

  • Hindi

    Native or bilingual proficiency

  • French

    Elementary proficiency

More activity by Samarth

View Samarth’s full profile

  • See who you know in common
  • Get introduced
  • Contact Samarth directly
Join to view full profile

Other similar profiles

Explore collaborative articles

We’re unlocking community knowledge in a new way. Experts add insights directly into each article, started with the help of AI.

Explore More

Others named Samarth Aggarwal

Add new skills with these courses