Skip to main content

Showing 1–6 of 6 results for author: Daniels, N M

Searching in archive cs. Search in all archives.
.
  1. arXiv:2309.05491  [pdf, other

    cs.DS

    Let them have CAKES: A Cutting-Edge Algorithm for Scalable, Efficient, and Exact Search on Big Data

    Authors: Morgan E. Prior, Thomas J. Howard III, Oliver McLaughlin, Terrence Ferguson, Najib Ishaq, Noah M. Daniels

    Abstract: The ongoing Big Data explosion has created a demand for efficient and scalable algorithms for similarity search. Most recent work has focused on \textit{approximate} $k$-NN search, and while this may be sufficient for some applications, \textit{exact} $k$-NN search would be ideal for many applications. We present CAKES, a set of three novel, exact algorithms for $k$-NN search. CAKES's algori… ▽ More

    Submitted 9 January, 2024; v1 submitted 11 September, 2023; originally announced September 2023.

    Comments: As submitted to ACM TALG

    ACM Class: E.1; F.2.1; H.3.3

  2. arXiv:2204.09610  [pdf, other

    cs.DL cs.DB q-bio.QM

    MEDFORD: A human and machine readable metadata markup language

    Authors: Polina Shpilker, John Freeman, Hailey McKelvie, Jill Ashey, Jay-Miguel Fonticella, Hollie Putnam, Jane Greenberg, Lenore J. Cowen, Alva Couch, Noah M. Daniels

    Abstract: Reproducibility of research is essential for science. However, in the way modern computational biology research is done, it is easy to lose track of small, but extremely critical, details. Key details, such as the specific version of a software used or iteration of a genome can easily be lost in the shuffle, or perhaps not noted at all. Much work is being done on the database and storage side of t… ▽ More

    Submitted 16 June, 2022; v1 submitted 20 April, 2022; originally announced April 2022.

    Comments: 10 pages, no figures

    ACM Class: H.3.7; J.3; H.3.3

  3. arXiv:2103.11774  [pdf, other

    cs.LG stat.ML

    Clustered Hierarchical Anomaly and Outlier Detection Algorithms

    Authors: Najib Ishaq, Thomas J. Howard III, Noah M. Daniels

    Abstract: Anomaly and outlier detection is a long-standing problem in machine learning. In some cases, anomaly detection is easy, such as when data are drawn from well-characterized distributions such as the Gaussian. However, when data occupy high-dimensional spaces, anomaly detection becomes more difficult. We present CLAM (Clustered Learning of Approximate Manifolds), a manifold mapping technique in any… ▽ More

    Submitted 21 November, 2021; v1 submitted 9 February, 2021; originally announced March 2021.

    Comments: As published in IEEE Big Data 2021

  4. arXiv:1908.08551  [pdf, other

    cs.DS cs.CE

    Clustered Hierarchical Entropy-Scaling Search of Astronomical and Biological Data

    Authors: Najib Ishaq, George Student, Noah M. Daniels

    Abstract: Both astronomy and biology are experiencing explosive growth of data, resulting in a "big data" problem that stands in the way of a "big data" opportunity for discovery. One common question asked of such data is that of approximate search ($ρ-$nearest neighbors search). We present a hierarchical search algorithm for such data sets that takes advantage of particular geometric properties apparent in… ▽ More

    Submitted 10 November, 2019; v1 submitted 22 August, 2019; originally announced August 2019.

    Comments: Accepted to IEEE Big Data 2019

  5. Entropy-scaling search of massive biological data

    Authors: Y. William Yu, Noah M. Daniels, David Christian Danko, Bonnie Berger

    Abstract: Many datasets exhibit a well-defined structure that can be exploited to design faster search tools, but it is not always clear when such acceleration is possible. Here, we introduce a framework for similarity search based on characterizing a dataset's entropy and fractal dimension. We prove that searching scales in time with metric entropy (number of covering hyperspheres), if the fractal dimensio… ▽ More

    Submitted 21 September, 2015; v1 submitted 18 March, 2015; originally announced March 2015.

    Comments: Including supplement: 41 pages, 6 figures, 4 tables, 1 box

    Journal ref: Cell Systems, Volume 1, Issue 2, 130-140, 2015

  6. Remote Homology Detection in Proteins Using Graphical Models

    Authors: Noah M. Daniels

    Abstract: Given the amino acid sequence of a protein, researchers often infer its structure and function by finding homologous, or evolutionarily-related, proteins of known structure and function. Since structure is typically more conserved than sequence over long evolutionary distances, recognizing remote protein homologs from their sequence poses a challenge. We first consider all proteins of known thre… ▽ More

    Submitted 23 April, 2013; originally announced April 2013.

    Comments: Doctoral dissertation