Skip to main content

Showing 1–16 of 16 results for author: Kipnis, A

Searching in archive cs. Search in all archives.
.
  1. arXiv:2407.12844  [pdf, other

    cs.CL cs.LG stat.ML

    $\texttt{metabench}$ -- A Sparse Benchmark to Measure General Ability in Large Language Models

    Authors: Alex Kipnis, Konstantinos Voudouris, Luca M. Schulze Buschoff, Eric Schulz

    Abstract: Large Language Models (LLMs) vary in their abilities on a range of tasks. Initiatives such as the $\texttt{Open LLM Leaderboard}$ aim to quantify these differences with several large benchmarks (sets of test items to which an LLM can respond either correctly or incorrectly). However, high correlations within and between benchmark scores suggest that (1) there exists a small set of common underlyin… ▽ More

    Submitted 4 July, 2024; originally announced July 2024.

    Comments: LLMs, benchmarking, IRT, information, compression

  2. arXiv:2308.13399  [pdf, ps, other

    cs.CL cs.IT cs.LG

    EntropyRank: Unsupervised Keyphrase Extraction via Side-Information Optimization for Language Model-based Text Compression

    Authors: Alexander Tsvetkov, Alon Kipnis

    Abstract: We propose an unsupervised method to extract keywords and keyphrases from texts based on a pre-trained language model (LM) and Shannon's information maximization. Specifically, our method extracts phrases having the highest conditional entropy under the LM. The resulting set of keyphrases turns out to solve a relevant information-theoretic problem: if provided as side information, it leads to the… ▽ More

    Submitted 29 August, 2023; v1 submitted 25 August, 2023; originally announced August 2023.

    Journal ref: ICML 2023 Workshop Neural Compression: From Information Theory to Applications

  3. arXiv:2308.12747  [pdf, other

    cs.IT cs.AI stat.AP

    Separating the Human Touch from AI-Generated Text using Higher Criticism: An Information-Theoretic Approach

    Authors: Alon Kipnis

    Abstract: We propose a method to determine whether a given article was entirely written by a generative language model versus an alternative situation in which the article includes some significant edits by a different author, possibly a human. Our process involves many perplexity tests for the origin of individual sentences or other text atoms, combining these multiple tests using Higher Criticism (HC). As… ▽ More

    Submitted 24 August, 2023; originally announced August 2023.

  4. arXiv:2305.18111  [pdf, other

    math.ST cs.IT stat.ML

    The Minimax Risk in Testing Uniformity of Poisson Data under Missing Ball Alternatives within a Hypercube

    Authors: Alon Kipnis

    Abstract: We study the problem of testing the goodness of fit of occurrences of items from many categories to an identical Poisson distribution over the categories. As a class of alternative hypotheses, we consider the removal of an $\ell_p$ ball, $p \leq 2$, of radius $ε$ from a hypercube around the sequence of uniform Poisson rates. When the expected number of samples $n$ and number of categories $N$ go t… ▽ More

    Submitted 17 July, 2024; v1 submitted 29 May, 2023; originally announced May 2023.

    MSC Class: 62G10; 62C20; 62C20 ACM Class: G.3

  5. Gaussian Approximation of Quantization Error for Estimation from Compressed Data

    Authors: Alon Kipnis, Galen Reeves

    Abstract: We consider the distributional connection between the lossy compressed representation of a high-dimensional signal $X$ using a random spherical code and the observation of $X$ under an additive white Gaussian noise (AWGN). We show that the Wasserstein distance between a bitrate-$R$ compressed version of $X$ and its observation under an AWGN-channel of signal-to-noise ratio $2^{2R}-1$ is sub-linear… ▽ More

    Submitted 12 December, 2021; v1 submitted 9 January, 2020; originally announced January 2020.

    Journal ref: IEEE Transactions on Information Theory (Volume: 67, Issue: 8, Aug. 2021)

  6. arXiv:1911.01208  [pdf, other

    cs.CL cs.LG stat.CO stat.ML

    Higher Criticism for Discriminating Word-Frequency Tables and Testing Authorship

    Authors: Alon Kipnis

    Abstract: We adapt the Higher Criticism (HC) goodness-of-fit test to measure the closeness between word-frequency tables. We apply this measure to authorship attribution challenges, where the goal is to identify the author of a document using other documents whose authorship is known. The method is simple yet performs well without handcrafting and tuning; reporting accuracy at the state of the art level in… ▽ More

    Submitted 21 June, 2022; v1 submitted 30 October, 2019; originally announced November 2019.

    MSC Class: 62G; 62P ACM Class: J.5

    Journal ref: The Annals of Applied Statistics 16, no. 2 (2022): 1236-1252

  7. Mean Estimation from One-Bit Measurements

    Authors: Alon Kipnis, John C. Duchi

    Abstract: We consider the problem of estimating the mean of a symmetric log-concave distribution under the constraint that only a single bit per sample from this distribution is available to the estimator. We study the mean squared error as a function of the sample size (and hence the number of bits). We consider three settings: first, a centralized setting, where an encoder may release $n$ bits given a sam… ▽ More

    Submitted 9 May, 2022; v1 submitted 10 January, 2019; originally announced January 2019.

    Comments: Accepted for publication in the IEEE Transactions on Information Theory

    Journal ref: IEEE Transactions on Information Theory ( Volume: 68, Issue: 9, September 2022)

  8. Analog-to-Digital Compression: A New Paradigm for Converting Signals to Bits

    Authors: Alon Kipnis, Yonina C. Eldar, Andrea J. Goldsmith

    Abstract: Processing, storing and communicating information that originates as an analog signal involves conversion of this information to bits. This conversion can be described by the combined effect of sampling and quantization, as illustrated in Fig. 1. The digital representation is achieved by first sampling the analog signal so as to represent it by a set of discrete-time samples and then quantizing th… ▽ More

    Submitted 20 January, 2018; originally announced January 2018.

    Comments: to appear in "Signal Processing Magazine"

  9. arXiv:1707.00420  [pdf, other

    cs.IT

    Compress-and-Estimate Source Coding for a Vector Gaussian Source

    Authors: Ruiyang Song, Stefano Rini, Alon Kipnis, Andrea Goldsmith

    Abstract: We consider the remote vector source coding problem in which a vector Gaussian source is to be estimated from noisy linear measurements. For this problem, we derive the performance of the compress-and-estimate (CE) coding scheme and compare it to the optimal performance. In the CE coding scheme, the remote encoder compresses the noisy source observations so as to minimize the local distortion meas… ▽ More

    Submitted 3 July, 2017; originally announced July 2017.

  10. arXiv:1608.04679  [pdf, ps, other

    cs.IT

    The Distortion-Rate Function of Sampled Wiener Processes

    Authors: Alon Kipnis, Andrea J. Goldsmith, Yonina C. Eldar

    Abstract: We consider the recovery of a continuous-time Wiener process from a quantized or lossy compressed version of its uniform samples under limited bitrate and sampling rate. We derive a closed form expression for the optimal tradeoff among sampling rate, bitrate, and quadratic distortion in this setting. This expression is given in terms of a reverse waterfilling formula over the asymptotic spectral d… ▽ More

    Submitted 26 July, 2018; v1 submitted 16 August, 2016; originally announced August 2016.

    Comments: Under review. An extended version of a work presented in ISIT 2016 under the title "Information rates of sampled Wiener processes"

  11. arXiv:1605.03755  [pdf, ps, other

    cs.IT

    Optimal Rate Allocation in Mismatched Multiterminal Source Coding

    Authors: Ruiyang Song, Stefano Rini, Alon Kipnis, Andrea J. Goldsmith

    Abstract: We consider a multiterminal source coding problem in which a source is estimated at a central processing unit from lossy-compressed remote observations. Each lossy-encoded observation is produced by a remote sensor which obtains a noisy version of the source and compresses this observation minimizing a local distortion measure which depends only on the marginal distribution of its observation. The… ▽ More

    Submitted 12 May, 2016; originally announced May 2016.

  12. arXiv:1602.02201  [pdf, ps, other

    cs.IT

    The Rate-Distortion Risk in Estimation from Compressed Data

    Authors: Alon Kipnis, Stefano Rini, Andrea J. Goldsmith

    Abstract: Consider the problem of estimating a latent signal from a lossy compressed version of the data when the compressor is agnostic to the relation between the signal and the data. This situation arises in a host of modern applications when data is transmitted or stored prior to determining the downstream inference task. Given a bitrate constraint and a distortion measure between the data and its compr… ▽ More

    Submitted 10 January, 2021; v1 submitted 5 February, 2016; originally announced February 2016.

    Comments: Second revision. IEEE Transactions on Information Theory

  13. arXiv:1601.06421  [pdf, ps, other

    cs.IT

    Fundamental Distortion Limits of Analog-to-Digital Compression

    Authors: Alon Kipnis, Yonina C. Eldar, Andrea J. Goldsmith

    Abstract: Representing a continuous-time signal by a set of samples is a classical problem in signal processing. We study this problem under the additional constraint that the samples are quantized or compressed in a lossy manner under a limited bitrate budget. To this end, we consider a combined sampling and source coding problem in which an analog stationary Gaussian signal is reconstructed from its encod… ▽ More

    Submitted 10 April, 2018; v1 submitted 24 January, 2016; originally announced January 2016.

    Comments: 20 pages, 14 figures

  14. arXiv:1505.05586  [pdf, ps, other

    cs.IT

    The Distortion Rate Function of Cyclostationary Gaussian Processes

    Authors: Alon Kipnis, Andrea J. Goldsmith, Yonina C. Eldar

    Abstract: A general expression for the distortion rate function (DRF) of cyclostationary Gaussian processes in terms of their spectral properties is derived. This expression can be seen as the result of orthogonalization over the different components in the polyphase decomposition of the process. We use this expression to derive, in a closed form, the DRF of several cyclostationary processes arising in prac… ▽ More

    Submitted 10 August, 2016; v1 submitted 20 May, 2015; originally announced May 2015.

    Comments: First revision for the IEEE Transactions on Information Theory

  15. arXiv:1505.04875  [pdf, ps, other

    cs.IT stat.CO

    Indirect Rate-Distortion Function of a Binary i.i.d Source

    Authors: Alon Kipnis, Stefano Rini, Andrea J. Goldsmith

    Abstract: The indirect source-coding problem in which a Bernoulli process is compressed in a lossy manner from its noisy observations is considered. These noisy observations are obtained by passing the source sequence through a The indirect source-coding problem in which a Bernoulli process is compressed in a lossy manner from its noisy observations is considered. These noisy observations are obtained by pa… ▽ More

    Submitted 3 June, 2015; v1 submitted 19 May, 2015; originally announced May 2015.

  16. Distortion-Rate Function of Sub-Nyquist Sampled Gaussian Sources

    Authors: Alon Kipnis, Andrea J. Goldsmith, Yonina C. Eldar, Tsachy Weissman

    Abstract: The amount of information lost in sub-Nyquist sampling of a continuous-time Gaussian stationary process is quantified. We consider a combined source coding and sub-Nyquist reconstruction problem in which the input to the encoder is a noisy sub-Nyquist sampled version of the analog source. We first derive an expression for the mean squared error in the reconstruction of the process from a noisy and… ▽ More

    Submitted 6 November, 2015; v1 submitted 21 May, 2014; originally announced May 2014.

    Comments: Accepted for publication at the IEEE transactions on information theory

    Journal ref: Information Theory, IEEE Transactions on , vol.62, no.1, pp.401-429, Jan. 2016