Skip to main content

Showing 1–41 of 41 results for author: Simon, N

Searching in archive stat. Search in all archives.
.
  1. arXiv:2311.12726  [pdf, other

    stat.ME stat.AP

    Assessing variable importance in survival analysis using machine learning

    Authors: Charles J. Wolock, Peter B. Gilbert, Noah Simon, Marco Carone

    Abstract: Given a collection of features available for inclusion in a predictive model, it may be of interest to quantify the relative importance of a subset of features for the prediction task at hand. For example, in HIV vaccine trials, participant baseline characteristics are used to predict the probability of HIV acquisition over the intended follow-up period, and investigators may wish to understand ho… ▽ More

    Submitted 12 August, 2024; v1 submitted 21 November, 2023; originally announced November 2023.

    Comments: 98 total pages (37 main text, 61 supplementary)

  2. arXiv:2308.01470  [pdf, other

    math.ST stat.ME

    Improved convergence rates of nonparametric penalized regression under misspecified total variation

    Authors: Marlena S. Bannick, Noah Simon

    Abstract: Penalties that induce smoothness are common in nonparametric regression. In many settings, the amount of smoothness in the data generating function will not be known. Simon and Shojaie (2021) derived convergence rates for nonparametric estimators under misspecified smoothness. We show that their theoretical convergence rates can be improved by working with convenient approximating functions. Prope… ▽ More

    Submitted 2 August, 2023; originally announced August 2023.

  3. A framework for leveraging machine learning tools to estimate personalized survival curves

    Authors: Charles J. Wolock, Peter B. Gilbert, Noah Simon, Marco Carone

    Abstract: The conditional survival function of a time-to-event outcome subject to censoring and truncation is a common target of estimation in survival analysis. This parameter may be of scientific interest and also often appears as a nuisance in nonparametric and semiparametric problems. In addition to classical parametric and semiparametric methods (e.g., based on the Cox proportional hazards model), flex… ▽ More

    Submitted 31 October, 2023; v1 submitted 6 November, 2022; originally announced November 2022.

    Comments: 52 pages, 13 figures

    Journal ref: Journal of Computational and Graphical Statistics 33(3) 1098-1108 (2024)

  4. arXiv:2206.12393  [pdf, other

    stat.ME

    Accounting for Inconsistent Use of Covariate Adjustment in Group Sequential Trials

    Authors: Marlena S. Bannick, Sonya L. Heltshe, Noah Simon

    Abstract: Group sequential designs in clinical trials allow for interim efficacy and futility monitoring. Adjustment for baseline covariates can increase power and precision of estimated effects. However, inconsistently applying covariate adjustment throughout the stages of a group sequential trial can result in inflation of type I error, biased point estimates, and anti-conservative confidence intervals. W… ▽ More

    Submitted 9 August, 2023; v1 submitted 24 June, 2022; originally announced June 2022.

  5. arXiv:2206.02994  [pdf, other

    stat.ME math.ST

    Regression in Tensor Product Spaces by the Method of Sieves

    Authors: Tianyu Zhang, Noah Simon

    Abstract: Estimation of a conditional mean (linking a set of features to an outcome of interest) is a fundamental statistical task. While there is an appeal to flexible nonparametric procedures, effective estimation in many classical nonparametric function spaces (e.g., multivariate Sobolev spaces) can be prohibitively difficult -- both statistically and computationally -- especially when the number of feat… ▽ More

    Submitted 7 June, 2022; originally announced June 2022.

  6. arXiv:2112.03428  [pdf, other

    stat.ME stat.CO stat.ML

    Mesh-Based Solutions for Nonparametric Penalized Regression

    Authors: Brayan Ortiz, Noah Simon

    Abstract: It is often of interest to estimate regression functions non-parametrically. Penalized regression (PR) is one statistically-effective, well-studied solution to this problem. Unfortunately, in many cases, finding exact solutions to PR problems is computationally intractable. In this manuscript, we propose a mesh-based approximate solution (MBS) for those scenarios. MBS transforms the complicated fu… ▽ More

    Submitted 6 December, 2021; originally announced December 2021.

    Comments: 29 pages, 4 figures

    MSC Class: 62G08; 62J07 (Primary); 62G20 (Secondary) ACM Class: G.3

  7. arXiv:2107.08787  [pdf

    stat.AP cs.LG

    The Future will be Different than Today: Model Evaluation Considerations when Developing Translational Clinical Biomarker

    Authors: Yichen Lu, Jane Fridlyand, Tiffany Tang, Ting Qi, Noah Simon, Ning Leng

    Abstract: Finding translational biomarkers stands center stage of the future of personalized medicine in healthcare. We observed notable challenges in identifying robust biomarkers as some with great performance in one scenario often fail to perform well in new trials (e.g. different population, indications). With rapid development in the clinical trial world (e.g. assay, disease definition), new trials ver… ▽ More

    Submitted 13 July, 2021; originally announced July 2021.

    Comments: Paper has 4 pages, 2 figures. Appendix are supplementary at the end

  8. arXiv:2105.01874  [pdf, other

    math.ST stat.ME stat.ML

    On the Optimality of Nuclear-norm-based Matrix Completion for Problems with Smooth Non-linear Structure

    Authors: Yunhua Xiang, Tianyu Zhang, Xu Wang, Ali Shojaie, Noah Simon

    Abstract: Originally developed for imputing missing entries in low rank, or approximately low rank matrices, matrix completion has proven widely effective in many problems where there is no reason to assume low-dimensional linear structure in the underlying matrix, as would be imposed by rank constraints. In this manuscript, we build some theoretical intuition for this behavior. We consider matrices which a… ▽ More

    Submitted 5 May, 2021; originally announced May 2021.

    Comments: 47 pages, 1 figure

  9. arXiv:2104.00846  [pdf, other

    math.ST stat.ME

    A Sieve Stochastic Gradient Descent Estimator for Online Nonparametric Regression in Sobolev ellipsoids

    Authors: Tianyu Zhang, Noah Simon

    Abstract: The goal of regression is to recover an unknown underlying function that best links a set of predictors to an outcome from noisy observations. In nonparametric regression, one assumes that the regression function belongs to a pre-specified infinite-dimensional function space (the hypothesis space). In the online setting, when the observations come in a stream, it is computationally-preferable to i… ▽ More

    Submitted 6 January, 2022; v1 submitted 1 April, 2021; originally announced April 2021.

  10. arXiv:2104.00780  [pdf, other

    stat.ME

    An Online Projection Estimator for Nonparametric Regression in Reproducing Kernel Hilbert Spaces

    Authors: Tianyu Zhang, Noah Simon

    Abstract: The goal of nonparametric regression is to recover an underlying regression function from noisy observations, under the assumption that the regression function belongs to a pre-specified infinite dimensional function space. In the online setting, when the observations come in a stream, it is generally computationally infeasible to refit the whole model repeatedly. There are as of yet no methods th… ▽ More

    Submitted 1 April, 2021; originally announced April 2021.

  11. arXiv:2010.00718  [pdf, other

    stat.ML cs.LG stat.CO

    When to Impute? Imputation before and during cross-validation

    Authors: Byron C. Jaeger, Nicholas J. Tierney, Noah R. Simon

    Abstract: Cross-validation (CV) is a technique used to estimate generalization error for prediction models. For pipeline modeling algorithms (i.e. modeling procedures with multiple steps), it has been recommended the entire sequence of steps be carried out during each replicate of CV to mimic the application of the entire pipeline to an external testing set. While theoretically sound, following this recomme… ▽ More

    Submitted 1 October, 2020; originally announced October 2020.

    Comments: 11 pages (main text, not including references), 6 tables, and 4 figures. Code to replicate manuscript available at https://github.com/bcjaeger/Imputation-and-CV

  12. arXiv:2005.04834  [pdf, other

    stat.ML cs.LG stat.ME

    Ensembled sparse-input hierarchical networks for high-dimensional datasets

    Authors: Jean Feng, Noah Simon

    Abstract: Neural networks have seen limited use in prediction for high-dimensional data with small sample sizes, because they tend to overfit and require tuning many more hyperparameters than existing off-the-shelf machine learning methods. With small modifications to the network architecture and training procedure, we show that dense neural networks can be a practical data analysis tool in these settings.… ▽ More

    Submitted 10 May, 2020; originally announced May 2020.

  13. Spatial Matrix Completion for Spatially-Misaligned and High-Dimensional Air Pollution Data

    Authors: Phuong T. Vu, Adam A. Szpiro, Noah Simon

    Abstract: In health-pollution cohort studies, accurate predictions of pollutant concentrations at new locations are needed, since the locations of fixed monitoring sites and study participants are often spatially misaligned. For multi-pollution data, principal component analysis (PCA) is often incorporated to obtain low-rank (LR) structure of the data prior to spatial prediction. Recently developed predicti… ▽ More

    Submitted 21 January, 2022; v1 submitted 11 April, 2020; originally announced April 2020.

    Comments: 26 pages, 5 figures, 5 tables, 1 supplemental file (available upon request). This v2 is a pre peer-reviewed version that was submitted to Environmetrics. A final version with minor revisions was accepted for publication by Environmetrics on Dec 13, 2021, and will be linked to this version once published

  14. arXiv:2004.03683  [pdf, other

    stat.ME math.ST stat.ML

    A general framework for inference on algorithm-agnostic variable importance

    Authors: Brian D. Williamson, Peter B. Gilbert, Noah R. Simon, Marco Carone

    Abstract: In many applications, it is of interest to assess the relative contribution of features (or subsets of features) toward the goal of predicting a response -- in other words, to gauge the variable importance of features. Most recent work on variable importance assessment has focused on describing the importance of features within the confines of a given prediction algorithm. However, such assessment… ▽ More

    Submitted 13 September, 2021; v1 submitted 7 April, 2020; originally announced April 2020.

    Comments: 69 total pages (35 in the main document, 34 supplementary), 23 figures (4 in the main document, 19 supplementary)

  15. arXiv:2003.00401  [pdf, other

    stat.AP

    A flexible Bayesian framework to estimate age- and cause-specific child mortality over time from sample registration data

    Authors: Austin E Schumacher, Tyler H McCormick, Jon Wakefield, Yue Chu, Jamie Perin, Francisco Villavicencio, Noah Simon, Li Liu

    Abstract: In order to implement disease-specific interventions in young age groups, policy makers in low- and middle-income countries require timely and accurate estimates of age- and cause-specific child mortality. High quality data is not available in settings where these interventions are most needed, but there is a push to create sample registration systems that collect detailed mortality information. C… ▽ More

    Submitted 18 May, 2021; v1 submitted 29 February, 2020; originally announced March 2020.

    Comments: 16 pages, 4 figures, submitted to The Annals of Applied Statistics

    MSC Class: 62P99

  16. arXiv:1912.12413  [pdf, other

    stat.ML cs.LG

    Approval policies for modifications to Machine Learning-Based Software as a Medical Device: A study of bio-creep

    Authors: Jean Feng, Scott Emerson, Noah Simon

    Abstract: Successful deployment of machine learning algorithms in healthcare requires careful assessments of their performance and safety. To date, the FDA approves locked algorithms prior to marketing and requires future updates to undergo separate premarket reviews. However, this negates a key feature of machine learning--the ability to learn from a growing dataset and improve over time. This paper frames… ▽ More

    Submitted 28 December, 2019; originally announced December 2019.

  17. arXiv:1906.05473  [pdf, other

    stat.ML cs.LG

    Selective prediction-set models with coverage guarantees

    Authors: Jean Feng, Arjun Sondhi, Jessica Perry, Noah Simon

    Abstract: Though black-box predictors are state-of-the-art for many complex tasks, they often fail to properly quantify predictive uncertainty and may provide inappropriate predictions for unfamiliar data. Instead, we can learn more reliable models by letting them either output a prediction set or abstain when the uncertainty is high. We propose training these selective prediction-set models using an uncert… ▽ More

    Submitted 10 December, 2021; v1 submitted 13 June, 2019; originally announced June 2019.

    Comments: Published at Biometrics

  18. arXiv:1905.12768  [pdf, other

    stat.ME

    Using Propensity Scores to Develop and Evaluate Treatment Rules with Observational Data

    Authors: Jeremy Roth, Noah Simon

    Abstract: In this paper, we outline a principled approach to estimate an individualized treatment rule that is appropriate for data from observational studies where, in addition to treatment assignment not being independent of individual characteristics, some characteristics may affect treatment assignment in the current study but not be available in future clinical settings where the estimated rule would b… ▽ More

    Submitted 3 June, 2019; v1 submitted 29 May, 2019; originally announced May 2019.

  19. arXiv:1904.00117  [pdf, other

    q-bio.QM stat.AP

    Estimation of cell lineage trees by maximum-likelihood phylogenetics

    Authors: Jean Feng, William S DeWitt III, Aaron McKenna, Noah Simon, Amy Willis, Frederick A Matsen IV

    Abstract: CRISPR technology has enabled large-scale cell lineage tracing for complex multicellular organisms by mutating synthetic genomic barcodes during organismal development. However, these sophisticated biological tools currently use ad-hoc and outmoded computational methods to reconstruct the cell lineage tree from the mutated barcodes. Because these methods are agnostic to the biological mechanism, t… ▽ More

    Submitted 29 March, 2019; originally announced April 2019.

  20. An analysis of the cost of hyper-parameter selection via split-sample validation, with applications to penalized regression

    Authors: Jean Feng, Noah Simon

    Abstract: In the regression setting, given a set of hyper-parameters, a model-estimation procedure constructs a model from training data. The optimal hyper-parameters that minimize generalization error of the model are usually unknown. In practice they are often estimated using split-sample validation. Up to now, there is an open question regarding how the generalization error of the selected model grows wi… ▽ More

    Submitted 28 March, 2019; originally announced March 2019.

  21. arXiv:1903.04641  [pdf, other

    stat.ME math.ST stat.ML

    Generalized Sparse Additive Models

    Authors: Asad Haris, Noah Simon, Ali Shojaie

    Abstract: We present a unified framework for estimation and analysis of generalized additive models in high dimensions. The framework defines a large class of penalized regression estimators, encompassing many existing methods. An efficient computational algorithm for this class is presented that easily scales to thousands of observations and features. We prove minimax optimal convergence bounds for this cl… ▽ More

    Submitted 11 March, 2019; originally announced March 2019.

  22. arXiv:1903.04631  [pdf, other

    stat.ML cs.LG

    Wavelet regression and additive models for irregularly spaced data

    Authors: Asad Haris, Noah Simon, Ali Shojaie

    Abstract: We present a novel approach for nonparametric regression using wavelet basis functions. Our proposal, $\texttt{waveMesh}$, can be applied to non-equispaced data with sample size not necessarily a power of 2. We develop an efficient proximal gradient descent algorithm for computing the estimator and establish adaptive minimax convergence rates. The main appeal of our approach is that it naturally e… ▽ More

    Submitted 11 March, 2019; originally announced March 2019.

    Journal ref: Advances in Neural Information Processing Systems 2018, 8987-8997

  23. arXiv:1711.07592  [pdf, other

    stat.ME stat.ML

    Sparse-Input Neural Networks for High-dimensional Nonparametric Regression and Classification

    Authors: Jean Feng, Noah Simon

    Abstract: Neural networks are usually not the tool of choice for nonparametric high-dimensional problems where the number of input features is much larger than the number of observations. Though neural networks can approximate complex multivariate functions, they generally require a large number of training observations to obtain reasonable fits, unless one can learn the appropriate network structure. In th… ▽ More

    Submitted 21 June, 2019; v1 submitted 20 November, 2017; originally announced November 2017.

  24. arXiv:1711.04057  [pdf, other

    q-bio.PE stat.AP

    Survival analysis of DNA mutation motifs with penalized proportional hazards

    Authors: Jean Feng, David A. Shaw, Vladimir N. Minin, Noah Simon, Frederick A. Matsen IV

    Abstract: Antibodies, an essential part of our immune system, develop through an intricate process to bind a wide array of pathogens. This process involves randomly mutating DNA sequences encoding these antibodies to find variants with improved binding, though mutations are not distributed uniformly across sequence sites. Immunologists observe this nonuniformity to be consistent with "mutation motifs", whic… ▽ More

    Submitted 21 September, 2018; v1 submitted 10 November, 2017; originally announced November 2017.

  25. arXiv:1703.09813  [pdf, other

    stat.ML

    Gradient-based Regularization Parameter Selection for Problems with Non-smooth Penalty Functions

    Authors: Jean Feng, Noah Simon

    Abstract: In high-dimensional and/or non-parametric regression problems, regularization (or penalization) is used to control model complexity and induce desired structure. Each penalty has a weight parameter that indicates how strongly the structure corresponding to that penalty should be enforced. Typically the parameters are chosen to minimize the error on a separate validation set using a simple grid sea… ▽ More

    Submitted 28 March, 2017; originally announced March 2017.

  26. arXiv:1703.06946  [pdf, other

    stat.AP q-bio.NC

    SCALPEL: Extracting Neurons from Calcium Imaging Data

    Authors: Ashley Petersen, Noah Simon, Daniela Witten

    Abstract: In the past few years, new technologies in the field of neuroscience have made it possible to simultaneously image activity in large populations of neurons at cellular resolution in behaving animals. In mid-2016, a huge repository of this so-called "calcium imaging" data was made publicly-available. The availability of this large-scale data resource opens the door to a host of scientific questions… ▽ More

    Submitted 20 March, 2017; originally announced March 2017.

  27. arXiv:1702.06986  [pdf, other

    stat.ME

    Rank conditional coverage and confidence intervals in high dimensional problems

    Authors: Jean Morrison, Noah Simon

    Abstract: Confidence interval procedures used in low dimensional settings are often inappropriate for high dimensional applications. When a large number of parameters are estimated, marginal confidence intervals associated with the most significant estimates have very low coverage rates: They are too small and centered at biased estimates. The problem of forming confidence intervals in high dimensional sett… ▽ More

    Submitted 22 February, 2017; originally announced February 2017.

  28. arXiv:1611.09972  [pdf, ps, other

    stat.ME math.ST stat.ML

    Nonparametric Regression with Adaptive Truncation via a Convex Hierarchical Penalty

    Authors: Asad Haris, Ali Shojaie, Noah Simon

    Abstract: We consider the problem of non-parametric regression with a potentially large number of covariates. We propose a convex, penalized estimation framework that is particularly well-suited for high-dimensional sparse additive models. The proposed approach combines appealing features of finite basis representation and smoothing penalties for non-parametric estimation. In particular, in the case of addi… ▽ More

    Submitted 18 June, 2019; v1 submitted 29 November, 2016; originally announced November 2016.

    Journal ref: Biometrika 2018, Vol. 106, No. 1, 87-107

  29. Simultaneous detection and estimation of trait associations with genomic phenotypes

    Authors: Jean Morrison, Noah Simon, Daniela Witten

    Abstract: Genomic phenotypes, such as DNA methylation and chromatin accessibility, can be used to characterize the transcriptional and regulatory activity of DNA within a cell. Recent technological advances have made it possible to measure such phenotypes very densely. This density often results in spatial structure, in the sense that measurements at nearby sites are very similar. In this paper, we consid… ▽ More

    Submitted 14 November, 2016; originally announced November 2016.

    Comments: In press in Biostatistics (2016)

  30. Graphical Models for Zero-Inflated Single Cell Gene Expression

    Authors: Andrew McDavid, Raphael Gottardo, Noah Simon, Mathias Drton

    Abstract: Bulk gene expression experiments relied on aggregations of thousands of cells to measure the average expression in an organism. Advances in microfluidic and droplet sequencing now permit expression profiling in single cells. This study of cell-to-cell variation reveals that individual cells lack detectable expression of transcripts that appear abundant on a population level, giving rise to zero-in… ▽ More

    Submitted 14 March, 2018; v1 submitted 18 October, 2016; originally announced October 2016.

    Comments: Fixed error in software URL

    Journal ref: Ann. Appl. Stat., Volume 13, Number 2 (2019), 848-873

  31. arXiv:1609.05551  [pdf, other

    math.ST stat.OT

    Graphical Models for Discrete and Continuous Data

    Authors: Rui Zhuang, Noah Simon, Johannes Lederer

    Abstract: We introduce a general framework for undirected graphical models. It generalizes Gaussian graphical models to a wide range of continuous, discrete, and combinations of different types of data. The models in the framework, called exponential trace models, are amenable to estimation based on maximum likelihood. We introduce a sampling-based approximation algorithm for computing the maximum likelihoo… ▽ More

    Submitted 15 June, 2019; v1 submitted 18 September, 2016; originally announced September 2016.

  32. Convex Modeling of Interactions with Strong Heredity

    Authors: Asad Haris, Daniela Witten, Noah Simon

    Abstract: We consider the task of fitting a regression model involving interactions among a potentially large set of covariates, in which we wish to enforce strong heredity. We propose FAMILY, a very general framework for this task. Our proposal is a generalization of several existing methods, such as VANISH [Radchenko and James, 2010], hierNet [Bien et al., 2013], the all-pairs lasso, and the lasso using o… ▽ More

    Submitted 3 October, 2015; v1 submitted 13 October, 2014; originally announced October 2014.

    Comments: Final version accepted for publication in JCGS

    Journal ref: Journal of Computational and Graphical Statistics 2016, Vol. 25, No. 4, 981-1004

  33. arXiv:1409.5391  [pdf, other

    stat.ME stat.ML

    Fused Lasso Additive Model

    Authors: Ashley Petersen, Daniela Witten, Noah Simon

    Abstract: We consider the problem of predicting an outcome variable using $p$ covariates that are measured on $n$ independent observations, in the setting in which flexible and interpretable fits are desirable. We propose the fused lasso additive model (FLAM), in which each additive function is estimated to be piecewise constant with a small number of adaptively-chosen knots. FLAM is the solution to a conve… ▽ More

    Submitted 18 September, 2014; originally announced September 2014.

  34. arXiv:1405.4251  [pdf, other

    stat.ME stat.AP stat.ML

    Selection Bias Correction and Effect Size Estimation under Dependence

    Authors: Kean Ming Tan, Noah Simon, Daniela Witten

    Abstract: We consider large-scale studies in which it is of interest to test a very large number of hypotheses, and then to estimate the effect sizes corresponding to the rejected hypotheses. For instance, this setting arises in the analysis of gene expression or DNA sequencing data. However, naive estimates of the effect sizes suffer from selection bias, i.e., some of the largest naive estimates are large… ▽ More

    Submitted 28 March, 2015; v1 submitted 16 May, 2014; originally announced May 2014.

    Comments: 21 pages, 2 figures

  35. arXiv:1401.7645  [pdf, other

    stat.ME

    Comment on "Detecting Novel Associations In Large Data Sets" by Reshef Et Al, Science Dec 16, 2011

    Authors: Noah Simon, Robert Tibshirani

    Abstract: The proposal of Reshef et al. (2011) is an interesting new approach for discovering non-linear dependencies among pairs of measurements in exploratory data mining. However, it has a potentially serious drawback. The authors laud the fact that MIC has no preference for some alternatives over others, but as the authors know, there is no free lunch in Statistics: tests which strive to have high power… ▽ More

    Submitted 29 January, 2014; originally announced January 2014.

    Comments: 3 pages, 1 figure

  36. arXiv:1311.6529  [pdf, ps, other

    stat.CO stat.ML

    A Blockwise Descent Algorithm for Group-penalized Multiresponse and Multinomial Regression

    Authors: Noah Simon, Jerome Friedman, Trevor Hastie

    Abstract: In this paper we purpose a blockwise descent algorithm for group-penalized multiresponse regression. Using a quasi-newton framework we extend this to group-penalized multinomial regression. We give a publicly available implementation for these in R, and compare the speed of this algorithm to a competing algorithm --- we show that our implementation is an order of magnitude faster than its competit… ▽ More

    Submitted 25 November, 2013; originally announced November 2013.

  37. arXiv:1311.3709  [pdf, other

    stat.ML stat.ME

    On Estimating Many Means, Selection Bias, and the Bootstrap

    Authors: Noah Simon, Richard Simon

    Abstract: With recent advances in high throughput technology, researchers often find themselves running a large number of hypothesis tests (thousands+) and esti- mating a large number of effect-sizes. Generally there is particular interest in those effects estimated to be most extreme. Unfortunately naive estimates of these effect-sizes (even after potentially accounting for multiplicity in a testing proced… ▽ More

    Submitted 14 November, 2013; originally announced November 2013.

  38. Convex hierarchical testing of interactions

    Authors: Jacob Bien, Noah Simon, Robert Tibshirani

    Abstract: We consider the testing of all pairwise interactions in a two-class problem with many features. We devise a hierarchical testing framework that considers an interaction only when one or more of its constituent features has a nonzero main effect. The test is based on a convex optimization framework that seamlessly considers main effects and interactions together. We show - both in simulation and on… ▽ More

    Submitted 2 June, 2015; v1 submitted 6 November, 2012; originally announced November 2012.

    Comments: Published at http://dx.doi.org/10.1214/14-AOAS758 in the Annals of Applied Statistics (http://www.imstat.org/aoas/) by the Institute of Mathematical Statistics (http://www.imstat.org)

    Report number: IMS-AOAS-AOAS758

    Journal ref: Annals of Applied Statistics 2015, Vol. 9, No. 1, 27-42

  39. arXiv:1206.6519  [pdf, other

    stat.ML stat.CO stat.ME

    A Permutation Approach to Testing Interactions in Many Dimensions

    Authors: Noah Simon, Robert Tibshirani

    Abstract: To date, testing interactions in high dimensions has been a challenging task. Existing methods often have issues with sensitivity to modeling assumptions and heavily asymptotic nominal p-values. To help alleviate these issues, we propose a permutation-based method for testing marginal interactions with a binary response. Our method searches for pairwise correlations which differ between classes. I… ▽ More

    Submitted 27 June, 2012; originally announced June 2012.

  40. arXiv:1111.1687  [pdf, other

    stat.ML stat.CO stat.ME

    Discriminant Analysis with Adaptively Pooled Covariance

    Authors: Noah Simon, Rob Tibshirani

    Abstract: Linear and Quadratic Discriminant analysis (LDA/QDA) are common tools for classification problems. For these methods we assume observations are normally distributed within group. We estimate a mean and covariance matrix for each group and classify using Bayes theorem. With LDA, we estimate a single, pooled covariance matrix, while for QDA we estimate a separate covariance matrix for each group. Ra… ▽ More

    Submitted 6 December, 2011; v1 submitted 7 November, 2011; originally announced November 2011.

  41. arXiv:1011.2234  [pdf, ps, other

    math.ST stat.ML

    Strong rules for discarding predictors in lasso-type problems

    Authors: Robert Tibshirani, Jacob Bien, Jerome Friedman, Trevor Hastie, Noah Simon, Jonathan Taylor, Ryan J. Tibshirani

    Abstract: We consider rules for discarding predictors in lasso regression and related problems, for computational efficiency. El Ghaoui et al (2010) propose "SAFE" rules that guarantee that a coefficient will be zero in the solution, based on the inner products of each predictor with the outcome. In this paper we propose strong rules that are not foolproof but rarely fail in practice. These can be complemen… ▽ More

    Submitted 24 November, 2010; v1 submitted 9 November, 2010; originally announced November 2010.

    Comments: 5

    MSC Class: 62J07 62G08