Skip to main content

Showing 1–50 of 229 results for author: Chen, C

Searching in archive stat. Search in all archives.
.
  1. arXiv:2407.01770  [pdf, other

    stat.ME

    Exploring causal effects of hormone- and radio-treatments in an observational study of breast cancer using copula-based semi-competing risks models

    Authors: Tonghui Yu, Mengjiao Peng, Yifan Cui, Elynn Chen, Chixiang Chen

    Abstract: Breast cancer patients may experience relapse or death after surgery during the follow-up period, leading to dependent censoring of relapse. This phenomenon, known as semi-competing risk, imposes challenges in analyzing treatment effects on breast cancer and necessitates advanced statistical tools for unbiased analysis. Despite progress in estimation and inference within semi-competing risks regre… ▽ More

    Submitted 1 July, 2024; originally announced July 2024.

    Comments: Contact: [email protected]

  2. arXiv:2407.00561  [pdf, ps, other

    stat.ME stat.AP

    Advancing Information Integration through Empirical Likelihood: Selective Reviews and a New Idea

    Authors: Chixiang Chen, Jia Liang, Elynn Chen, Ming Wang

    Abstract: Information integration plays a pivotal role in biomedical studies by facilitating the combination and analysis of independent datasets from multiple studies, thereby uncovering valuable insights that might otherwise remain obscured due to the limited sample size in individual studies. However, sharing raw data from independent studies presents significant challenges, primarily due to the need to… ▽ More

    Submitted 29 June, 2024; originally announced July 2024.

  3. arXiv:2406.10778  [pdf, other

    cs.CE stat.AP

    Heterogeneous Entity Representation for Medicinal Synergy Prediction

    Authors: Jiawei Wu, Jun Wen, Mingyuan Yan, Anqi Dong, Can Chen

    Abstract: Medicinal synergy prediction is a powerful tool in drug discovery and development that harnesses the principles of combination therapy to enhance therapeutic outcomes by improving efficacy, reducing toxicity, and preventing drug resistance. While a myriad of computational methods has emerged for predicting synergistic drug combinations, a large portion of them may overlook the intricate, yet criti… ▽ More

    Submitted 15 June, 2024; originally announced June 2024.

    Comments: 8 pages, 3 figures

    MSC Class: 92C50; 05C65; 68T07

  4. arXiv:2404.10884  [pdf, other

    stat.ME

    Modeling Interconnected Modules in Multivariate Outcomes: Evaluating the Impact of Alcohol Intake on Plasma Metabolomics

    Authors: Yifan Yang, Chixiang Chen, Hwiyoung Lee, Ming Wang, Shuo Chen

    Abstract: Alcohol consumption has been shown to influence cardiovascular mechanisms in humans, leading to observable alterations in the plasma metabolomic profile. Regression models are commonly employed to investigate these effects, treating metabolomics features as the outcomes and alcohol intake as the exposure. Given the latent dependence structure among the numerous metabolomic features (e.g., co-expre… ▽ More

    Submitted 16 April, 2024; originally announced April 2024.

    Comments: 25 pages, 5 figures

  5. arXiv:2403.15291  [pdf, other

    stat.AP physics.soc-ph q-bio.PE

    Wastewater-based Epidemiology for COVID-19 Surveillance: A Survey

    Authors: Chen Chen, Gursharn Kaur, Aniruddha Adiga, Baltazar Espinoza, Srinivasan Venkatramanan, Andrew Warren, Bryan Lewis, Justin Crow, Rekha Singh, Alexandra Lorentz, Denise Toney, Madhav Marathe

    Abstract: The pandemic of COVID-19 has imposed tremendous pressure on public health systems and social economic ecosystems over the past years. To alleviate its social impact, it is important to proactively track the prevalence of COVID-19 within communities. The traditional way to estimate the disease prevalence is to estimate from reported clinical test data or surveys. However, the coverage of clinical t… ▽ More

    Submitted 22 March, 2024; originally announced March 2024.

  6. arXiv:2403.15025  [pdf, other

    cs.LG stat.ML

    Robust Conformal Prediction under Distribution Shift via Physics-Informed Structural Causal Model

    Authors: Rui Xu, Yue Sun, Chao Chen, Parv Venkitasubramaniam, Sihong Xie

    Abstract: Uncertainty is critical to reliable decision-making with machine learning. Conformal prediction (CP) handles uncertainty by predicting a set on a test input, hoping the set to cover the true label with at least $(1-α)$ confidence. This coverage can be guaranteed on test data even if the marginal distributions $P_X$ differ between calibration and test datasets. However, as it is common in practice,… ▽ More

    Submitted 22 March, 2024; originally announced March 2024.

  7. arXiv:2402.12655  [pdf, other

    cs.SI stat.AP

    Ego Group Partition: A Novel Framework for Improving Ego Experiments in Social Networks

    Authors: Lu Deng, JingJing Zhang, Yong Wang, Chuan Chen

    Abstract: Estimating the average treatment effect in social networks is challenging due to individuals influencing each other. One approach to address interference is ego cluster experiments, where each cluster consists of a central individual (ego) and its peers (alters). Clusters are randomized, and only the effects on egos are measured. In this work, we propose an improved framework for ego cluster exper… ▽ More

    Submitted 19 February, 2024; originally announced February 2024.

  8. arXiv:2402.12653  [pdf, other

    cs.SI stat.AP

    Unbiased Estimation for Total Treatment Effect Under Interference Using Aggregated Dyadic Data

    Authors: Lu Deng, Yilin Li, JingJing Zhang, Yong Wang, Chuan Chen

    Abstract: In social media platforms, user behavior is often influenced by interactions with other users, complicating the accurate estimation of causal effects in traditional A/B experiments. This study investigates situations where an individual's outcome can be broken down into the sum of multiple pairwise outcomes, a reflection of user interactions. These outcomes, referred to as dyadic data, are prevale… ▽ More

    Submitted 19 February, 2024; originally announced February 2024.

  9. arXiv:2402.10062  [pdf, other

    cs.LG stat.ML

    Optimal Parameter and Neuron Pruning for Out-of-Distribution Detection

    Authors: Chao Chen, Zhihang Fu, Kai Liu, Ze Chen, Mingyuan Tao, Jieping Ye

    Abstract: For a machine learning model deployed in real world scenarios, the ability of detecting out-of-distribution (OOD) samples is indispensable and challenging. Most existing OOD detection methods focused on exploring advanced training skills or training-free tricks to prevent the model from yielding overconfident confidence score for unknown samples. The training-based methods require expensive traini… ▽ More

    Submitted 4 February, 2024; originally announced February 2024.

    Comments: Accepted by NeurIPS 2023. 19 pages

    Journal ref: NeurIPS 2023

  10. arXiv:2402.07134  [pdf, other

    q-fin.RM stat.AP

    Tail risk forecasting with semi-parametric regression models by incorporating overnight information

    Authors: Cathy W. S. Chen, Takaaki Koike, Wei-Hsuan Shau

    Abstract: This research incorporates realized volatility and overnight information into risk models, wherein the overnight return often contributes significantly to the total return volatility. Extending a semi-parametric regression model based on asymmetric Laplace distribution, we propose a family of RES-CAViaR-oc models by adding overnight return and realized measures as a nowcasting technique for simult… ▽ More

    Submitted 11 February, 2024; originally announced February 2024.

  11. arXiv:2402.01112  [pdf

    stat.ME stat.AP stat.OT

    Gerontologic Biostatistics 2.0: Developments over 10+ years in the age of data science

    Authors: Chixiang Chen, Michelle Shardell, Jaime Lynn Speiser, Karen Bandeen-Roche, Heather Allore, Thomas G Travison, Michael Griswold, Terrence E. Murphy

    Abstract: Background: Introduced in 2010, the sub-discipline of gerontologic biostatistics (GBS) was conceptualized to address the specific challenges in analyzing data from research studies involving older adults. However, the evolving technological landscape has catalyzed data science and statistical advancements since the original GBS publication, greatly expanding the scope of gerontologic research. The… ▽ More

    Submitted 1 February, 2024; originally announced February 2024.

    Comments: Corresponding Author: Michelle Shardell, PhD (Email: [email protected])

  12. arXiv:2311.17867  [pdf, other

    stat.ME

    A Class of Directed Acyclic Graphs with Mixed Data Types in Mediation Analysis

    Authors: Wei Hao, Canyi Chen, Peter X. -K. Song

    Abstract: We propose a unified class of generalized structural equation models (GSEMs) with data of mixed types in mediation analysis, including continuous, categorical, and count variables. Such models extend substantially the classical linear structural equation model to accommodate many data types arising from the application of mediation analysis. Invoking the hierarchical modeling approach, we specify… ▽ More

    Submitted 4 December, 2023; v1 submitted 29 November, 2023; originally announced November 2023.

    Comments: 33 pages, 3 figures, 3 tables

  13. arXiv:2310.18527  [pdf, other

    stat.ME stat.AP stat.CO

    Multiple Imputation Method for High-Dimensional Neuroimaging Data

    Authors: Tong Lu, Chixiang Chen, Hsin-Hsiung Huang, Peter Kochunov, Elliot Hong, Shuo Chen

    Abstract: Missingness is a common issue for neuroimaging data, and neglecting it in downstream statistical analysis can introduce bias and lead to misguided inferential conclusions. It is therefore crucial to conduct appropriate statistical methods to address this issue. While multiple imputation is a popular technique for handling missing data, its application to neuroimaging data is hindered by high dimen… ▽ More

    Submitted 27 October, 2023; originally announced October 2023.

    Comments: 13 pages, 5 figures

  14. arXiv:2308.08217  [pdf, other

    stat.AP

    Matching with multiple criteria and its application to health disparities research

    Authors: Chang Chen, Zhiyu Qian, Bo Zhang

    Abstract: Matching is a popular nonparametric covariate adjustment strategy in empirical health services research. Matching helps construct two groups comparable in many baseline covariates but different in some key aspects under investigation. In health disparities research, it is desirable to understand the contributions of various modifiable factors, like income and insurance type, to the observed dispar… ▽ More

    Submitted 16 August, 2023; originally announced August 2023.

  15. A Look into Causal Effects under Entangled Treatment in Graphs: Investigating the Impact of Contact on MRSA Infection

    Authors: Jing Ma, Chen Chen, Anil Vullikanti, Ritwick Mishra, Gregory Madden, Daniel Borrajo, Jundong Li

    Abstract: Methicillin-resistant Staphylococcus aureus (MRSA) is a type of bacteria resistant to certain antibiotics, making it difficult to prevent MRSA infections. Among decades of efforts to conquer infectious diseases caused by MRSA, many studies have been proposed to estimate the causal effects of close contact (treatment) on MRSA infection (outcome) from observational data. In this problem, the treatme… ▽ More

    Submitted 17 July, 2023; originally announced July 2023.

  16. arXiv:2305.19947  [pdf, other

    cs.CV cs.LG stat.ML

    A Geometric Perspective on Diffusion Models

    Authors: Defang Chen, Zhenyu Zhou, Jian-Ping Mei, Chunhua Shen, Chun Chen, Can Wang

    Abstract: Recent years have witnessed significant progress in developing effective training and fast sampling techniques for diffusion models. A remarkable advancement is the use of stochastic differential equations (SDEs) and their marginal-preserving ordinary differential equations (ODEs) to describe data perturbation and generative modeling in a unified framework. In this paper, we carefully inspect the… ▽ More

    Submitted 30 September, 2023; v1 submitted 31 May, 2023; originally announced May 2023.

    Comments: 38 pages

  17. arXiv:2303.03520  [pdf, other

    stat.ME

    The Effect of Alcohol Consumption on Brain Ageing: A New Causal Inference Framework for Incomplete and Massive Phenomic Data

    Authors: Chixiang Chen, Shuo Chen, Zhenyao Ye, Xu Shi, Tianzhou Ma

    Abstract: Although substance use, such as alcohol consumption, is known to be associated with cognitive decline during ageing, its direct influence on the central nervous system remains unclear. In this study, we aim to investigate the potential influence of alcohol intake frequency on accelerated brain ageing by estimating the mean potential brain-age gap (BAG) index, the difference between brain age and a… ▽ More

    Submitted 4 March, 2024; v1 submitted 6 March, 2023; originally announced March 2023.

    Comments: Contact: [email protected]

  18. arXiv:2303.03512  [pdf, other

    stat.ME

    An Efficient Data Integration Scheme for Synthesizing Information from Multiple Secondary Datasets for the Parameter Inference of the Main Analysis

    Authors: Chixiang Chen, Ming Wang, Shuo Chen

    Abstract: Many observational studies and clinical trials collect various secondary outcomes that may be highly correlated with the primary endpoint. These secondary outcomes are often analyzed in secondary analyses separately from the main data analysis. However, these secondary outcomes can be used to improve the estimation precision in the main analysis. We propose a method called Multiple Information Bor… ▽ More

    Submitted 6 March, 2023; originally announced March 2023.

    Comments: Contact Email: [email protected]

  19. Analyzing Risk Factors for Post-Acute Recovery in Older Adults with Alzheimer's Disease and Related Dementia: A New Semi-Parametric Model for Large-Scale Medicare Claims

    Authors: Biyi Shen, Haoyu Ren, Michelle Shardell, Jason Falvey, Chixiang Chen

    Abstract: Nearly 300,000 older adults experience a hip fracture every year, the majority of which occur following a fall. Unfortunately, recovery after fall-related trauma such as hip fracture is poor, where older adults diagnosed with Alzheimer's Disease and Related Dementia (ADRD) spend a particularly long time in hospitals or rehabilitation facilities during the post-operative recuperation period. Becaus… ▽ More

    Submitted 1 February, 2024; v1 submitted 6 March, 2023; originally announced March 2023.

    Comments: Published on Statistics in Medicine. Contact Emails: [email protected]

  20. arXiv:2303.03497  [pdf, other

    stat.ME

    Integrative data analysis where partial covariates have complex non-linear effects by using summary information from an external data

    Authors: Jia Liang, Shuo Chen, Peter Kochunov, L Elliot Hong, Chixiang Chen

    Abstract: A full parametric and linear specification may be insufficient to capture complicated patterns in studies exploring complex features, such as those investigating age-related changes in brain functional abilities. Alternatively, a partially linear model (PLM) consisting of both parametric and non-parametric elements may have a better fit. This model has been widely applied in economics, environment… ▽ More

    Submitted 5 February, 2024; v1 submitted 6 March, 2023; originally announced March 2023.

    Comments: Contact Email: chixiang.chen [at] som [dot] umaryland [dot]edu

  21. arXiv:2302.01861  [pdf, other

    stat.ME

    Covariance Matrix Estimation for High-Throughput Biomedical Data with Interconnected Communities

    Authors: Yifan Yang, Chixiang Chen, Shuo Chen

    Abstract: Estimating a covariance matrix is central to high-dimensional data analysis. Empirical analyses of high-dimensional biomedical data, including genomics, proteomics, microbiome, and neuroimaging, among others, consistently reveal strong modularity in the dependence patterns. In these analyses, intercorrelated high-dimensional biomedical features often form communities or modules that can be interco… ▽ More

    Submitted 15 November, 2023; v1 submitted 3 February, 2023; originally announced February 2023.

    Comments: 24 pages, 3 figures

  22. arXiv:2302.00239  [pdf, other

    cs.LG cs.CL stat.ML

    Filtering Context Mitigates Scarcity and Selection Bias in Political Ideology Prediction

    Authors: Chen Chen, Dylan Walker, Venkatesh Saligrama

    Abstract: We propose a novel supervised learning approach for political ideology prediction (PIP) that is capable of predicting out-of-distribution inputs. This problem is motivated by the fact that manual data-labeling is expensive, while self-reported labels are often scarce and exhibit significant selection bias. We propose a novel statistical model that decomposes the document embeddings into a linear s… ▽ More

    Submitted 31 January, 2023; originally announced February 2023.

  23. arXiv:2211.15849  [pdf, other

    stat.AP

    Association between author metadata and acceptance: A feature-rich, matched observational study of a corpus of ICLR submissions between 2017-2022

    Authors: Chang Chen, Jiayao Zhang, Dan Roth, Ting Ye, Bo Zhang

    Abstract: Many recent studies have probed status bias in the peer-review process of academic journals and conferences. In this article, we investigated the association between author metadata and area chairs' final decisions (Accept/Reject) using our compiled database of 5,313 borderline submissions to the International Conference on Learning Representations (ICLR) from 2017 to 2022. We carefully defined el… ▽ More

    Submitted 28 November, 2022; originally announced November 2022.

  24. arXiv:2211.15770  [pdf, other

    cs.LG math.OC stat.ML

    Accelerated Nonnegative Tensor Completion via Integer Programming

    Authors: Wenhao Pan, Anil Aswani, Chen Chen

    Abstract: The problem of tensor completion has applications in healthcare, computer vision, and other domains. However, past approaches to tensor completion have faced a tension in that they either have polynomial-time computation but require exponentially more samples than the information-theoretic rate, or they use fewer samples but require solving NP-hard problems for which there are no known practical a… ▽ More

    Submitted 4 February, 2023; v1 submitted 28 November, 2022; originally announced November 2022.

    Comments: 23 pages. Abstract accepted by Frontiers in Applied Mathematics and Statistics. Full manuscript submitted and under review

  25. arXiv:2210.05797  [pdf, other

    stat.AP

    Joint Modeling for Geometry and Functionality of Cerebral Cortical Surface Images

    Authors: Jingjing Zou, Chi-Hua Chen, John A. D. Aston

    Abstract: We propose a framework for jointly modeling the geometry and functionality in high dimensional functional surfaces. The proposed mixed effects model characterizes effects of subject-specific covariates and exogenous stimuli on functional surfaces while accounting for potential mutual-influence of their geometry and functionality. This is achieved through a computationally efficient estimation meth… ▽ More

    Submitted 9 February, 2023; v1 submitted 11 October, 2022; originally announced October 2022.

  26. arXiv:2207.05945  [pdf, other

    cs.LG cs.DS stat.ML

    Online Active Regression

    Authors: Cheng Chen, Yi Li, Yiming Sun

    Abstract: Active regression considers a linear regression problem where the learner receives a large number of data points but can only observe a small number of labels. Since online algorithms can deal with incremental training data and take advantage of low computational cost, we consider an online extension of the active regression problem: the learner receives data points one by one and immediately deci… ▽ More

    Submitted 30 August, 2022; v1 submitted 12 July, 2022; originally announced July 2022.

    Comments: A preliminary version appeared in the Proceedings of the 39th International Conference on Machine Learning (ICML 2022), PMLR 162, pp 3320--3335, 2022. v2: optimal dependence on $ε$ in query complexity

  27. arXiv:2206.04993  [pdf, other

    cs.LG cs.AI cs.GT stat.ML

    The Symmetric Generalized Eigenvalue Problem as a Nash Equilibrium

    Authors: Ian Gemp, Charlie Chen, Brian McWilliams

    Abstract: The symmetric generalized eigenvalue problem (SGEP) is a fundamental concept in numerical linear algebra. It captures the solution of many classical machine learning problems such as canonical correlation analysis, independent components analysis, partial least squares, linear discriminant analysis, principal components and others. Despite this, most general solvers are prohibitively expensive whe… ▽ More

    Submitted 25 April, 2023; v1 submitted 10 June, 2022; originally announced June 2022.

    Comments: Published in ICLR 2023 (JAX code available as part of github.com/deepmind/eigengame)

  28. arXiv:2206.03914  [pdf, other

    stat.AP

    Spatio-temporal Downscaling Emulator for Regional Climate Models: a Comparative Study

    Authors: Luis A. Barboza, Shu Wei Chou Chen, Marcela Alfaro Córdoba, Eric J. Alfaro, Hugo G. Hidalgo

    Abstract: Regional Climate Models (RCM) describe the meso scale global atmospheric and oceanic dynamics and serve as dynamical downscaling models. In other words, RCMs use atmospheric and oceanic climate output from General Circulation Models (GCM) to develop a higher resolution climate output. They are computationally demanding and, depending on the application, require several orders of magnitude of compu… ▽ More

    Submitted 13 March, 2023; v1 submitted 8 June, 2022; originally announced June 2022.

    MSC Class: 62P12

  29. arXiv:2206.02946  [pdf, other

    cs.LG cs.CG stat.ML

    On the Convergence of Optimizing Persistent-Homology-Based Losses

    Authors: Yikai Zhang, Jiachen Yao, Yusu Wang, Chao Chen

    Abstract: Topological loss based on persistent homology has shown promise in various applications. A topological loss enforces the model to achieve certain desired topological property. Despite its empirical success, less is known about the optimization behavior of the loss. In fact, the topological loss involves combinatorial configurations that may oscillate during optimization. In this paper, we introduc… ▽ More

    Submitted 11 June, 2022; v1 submitted 6 June, 2022; originally announced June 2022.

  30. arXiv:2205.12243  [pdf, other

    stat.ML cs.LG

    EBM Life Cycle: MCMC Strategies for Synthesis, Defense, and Density Modeling

    Authors: Mitch Hill, Jonathan Mitchell, Chu Chen, Yuan Du, Mubarak Shah, Song-Chun Zhu

    Abstract: This work presents strategies to learn an Energy-Based Model (EBM) according to the desired length of its MCMC sampling trajectories. MCMC trajectories of different lengths correspond to models with different purposes. Our experiments cover three different trajectory magnitudes and learning outcomes: 1) shortrun sampling for image generation; 2) midrun sampling for classifier-agnostic adversarial… ▽ More

    Submitted 24 May, 2022; originally announced May 2022.

  31. arXiv:2204.09125  [pdf

    stat.CO math.NA

    Mobility Analysis Workflow (MAW): An accessible, interoperable, and reproducible container system for processing raw mobile data

    Authors: Xiangyang Guan, Cynthia Chen, Ian Ren, Ka Yee Yeung, Ling-Hong Hung, Wes J. Lloyd

    Abstract: Mobility analysis, or understanding and modeling of people's mobility patterns in terms of when, where, and how people move from one place to another, is fundamentally important as such information is the basis for large-scale investment decisions on the nation's multi-modal transportation infrastructure. Recent rise of using passively generated mobile data from mobile devices have raised question… ▽ More

    Submitted 19 April, 2022; originally announced April 2022.

    MSC Class: 91C20 ACM Class: J.6

  32. arXiv:2203.14206  [pdf, other

    cs.LG stat.ML

    Denoising Likelihood Score Matching for Conditional Score-based Data Generation

    Authors: Chen-Hao Chao, Wei-Fang Sun, Bo-Wun Cheng, Yi-Chen Lo, Chia-Che Chang, Yu-Lun Liu, Yu-Lin Chang, Chia-Ping Chen, Chun-Yi Lee

    Abstract: Many existing conditional score-based data generation methods utilize Bayes' theorem to decompose the gradients of a log posterior density into a mixture of scores. These methods facilitate the training procedure of conditional score models, as a mixture of scores can be separately estimated using a score model and a classifier. However, our analysis indicates that the training objectives for the… ▽ More

    Submitted 27 March, 2022; originally announced March 2022.

    Comments: ICLR 2022

  33. arXiv:2202.08695  [pdf, other

    cs.DL stat.AP

    Article's Scientific Prestige: measuring the impact of individual articles in the Web of Science

    Authors: Ying Chen, Thorsten Koch, Nazgul Zakiyeva, Kailiang Liu, Zhitong Xu, Chun-houh Chen, Junji Nakano, Keisuke Honda

    Abstract: We performed a citation analysis on the Web of Science publications consisting of more than 63 million articles and 1.45 billion citations on 254 subjects from 1981 to 2020. We proposed the Article's Scientific Prestige (ASP) metric and compared this metric to number of citations (#Cit) and journal grade in measuring the scientific impact of individual articles in the large-scale hierarchical and… ▽ More

    Submitted 17 February, 2022; originally announced February 2022.

  34. arXiv:2202.01034  [pdf, other

    cs.LG cs.CY stat.ML

    Diagnosing failures of fairness transfer across distribution shift in real-world medical settings

    Authors: Jessica Schrouff, Natalie Harris, Oluwasanmi Koyejo, Ibrahim Alabdulmohsin, Eva Schnider, Krista Opsahl-Ong, Alex Brown, Subhrajit Roy, Diana Mincu, Christina Chen, Awa Dieng, Yuan Liu, Vivek Natarajan, Alan Karthikesalingam, Katherine Heller, Silvia Chiappa, Alexander D'Amour

    Abstract: Diagnosing and mitigating changes in model fairness under distribution shift is an important component of the safe deployment of machine learning in healthcare settings. Importantly, the success of any mitigation strategy strongly depends on the structure of the shift. Despite this, there has been little discussion of how to empirically assess the structure of a distribution shift that one is enco… ▽ More

    Submitted 10 February, 2023; v1 submitted 2 February, 2022; originally announced February 2022.

    Journal ref: Advances in Neural Information Processing Systems 35 (NeurIPS 2022)

  35. arXiv:2201.09644  [pdf, other

    cs.LG stat.ML

    Multiscale Generative Models: Improving Performance of a Generative Model Using Feedback from Other Dependent Generative Models

    Authors: Changyu Chen, Avinandan Bose, Shih-Fen Cheng, Arunesh Sinha

    Abstract: Realistic fine-grained multi-agent simulation of real-world complex systems is crucial for many downstream tasks such as reinforcement learning. Recent work has used generative models (GANs in particular) for providing high-fidelity simulation of real-world systems. However, such generative models are often monolithic and miss out on modeling the interaction in multi-agent systems. In this work, w… ▽ More

    Submitted 24 February, 2022; v1 submitted 24 January, 2022; originally announced January 2022.

  36. arXiv:2112.09086  [pdf, ps, other

    stat.ML cs.LG math.NA

    A new locally linear embedding scheme in light of Hessian eigenmap

    Authors: Liren Lin, Chih-Wei Chen

    Abstract: We provide a new interpretation of Hessian locally linear embedding (HLLE), revealing that it is essentially a variant way to implement the same idea of locally linear embedding (LLE). Based on the new interpretation, a substantial simplification can be made, in which the idea of "Hessian" is replaced by rather arbitrary weights. Moreover, we show by numerical examples that HLLE may produce projec… ▽ More

    Submitted 16 December, 2021; originally announced December 2021.

    Comments: 13 pages

    MSC Class: 62-07

  37. arXiv:2111.10178  [pdf, other

    stat.ML cs.LG

    Understanding Training-Data Leakage from Gradients in Neural Networks for Image Classification

    Authors: Cangxiong Chen, Neill D. F. Campbell

    Abstract: Federated learning of deep learning models for supervised tasks, e.g. image classification and segmentation, has found many applications: for example in human-in-the-loop tasks such as film post-production where it enables sharing of domain expertise of human artists in an efficient and effective fashion. In many such applications, we need to protect the training data from being leaked when gradie… ▽ More

    Submitted 19 November, 2021; originally announced November 2021.

  38. arXiv:2111.08906  [pdf, other

    cs.CL stat.AP

    Using Sampling to Estimate and Improve Performance of Automated Scoring Systems with Guarantees

    Authors: Yaman Kumar Singla, Sriram Krishna, Rajiv Ratn Shah, Changyou Chen

    Abstract: Automated Scoring (AS), the natural language processing task of scoring essays and speeches in an educational testing setting, is growing in popularity and being deployed across contexts from government examinations to companies providing language proficiency services. However, existing systems either forgo human raters entirely, thus harming the reliability of the test, or score every response by… ▽ More

    Submitted 17 November, 2021; originally announced November 2021.

  39. arXiv:2111.07052   

    physics.ao-ph stat.AP

    Distribution and Determinants of Correlation between PM2.5 and O3 in China Mainland: Dynamitic simil-Hu Lines

    Authors: Chenru Chen, Miaoqing Xu, Shuyi Liu, Dehai Zhu, Jianyu Yang, Bingbo Gao, Ziyue Chen

    Abstract: In recent years, China has made great efforts to control air pollution. During the governance process, it is found that fine particulate matter (PM2.5) and ozone (O3) change in the same trend among some areas and the opposite in others, which brings some difficulties to take measures in a planned way. Therefore, this study adopted multi-year and large-scale air quality data to explore the distribu… ▽ More

    Submitted 30 September, 2022; v1 submitted 13 November, 2021; originally announced November 2021.

    Comments: Our research group have decided to withdraw this preprint

  40. arXiv:2111.04580  [pdf, other

    cs.LG math.OC stat.ML

    Nonnegative Tensor Completion via Integer Optimization

    Authors: Caleb Bugg, Chen Chen, Anil Aswani

    Abstract: Unlike matrix completion, tensor completion does not have an algorithm that is known to achieve the information-theoretic sample complexity rate. This paper develops a new algorithm for the special case of completion for nonnegative tensors. We prove that our algorithm converges in a linear (in numerical tolerance) number of oracle steps, while achieving the information-theoretic rate. Our approac… ▽ More

    Submitted 23 May, 2022; v1 submitted 8 November, 2021; originally announced November 2021.

  41. arXiv:2110.09360  [pdf, other

    stat.ML cs.LG

    Prediction of liquid fuel properties using machine learning models with Gaussian processes and probabilistic conditional generative learning

    Authors: Rodolfo S. M. Freitas, Ágatha P. F. Lima, Cheng Chen, Fernando A. Rochinha, Daniel Mira, Xi Jiang

    Abstract: Accurate determination of fuel properties of complex mixtures over a wide range of pressure and temperature conditions is essential to utilizing alternative fuels. The present work aims to construct cheap-to-compute machine learning (ML) models to act as closure equations for predicting the physical properties of alternative fuels. Those models can be trained using the database from MD simulations… ▽ More

    Submitted 18 October, 2021; originally announced October 2021.

    Comments: 22 pages, 13 figures

  42. arXiv:2109.10957  [pdf, other

    cs.RO stat.AP

    Real Robot Challenge: A Robotics Competition in the Cloud

    Authors: Stefan Bauer, Felix Widmaier, Manuel Wüthrich, Annika Buchholz, Sebastian Stark, Anirudh Goyal, Thomas Steinbrenner, Joel Akpo, Shruti Joshi, Vincent Berenz, Vaibhav Agrawal, Niklas Funk, Julen Urain De Jesus, Jan Peters, Joe Watson, Claire Chen, Krishnan Srinivasan, Junwu Zhang, Jeffrey Zhang, Matthew R. Walter, Rishabh Madan, Charles Schaff, Takahiro Maeda, Takuma Yoneda, Denis Yarats , et al. (17 additional authors not shown)

    Abstract: Dexterous manipulation remains an open problem in robotics. To coordinate efforts of the research community towards tackling this problem, we propose a shared benchmark. We designed and built robotic platforms that are hosted at MPI for Intelligent Systems and can be accessed remotely. Each platform consists of three robotic fingers that are capable of dexterous object manipulation. Users are able… ▽ More

    Submitted 10 June, 2022; v1 submitted 22 September, 2021; originally announced September 2021.

  43. arXiv:2108.07301  [pdf, other

    cs.LG stat.AP

    Understanding the factors driving the opioid epidemic using machine learning

    Authors: Sachin Gavali, Chuming Chen, Julie Cowart, Xi Peng, Shanshan Ding, Cathy Wu, Tammy Anderson

    Abstract: In recent years, the US has experienced an opioid epidemic with an unprecedented number of drugs overdose deaths. Research finds such overdose deaths are linked to neighborhood-level traits, thus providing opportunity to identify effective interventions. Typically, techniques such as Ordinary Least Squares (OLS) or Maximum Likelihood Estimation (MLE) are used to document neighborhood-level factors… ▽ More

    Submitted 6 December, 2021; v1 submitted 16 August, 2021; originally announced August 2021.

    Comments: Accepted to IEEE International Conference on Bioinformatics & Biomedicine 2021

  44. arXiv:2104.13417  [pdf, other

    cs.CV cs.LG stat.ML

    Towards Fair Federated Learning with Zero-Shot Data Augmentation

    Authors: Weituo Hao, Mostafa El-Khamy, Jungwon Lee, Jianyi Zhang, Kevin J Liang, Changyou Chen, Lawrence Carin

    Abstract: Federated learning has emerged as an important distributed learning paradigm, where a server aggregates a global model from many client-trained models while having no access to the client data. Although it is recognized that statistical heterogeneity of the client local data yields slower global model convergence, it is less commonly recognized that it also yields a biased federated global model w… ▽ More

    Submitted 27 April, 2021; originally announced April 2021.

    Comments: Accepted by IEEE CVPR Workshop on Fair, Data Efficient And Trusted Computer Vision

  45. arXiv:2103.11251  [pdf, other

    cs.LG stat.ML

    Interpretable Machine Learning: Fundamental Principles and 10 Grand Challenges

    Authors: Cynthia Rudin, Chaofan Chen, Zhi Chen, Haiyang Huang, Lesia Semenova, Chudi Zhong

    Abstract: Interpretability in machine learning (ML) is crucial for high stakes decisions and troubleshooting. In this work, we provide fundamental principles for interpretable ML, and dispel common misunderstandings that dilute the importance of this crucial topic. We also identify 10 technical challenge areas in interpretable machine learning and provide history and background on each problem. Some of thes… ▽ More

    Submitted 9 July, 2021; v1 submitted 20 March, 2021; originally announced March 2021.

    MSC Class: 68T01 ACM Class: I.2.6

    Journal ref: Statistics Surveys, 2021

  46. arXiv:2103.07756  [pdf, other

    cs.LG cs.CV stat.AP stat.ML

    Learning with Feature-Dependent Label Noise: A Progressive Approach

    Authors: Yikai Zhang, Songzhu Zheng, Pengxiang Wu, Mayank Goswami, Chao Chen

    Abstract: Label noise is frequently observed in real-world large-scale datasets. The noise is introduced due to a variety of reasons; it is heterogeneous and feature-dependent. Most existing approaches to handling noisy labels fall into two categories: they either assume an ideal feature-independent noise, or remain heuristic without theoretical guarantees. In this paper, we propose to target a new family o… ▽ More

    Submitted 27 March, 2021; v1 submitted 13 March, 2021; originally announced March 2021.

    Comments: ICLR 2021 (Spotlight)

  47. arXiv:2103.02156  [pdf, other

    stat.ME

    Ridge-penalized adaptive Mantel test and its application in imaging genetics

    Authors: Dustin Pluta, Tong Shen, Gui Xue, Chuansheng Chen, Hernando Ombao, Zhaoxia Yu

    Abstract: We propose a ridge-penalized adaptive Mantel test (AdaMant) for evaluating the association of two high-dimensional sets of features. By introducing a ridge penalty, AdaMant tests the association across many metrics simultaneously. We demonstrate how ridge penalization bridges Euclidean and Mahalanobis distances and their corresponding linear models from the perspective of association measurement a… ▽ More

    Submitted 20 March, 2021; v1 submitted 2 March, 2021; originally announced March 2021.

  48. arXiv:2102.05274  [pdf, ps, other

    cs.LG math.OC stat.ML

    Stability of SGD: Tightness Analysis and Improved Bounds

    Authors: Yikai Zhang, Wenjia Zhang, Sammy Bald, Vamsi Pingali, Chao Chen, Mayank Goswami

    Abstract: Stochastic Gradient Descent (SGD) based methods have been widely used for training large-scale machine learning models that also generalize well in practice. Several explanations have been offered for this generalization performance, a prominent one being algorithmic stability [18]. However, there are no known examples of smooth loss functions for which the analysis can be shown to be tight. Furth… ▽ More

    Submitted 10 February, 2021; originally announced February 2021.

    ACM Class: I.2.6; G.1.6

  49. arXiv:2102.04124  [pdf, other

    stat.AP

    SONIC: SOcial Network with Influencers and Communities

    Authors: Cathy Yi-Hsuan Chen, Wolfgang Karl Härdle, Yegor Klochkov

    Abstract: The integration of social media characteristics into an econometric framework requires modeling a high dimensional dynamic network with dimensions of parameter typically much larger than the number of observations. To cope with this problem, we introduce SONIC, a new high-dimensional network model that assumes that (1) only few influencers drive the network dynamics; (2) the community structure of… ▽ More

    Submitted 8 February, 2021; originally announced February 2021.

  50. arXiv:2101.02280  [pdf

    stat.AP

    Independent Action Models and Prediction of Combination Treatment Effects for Response Rate, Duration of Response and Tumor Size Change in Oncology Drug Development

    Authors: Linda Z. Sun, Cai, Wu, Xiaoyun, Li, Cong Chen, Emmett V. Schmidt

    Abstract: An unprecedented number of new cancer targets are in development, and most are being developed in combination therapies. Early oncology development is strategically challenged in choosing the best combinations to move forward to late stage development. The most common early endpoints to be assessed in such decision-making include objective response rate, duration of response and tumor size change.… ▽ More

    Submitted 6 January, 2021; originally announced January 2021.