Skip to main content

Showing 1–50 of 88 results for author: Shi, X

Searching in archive stat. Search in all archives.
.
  1. arXiv:2408.11003  [pdf, other

    stat.ME

    DEEPEAST technique to enhance power in two-sample tests via the same-attraction function

    Authors: Yiting Chen, Min Gao, Wei Lin, Andrew Jirasek, Kirsty Milligan, Xiaoping Shi

    Abstract: Data depth has emerged as an invaluable nonparametric measure for the ranking of multivariate samples. The main contribution of depth-based two-sample comparisons is the introduction of the Q statistic (Liu and Singh, 1993), a quality index. Unlike traditional methods, data depth does not require the assumption of normal distributions and adheres to four fundamental properties. Many existing two-s… ▽ More

    Submitted 20 August, 2024; originally announced August 2024.

  2. arXiv:2407.01856  [pdf, other

    cs.LG stat.ML

    Adaptive RKHS Fourier Features for Compositional Gaussian Process Models

    Authors: Xinxing Shi, Thomas Baldwin-McDonald, Mauricio A. Álvarez

    Abstract: Deep Gaussian Processes (DGPs) leverage a compositional structure to model non-stationary processes. DGPs typically rely on local inducing point approximations across intermediate GP layers. Recent advances in DGP inference have shown that incorporating global Fourier features from Reproducing Kernel Hilbert Space (RKHS) can enhance the DGPs' capability to capture complex non-stationary patterns.… ▽ More

    Submitted 1 July, 2024; originally announced July 2024.

  3. arXiv:2406.19597  [pdf, other

    stat.ME

    What's the Weight? Estimating Controlled Outcome Differences in Complex Surveys for Health Disparities Research

    Authors: Stephen Salerno, Emily K. Roberts, Belinda L. Needham, Tyler H. McCormick, Bhramar Mukherjee, Xu Shi

    Abstract: A basic descriptive question in statistics often asks whether there are differences in mean outcomes between groups based on levels of a discrete covariate (e.g., racial disparities in health outcomes). However, when this categorical covariate of interest is correlated with other factors related to the outcome, direct comparisons may lead to biased estimates and invalid inferential conclusions wit… ▽ More

    Submitted 27 June, 2024; originally announced June 2024.

  4. arXiv:2406.00322  [pdf, other

    stat.ME stat.AP

    Adaptive Penalized Likelihood method for Markov Chains

    Authors: Yining Zhou, Ming Gao, Yiting Chen, Xiaoping Shi

    Abstract: Maximum Likelihood Estimation (MLE) and Likelihood Ratio Test (LRT) are widely used methods for estimating the transition probability matrix in Markov chains and identifying significant relationships between transitions, such as equality. However, the estimated transition probability matrix derived from MLE lacks accuracy compared to the real one, and LRT is inefficient in high-dimensional Markov… ▽ More

    Submitted 1 June, 2024; originally announced June 2024.

  5. arXiv:2405.18518  [pdf, other

    cs.LG stat.ME stat.ML

    Modeling Long Sequences in Bladder Cancer Recurrence: A Comparative Evaluation of LSTM,Transformer,and Mamba

    Authors: Runquan Zhang, Jiawen Jiang, Xiaoping Shi

    Abstract: Traditional survival analysis methods often struggle with complex time-dependent data,failing to capture and interpret dynamic characteristics adequately.This study aims to evaluate the performance of three long-sequence models,LSTM,Transformer,and Mamba,in analyzing recurrence event data and integrating them with the Cox proportional hazards model.This study integrates the advantages of deep lear… ▽ More

    Submitted 19 July, 2024; v1 submitted 28 May, 2024; originally announced May 2024.

  6. arXiv:2403.18039  [pdf, other

    stat.ME

    Doubly robust causal inference through penalized bias-reduced estimation: combining non-probability samples with designed surveys

    Authors: Jiacong Du, Xu Shi, Donglin Zeng, Bhramar Mukherjee

    Abstract: Causal inference on the average treatment effect (ATE) using non-probability samples, such as electronic health records (EHR), faces challenges from sample selection bias and high-dimensional covariates. This requires considering a selection model alongside treatment and outcome models that are typical ingredients in causal inference. This paper considers integrating large non-probability samples… ▽ More

    Submitted 26 March, 2024; originally announced March 2024.

  7. arXiv:2403.03388  [pdf, other

    stat.AP

    Is a Recent Surge in Global Warming Detectable?

    Authors: Claudie Beaulieu, Colin Gallagher, Rebecca Killick, Robert Lund, Xueheng Shi

    Abstract: The global mean surface temperature is widely studied to monitor climate change. A current debate centers around whether there has been a recent (post-1970s) surge/acceleration in the warming rate. This paper addresses whether an acceleration in the warming rate is detectable from a statistical perspective. We use changepoint models, which are statistical techniques specifically designed for ident… ▽ More

    Submitted 5 March, 2024; originally announced March 2024.

  8. arXiv:2401.00521  [pdf, other

    cs.LG cs.AI stat.AP

    Multi-spatial Multi-temporal Air Quality Forecasting with Integrated Monitoring and Reanalysis Data

    Authors: Yuxiao Hu, Qian Li, Xiaodan Shi, Jinyue Yan, Yuntian Chen

    Abstract: Accurate air quality forecasting is crucial for public health, environmental monitoring and protection, and urban planning. However, existing methods fail to effectively utilize multi-scale information, both spatially and temporally. Spatially, there is a lack of integration between individual monitoring stations and city-wide scales. Temporally, the periodic nature of air quality variations is of… ▽ More

    Submitted 31 December, 2023; originally announced January 2024.

  9. arXiv:2312.03967  [pdf, other

    stat.ME

    Test-negative designs with various reasons for testing: statistical bias and solution

    Authors: Mengxin Yu, Kendrick Qijun Li, Nicholas Jewell, Eric Tchetgen Tchetgen, Dylan Small, Xu Shi, Bingkai Wang

    Abstract: Test-negative designs are widely used for post-market evaluation of vaccine effectiveness, particularly in cases where randomization is not feasible. Differing from classical test-negative designs where only healthcare-seekers with symptoms are included, recent test-negative designs have involved individuals with various reasons for testing, especially in an outbreak setting. While including these… ▽ More

    Submitted 21 April, 2024; v1 submitted 6 December, 2023; originally announced December 2023.

  10. Impacts of Riding Comfort on the Attitudes of Riders, Drivers, and Pedestrians Toward Autonomous Shuttles

    Authors: Keke Long, Xiaowei Shi, Zhiwei Chen, Yuan Wang, Xiaopeng Li

    Abstract: Automated vehicle (AV) shuttles are emerging mobility technologies that have been widely piloted and deployed. Public attitude is critical to the deployment progress and the overall social benefits of automated vehicle (AV) technologies. The AV shuttle demonstration was regarded as a good way for possible attitude improvements. However, not all existing AV shuttle technologies are mature and relia… ▽ More

    Submitted 11 October, 2023; originally announced October 2023.

  11. arXiv:2306.17704  [pdf, other

    stat.ME math.ST

    Top-Two Thompson Sampling for Contextual Top-mc Selection Problems

    Authors: Xinbo Shi, Yijie Peng, Gongbo Zhang

    Abstract: We aim to efficiently allocate a fixed simulation budget to identify the top-mc designs for each context among a finite number of contexts. The performance of each design under a context is measured by an identifiable statistical characteristic, possibly with the existence of nuisance parameters. Under a Bayesian framework, we extend the top-two Thompson sampling method designed for selecting the… ▽ More

    Submitted 30 June, 2023; originally announced June 2023.

    MSC Class: 62F07 (Primary) 62C10; 62L10 (Secondary)

  12. arXiv:2304.04652  [pdf, other

    stat.ME

    A Framework for Understanding Selection Bias in Real-World Healthcare Data

    Authors: Ritoban Kundu, Xu Shi, Jean Morrison, Jessica Barrett, Bhramar Mukherjee

    Abstract: Using administrative patient-care data such as Electronic Health Records (EHR) and medical/ pharmaceutical claims for population-based scientific research has become increasingly common. With vast sample sizes leading to very small standard errors, researchers need to pay more attention to potential biases in the estimates of association parameters of interest, specifically to biases that do not d… ▽ More

    Submitted 17 August, 2023; v1 submitted 10 April, 2023; originally announced April 2023.

  13. arXiv:2303.03520  [pdf, other

    stat.ME

    The Effect of Alcohol Consumption on Brain Ageing: A New Causal Inference Framework for Incomplete and Massive Phenomic Data

    Authors: Chixiang Chen, Shuo Chen, Zhenyao Ye, Xu Shi, Tianzhou Ma

    Abstract: Although substance use, such as alcohol consumption, is known to be associated with cognitive decline during ageing, its direct influence on the central nervous system remains unclear. In this study, we aim to investigate the potential influence of alcohol intake frequency on accelerated brain ageing by estimating the mean potential brain-age gap (BAG) index, the difference between brain age and a… ▽ More

    Submitted 4 March, 2024; v1 submitted 6 March, 2023; originally announced March 2023.

    Comments: Contact: [email protected]

  14. arXiv:2302.05549  [pdf, other

    stat.ME cs.DC

    Balancing Approach for Causal Inference at Scale

    Authors: Sicheng Lin, Meng Xu, Xi Zhang, Shih-Kang Chao, Ying-Kai Huang, Xiaolin Shi

    Abstract: With the modern software and online platforms to collect massive amount of data, there is an increasing demand of applying causal inference methods at large scale when randomized experimentation is not viable. Weighting methods that directly incorporate covariate balancing have recently gained popularity for estimating causal effects in observational studies. These methods reduce the manual effort… ▽ More

    Submitted 3 August, 2023; v1 submitted 10 February, 2023; originally announced February 2023.

    Comments: KDD '23: Proceedings of the 29th ACM SIGKDD Conference on Knowledge Discovery and Data Mining

  15. arXiv:2212.04081  [pdf, other

    stat.ME

    Changepoint Methods in Climatology

    Authors: Robert B. Lund, Xueheng Shi

    Abstract: Changepoint methods have multiple uses in climatology, including stationary checks and record homogenization. There are still many open problems in the area, especially in the multiple changepoint setting, and statisticians are needed to help develop the methods and analyze the data.

    Submitted 8 December, 2022; originally announced December 2022.

  16. arXiv:2212.02674  [pdf, other

    stat.AP

    Good Practices and Common Pitfalls in Climate Time Series Changepoint Techniques: A Review

    Authors: Robert B. Lund, Claudie Beaulieu, Rebecca Killick, Qiqi Lu, Xueheng Shi

    Abstract: Climate changepoint (homogenization) methods abound today, with a myriad of techniques existing in both the climate and statistics literature. Unfortunately, the appropriate changepoint technique to use remains unclear to many. Further complicating issues, changepoint conclusions are not robust to small perturbations in assumptions; for example, allowing for a trend or correlation in the series ca… ▽ More

    Submitted 5 December, 2022; originally announced December 2022.

  17. arXiv:2210.02014  [pdf, other

    stat.ME

    Doubly Robust Proximal Synthetic Controls

    Authors: Hongxiang Qiu, Xu Shi, Wang Miao, Edgar Dobriban, Eric Tchetgen Tchetgen

    Abstract: To infer the treatment effect for a single treated unit using panel data, synthetic control methods construct a linear combination of control units' outcomes that mimics the treated unit's pre-treatment outcome trajectory. This linear combination is subsequently used to impute the counterfactual outcomes of the treated unit had it not been treated in the post-treatment period, and used to estimate… ▽ More

    Submitted 6 May, 2024; v1 submitted 5 October, 2022; originally announced October 2022.

  18. arXiv:2210.00528  [pdf, other

    stat.ME

    Data-driven Automated Negative Control Estimation (DANCE): Search for, Validation of, and Causal Inference with Negative Controls

    Authors: Erich Kummerfeld, Jaewon Lim, Xu Shi

    Abstract: Negative control variables are increasingly used to adjust for unmeasured confounding bias in causal inference using observational data. They are typically identified by subject matter knowledge and there is currently a severe lack of data-driven methods to find negative controls. In this paper, we present a statistical test for discovering negative controls of a special type -- disconnected negat… ▽ More

    Submitted 2 October, 2022; originally announced October 2022.

  19. arXiv:2208.01237  [pdf, ps, other

    stat.ME

    Doubly Robust Proximal Causal Inference under Confounded Outcome-Dependent Sampling

    Authors: Kendrick Qijun Li, Xu Shi, Wang Miao, Eric Tchetgen Tchetgen

    Abstract: Unmeasured confounding and selection bias are often of concern in observational studies and may invalidate a causal analysis if not appropriately accounted for. Under outcome-dependent sampling, a latent factor that has causal effects on the treatment, outcome, and sample selection process may cause both unmeasured confounding and selection bias, rendering standard causal parameters unidentifiable… ▽ More

    Submitted 2 August, 2022; originally announced August 2022.

    Comments: 43 pages, 1 figure

  20. arXiv:2204.00857  [pdf, other

    stat.ME

    Collaborative causal inference with a distributed data-sharing management

    Authors: Mengtong Hu, Xu Shi, Peter X. -K. Song

    Abstract: Data sharing barriers are paramount challenges arising from multicenter clinical trials where multiple data sources are stored in a distributed fashion at different local study sites. Merging such data sources into a common data storage for a centralized statistical analysis requires a data use agreement, which is often time-consuming. Data merging may become more burdensome when causal inference… ▽ More

    Submitted 2 April, 2022; originally announced April 2022.

  21. arXiv:2203.12509  [pdf, other

    stat.ME

    Double Negative Control Inference in Test-Negative Design Studies of Vaccine Effectiveness

    Authors: Kendrick Qijun Li, Xu Shi, Wang Miao, Eric Tchetgen Tchetgen

    Abstract: The test-negative design (TND) has become a standard approach to evaluate vaccine effectiveness against the risk of acquiring infectious diseases in real-world settings, such as Influenza, Rotavirus, Dengue fever, and more recently COVID-19. In a TND study, individuals who experience symptoms and seek care are recruited and tested for the infectious disease which defines cases and controls. Despit… ▽ More

    Submitted 8 March, 2023; v1 submitted 23 March, 2022; originally announced March 2022.

    Comments: 78 pages, 4 figures, 5 tables

  22. arXiv:2111.13775  [pdf, other

    stat.ME

    Online Causal Inference with Application to Near Real-Time Post-Market Vaccine Safety Surveillance

    Authors: Xu Shi, Lan Luo

    Abstract: Streaming data routinely generated by mobile phones, social networks, e-commerce, and electronic health records present new opportunities for near real-time surveillance of the impact of an intervention on an outcome of interest via causal inference methods. However, as data grow rapidly in volume and velocity, storing and combing data become increasingly challenging. The amount of time and effort… ▽ More

    Submitted 26 November, 2021; originally announced November 2021.

  23. arXiv:2111.08851  [pdf, other

    cs.LG cs.CV stat.ML

    Deep Neural Networks for Rank-Consistent Ordinal Regression Based On Conditional Probabilities

    Authors: Xintong Shi, Wenzhi Cao, Sebastian Raschka

    Abstract: In recent times, deep neural networks achieved outstanding predictive performance on various classification and pattern recognition tasks. However, many real-world prediction problems have ordinal response variables, and this ordering information is ignored by conventional classification losses such as the multi-category cross-entropy. Ordinal regression methods for deep neural networks address th… ▽ More

    Submitted 31 May, 2023; v1 submitted 16 November, 2021; originally announced November 2021.

    Comments: Accepted for publication in Pattern Analysis and Applications

    Journal ref: Pattern Analysis and Applications 2023

  24. arXiv:2111.07407  [pdf, other

    cs.LG stat.AP stat.ML

    A Machine Learning Approach for Recruitment Prediction in Clinical Trial Design

    Authors: Jingshu Liu, Patricia J Allen, Luke Benz, Daniel Blickstein, Evon Okidi, Xiao Shi

    Abstract: Significant advancements have been made in recent years to optimize patient recruitment for clinical trials, however, improved methods for patient recruitment prediction are needed to support trial site selection and to estimate appropriate enrollment timelines in the trial design stage. In this paper, using data from thousands of historical clinical trials, we explore machine learning methods to… ▽ More

    Submitted 14 November, 2021; originally announced November 2021.

    Comments: Machine Learning for Health (ML4H) - Extended Abstract

  25. arXiv:2111.02705  [pdf, other

    cs.LG cs.CL stat.ML

    Benchmarking Multimodal AutoML for Tabular Data with Text Fields

    Authors: Xingjian Shi, Jonas Mueller, Nick Erickson, Mu Li, Alexander J. Smola

    Abstract: We consider the use of automated supervised learning systems for data tables that not only contain numeric/categorical columns, but one or more text fields as well. Here we assemble 18 multimodal data tables that each contain some text fields and stem from a real business application. Our publicly-available benchmark enables researchers to comprehensively evaluate their own methods for supervised… ▽ More

    Submitted 4 November, 2021; originally announced November 2021.

    Comments: Proceedings of the Neural Information Processing Systems (NeurIPS) Track on Datasets and Benchmarks 2021

  26. arXiv:2110.01106  [pdf, ps, other

    stat.ME

    Data Integration in Causal Inference

    Authors: Xu Shi, Ziyang Pan, Wang Miao

    Abstract: Integrating data from multiple heterogeneous sources has become increasingly popular to achieve a large sample size and diverse study population. This paper reviews development in causal inference methods that combines multiple datasets collected by potentially different designs from potentially heterogeneous populations. We summarize recent advances on combining randomized clinical trial with ext… ▽ More

    Submitted 3 October, 2021; originally announced October 2021.

  27. arXiv:2109.07030  [pdf, other

    stat.ME math.ST

    Proximal Causal Inference for Complex Longitudinal Studies

    Authors: Andrew Ying, Wang Miao, Xu Shi, Eric J. Tchetgen Tchetgen

    Abstract: A standard assumption for causal inference about the joint effects of time-varying treatment is that one has measured sufficient covariates to ensure that within covariate strata, subjects are exchangeable across observed treatment values, also known as "sequential randomization assumption (SRA)". SRA is often criticized as it requires one to accurately measure all confounders. Realistically, meas… ▽ More

    Submitted 3 August, 2022; v1 submitted 14 September, 2021; originally announced September 2021.

  28. arXiv:2108.13935  [pdf, other

    stat.ME

    Theory for identification and Inference with Synthetic Controls: A Proximal Causal Inference Framework

    Authors: Xu Shi, Kendrick Li, Wang Miao, Mengtong Hu, Eric Tchetgen Tchetgen

    Abstract: Synthetic control (SC) methods are commonly used to estimate the treatment effect on a single treated unit in panel data settings. An SC is a weighted average of control units built to match the treated unit, with weights typically estimated by regressing (summaries of) pre-treatment outcomes and measured covariates of the treated unit to those of the control units. However, it has been establishe… ▽ More

    Submitted 18 February, 2023; v1 submitted 31 August, 2021; originally announced August 2021.

    Comments: 37 pages, 3 figures. The Supplementary Materials are attached

  29. arXiv:2107.00728  [pdf, other

    stat.ME

    Two edge-count tests and relevance analysis in k high-dimensional samples

    Authors: Xiaoping Shi

    Abstract: For the task of relevance analysis, the conventional Tukey's test may be applied to the set of all pairwise comparisons. However, there were few studies that discuss both nonparametric k-sample comparisons and relevance analysis in high dimensions. Our aim is to capture the degree of relevance between combined samples and provide additional insights and advantages in high-dimensional k-sample comp… ▽ More

    Submitted 1 July, 2021; originally announced July 2021.

  30. Changepoint Detection: An Analysis of the Central England Temperature Series

    Authors: Xueheng Shi, Claudie Beaulieu, Rebecca Killick, Robert Lund

    Abstract: This paper presents a statistical analysis of structural changes in the Central England temperature series, one of the longest surface temperature records available. A changepoint analysis is performed to detect abrupt changes, which can be regarded as a preliminary step before further analysis is conducted to identify the causes of the changes (e.g., artificial, human-induced or natural variabili… ▽ More

    Submitted 13 June, 2022; v1 submitted 23 June, 2021; originally announced June 2021.

  31. arXiv:2102.13287  [pdf, other

    stat.ME stat.AP stat.CO

    Exploring the space-time pattern of log-transformed infectious count of COVID-19: a clustering-segmented autoregressive sigmoid model

    Authors: Xiaoping Shi, Meiqian Chen, Yucheng Dong

    Abstract: At the end of April 20, 2020, there were only a few new COVID-19 cases remaining in China, whereas the rest of the world had shown increases in the number of new cases. It is of extreme importance to develop an efficient statistical model of COVID-19 spread, which could help in the global fight against the virus. We propose a clustering-segmented autoregressive sigmoid (CSAS) model to explore the… ▽ More

    Submitted 25 February, 2021; originally announced February 2021.

    Comments: 29 pages and 9 figures

    MSC Class: 62H30(Primary) 65D10 (Secondary)

  32. arXiv:2102.10669  [pdf, other

    math.ST stat.ME

    Autocovariance Estimation in the Presence of Changepoints

    Authors: Colin Gallagher, Rebecca Killick, Robert Lund, Xueheng Shi

    Abstract: This article studies estimation of a stationary autocovariance structure in the presence of an unknown number of mean shifts. Here, a Yule-Walker moment estimator for the autoregressive parameters in a dependent time series contaminated by mean shift changepoints is proposed and studied. The estimator is based on first order differences of the series and is proven consistent and asymptotically nor… ▽ More

    Submitted 24 February, 2021; v1 submitted 21 February, 2021; originally announced February 2021.

  33. arXiv:2101.01960  [pdf, other

    stat.ME stat.CO

    A Comparison of Single and Multiple Changepoint Techniques for Time Series Data

    Authors: Xueheng Shi, Colin Gallagher, Robert Lund, Rebecca Killick

    Abstract: This paper describes and compares several prominent single and multiple changepoint techniques for time series data. Due to their importance in inferential matters, changepoint research on correlated data has accelerated recently. Unfortunately, small perturbations in model assumptions can drastically alter changepoint conclusions; for example, heavy positive correlation in a time series can be mi… ▽ More

    Submitted 6 January, 2021; originally announced January 2021.

  34. arXiv:2011.14437  [pdf, other

    stat.AP

    How to Measure Your App: A Couple of Pitfalls and Remedies in Measuring App Performance in Online Controlled Experiments

    Authors: Yuxiang Xie, Meng Xu, Evan Chow, Xiaolin Shi

    Abstract: Effectively measuring, understanding, and improving mobile app performance is of paramount importance for mobile app developers. Across the mobile Internet landscape, companies run online controlled experiments (A/B tests) with thousands of performance metrics in order to understand how app performance causally impacts user retention and to guard against service or app regressions that degrade use… ▽ More

    Submitted 29 November, 2020; originally announced November 2020.

    Comments: WSDM '21: Proceedings of the 14th International Conference on Web Search and Data Mining

  35. arXiv:2011.08411  [pdf, other

    stat.ME math.ST

    Semiparametric proximal causal inference

    Authors: Yifan Cui, Hongming Pu, Xu Shi, Wang Miao, Eric Tchetgen Tchetgen

    Abstract: Skepticism about the assumption of no unmeasured confounding, also known as exchangeability, is often warranted in making causal inferences from observational data; because exchangeability hinges on an investigator's ability to accurately measure covariates that capture all potential sources of confounding. In practice, the most one can hope for is that covariate measurements are at best proxies o… ▽ More

    Submitted 21 February, 2023; v1 submitted 16 November, 2020; originally announced November 2020.

  36. arXiv:2011.06663  [pdf, other

    stat.ME

    Patient Recruitment Using Electronic Health Records Under Selection Bias: a Two-phase Sampling Framework

    Authors: Guanghao Zhang, Lauren J. Beesley, Bhramar Mukherjee, Xu Shi

    Abstract: Electronic health records (EHRs) are increasingly recognized as a cost-effective resource for patient recruitment in clinical research. However, how to optimally select a cohort from millions of individuals to answer a scientific question of interest remains unclear. Consider a study to estimate the mean or mean difference of an expensive outcome. Inexpensive auxiliary covariates predictive of the… ▽ More

    Submitted 13 December, 2023; v1 submitted 12 November, 2020; originally announced November 2020.

  37. arXiv:2009.10982  [pdf, other

    stat.ME

    An Introduction to Proximal Causal Learning

    Authors: Eric J Tchetgen Tchetgen, Andrew Ying, Yifan Cui, Xu Shi, Wang Miao

    Abstract: A standard assumption for causal inference from observational data is that one has measured a sufficiently rich set of covariates to ensure that within covariate strata, subjects are exchangeable across observed treatment values. Skepticism about the exchangeability assumption in observational studies is often warranted because it hinges on investigators' ability to accurately measure covariates c… ▽ More

    Submitted 23 September, 2020; originally announced September 2020.

    Comments: This paper was originally presented by the first author at the 2020 Myrto Lefkopoulou Distinguished Lectureship at the Harvard T. H. Chan School of Public Health on September 17th 2020

    MSC Class: 62A01

  38. arXiv:2009.09842  [pdf, other

    cs.LG cs.MA stat.ML

    Energy-based Surprise Minimization for Multi-Agent Value Factorization

    Authors: Karush Suri, Xiao Qi Shi, Konstantinos Plataniotis, Yuri Lawryshyn

    Abstract: Multi-Agent Reinforcement Learning (MARL) has demonstrated significant success in training decentralised policies in a centralised manner by making use of value factorization methods. However, addressing surprise across spurious states and approximation bias remain open problems for multi-agent settings. Towards this goal, we introduce the Energy-based MIXer (EMIX), an algorithm which minimizes su… ▽ More

    Submitted 17 January, 2021; v1 submitted 16 September, 2020; originally announced September 2020.

    Comments: Preprint, Under Review

  39. arXiv:2009.05641  [pdf, ps, other

    stat.ME

    A Selective Review of Negative Control Methods in Epidemiology

    Authors: Xu Shi, Wang Miao, Eric Tchetgen Tchetgen

    Abstract: Purpose of Review: Negative controls are a powerful tool to detect and adjust for bias in epidemiological research. This paper introduces negative controls to a broader audience and provides guidance on principled design and causal analysis based on a formal negative control framework. Recent Findings: We review and summarize causal and statistical assumptions, practical strategies, and validati… ▽ More

    Submitted 19 July, 2022; v1 submitted 11 September, 2020; originally announced September 2020.

  40. arXiv:2007.13690  [pdf, other

    cs.LG cs.AI cs.NE stat.ML

    Maximum Mutation Reinforcement Learning for Scalable Control

    Authors: Karush Suri, Xiao Qi Shi, Konstantinos N. Plataniotis, Yuri A. Lawryshyn

    Abstract: Advances in Reinforcement Learning (RL) have demonstrated data efficiency and optimal control over large state spaces at the cost of scalable performance. Genetic methods, on the other hand, provide scalability but depict hyperparameter sensitivity towards evolutionary operations. However, a combination of the two methods has recently demonstrated success in scaling RL agents to high-dimensional a… ▽ More

    Submitted 16 January, 2021; v1 submitted 24 July, 2020; originally announced July 2020.

    Comments: 10+3 Pages

  41. arXiv:2006.10845  [pdf, ps, other

    stat.ME

    Short Communication: Detecting Possibly Frequent Change-points: Wild Binary Segmentation 2

    Authors: Robert Lund, Xueheng Shi

    Abstract: This article comments on the new version of wild binary segmentation 2. Wild Binary Segmentation 2 and Steepest-drop Model Selection has made improvements on changepoint analysis especially on reducing the computational cost. However, WBS2 tends to overestimate as WBS and the threshold does not work appropriately on short sequences without changepoints.

    Submitted 18 June, 2020; originally announced June 2020.

  42. arXiv:2006.06427  [pdf, other

    cs.SI cs.LG stat.ML

    Knowing your FATE: Friendship, Action and Temporal Explanations for User Engagement Prediction on Social Apps

    Authors: Xianfeng Tang, Yozen Liu, Neil Shah, Xiaolin Shi, Prasenjit Mitra, Suhang Wang

    Abstract: With the rapid growth and prevalence of social network applications (Apps) in recent years, understanding user engagement has become increasingly important, to provide useful insights for future App design and development. While several promising neural modeling approaches were recently pioneered for accurate user engagement prediction, their black-box designs are unfortunately limited in model ex… ▽ More

    Submitted 15 June, 2020; v1 submitted 9 June, 2020; originally announced June 2020.

    Comments: Accepted to KDD 2020 Applied Data Science Track

  43. arXiv:2004.09845  [pdf, other

    cs.LG cs.CV stat.ML

    LRTD: Long-Range Temporal Dependency based Active Learning for Surgical Workflow Recognition

    Authors: Xueying Shi, Yueming Jin, Qi Dou, Pheng-Ann Heng

    Abstract: Automatic surgical workflow recognition in video is an essentially fundamental yet challenging problem for developing computer-assisted and robotic-assisted surgery. Existing approaches with deep learning have achieved remarkable performance on analysis of surgical videos, however, heavily relying on large-scale labelled datasets. Unfortunately, the annotation is not often available in abundance,… ▽ More

    Submitted 23 April, 2020; v1 submitted 21 April, 2020; originally announced April 2020.

    Comments: Accepted as a conference paper in IPCAI 2020

  44. Real-time Data-driven Quality Assessment for Continuous Manufacturing of Carbon Nanotube Buckypaper

    Authors: Xinran Shi, Xiaowei Yue, Zhiyong Liang, Jianjun Shi

    Abstract: Carbon nanotube (CNT) thin sheet, or buckypaper, has shown great potential as a multifunctional platform material due to its desirable properties, including its lightweight nature, high mechanical properties, and good conductivity. However, their mass adoption and applications by industry have run into significant bottlenecks because of large variability and uncertainty in quality during fabricati… ▽ More

    Submitted 19 April, 2020; originally announced April 2020.

  45. arXiv:1912.06060  [pdf, ps, other

    cs.LG stat.ML

    Sublinear Time Numerical Linear Algebra for Structured Matrices

    Authors: Xiaofei Shi, David P. Woodruff

    Abstract: We show how to solve a number of problems in numerical linear algebra, such as least squares regression, $\ell_p$-regression for any $p \geq 1$, low rank approximation, and kernel regression, in time $T(A) \poly(\log(nd))$, where for a given input matrix $A \in \mathbb{R}^{n \times d}$, $T(A)$ is the time needed to compute $A\cdot y$ for an arbitrary vector $y \in \mathbb{R}^d$. Since… ▽ More

    Submitted 12 December, 2019; originally announced December 2019.

  46. arXiv:1911.06487  [pdf, other

    cs.CV cs.LG cs.RO stat.ML

    OpenLORIS-Object: A Robotic Vision Dataset and Benchmark for Lifelong Deep Learning

    Authors: Qi She, Fan Feng, Xinyue Hao, Qihan Yang, Chuanlin Lan, Vincenzo Lomonaco, Xuesong Shi, Zhengwei Wang, Yao Guo, Yimin Zhang, Fei Qiao, Rosa H. M. Chan

    Abstract: The recent breakthroughs in computer vision have benefited from the availability of large representative datasets (e.g. ImageNet and COCO) for training. Yet, robotic vision poses unique challenges for applying visual algorithms developed from these standard computer vision datasets due to their implicit assumption over non-varying distributions for a fixed set of tasks. Fully retraining models eac… ▽ More

    Submitted 6 March, 2020; v1 submitted 15 November, 2019; originally announced November 2019.

    Comments: 7 pages, 7 figures, 4 tables

  47. arXiv:1910.12163  [pdf, other

    stat.ML cs.CR cs.LG

    Understanding and Quantifying Adversarial Examples Existence in Linear Classification

    Authors: Xupeng Shi, A. Adam Ding

    Abstract: State-of-art deep neural networks (DNN) are vulnerable to attacks by adversarial examples: a carefully designed small perturbation to the input, that is imperceptible to human, can mislead DNN. To understand the root cause of adversarial examples, we quantify the probability of adversarial example existence for linear classifiers. Previous mathematical definition of adversarial examples only invol… ▽ More

    Submitted 26 October, 2019; originally announced October 2019.

  48. arXiv:1910.02396  [pdf, other

    stat.ME

    Estimating Unknown Cycles in Geophysical data

    Authors: Xueheng Shi, Colin Gallagher

    Abstract: Examples of cyclic (periodic) behavior in geophysical data abound. In many cases the primary period is known, such as in daily measurements of rain, temperature, and sea level. However, many time series of measurements contain cycles of unknown or varying length. We consider the problem of estimating the unknown period in a time series.We review the basic methods, compare their performance through… ▽ More

    Submitted 6 October, 2019; originally announced October 2019.

  49. arXiv:1909.02344  [pdf, other

    cs.CV cs.LG eess.IV stat.ML

    An Active Learning Approach for Reducing Annotation Cost in Skin Lesion Analysis

    Authors: Xueying Shi, Qi Dou, Cheng Xue, Jing Qin, Hao Chen, Pheng-Ann Heng

    Abstract: Automated skin lesion analysis is very crucial in clinical practice, as skin cancer is among the most common human malignancy. Existing approaches with deep learning have achieved remarkable performance on this challenging task, however, heavily relying on large-scale labelled datasets. In this paper, we present a novel active learning framework for cost-effective skin lesion analysis. The goal is… ▽ More

    Submitted 5 September, 2019; originally announced September 2019.

    Comments: Accepted by MIML2019

  50. arXiv:1907.10485  [pdf, other

    eess.SP stat.AP

    Early Anomaly Detection in Power Systems Based on Random Matrix Theory

    Authors: Xin Shi, Robert Qiu

    Abstract: It is important for detecting the anomaly in power systems before it expands and causes serious faults such as power failures or system blackout. With the deployments of phasor measurement units (PMUs), massive amounts of synchrophasor measurements are collected, which makes it possible for the real-time situation awareness of the entire system. In this paper, based on random matrix theory (RMT),… ▽ More

    Submitted 21 July, 2019; originally announced July 2019.

    Comments: 8 pages, IEEE Trans on Smart Grid, submitted. arXiv admin note: substantial text overlap with arXiv:1801.01669; text overlap with arXiv:1810.08962