Skip to main content

Showing 1–49 of 49 results for author: He, S

Searching in archive stat. Search in all archives.
.
  1. arXiv:2403.12624  [pdf, other

    stat.ME stat.AP

    Large-scale metric objects filtering for binary classification with application to abnormal brain connectivity detection

    Authors: Shuaida He, Jiaqi Li, Xin Chen

    Abstract: The classification of random objects within metric spaces without a vector structure has attracted increasing attention. However, the complexity inherent in such non-Euclidean data often restricts existing models to handle only a limited number of features, leaving a gap in real-world applications. To address this, we propose a data-adaptive filtering procedure to identify informative features fro… ▽ More

    Submitted 19 March, 2024; originally announced March 2024.

  2. arXiv:2402.05438  [pdf, other

    math.ST stat.ME

    Penalized spline estimation of principal components for sparse functional data: rates of convergence

    Authors: Shiyuan He, Jianhua Z. Huang, Kejun He

    Abstract: This paper gives a comprehensive treatment of the convergence rates of penalized spline estimators for simultaneously estimating several leading principal component functions, when the functional data is sparsely observed. The penalized spline estimators are defined as the solution of a penalized empirical risk minimization problem, where the loss function belongs to a general class of loss functi… ▽ More

    Submitted 8 February, 2024; originally announced February 2024.

  3. arXiv:2401.04723  [pdf, other

    stat.ME

    Spatio-temporal data fusion for the analysis of in situ and remote sensing data using the INLA-SPDE approach

    Authors: Shiyu He, Samuel W. K. Wong

    Abstract: We propose a Bayesian hierarchical model to address the challenge of spatial misalignment in spatio-temporal data obtained from in situ and satellite sources. The model is fit using the INLA-SPDE approach, which provides efficient computation. Our methodology combines the different data sources in a "fusion"" model via the construction of projection matrices in both spatial and temporal domains. T… ▽ More

    Submitted 9 January, 2024; originally announced January 2024.

    Comments: 23 pages, 7 figures

  4. arXiv:2401.00667  [pdf, other

    stat.ME stat.CO

    Channelling Multimodality Through a Unimodalizing Transport: Warp-U Sampler and Stochastic Bridge Sampling

    Authors: Fei Ding, David E. Jones, Shiyuan He, Xiao-Li Meng

    Abstract: Monte Carlo integration is fundamental in scientific and statistical computation, but requires reliable samples from the target distribution, which poses a substantial challenge in the case of multi-modal distributions. Existing methods often involve time-consuming tuning, and typically lack tailored estimators for efficient use of the samples. This paper adapts the Warp-U transformation [Wang et… ▽ More

    Submitted 1 January, 2024; originally announced January 2024.

  5. arXiv:2312.13875  [pdf, other

    stat.ML cs.LG stat.ME

    Best Arm Identification in Batched Multi-armed Bandit Problems

    Authors: Shengyu Cao, Simai He, Ruoqing Jiang, Jin Xu, Hongsong Yuan

    Abstract: Recently multi-armed bandit problem arises in many real-life scenarios where arms must be sampled in batches, due to limited time the agent can wait for the feedback. Such applications include biological experimentation and online marketing. The problem is further complicated when the number of arms is large and the number of batches is small. We consider pure exploration in a batched multi-armed… ▽ More

    Submitted 21 December, 2023; originally announced December 2023.

  6. arXiv:2311.03497  [pdf, other

    stat.AP

    Understanding the Impact of Seasonal Climate Change on Canada's Economy by Region and Sector

    Authors: Shiyu He, Trang Bui, Yuying Huang, Wenling Zhang, Jie Jian, Samuel W. K. Wong, Tony S. Wirjanto

    Abstract: To assess the impact of climate change on the Canadian economy, we investigate and model the relationship between seasonal climate variables and economic growth across provinces and economic sectors. We further provide projections of climate change impacts up to the year 2050, taking into account the diverse climate change patterns and economic conditions across Canada. Our results indicate that r… ▽ More

    Submitted 6 November, 2023; originally announced November 2023.

    Comments: 25 pages, 7 figures

  7. arXiv:2306.06324  [pdf, other

    stat.ME stat.ML

    Differentially private sliced inverse regression in the federated paradigm

    Authors: Shuaida He, Jiarui Zhang, Xin Chen

    Abstract: Sliced inverse regression (SIR), which includes linear discriminant analysis (LDA) as a special case, is a popular and powerful dimension reduction tool. In this article, we extend SIR to address the challenges of decentralized data, prioritizing privacy and communication efficiency. Our approach, named as federated sliced inverse regression (FSIR), facilitates collaborative estimation of the suff… ▽ More

    Submitted 10 August, 2023; v1 submitted 9 June, 2023; originally announced June 2023.

  8. Bayesian Nonlinear Tensor Regression with Functional Fused Elastic Net Prior

    Authors: Shuoli Chen, Kejun He, Shiyuan He, Yang Ni, Raymond K. W. Wong

    Abstract: Tensor regression methods have been widely used to predict a scalar response from covariates in the form of a multiway array. In many applications, the regions of tensor covariates used for prediction are often spatially connected with unknown shapes and discontinuous jumps on the boundaries. Moreover, the relationship between the response and the tensor covariates can be nonlinear. In this articl… ▽ More

    Submitted 16 February, 2023; originally announced February 2023.

    Journal ref: Technometrics, 65:4, 524-536 (2023)

  9. arXiv:2211.04874  [pdf, other

    math.ST stat.ML

    A Unified Analysis of Multi-task Functional Linear Regression Models with Manifold Constraint and Composite Quadratic Penalty

    Authors: Shiyuan He, Hanxuan Ye, Kejun He

    Abstract: This work studies the multi-task functional linear regression models where both the covariates and the unknown regression coefficients (called slope functions) are curves. For slope function estimation, we employ penalized splines to balance bias, variance, and computational complexity. The power of multi-task learning is brought in by imposing additional structures over the slope functions. We pr… ▽ More

    Submitted 31 July, 2023; v1 submitted 9 November, 2022; originally announced November 2022.

  10. arXiv:2211.04784  [pdf, other

    stat.ME stat.CO

    Spline Estimation of Functional Principal Components via Manifold Conjugate Gradient Algorithm

    Authors: Shiyuan He, Hanxuan Ye, Kejun He

    Abstract: Functional principal component analysis has become the most important dimension reduction technique in functional data analysis. Based on B-spline approximation, functional principal components (FPCs) can be efficiently estimated by the expectation-maximization (EM) and the geometric restricted maximum likelihood (REML) algorithms under the strong assumption of Gaussianity on the principal compone… ▽ More

    Submitted 9 November, 2022; originally announced November 2022.

  11. arXiv:2210.10887  [pdf, other

    math.OC cs.RO stat.AP

    Data-Driven Distributionally Robust Electric Vehicle Balancing for Mobility-on-Demand Systems under Demand and Supply Uncertainties

    Authors: Sihong He, Lynn Pepin, Guang Wang, Desheng Zhang, Fei Miao

    Abstract: As electric vehicle (EV) technologies become mature, EV has been rapidly adopted in modern transportation systems, and is expected to provide future autonomous mobility-on-demand (AMoD) service with economic and societal benefits. However, EVs require frequent recharges due to their limited and unpredictable cruising ranges, and they have to be managed efficiently given the dynamic charging proces… ▽ More

    Submitted 19 October, 2022; originally announced October 2022.

    Comments: This paper has been published in IROS2020

  12. arXiv:2206.05891  [pdf, other

    cs.LG cs.DC stat.ML

    Anchor Sampling for Federated Learning with Partial Client Participation

    Authors: Feijie Wu, Song Guo, Zhihao Qu, Shiqi He, Ziming Liu, Jing Gao

    Abstract: Compared with full client participation, partial client participation is a more practical scenario in federated learning, but it may amplify some challenges in federated learning, such as data heterogeneity. The lack of inactive clients' updates in partial client participation makes it more likely for the model aggregation to deviate from the aggregation based on full client participation. Trainin… ▽ More

    Submitted 28 May, 2023; v1 submitted 12 June, 2022; originally announced June 2022.

    Comments: ICML 2023

  13. arXiv:2111.08269  [pdf, other

    math.OC stat.ME

    Data-Driven Inpatient Bed Assignment Using the P Model

    Authors: Shasha Han, Shuangchi He, Hong Choon Oh

    Abstract: Problem definition: Emergency department (ED) boarding refers to the practice of holding patients in the ED after they have been admitted to hospital wards, usually resulting from insufficient inpatient resources. Boarded patients may compete with new patients for medical resources in the ED, compromising the quality of emergency care. A common expedient for mitigating boarding is patient overflow… ▽ More

    Submitted 16 November, 2021; originally announced November 2021.

  14. arXiv:2111.06859  [pdf, other

    math.ST math.PR stat.ME

    Higher-Order Coverage Errors of Batching Methods via Edgeworth Expansions on $t$-Statistics

    Authors: Shengyi He, Henry Lam

    Abstract: While batching methods have been widely used in simulation and statistics, it is open regarding their higher-order coverage behaviors and whether one variant is better than the others in this regard. We develop techniques to obtain higher-order coverage errors for batching methods by building Edgeworth-type expansions on $t$-statistics. The coefficients in these expansions are intricate analytical… ▽ More

    Submitted 12 November, 2021; originally announced November 2021.

  15. arXiv:2111.02204  [pdf, other

    stat.ME stat.ML

    Certifiable Deep Importance Sampling for Rare-Event Simulation of Black-Box Systems

    Authors: Mansur Arief, Yuanlu Bai, Wenhao Ding, Shengyi He, Zhiyuan Huang, Henry Lam, Ding Zhao

    Abstract: Rare-event simulation techniques, such as importance sampling (IS), constitute powerful tools to speed up challenging estimation of rare catastrophic events. These techniques often leverage the knowledge and analysis on underlying system structures to endow desirable efficiency guarantees. However, black-box problems, especially those arising from recent safety-critical applications of AI-driven p… ▽ More

    Submitted 3 November, 2021; originally announced November 2021.

    Comments: The conference version of this paper has appeared in AISTATS 2021 (arXiv:2006.15722)

  16. arXiv:2110.07531  [pdf

    stat.ML cs.LG physics.bio-ph q-bio.BM

    Deep learning models for predicting RNA degradation via dual crowdsourcing

    Authors: Hannah K. Wayment-Steele, Wipapat Kladwang, Andrew M. Watkins, Do Soon Kim, Bojan Tunguz, Walter Reade, Maggie Demkin, Jonathan Romano, Roger Wellington-Oguri, John J. Nicol, Jiayang Gao, Kazuki Onodera, Kazuki Fujikawa, Hanfei Mao, Gilles Vandewiele, Michele Tinti, Bram Steenwinckel, Takuya Ito, Taiga Noumi, Shujun He, Keiichiro Ishi, Youhan Lee, Fatih Öztürk, Anthony Chiu, Emin Öztürk , et al. (4 additional authors not shown)

    Abstract: Messenger RNA-based medicines hold immense potential, as evidenced by their rapid deployment as COVID-19 vaccines. However, worldwide distribution of mRNA molecules has been limited by their thermostability, which is fundamentally limited by the intrinsic instability of RNA molecules to a chemical degradation reaction called in-line hydrolysis. Predicting the degradation of an RNA molecule is a ke… ▽ More

    Submitted 22 April, 2022; v1 submitted 14 October, 2021; originally announced October 2021.

  17. arXiv:2110.04232  [pdf, other

    stat.ML cs.IR cs.LG stat.ME

    Learning Topic Models: Identifiability and Finite-Sample Analysis

    Authors: Yinyin Chen, Shishuang He, Yun Yang, Feng Liang

    Abstract: Topic models provide a useful text-mining tool for learning, extracting, and discovering latent structures in large text corpora. Although a plethora of methods have been proposed for topic modeling, lacking in the literature is a formal theoretical investigation of the statistical identifiability and accuracy of latent topic estimation. In this paper, we propose a maximum likelihood estimator (ML… ▽ More

    Submitted 10 August, 2022; v1 submitted 8 October, 2021; originally announced October 2021.

  18. arXiv:2109.03397  [pdf, other

    stat.CO

    Functional Principal Subspace Sampling for Large Scale Functional Data Analysis

    Authors: Shiyuan He, Xiaomeng Yan

    Abstract: Functional data analysis (FDA) methods have computational and theoretical appeals for some high dimensional data, but lack the scalability to modern large sample datasets. To tackle the challenge, we develop randomized algorithms for two important FDA methods: functional principal component analysis (FPCA) and functional linear regression (FLR) with scalar response. The two methods are connected a… ▽ More

    Submitted 8 April, 2022; v1 submitted 7 September, 2021; originally announced September 2021.

  19. arXiv:2109.01993  [pdf, ps, other

    stat.AP

    Statistical computation methods for microbiome compositional data network inference

    Authors: Liang Chen, Qiuyan He, Hui Wan, Shun He, Minghua Deng

    Abstract: Microbes can affect processes from food production to human health. Such microbes are not isolated, but rather interact with each other and establish connections with their living environments. Understanding these interactions is essential to an understanding of the organization and complex interplay of microbial communities, as well as the structure and dynamics of various ecosystems. A common an… ▽ More

    Submitted 5 September, 2021; originally announced September 2021.

  20. arXiv:2108.05908  [pdf, ps, other

    math.OC math.PR stat.ME

    Higher-Order Expansion and Bartlett Correctability of Distributionally Robust Optimization

    Authors: Shengyi He, Henry Lam

    Abstract: Distributionally robust optimization (DRO) is a worst-case framework for stochastic optimization under uncertainty that has drawn fast-growing studies in recent years. When the underlying probability distribution is unknown and observed from data, DRO suggests to compute the worst-case distribution within a so-called uncertainty set that captures the involved statistical uncertainty. In particular… ▽ More

    Submitted 11 August, 2021; originally announced August 2021.

  21. arXiv:2107.08288  [pdf, other

    stat.ME stat.AP

    A Reproducing Kernel Hilbert Space Approach to Functional Calibration of Computer Models

    Authors: Rui Tuo, Shiyuan He, Arash Pourhabib, Yu Ding, Jianhua Z. Huang

    Abstract: This paper develops a frequentist solution to the functional calibration problem, where the value of a calibration parameter in a computer model is allowed to vary with the value of control variables in the physical system. The need of functional calibration is motivated by engineering applications where using a constant calibration parameter results in a significant mismatch between outputs from… ▽ More

    Submitted 17 July, 2021; originally announced July 2021.

    MSC Class: 62G05; 62P30; 62F15

  22. arXiv:2102.10631  [pdf, other

    stat.ME math.OC math.PR

    Adaptive Importance Sampling for Efficient Stochastic Root Finding and Quantile Estimation

    Authors: Shengyi He, Guangxin Jiang, Henry Lam, Michael C. Fu

    Abstract: In solving simulation-based stochastic root-finding or optimization problems that involve rare events, such as in extreme quantile estimation, running crude Monte Carlo can be prohibitively inefficient. To address this issue, importance sampling can be employed to drive down the sampling error to a desirable level. However, selecting a good importance sampler requires knowledge of the solution to… ▽ More

    Submitted 21 February, 2021; originally announced February 2021.

  23. arXiv:2101.02938  [pdf, other

    stat.AP astro-ph.GA astro-ph.SR

    Simultaneous inference of periods and period-luminosity relations for Mira variable stars

    Authors: Shiyuan He, Zhenfeng Lin, Wenlong Yuan, Lucas M. Macri, Jianhua Z. Huang

    Abstract: The Period--Luminosity relation (PLR) of Mira variable stars is an important tool to determine astronomical distances. The common approach of estimating the PLR is a two-step procedure that first estimates the Mira periods and then runs a linear regression of magnitude on log period. When the light curves are sparse and noisy, the accuracy of period estimation decreases and can suffer from aliasin… ▽ More

    Submitted 8 January, 2021; originally announced January 2021.

  24. arXiv:2101.02304  [pdf, other

    stat.AP q-bio.BM

    Statistical challenges in the analysis of sequence and structure data for the COVID-19 spike protein

    Authors: Shiyu He, Samuel W. K. Wong

    Abstract: As the major target of many vaccines and neutralizing antibodies against SARS-CoV-2, the spike (S) protein is observed to mutate over time. In this paper, we present statistical approaches to tackle some challenges associated with the analysis of S-protein data. We build a Bayesian hierarchical model to study the temporal and spatial evolution of S-protein sequences, after grouping the sequences i… ▽ More

    Submitted 30 January, 2021; v1 submitted 6 January, 2021; originally announced January 2021.

    Comments: 21 pages, 5 figures

  25. arXiv:2008.07318  [pdf, other

    cs.LG cs.AI stat.ML

    Towards Dynamic Urban Bike Usage Prediction for Station Network Reconfiguration

    Authors: Xi Yang, Suining He

    Abstract: Bike sharing has become one of the major choices of transportation for residents in metropolitan cities worldwide. A station-based bike sharing system is usually operated in the way that a user picks up a bike from one station, and drops it off at another. Bike stations are, however, not static, as the bike stations are often reconfigured to accommodate changing demands or city urbanization over t… ▽ More

    Submitted 13 August, 2020; originally announced August 2020.

    Comments: 9 pages, UrbComp 2020

  26. arXiv:2006.15722  [pdf, other

    cs.LG stat.ML

    Deep Probabilistic Accelerated Evaluation: A Robust Certifiable Rare-Event Simulation Methodology for Black-Box Safety-Critical Systems

    Authors: Mansur Arief, Zhiyuan Huang, Guru Koushik Senthil Kumar, Yuanlu Bai, Shengyi He, Wenhao Ding, Henry Lam, Ding Zhao

    Abstract: Evaluating the reliability of intelligent physical systems against rare safety-critical events poses a huge testing burden for real-world applications. Simulation provides a useful platform to evaluate the extremal risks of these systems before their deployments. Importance Sampling (IS), while proven to be powerful for rare-event simulation, faces challenges in handling these learning-based syste… ▽ More

    Submitted 8 March, 2021; v1 submitted 28 June, 2020; originally announced June 2020.

  27. arXiv:2006.09449  [pdf, other

    cs.LG cs.SI stat.ML

    Network Diffusions via Neural Mean-Field Dynamics

    Authors: Shushan He, Hongyuan Zha, Xiaojing Ye

    Abstract: We propose a novel learning framework based on neural mean-field dynamics for inference and estimation problems of diffusion on networks. Our new framework is derived from the Mori-Zwanzig formalism to obtain an exact evolution of the node infection probabilities, which renders a delay differential equation with memory integral approximated by learnable time convolution operators, resulting in a h… ▽ More

    Submitted 19 January, 2021; v1 submitted 16 June, 2020; originally announced June 2020.

    Comments: Accepted by NIPS2020, 21 pages, 5 figures

  28. arXiv:2006.07972  [pdf, other

    cs.LG stat.ML

    Sub-Seasonal Climate Forecasting via Machine Learning: Challenges, Analysis, and Advances

    Authors: Sijie He, Xinyan Li, Timothy DelSole, Pradeep Ravikumar, Arindam Banerjee

    Abstract: Sub-seasonal climate forecasting (SSF) focuses on predicting key climate variables such as temperature and precipitation in the 2-week to 2-month time scales. Skillful SSF would have immense societal value, in areas such as agricultural productivity, water resource management, transportation and aviation systems, and emergency planning for extreme weather events. However, SSF is considered more ch… ▽ More

    Submitted 24 June, 2020; v1 submitted 14 June, 2020; originally announced June 2020.

  29. arXiv:2004.08113  [pdf, other

    cs.LG stat.ML

    Incorporating Multiple Cluster Centers for Multi-Label Learning

    Authors: Senlin Shu, Fengmao Lv, Yan Yan, Li Li, Shuo He, Jun He

    Abstract: Multi-label learning deals with the problem that each instance is associated with multiple labels simultaneously. Most of the existing approaches aim to improve the performance of multi-label learning by exploiting label correlations. Although the data augmentation technique is widely used in many machine learning tasks, it is still unclear whether data augmentation is helpful to multi-label learn… ▽ More

    Submitted 16 January, 2022; v1 submitted 17 April, 2020; originally announced April 2020.

    Comments: 19 pages with 4 figures and 4 tables

  30. arXiv:2003.06769  [pdf

    cs.GT stat.ML

    Multi-AI competing and winning against humans in iterated Rock-Paper-Scissors game

    Authors: Lei Wang, Wenbin Huang, Yuanpeng Li, Julian Evans, Sailing He

    Abstract: Predicting and modeling human behavior and finding trends within human decision-making processes is a major problem of social science. Rock Paper Scissors (RPS) is the fundamental strategic question in many game theory problems and real-world competitions. Finding the right approach to beat a particular human opponent is challenging. Here we use an AI (artificial intelligence) algorithm based on M… ▽ More

    Submitted 22 November, 2020; v1 submitted 15 March, 2020; originally announced March 2020.

  31. arXiv:2003.00359  [pdf, other

    cs.LG cs.AI stat.ML

    Contextual-Bandit Based Personalized Recommendation with Time-Varying User Interests

    Authors: Xiao Xu, Fang Dong, Yanghua Li, Shaojian He, Xin Li

    Abstract: A contextual bandit problem is studied in a highly non-stationary environment, which is ubiquitous in various recommender systems due to the time-varying interests of users. Two models with disjoint and hybrid payoffs are considered to characterize the phenomenon that users' preferences towards different items vary differently over time. In the disjoint payoff model, the reward of playing an arm i… ▽ More

    Submitted 29 February, 2020; originally announced March 2020.

    Comments: Accepted by AAAI 20

  32. arXiv:2002.03763  [pdf, other

    cs.CV cs.LG stat.ML

    Learning Numerical Observers using Unsupervised Domain Adaptation

    Authors: Shenghua He, Weimin Zhou, Hua Li, Mark A. Anastasio

    Abstract: Medical imaging systems are commonly assessed by use of objective image quality measures. Supervised deep learning methods have been investigated to implement numerical observers for task-based image quality assessment. However, labeling large amounts of experimental data to train deep neural networks is tedious, expensive, and prone to subjective errors. Computer-simulated image data can potentia… ▽ More

    Submitted 22 February, 2020; v1 submitted 3 February, 2020; originally announced February 2020.

    Comments: SPIE Medical Imaging 2020 (Oral)

  33. arXiv:2001.11425  [pdf, other

    stat.ME stat.AP stat.CO

    Functional PCA with Covariate Dependent Mean and Covariance Structure

    Authors: Fei Ding, Shiyuan He, David E. Jones, Jianhua Z. Huang

    Abstract: Incorporating covariates into functional principal component analysis (PCA) can substantially improve the representation efficiency of the principal components and predictive performance. However, many existing functional PCA methods do not make use of covariates, and those that do often have high computational cost or make overly simplistic assumptions that are violated in practice. In this artic… ▽ More

    Submitted 19 August, 2023; v1 submitted 30 January, 2020; originally announced January 2020.

    Comments: 28 pages, 3 figures

    Journal ref: Technometrics, 64(3), 335-345 (2022)

  34. arXiv:1910.07899  [pdf, other

    cs.LG stat.ML

    Design, Benchmarking and Explainability Analysis of a Game-Theoretic Framework towards Energy Efficiency in Smart Infrastructure

    Authors: Ioannis C. Konstantakopoulos, Hari Prasanna Das, Andrew R. Barkan, Shiying He, Tanya Veeravalli, Huihan Liu, Aummul Baneen Manasawala, Yu-Wen Lin, Costas J. Spanos

    Abstract: In this paper, we propose a gamification approach as a novel framework for smart building infrastructure with the goal of motivating human occupants to reconsider personal energy usage and to have positive effects on their environment. Human interaction in the context of cyber-physical systems is a core component and consideration in the implementation of any smart building technology. Research ha… ▽ More

    Submitted 16 October, 2019; originally announced October 2019.

    Comments: arXiv admin note: substantial text overlap with arXiv:1809.05142, arXiv:1810.10533

  35. arXiv:1902.03047  [pdf, other

    cs.LG stat.ML

    Collaboration based Multi-Label Learning

    Authors: Lei Feng, Bo An, Shuo He

    Abstract: It is well-known that exploiting label correlations is crucially important to multi-label learning. Most of the existing approaches take label correlations as prior knowledge, which may not correctly characterize the real relationships among labels. Besides, label correlations are normally used to regularize the hypothesis space, while the final predictions are not explicitly correlated. In this p… ▽ More

    Submitted 8 February, 2019; originally announced February 2019.

    Comments: Accepted by AAAI-19

  36. arXiv:1810.00867  [pdf, other

    cs.LG stat.ML

    Domain-Adversarial Multi-Task Framework for Novel Therapeutic Property Prediction of Compounds

    Authors: Lingwei Xie, Song He, Shu Yang, Boyuan Feng, Kun Wan, Zhongnan Zhang, Xiaochen Bo, Yufei Ding

    Abstract: With the rapid development of high-throughput technologies, parallel acquisition of large-scale drug-informatics data provides huge opportunities to improve pharmaceutical research and development. One significant application is the purpose prediction of small molecule compounds, aiming to specify therapeutic properties of extensive purpose-unknown compounds and to repurpose novel therapeutic prop… ▽ More

    Submitted 28 September, 2018; originally announced October 2018.

    Comments: 9 pages, 6 figures

  37. arXiv:1809.05142  [pdf, other

    cs.LG stat.ML

    A Deep Learning and Gamification Approach to Energy Conservation at Nanyang Technological University

    Authors: Ioannis C. Konstantakopoulos, Andrew R. Barkan, Shiying He, Tanya Veeravalli, Huihan Liu, Costas Spanos

    Abstract: The implementation of smart building technology in the form of smart infrastructure applications has great potential to improve sustainability and energy efficiency by leveraging humans-in-the-loop strategy. However, human preference in regard to living conditions is usually unknown and heterogeneous in its manifestation as control inputs to a building. Furthermore, the occupants of a building typ… ▽ More

    Submitted 25 September, 2018; v1 submitted 13 September, 2018; originally announced September 2018.

    Comments: 16 double pages, shorter version submitted to Applied Energy Journal

  38. arXiv:1807.11158  [pdf, other

    cs.LG cs.CV stat.ML

    Robust Student Network Learning

    Authors: Tianyu Guo, Chang Xu, Shiyi He, Boxin Shi, Chao Xu, Dacheng Tao

    Abstract: Deep neural networks bring in impressive accuracy in various applications, but the success often relies on the heavy network architecture. Taking well-trained heavy networks as teachers, classical teacher-student learning paradigm aims to learn a student network that is lightweight yet accurate. In this way, a portable student network with significantly fewer parameters can achieve a considerable… ▽ More

    Submitted 30 July, 2018; v1 submitted 29 July, 2018; originally announced July 2018.

  39. arXiv:1805.09023  [pdf, other

    cs.IR cs.LG stat.ML

    Addressing the Item Cold-start Problem by Attribute-driven Active Learning

    Authors: Yu Zhu, Jinhao Lin, Shibi He, Beidou Wang, Ziyu Guan, Haifeng Liu, Deng Cai

    Abstract: In recommender systems, cold-start issues are situations where no previous events, e.g. ratings, are known for certain users or items. In this paper, we focus on the item cold-start problem. Both content information (e.g. item attributes) and initial user ratings are valuable for seizing users' preferences on a new item. However, previous methods for the item cold-start problem either 1) incorpora… ▽ More

    Submitted 23 May, 2018; originally announced May 2018.

    Comments: 14 pages, 7 figures, 9 tables. Submitted to TKDE

    ACM Class: H.3.3

  40. arXiv:1805.00159  [pdf, other

    astro-ph.CO stat.AP

    Detecting Galaxy-Filament Alignments in the Sloan Digital Sky Survey III

    Authors: Yen-Chi Chen, Shirley Ho, Jonathan Blazek, Siyu He, Rachel Mandelbaum, Peter Melchior, Sukhdeep Singh

    Abstract: Previous studies have shown the filamentary structures in the cosmic web influence the alignments of nearby galaxies. We study this effect in the LOWZ sample of the Sloan Digital Sky Survey using the "Cosmic Web Reconstruction" filament catalogue. We find that LOWZ galaxies exhibit a small but statistically significant alignment in the direction parallel to the orientation of nearby filaments. Thi… ▽ More

    Submitted 21 February, 2019; v1 submitted 30 April, 2018; originally announced May 2018.

    Comments: 14 pages, 13 figures. Accepted to the MNRAS

  41. arXiv:1709.03891  [pdf, other

    cs.LG stat.ML

    High-Dimensional Dependency Structure Learning for Physical Processes

    Authors: Jamal Golmohammadi, Imme Ebert-Uphoff, Sijie He, Yi Deng, Arindam Banerjee

    Abstract: In this paper, we consider the use of structure learning methods for probabilistic graphical models to identify statistical dependencies in high-dimensional physical processes. Such processes are often synthetically characterized using PDEs (partial differential equations) and are observed in a variety of natural phenomena, including geoscience data capturing atmospheric and hydrological phenomena… ▽ More

    Submitted 12 September, 2017; originally announced September 2017.

    Comments: 21 pages, 8 figures, International Conference on Data Mining 2017

  42. arXiv:1708.04742  [pdf, ps, other

    astro-ph.SR stat.AP

    Large Magellanic Cloud Near-Infrared Synoptic Survey. V. Period-Luminosity Relations of Miras

    Authors: Wenlong Yuan, Lucas M. Macri, Shiyuan He, Jianhua Z. Huang, Shashi M. Kanbur, Chow-Choong Ngeow

    Abstract: We study the near-infrared properties of 690 Mira candidates in the central region of the Large Magellanic Cloud, based on time-series observations at JHKs. We use densely-sampled I-band observations from the OGLE project to generate template light curves in the near infrared and derive robust mean magnitudes at those wavelengths. We obtain near-infrared Period-Luminosity relations for Oxygen-rich… ▽ More

    Submitted 15 August, 2017; originally announced August 2017.

    Comments: Accepted for publication in The Astronomical Journal

  43. arXiv:1708.02079  [pdf, other

    math.OC stat.CO

    The discrete moment problem with nonconvex shape constraints

    Authors: Xi Chen, Simai He, Bo Jiang, Christopher Thomas Ryan, Teng Zhang

    Abstract: The discrete moment problem is a foundational problem in distribution-free robust optimization, where the goal is to find a worst-case distribution that satisfies a given set of moments. This paper studies the discrete moment problems with additional "shape constraints" that guarantee the worst case distribution is either log-concave or has an increasing failure rate. These classes of shape constr… ▽ More

    Submitted 7 August, 2017; originally announced August 2017.

    Comments: 31 pages, 3 figures

  44. arXiv:1707.06635  [pdf

    q-fin.GN stat.AP

    Impact of the Global Crisis on SME Internal vs. External Financing in China

    Authors: ShiXue He, Marcel Ausloos

    Abstract: Changes in the capital structure before and after the global financial crisis for SMEs are studied, emphasizing their financing problems, distinguishing between internal financing and external financing determinants. The empirical research bears upon 158 small and medium-sized firms listed on Shenzhen and Shanghai Stock Exchanges in China over the period of 2004-2014. A regression analysis, along… ▽ More

    Submitted 1 July, 2017; originally announced July 2017.

    Comments: 17 pages, 6 tables, 43 references

    Journal ref: Banking and Finance Review, 9(1) (2017) 1-17

  45. arXiv:1703.01000  [pdf, ps, other

    astro-ph.SR stat.AP

    The M33 Synoptic Stellar Survey. II. Mira Variables

    Authors: Wenlong Yuan, Shiyuan He, Lucas M. Macri, James Long, Jianhua Z. Huang

    Abstract: We present the discovery of 1847 Mira candidates in the Local Group galaxy M33 using a novel semi-parametric periodogram technique coupled with a Random Forest classifier. The algorithms were applied to ~2.4x10^5 I-band light curves previously obtained by the M33 Synoptic Stellar Survey. We derive preliminary Period-Luminosity relations at optical, near- & mid-infrared wavelengths and compare them… ▽ More

    Submitted 24 March, 2017; v1 submitted 2 March, 2017; originally announced March 2017.

    Comments: Includes small corrections to match the published version

    Journal ref: The Astronomical Journal, 153:170 (2017)

  46. arXiv:1611.01606  [pdf, other

    cs.LG stat.ML

    Learning to Play in a Day: Faster Deep Reinforcement Learning by Optimality Tightening

    Authors: Frank S. He, Yang Liu, Alexander G. Schwing, Jian Peng

    Abstract: We propose a novel training algorithm for reinforcement learning which combines the strength of deep Q-learning with a constrained optimization approach to tighten optimality and encourage faster reward propagation. Our novel technique makes deep reinforcement learning more practical by drastically reducing the training time. We evaluate the performance of our approach on the 49 games of the chall… ▽ More

    Submitted 5 November, 2016; originally announced November 2016.

  47. arXiv:1609.06680  [pdf, ps, other

    astro-ph.SR astro-ph.IM stat.AP

    Period estimation for sparsely-sampled quasi-periodic light curves applied to Miras

    Authors: Shiyuan He, Wenlong Yuan, Jianhua Z. Huang, James Long, Lucas M. Macri

    Abstract: We develop a non-linear semi-parametric Gaussian process model to estimate periods of Miras with sparsely-sampled light curves. The model uses a sinusoidal basis for the periodic variation and a Gaussian process for the stochastic changes. We use maximum likelihood to estimate the period and the parameters of the Gaussian process, while integrating out the effects of other nuisance parameters in t… ▽ More

    Submitted 17 November, 2016; v1 submitted 21 September, 2016; originally announced September 2016.

    Comments: Changes in v3: minor edits to match the published version. Software package and test data set available at http://github.com/shiyuanhe/varStar

    Journal ref: The Astronomical Journal, 152 (6), 164 (2016)

  48. arXiv:1501.00537  [pdf

    stat.AP math.ST

    A theoretical foundation of the target-decoy search strategy for false discovery rate control in proteomics

    Authors: Kun He, Yan Fu, Wen-Feng Zeng, Lan Luo, Hao Chi, Chao Liu, Lai-Yun Qing, Rui-Xiang Sun, Si-Min He

    Abstract: Motivation: Target-decoy search (TDS) is currently the most popular strategy for estimating and controlling the false discovery rate (FDR) of peptide identifications in mass spectrometry-based shotgun proteomics. While this strategy is very useful in practice and has been intensively studied empirically, its theoretical foundation has not yet been well established. Result: In this work, we systema… ▽ More

    Submitted 3 January, 2015; originally announced January 2015.

    Comments: 7 pages, 2 figures

  49. arXiv:1407.8382  [pdf, ps, other

    stat.AP q-bio.PE

    Detection boundary and Higher Criticism approach for rare and weak genetic effects

    Authors: Zheyang Wu, Yiming Sun, Shiquan He, Judy Cho, Hongyu Zhao, Jiashun Jin

    Abstract: Genome-wide association studies (GWAS) have identified many genetic factors underlying complex human traits. However, these factors have explained only a small fraction of these traits' genetic heritability. It is argued that many more genetic factors remain undiscovered. These genetic factors likely are weakly associated at the population level and sparsely distributed across the genome. In this… ▽ More

    Submitted 31 July, 2014; originally announced July 2014.

    Comments: Published in at http://dx.doi.org/10.1214/14-AOAS724 the Annals of Applied Statistics (http://www.imstat.org/aoas/) by the Institute of Mathematical Statistics (http://www.imstat.org)

    Report number: IMS-AOAS-AOAS724

    Journal ref: Annals of Applied Statistics 2014, Vol. 8, No. 2, 824-851