Search | arXiv e-print repository

Copula-Based Estimation of Causal Effects in Multiple Linear and Path Analysis Models

Authors: Alam Ali, Ashok Kumar Pathak, Mohd Arshad, Ayyub Sheikhi

Abstract: Regression analysis is one of the most popularly used statistical technique which only measures the direct effect of independent variables on dependent variable. Path analysis looks for both direct and indirect effects of independent variables and may overcome several hurdles allied with regression models. It utilizes one or more structural regression equations in the model which are used to estim… ▽ More Regression analysis is one of the most popularly used statistical technique which only measures the direct effect of independent variables on dependent variable. Path analysis looks for both direct and indirect effects of independent variables and may overcome several hurdles allied with regression models. It utilizes one or more structural regression equations in the model which are used to estimate the unknown parameters. The aim of this work is to study the path analysis models when the endogenous (dependent) variable and exogenous (independent) variables are linked through the elliptical copulas. Using well-organized numerical schemes, we investigate the performance of path models when direct and indirect effects are estimated applying classical ordinary least squares and copula-based regression approaches in different scenarios. Finally, two real data applications are also presented to demonstrate the performance of path analysis using copula approach. △ Less

Submitted 25 June, 2024; originally announced June 2024.

Comments: 23 pages, 3 figures, 11 tables

MSC Class: 62H05; 62J05; 62F10

arXiv:2405.11624 [pdf, other]

On Generalized Transmuted Lifetime Distribution

Authors: Alok Kumar Pandey, Alam Ali, Ashok Kumar Pathak

Abstract: This article presents a new class of generalized transmuted lifetime distributions which includes a large number of lifetime distributions as sub-family. Several important mathematical quantities such as density function, distribution function, quantile function, moments, moment generating function, stress-strength reliability function, order statistics, Rényi and q-entropy, residual and reversed… ▽ More This article presents a new class of generalized transmuted lifetime distributions which includes a large number of lifetime distributions as sub-family. Several important mathematical quantities such as density function, distribution function, quantile function, moments, moment generating function, stress-strength reliability function, order statistics, Rényi and q-entropy, residual and reversed residual life function, and cumulative information generating function are obtained. The methods of maximum likelihood, ordinary least square, weighted least square, Cramér-von Mises, Anderson Darling, and Right-tail Anderson Darling are considered to estimate the model parameters in a general way. Further, a well-organized Monte Carlo simulation experiments have been performed to observe the behavior of the estimators. Finally, two real data have also been analyzed to demonstrate the effectiveness of the proposed distribution in real-life modeling. △ Less

Submitted 19 May, 2024; originally announced May 2024.

Comments: 26 pages, 8 figures

MSC Class: 60E05; 62F10; 62E15; 65C05; 33B20

arXiv:2401.12667 [pdf, ps, other]

Feature Selection via Robust Weighted Score for High Dimensional Binary Class-Imbalanced Gene Expression Data

Authors: Zardad Khan, Amjad Ali, Saeed Aldahmani

Abstract: In this paper, a robust weighted score for unbalanced data (ROWSU) is proposed for selecting the most discriminative feature for high dimensional gene expression binary classification with class-imbalance problem. The method addresses one of the most challenging problems of highly skewed class distributions in gene expression datasets that adversely affect the performance of classification algorit… ▽ More In this paper, a robust weighted score for unbalanced data (ROWSU) is proposed for selecting the most discriminative feature for high dimensional gene expression binary classification with class-imbalance problem. The method addresses one of the most challenging problems of highly skewed class distributions in gene expression datasets that adversely affect the performance of classification algorithms. First, the training dataset is balanced by synthetically generating data points from minority class observations. Second, a minimum subset of genes is selected using a greedy search approach. Third, a novel weighted robust score, where the weights are computed by support vectors, is introduced to obtain a refined set of genes. The highest-scoring genes based on this approach are combined with the minimum subset of genes selected by the greedy search approach to form the final set of genes. The novel method ensures the selection of the most discriminative genes, even in the presence of skewed class distribution, thus improving the performance of the classifiers. The performance of the proposed ROWSU method is evaluated on $6$ gene expression datasets. Classification accuracy and sensitivity are used as performance metrics to compare the proposed ROWSU algorithm with several other state-of-the-art methods. Boxplots and stability plots are also constructed for a better understanding of the results. The results show that the proposed method outperforms the existing feature selection procedures based on classification performance from k nearest neighbours (kNN) and random forest (RF) classifiers. △ Less

Submitted 23 January, 2024; originally announced January 2024.

Comments: 25 pages

MSC Class: 14J60

arXiv:2401.05591 [pdf]

Time Series of Magnetic Field Parameters of Merged MDI and HMI Space-Weather Active Region Patches as Potential Tool for Solar Flare Forecasting

Authors: Paul A. Kosovich, Viacheslav M. Sadykov, Alexander G. Kosovichev, Spiridon Kasapis, Irina N. Kitiashvili, Patrick M. O'Keefe, Aatiya Ali, Vincent Oria, Samuel Granovsky, Chun Jie Chong, Gelu M. Nita

Abstract: Solar flare prediction studies have been recently conducted with the use of Space-Weather MDI (Michelson Doppler Imager onboard Solar and Heliospheric Observatory) Active Region Patches (SMARP) and Space-Weather HMI (Helioseismic and Magnetic Imager onboard Solar Dynamics Observatory) Active Region Patches (SHARP), which are two currently available data products containing magnetic field character… ▽ More Solar flare prediction studies have been recently conducted with the use of Space-Weather MDI (Michelson Doppler Imager onboard Solar and Heliospheric Observatory) Active Region Patches (SMARP) and Space-Weather HMI (Helioseismic and Magnetic Imager onboard Solar Dynamics Observatory) Active Region Patches (SHARP), which are two currently available data products containing magnetic field characteristics of solar active regions. The present work is an effort to combine them into one data product, and perform some initial statistical analyses in order to further expand their application in space weather forecasting. The combined data are derived by filtering, rescaling, and merging the SMARP with SHARP parameters, which can then be spatially reduced to create uniform multivariate time series. The resulting combined MDI-HMI dataset currently spans the period between April 4, 1996, and December 13, 2022, and may be extended to a more recent date. This provides an opportunity to correlate and compare it with other space weather time series, such as the daily solar flare index or the statistical properties of the soft X-ray flux measured by the Geostationary Operational Environmental Satellites (GOES). Time-lagged cross-correlation indicates that a relationship may exist, where some magnetic field properties of active regions lead the flare index in time. Applying the rolling window technique makes it possible to see how this leader-follower dynamic varies with time. Preliminary results indicate that areas of high correlation generally correspond to increased flare activity during the peak solar cycle. △ Less

Submitted 22 February, 2024; v1 submitted 10 January, 2024; originally announced January 2024.

arXiv:2303.12210 [pdf, ps, other]

A Random Projection k Nearest Neighbours Ensemble for Classification via Extended Neighbourhood Rule

Authors: Amjad Ali, Muhammad Hamraz, Dost Muhammad Khan, Wajdan Deebani, Zardad Khan

Abstract: Ensembles based on k nearest neighbours (kNN) combine a large number of base learners, each constructed on a sample taken from a given training data. Typical kNN based ensembles determine the k closest observations in the training data bounded to a test sample point by a spherical region to predict its class. In this paper, a novel random projection extended neighbourhood rule (RPExNRule) ensemble… ▽ More Ensembles based on k nearest neighbours (kNN) combine a large number of base learners, each constructed on a sample taken from a given training data. Typical kNN based ensembles determine the k closest observations in the training data bounded to a test sample point by a spherical region to predict its class. In this paper, a novel random projection extended neighbourhood rule (RPExNRule) ensemble is proposed where bootstrap samples from the given training data are randomly projected into lower dimensions for additional randomness in the base models and to preserve features information. It uses the extended neighbourhood rule (ExNRule) to fit kNN as base learners on randomly projected bootstrap samples. △ Less

Submitted 21 March, 2023; originally announced March 2023.

Comments: 23 pages, 8 diagrams, 69 references

ACM Class: F.2.2

arXiv:2211.11278 [pdf, ps, other]

Optimal Extended Neighbourhood Rule $k$ Nearest Neighbours Ensemble

Authors: Amjad Ali, Zardad Khan, Dost Muhammad Khan, Saeed Aldahmani

Abstract: The traditional k nearest neighbor (kNN) approach uses a distance formula within a spherical region to determine the k closest training observations to a test sample point. However, this approach may not work well when test point is located outside this region. Moreover, aggregating many base kNN learners can result in poor ensemble performance due to high classification errors. To address these i… ▽ More The traditional k nearest neighbor (kNN) approach uses a distance formula within a spherical region to determine the k closest training observations to a test sample point. However, this approach may not work well when test point is located outside this region. Moreover, aggregating many base kNN learners can result in poor ensemble performance due to high classification errors. To address these issues, a new optimal extended neighborhood rule based ensemble method is proposed in this paper. This rule determines neighbors in k steps starting from the closest sample point to the unseen observation and selecting subsequent nearest data points until the required number of observations is reached. Each base model is constructed on a bootstrap sample with a random subset of features, and optimal models are selected based on out-of-bag performance after building a sufficient number of models. The proposed ensemble is compared with state-of-the-art methods on 17 benchmark datasets using accuracy, Cohen's kappa, and Brier score (BS). The performance of the proposed method is also assessed by adding contrived features in the original data. △ Less

Submitted 15 February, 2024; v1 submitted 21 November, 2022; originally announced November 2022.

Comments: This manuscript has been submitted for publication in the esteemed journal Pattern Recognition Letters

MSC Class: 14J60

arXiv:2202.04166 [pdf, other]

The Lifecycle of a Statistical Model: Model Failure Detection, Identification, and Refitting

Authors: Alnur Ali, Maxime Cauchois, John C. Duchi

Abstract: The statistical machine learning community has demonstrated considerable resourcefulness over the years in developing highly expressive tools for estimation, prediction, and inference. The bedrock assumptions underlying these developments are that the data comes from a fixed population and displays little heterogeneity. But reality is significantly more complex: statistical models now routinely fa… ▽ More The statistical machine learning community has demonstrated considerable resourcefulness over the years in developing highly expressive tools for estimation, prediction, and inference. The bedrock assumptions underlying these developments are that the data comes from a fixed population and displays little heterogeneity. But reality is significantly more complex: statistical models now routinely fail when released into real-world systems and scientific applications, where such assumptions rarely hold. Consequently, we pursue a different path in this paper vis-a-vis the well-worn trail of developing new methodology for estimation and prediction. In this paper, we develop tools and theory for detecting and identifying regions of the covariate space (subpopulations) where model performance has begun to degrade, and study intervening to fix these failures through refitting. We present empirical results with three real-world data sets -- including a time series involving forecasting the incidence of COVID-19 -- showing that our methodology generates interpretable results, is useful for tracking model performance, and can boost model performance through refitting. We complement these empirical results with theory proving that our methodology is minimax optimal for recovering anomalous subpopulations as well as refitting to improve accuracy in a structured normal means setting. △ Less

Submitted 8 February, 2022; originally announced February 2022.

arXiv:2201.08315 [pdf, other]

Predictive Inference with Weak Supervision

Authors: Maxime Cauchois, Suyash Gupta, Alnur Ali, John Duchi

Abstract: The expense of acquiring labels in large-scale statistical machine learning makes partially and weakly-labeled data attractive, though it is not always apparent how to leverage such data for model fitting or validation. We present a methodology to bridge the gap between partial supervision and validation, developing a conformal prediction framework to provide valid predictive confidence sets -- se… ▽ More The expense of acquiring labels in large-scale statistical machine learning makes partially and weakly-labeled data attractive, though it is not always apparent how to leverage such data for model fitting or validation. We present a methodology to bridge the gap between partial supervision and validation, developing a conformal prediction framework to provide valid predictive confidence sets -- sets that cover a true label with a prescribed probability, independent of the underlying distribution -- using weakly labeled data. To do so, we introduce a (necessary) new notion of coverage and predictive validity, then develop several application scenarios, providing efficient algorithms for classification and several large-scale structured prediction problems. We corroborate the hypothesis that the new coverage definition allows for tighter and more informative (but valid) confidence sets through several experiments. △ Less

Submitted 9 February, 2022; v1 submitted 20 January, 2022; originally announced January 2022.

arXiv:2201.08311 [pdf, other]

Accelerated Gradient Flow: Risk, Stability, and Implicit Regularization

Authors: Yue Sheng, Alnur Ali

Abstract: Acceleration and momentum are the de facto standard in modern applications of machine learning and optimization, yet the bulk of the work on implicit regularization focuses instead on unaccelerated methods. In this paper, we study the statistical risk of the iterates generated by Nesterov's accelerated gradient method and Polyak's heavy ball method, when applied to least squares regression, drawin… ▽ More Acceleration and momentum are the de facto standard in modern applications of machine learning and optimization, yet the bulk of the work on implicit regularization focuses instead on unaccelerated methods. In this paper, we study the statistical risk of the iterates generated by Nesterov's accelerated gradient method and Polyak's heavy ball method, when applied to least squares regression, drawing several connections to explicit penalization. We carry out our analyses in continuous-time, allowing us to make sharper statements than in prior work, and revealing complex interactions between early stopping, stability, and the curvature of the loss function. △ Less

Submitted 20 January, 2022; originally announced January 2022.

arXiv:2103.02559 [pdf, other]

Minimum-Distortion Embedding

Authors: Akshay Agrawal, Alnur Ali, Stephen Boyd

Abstract: We consider the vector embedding problem. We are given a finite set of items, with the goal of assigning a representative vector to each one, possibly under some constraints (such as the collection of vectors being standardized, i.e., having zero mean and unit covariance). We are given data indicating that some pairs of items are similar, and optionally, some other pairs are dissimilar. For pairs… ▽ More We consider the vector embedding problem. We are given a finite set of items, with the goal of assigning a representative vector to each one, possibly under some constraints (such as the collection of vectors being standardized, i.e., having zero mean and unit covariance). We are given data indicating that some pairs of items are similar, and optionally, some other pairs are dissimilar. For pairs of similar items, we want the corresponding vectors to be near each other, and for dissimilar pairs, we want the corresponding vectors to not be near each other, measured in Euclidean distance. We formalize this by introducing distortion functions, defined for some pairs of the items. Our goal is to choose an embedding that minimizes the total distortion, subject to the constraints. We call this the minimum-distortion embedding (MDE) problem. The MDE framework is simple but general. It includes a wide variety of embedding methods, such as spectral embedding, principal component analysis, multidimensional scaling, dimensionality reduction methods (like Isomap and UMAP), force-directed layout, and others. It also includes new embeddings, and provides principled ways of validating historical and new embeddings alike. We develop a projected quasi-Newton method that approximately solves MDE problems and scales to large data sets. We implement this method in PyMDE, an open-source Python package. In PyMDE, users can select from a library of distortion functions and constraints or specify custom ones, making it easy to rapidly experiment with different embeddings. Our software scales to data sets with millions of items and tens of millions of distortion functions. To demonstrate our method, we compute embeddings for several real-world data sets, including images, an academic co-author network, US county demographic data, and single-cell mRNA transcriptomes. △ Less

Submitted 24 August, 2021; v1 submitted 3 March, 2021; originally announced March 2021.

arXiv:2011.03668 [pdf, other]

Confidence bands for a log-concave density

Authors: Guenther Walther, Alnur Ali, Xinyue Shen, Stephen Boyd

Abstract: We present a new approach for inference about a log-concave distribution: Instead of using the method of maximum likelihood, we propose to incorporate the log-concavity constraint in an appropriate nonparametric confidence set for the cdf $F$. This approach has the advantage that it automatically provides a measure of statistical uncertainty and it thus overcomes a marked limitation of the maximum… ▽ More We present a new approach for inference about a log-concave distribution: Instead of using the method of maximum likelihood, we propose to incorporate the log-concavity constraint in an appropriate nonparametric confidence set for the cdf $F$. This approach has the advantage that it automatically provides a measure of statistical uncertainty and it thus overcomes a marked limitation of the maximum likelihood estimate. In particular, we show how to construct confidence bands for the density that have a finite sample guaranteed confidence level. The nonparametric confidence set for $F$ which we introduce here has attractive computational and statistical properties: It allows to bring modern tools from optimization to bear on this problem via difference of convex programming, and it results in optimal statistical inference. We show that the width of the resulting confidence bands converges at nearly the parametric $n^{-\frac{1}{2}}$ rate when the log density is $k$-affine. △ Less

Submitted 6 May, 2022; v1 submitted 6 November, 2020; originally announced November 2020.

Comments: Added a discussion section, minor changes

arXiv:2008.04267 [pdf, other]

doi 10.1080/01621459.2023.2298037

Robust Validation: Confident Predictions Even When Distributions Shift

Authors: Maxime Cauchois, Suyash Gupta, Alnur Ali, John C. Duchi

Abstract: While the traditional viewpoint in machine learning and statistics assumes training and testing samples come from the same population, practice belies this fiction. One strategy -- coming from robust statistics and optimization -- is thus to build a model robust to distributional perturbations. In this paper, we take a different approach to describe procedures for robust predictive inference, wher… ▽ More While the traditional viewpoint in machine learning and statistics assumes training and testing samples come from the same population, practice belies this fiction. One strategy -- coming from robust statistics and optimization -- is thus to build a model robust to distributional perturbations. In this paper, we take a different approach to describe procedures for robust predictive inference, where a model provides uncertainty estimates on its predictions rather than point predictions. We present a method that produces prediction sets (almost exactly) giving the right coverage level for any test distribution in an $f$-divergence ball around the training population. The method, based on conformal inference, achieves (nearly) valid coverage in finite samples, under only the condition that the training data be exchangeable. An essential component of our methodology is to estimate the amount of expected future data shift and build robustness to it; we develop estimators and prove their consistency for protection and validity of uncertainty estimates under shifts. By experimenting on several large-scale benchmark datasets, including Recht et al.'s CIFAR-v4 and ImageNet-V2 datasets, we provide complementary empirical results that highlight the importance of robust predictive validity. △ Less

Submitted 4 July, 2024; v1 submitted 10 August, 2020; originally announced August 2020.

Comments: Published in the Journal of the American Statistical Association (JASA 2024)

arXiv:2006.07187 [pdf, other]

doi 10.3390/info11060318

HMIC: Hierarchical Medical Image Classification, A Deep Learning Approach

Authors: Kamran Kowsari, Rasoul Sali, Lubaina Ehsan, William Adorno, Asad Ali, Sean Moore, Beatrice Amadi, Paul Kelly, Sana Syed, Donald Brown

Abstract: Image classification is central to the big data revolution in medicine. Improved information processing methods for diagnosis and classification of digital medical images have shown to be successful via deep learning approaches. As this field is explored, there are limitations to the performance of traditional supervised classifiers. This paper outlines an approach that is different from the curre… ▽ More Image classification is central to the big data revolution in medicine. Improved information processing methods for diagnosis and classification of digital medical images have shown to be successful via deep learning approaches. As this field is explored, there are limitations to the performance of traditional supervised classifiers. This paper outlines an approach that is different from the current medical image classification tasks that view the issue as multi-class classification. We performed a hierarchical classification using our Hierarchical Medical Image classification (HMIC) approach. HMIC uses stacks of deep learning models to give particular comprehension at each level of the clinical picture hierarchy. For testing our performance, we use biopsy of the small bowel images that contain three categories in the parent level (Celiac Disease, Environmental Enteropathy, and histologically normal controls). For the child level, Celiac Disease Severity is classified into 4 classes (I, IIIa, IIIb, and IIIC). △ Less

Submitted 23 June, 2020; v1 submitted 12 June, 2020; originally announced June 2020.

Journal ref: Information 11, no. 6 (2020): 318

arXiv:2005.03868 [pdf, other]

Hierarchical Deep Convolutional Neural Networks for Multi-category Diagnosis of Gastrointestinal Disorders on Histopathological Images

Authors: Rasoul Sali, Sodiq Adewole, Lubaina Ehsan, Lee A. Denson, Paul Kelly, Beatrice C. Amadi, Lori Holtz, Syed Asad Ali, Sean R. Moore, Sana Syed, Donald E. Brown

Abstract: Deep convolutional neural networks(CNNs) have been successful for a wide range of computer vision tasks, including image classification. A specific area of the application lies in digital pathology for pattern recognition in the tissue-based diagnosis of gastrointestinal(GI) diseases. This domain can utilize CNNs to translate histopathological images into precise diagnostics. This is challenging s… ▽ More Deep convolutional neural networks(CNNs) have been successful for a wide range of computer vision tasks, including image classification. A specific area of the application lies in digital pathology for pattern recognition in the tissue-based diagnosis of gastrointestinal(GI) diseases. This domain can utilize CNNs to translate histopathological images into precise diagnostics. This is challenging since these complex biopsies are heterogeneous and require multiple levels of assessment. This is mainly due to structural similarities in different parts of the GI tract and shared features among different gut diseases. Addressing this problem with a flat model that assumes all classes (parts of the gut and their diseases) are equally difficult to distinguish leads to an inadequate assessment of each class. Since the hierarchical model restricts classification error to each sub-class, it leads to a more informative model than a flat model. In this paper, we propose to apply the hierarchical classification of biopsy images from different parts of the GI tract and the receptive diseases within each. We embedded a class hierarchy into the plain VGGNet to take advantage of its layers' hierarchical structure. The proposed model was evaluated using an independent set of image patches from 373 whole slide images. The results indicate that the hierarchical model can achieve better results than the flat model for multi-category diagnosis of GI disorders using histopathological images. △ Less

Submitted 6 August, 2020; v1 submitted 8 May, 2020; originally announced May 2020.

Comments: accepted at IEEE International Conference on Healthcare Informatics (ICHI 2020)

arXiv:2003.09018 [pdf, other]

Human Activity Recognition from Wearable Sensor Data Using Self-Attention

Authors: Saif Mahmud, M Tanjid Hasan Tonmoy, Kishor Kumar Bhaumik, A K M Mahbubur Rahman, M Ashraful Amin, Mohammad Shoyaib, Muhammad Asif Hossain Khan, Amin Ahsan Ali

Abstract: Human Activity Recognition from body-worn sensor data poses an inherent challenge in capturing spatial and temporal dependencies of time-series signals. In this regard, the existing recurrent or convolutional or their hybrid models for activity recognition struggle to capture spatio-temporal context from the feature space of sensor reading sequence. To address this complex problem, we propose a se… ▽ More Human Activity Recognition from body-worn sensor data poses an inherent challenge in capturing spatial and temporal dependencies of time-series signals. In this regard, the existing recurrent or convolutional or their hybrid models for activity recognition struggle to capture spatio-temporal context from the feature space of sensor reading sequence. To address this complex problem, we propose a self-attention based neural network model that foregoes recurrent architectures and utilizes different types of attention mechanisms to generate higher dimensional feature representation used for classification. We performed extensive experiments on four popular publicly available HAR datasets: PAMAP2, Opportunity, Skoda and USC-HAD. Our model achieve significant performance improvement over recent state-of-the-art models in both benchmark test subjects and Leave-one-subject-out evaluation. We also observe that the sensor attention maps produced by our model is able capture the importance of the modality and placement of the sensors in predicting the different activity classes. △ Less

Submitted 17 March, 2020; originally announced March 2020.

Comments: Accepted for publication at the 24th European Conference on Artificial Intelligence (ECAI-2020); 8 pages, 4 figures

arXiv:2003.07802 [pdf, other]

The Implicit Regularization of Stochastic Gradient Flow for Least Squares

Authors: Alnur Ali, Edgar Dobriban, Ryan J. Tibshirani

Abstract: We study the implicit regularization of mini-batch stochastic gradient descent, when applied to the fundamental problem of least squares regression. We leverage a continuous-time stochastic differential equation having the same moments as stochastic gradient descent, which we call stochastic gradient flow. We give a bound on the excess risk of stochastic gradient flow at time $t$, over ridge regre… ▽ More We study the implicit regularization of mini-batch stochastic gradient descent, when applied to the fundamental problem of least squares regression. We leverage a continuous-time stochastic differential equation having the same moments as stochastic gradient descent, which we call stochastic gradient flow. We give a bound on the excess risk of stochastic gradient flow at time $t$, over ridge regression with tuning parameter $λ= 1/t$. The bound may be computed from explicit constants (e.g., the mini-batch size, step size, number of iterations), revealing precisely how these quantities drive the excess risk. Numerical examples show the bound can be small, indicating a tight relationship between the two estimators. We give a similar result relating the coefficients of stochastic gradient flow and ridge. These results hold under no conditions on the data matrix $X$, and across the entire optimization path (not just at convergence). △ Less

Submitted 19 June, 2020; v1 submitted 17 March, 2020; originally announced March 2020.

Comments: ICML 2020

arXiv:2001.09249 [pdf, other]

TiFL: A Tier-based Federated Learning System

Authors: Zheng Chai, Ahsan Ali, Syed Zawad, Stacey Truex, Ali Anwar, Nathalie Baracaldo, Yi Zhou, Heiko Ludwig, Feng Yan, Yue Cheng

Abstract: Federated Learning (FL) enables learning a shared model across many clients without violating the privacy requirements. One of the key attributes in FL is the heterogeneity that exists in both resource and data due to the differences in computation and communication capacity, as well as the quantity and content of data among different clients. We conduct a case study to show that heterogeneity in… ▽ More Federated Learning (FL) enables learning a shared model across many clients without violating the privacy requirements. One of the key attributes in FL is the heterogeneity that exists in both resource and data due to the differences in computation and communication capacity, as well as the quantity and content of data among different clients. We conduct a case study to show that heterogeneity in resource and data has a significant impact on training time and model accuracy in conventional FL systems. To this end, we propose TiFL, a Tier-based Federated Learning System, which divides clients into tiers based on their training performance and selects clients from the same tier in each training round to mitigate the straggler problem caused by heterogeneity in resource and data quantity. To further tame the heterogeneity caused by non-IID (Independent and Identical Distribution) data and resources, TiFL employs an adaptive tier selection approach to update the tiering on-the-fly based on the observed training performance and accuracy overtime. We prototype TiFL in a FL testbed following Google's FL architecture and evaluate it using popular benchmarks and the state-of-the-art FL benchmark LEAF. Experimental evaluation shows that TiFL outperforms the conventional FL in various heterogeneous conditions. With the proposed adaptive tier selection policy, we demonstrate that TiFL achieves much faster training performance while keeping the same (and in some cases - better) test accuracy across the board. △ Less

Submitted 24 January, 2020; originally announced January 2020.

arXiv:2001.09001 [pdf, other]

doi 10.1109/ICRA40945.2020.9196846

MagNet: Discovering Multi-agent Interaction Dynamics using Neural Network

Authors: Priyabrata Saha, Arslan Ali, Burhan A. Mudassar, Yun Long, Saibal Mukhopadhyay

Abstract: We present the MagNet, a neural network-based multi-agent interaction model to discover the governing dynamics and predict evolution of a complex multi-agent system from observations. We formulate a multi-agent system as a coupled non-linear network with a generic ordinary differential equation (ODE) based state evolution, and develop a neural network-based realization of its time-discretized mode… ▽ More We present the MagNet, a neural network-based multi-agent interaction model to discover the governing dynamics and predict evolution of a complex multi-agent system from observations. We formulate a multi-agent system as a coupled non-linear network with a generic ordinary differential equation (ODE) based state evolution, and develop a neural network-based realization of its time-discretized model. MagNet is trained to discover the core dynamics of a multi-agent system from observations, and tuned on-line to learn agent-specific parameters of the dynamics to ensure accurate prediction even when physical or relational attributes of agents, or number of agents change. We evaluate MagNet on a point-mass system in two-dimensional space, Kuramoto phase synchronization dynamics and predator-swarm interaction dynamics demonstrating orders of magnitude improvement in prediction accuracy over traditional deep learning models. △ Less

Submitted 3 March, 2020; v1 submitted 24 January, 2020; originally announced January 2020.

Comments: Accepted manuscript by ICRA 2020

Journal ref: ICRA 2020, pp. 8158-8164

arXiv:1909.04525 [pdf, other]

Skin cancer detection based on deep learning and entropy to detect outlier samples

Authors: Andre G. C. Pacheco, Abder-Rahman Ali, Thomas Trappenberg

Abstract: We describe our methods that achieved the 3rd and 4th places in tasks 1 and 2, respectively, at ISIC challenge 2019. The goal of this challenge is to provide the diagnostic for skin cancer using images and meta-data. There are nine classes in the dataset, nonetheless, one of them is an outlier and is not present on it. To tackle the challenge, we apply an ensemble of classifiers, which has 13 conv… ▽ More We describe our methods that achieved the 3rd and 4th places in tasks 1 and 2, respectively, at ISIC challenge 2019. The goal of this challenge is to provide the diagnostic for skin cancer using images and meta-data. There are nine classes in the dataset, nonetheless, one of them is an outlier and is not present on it. To tackle the challenge, we apply an ensemble of classifiers, which has 13 convolutional neural networks (CNN), we develop two approaches to handle the outlier class and we propose a straightforward method to use the meta-data along with the images. Throughout this report, we detail each methodology and parameters to make it easy to replicate our work. The results obtained are in accordance with the previous challenges and the approaches to detect the outlier class and to address the meta-data seem to be work properly. △ Less

Submitted 5 January, 2020; v1 submitted 10 September, 2019; originally announced September 2019.

Comments: 3rd and 4th places in tasks 1 and 2 respectively, at ISIC challenge 2019 @ MICCAI workshop 2019

arXiv:1904.05773 [pdf, other]

Diagnosis of Celiac Disease and Environmental Enteropathy on Biopsy Images Using Color Balancing on Convolutional Neural Networks

Authors: Kamran Kowsari, Rasoul Sali, Marium N. Khan, William Adorno, S. Asad Ali, Sean R. Moore, Beatrice C. Amadi, Paul Kelly, Sana Syed, Donald E. Brown

Abstract: Celiac Disease (CD) and Environmental Enteropathy (EE) are common causes of malnutrition and adversely impact normal childhood development. CD is an autoimmune disorder that is prevalent worldwide and is caused by an increased sensitivity to gluten. Gluten exposure destructs the small intestinal epithelial barrier, resulting in nutrient mal-absorption and childhood under-nutrition. EE also results… ▽ More Celiac Disease (CD) and Environmental Enteropathy (EE) are common causes of malnutrition and adversely impact normal childhood development. CD is an autoimmune disorder that is prevalent worldwide and is caused by an increased sensitivity to gluten. Gluten exposure destructs the small intestinal epithelial barrier, resulting in nutrient mal-absorption and childhood under-nutrition. EE also results in barrier dysfunction but is thought to be caused by an increased vulnerability to infections. EE has been implicated as the predominant cause of under-nutrition, oral vaccine failure, and impaired cognitive development in low-and-middle-income countries. Both conditions require a tissue biopsy for diagnosis, and a major challenge of interpreting clinical biopsy images to differentiate between these gastrointestinal diseases is striking histopathologic overlap between them. In the current study, we propose a convolutional neural network (CNN) to classify duodenal biopsy images from subjects with CD, EE, and healthy controls. We evaluated the performance of our proposed model using a large cohort containing 1000 biopsy images. Our evaluations show that the proposed model achieves an area under ROC of 0.99, 1.00, and 0.97 for CD, EE, and healthy controls, respectively. These results demonstrate the discriminative power of the proposed model in duodenal biopsies classification. △ Less

Submitted 9 October, 2019; v1 submitted 10 April, 2019; originally announced April 2019.

arXiv:1902.07855 [pdf, other]

Stacking with Neural network for Cryptocurrency investment

Authors: Avinash Barnwal, Hari Pad Bharti, Aasim Ali, Vishal Singh

Abstract: Predicting the direction of assets have been an active area of study and a difficult task. Machine learning models have been used to build robust models to model the above task. Ensemble methods is one of them showing results better than a single supervised method. In this paper, we have used generative and discriminative classifiers to create the stack, particularly 3 generative and 6 discriminat… ▽ More Predicting the direction of assets have been an active area of study and a difficult task. Machine learning models have been used to build robust models to model the above task. Ensemble methods is one of them showing results better than a single supervised method. In this paper, we have used generative and discriminative classifiers to create the stack, particularly 3 generative and 6 discriminative classifiers and optimized over one-layer Neural Network to model the direction of price cryptocurrencies. Features used are technical indicators used are not limited to trend, momentum, volume, volatility indicators, and sentiment analysis has also been used to gain useful insight combined with the above features. For Cross-validation, Purged Walk forward cross-validation has been used. In terms of accuracy, we have done a comparative analysis of the performance of Ensemble method with Stacking and Ensemble method with blending. We have also developed a methodology for combined features importance for the stacked model. Important indicators are also identified based on feature importance. △ Less

Submitted 22 February, 2019; v1 submitted 20 February, 2019; originally announced February 2019.

Comments: 20 pages,7 figues

arXiv:1810.10082 [pdf, other]

A Continuous-Time View of Early Stopping for Least Squares

Authors: Alnur Ali, J. Zico Kolter, Ryan J. Tibshirani

Abstract: We study the statistical properties of the iterates generated by gradient descent, applied to the fundamental problem of least squares regression. We take a continuous-time view, i.e., consider infinitesimal step sizes in gradient descent, in which case the iterates form a trajectory called gradient flow. Our primary focus is to compare the risk of gradient flow to that of ridge regression. Under… ▽ More We study the statistical properties of the iterates generated by gradient descent, applied to the fundamental problem of least squares regression. We take a continuous-time view, i.e., consider infinitesimal step sizes in gradient descent, in which case the iterates form a trajectory called gradient flow. Our primary focus is to compare the risk of gradient flow to that of ridge regression. Under the calibration $t=1/λ$---where $t$ is the time parameter in gradient flow, and $λ$ the tuning parameter in ridge regression---we prove that the risk of gradient flow is no less than 1.69 times that of ridge, along the entire path (for all $t \geq 0$). This holds in finite samples with very weak assumptions on the data model (in particular, with no assumptions on the features $X$). We prove that the same relative risk bound holds for prediction risk, in an average sense over the underlying signal $β_0$. Finally, we examine limiting risk expressions (under standard Marchenko-Pastur asymptotics), and give supporting numerical experiments. △ Less

Submitted 23 February, 2019; v1 submitted 23 October, 2018; originally announced October 2018.

arXiv:1810.05041 [pdf, other]

doi 10.3390/e21080741

A General Framework for Fair Regression

Authors: Jack Fitzsimons, AbdulRahman Al Ali, Michael Osborne, Stephen Roberts

Abstract: Fairness, through its many forms and definitions, has become an important issue facing the machine learning community. In this work, we consider how to incorporate group fairness constraints in kernel regression methods, applicable to Gaussian processes, support vector machines, neural network regression and decision tree regression. Further, we focus on examining the effect of incorporating these… ▽ More Fairness, through its many forms and definitions, has become an important issue facing the machine learning community. In this work, we consider how to incorporate group fairness constraints in kernel regression methods, applicable to Gaussian processes, support vector machines, neural network regression and decision tree regression. Further, we focus on examining the effect of incorporating these constraints in decision tree regression, with direct applications to random forests and boosted trees amongst other widespread popular inference techniques. We show that the order of complexity of memory and computation is preserved for such models and tightly bound the expected perturbations to the model in terms of the number of leaves of the trees. Importantly, the approach works on trained models and hence can be easily applied to models in current use and group labels are only required on training data. △ Less

Submitted 2 February, 2019; v1 submitted 10 October, 2018; originally announced October 2018.

Comments: 8 pages, 4 figures, 2 pages references

arXiv:1710.10769 [pdf, other]

Communication-Avoiding Optimization Methods for Distributed Massive-Scale Sparse Inverse Covariance Estimation

Authors: Penporn Koanantakool, Alnur Ali, Ariful Azad, Aydin Buluc, Dmitriy Morozov, Leonid Oliker, Katherine Yelick, Sang-Yun Oh

Abstract: Across a variety of scientific disciplines, sparse inverse covariance estimation is a popular tool for capturing the underlying dependency relationships in multivariate data. Unfortunately, most estimators are not scalable enough to handle the sizes of modern high-dimensional data sets (often on the order of terabytes), and assume Gaussian samples. To address these deficiencies, we introduce HP-CO… ▽ More Across a variety of scientific disciplines, sparse inverse covariance estimation is a popular tool for capturing the underlying dependency relationships in multivariate data. Unfortunately, most estimators are not scalable enough to handle the sizes of modern high-dimensional data sets (often on the order of terabytes), and assume Gaussian samples. To address these deficiencies, we introduce HP-CONCORD, a highly scalable optimization method for estimating a sparse inverse covariance matrix based on a regularized pseudolikelihood framework, without assuming Gaussianity. Our parallel proximal gradient method uses a novel communication-avoiding linear algebra algorithm and runs across a multi-node cluster with up to 1k nodes (24k cores), achieving parallel scalability on problems with up to ~819 billion parameters (1.28 million dimensions); even on a single node, HP-CONCORD demonstrates scalability, outperforming a state-of-the-art method. We also use HP-CONCORD to estimate the underlying dependency structure of the brain from fMRI data, and use the result to identify functional regions automatically. The results show good agreement with a clustering from the neuroscience literature. △ Less

Submitted 8 April, 2018; v1 submitted 30 October, 2017; originally announced October 2017.

Comments: Main paper: 15 pages, appendix: 24 pages

Journal ref: Artificial Intelligence and Statistics vol. 84 1376-1386 (2018)

arXiv:1607.00515 [pdf, other]

The Multiple Quantile Graphical Model

Authors: Alnur Ali, J. Zico Kolter, Ryan J. Tibshirani

Abstract: We introduce the Multiple Quantile Graphical Model (MQGM), which extends the neighborhood selection approach of Meinshausen and Buhlmann for learning sparse graphical models. The latter is defined by the basic subproblem of modeling the conditional mean of one variable as a sparse function of all others. Our approach models a set of conditional quantiles of one variable as a sparse function of all… ▽ More We introduce the Multiple Quantile Graphical Model (MQGM), which extends the neighborhood selection approach of Meinshausen and Buhlmann for learning sparse graphical models. The latter is defined by the basic subproblem of modeling the conditional mean of one variable as a sparse function of all others. Our approach models a set of conditional quantiles of one variable as a sparse function of all others, and hence offers a much richer, more expressive class of conditional distribution estimates. We establish that, under suitable regularity conditions, the MQGM identifies the exact conditional independencies with probability tending to one as the problem size grows, even outside of the usual homoskedastic Gaussian data model. We develop an efficient algorithm for fitting the MQGM using the alternating direction method of multipliers. We also describe a strategy for sampling from the joint distribution that underlies the MQGM estimate. Lastly, we present detailed experiments that demonstrate the flexibility and effectiveness of the MQGM in modeling hetereoskedastic non-Gaussian data. △ Less

Submitted 27 October, 2016; v1 submitted 2 July, 2016; originally announced July 2016.

arXiv:1606.00033 [pdf, other]

Generalized Pseudolikelihood Methods for Inverse Covariance Estimation

Authors: Alnur Ali, Kshitij Khare, Sang-Yun Oh, Bala Rajaratnam

Abstract: We introduce PseudoNet, a new pseudolikelihood-based estimator of the inverse covariance matrix, that has a number of useful statistical and computational properties. We show, through detailed experiments with synthetic and also real-world finance as well as wind power data, that PseudoNet outperforms related methods in terms of estimation error and support recovery, making it well-suited for use… ▽ More We introduce PseudoNet, a new pseudolikelihood-based estimator of the inverse covariance matrix, that has a number of useful statistical and computational properties. We show, through detailed experiments with synthetic and also real-world finance as well as wind power data, that PseudoNet outperforms related methods in terms of estimation error and support recovery, making it well-suited for use in a downstream application, where obtaining low estimation error can be important. We also show, under regularity conditions, that PseudoNet is consistent. Our proof assumes the existence of accurate estimates of the diagonal entries of the underlying inverse covariance matrix; we additionally provide a two-step method to obtain these estimates, even in a high-dimensional setting, going beyond the proofs for related methods. Unlike other pseudolikelihood-based methods, we also show that PseudoNet does not saturate, i.e., in high dimensions, there is no hard limit on the number of nonzero entries in the PseudoNet estimate. We present a fast algorithm as well as screening rules that make computing the PseudoNet estimate over a range of tuning parameters tractable. △ Less

Submitted 14 October, 2016; v1 submitted 31 May, 2016; originally announced June 2016.

arXiv:1506.09060 [pdf, other]

Nonlinear Distortion Reduction in OFDM from Reliable Perturbations in Data Carriers

Authors: Ebrahim B. Al-Safadi, Tareq Y. Al-Naffouri, Mudassir Masood, Anum Ali

Abstract: A novel method for correcting the effect of nonlinear distortion in orthogonal frequency division multiplexing signals is proposed. The method depends on adaptively selecting the distortion over a subset of the data carriers, and then using tools from compressed sensing and sparse Bayesian recovery to estimate the distortion over the other carriers. Central to this method is the fact that carriers… ▽ More A novel method for correcting the effect of nonlinear distortion in orthogonal frequency division multiplexing signals is proposed. The method depends on adaptively selecting the distortion over a subset of the data carriers, and then using tools from compressed sensing and sparse Bayesian recovery to estimate the distortion over the other carriers. Central to this method is the fact that carriers (or tones) are decoded with different levels of confidence, depending on a coupled function of the magnitude and phase of the distortion over each carrier, in addition to the respective channel strength. Moreover, as no pilots are required by this method, a significant improvement in terms of achievable rate can be achieved relative to previous work. △ Less

Submitted 30 June, 2015; originally announced June 2015.

Comments: 27 pages, 11 Figures

arXiv:1412.6137 [pdf, ps, other]

Narrowband Interference Mitigation in SC-FDMA Using Bayesian Sparse Recovery

Authors: Anum Ali, Mudassir Masood, Muhammad S. Sohail, Samir Al-Ghadhban, Tareq Y. Al-Naffouri

Abstract: This paper presents a novel narrowband interference (NBI) mitigation scheme for SC-FDMA systems. The proposed NBI cancellation scheme exploits the frequency domain sparsity of the unknown signal and adopts a low complexity Bayesian sparse recovery procedure. At the transmitter, a few randomly chosen sub-carriers are kept data free to sense the NBI signal at the receiver. Further, it is noted that… ▽ More This paper presents a novel narrowband interference (NBI) mitigation scheme for SC-FDMA systems. The proposed NBI cancellation scheme exploits the frequency domain sparsity of the unknown signal and adopts a low complexity Bayesian sparse recovery procedure. At the transmitter, a few randomly chosen sub-carriers are kept data free to sense the NBI signal at the receiver. Further, it is noted that in practice, the sparsity of the NBI signal is destroyed by a grid mismatch between NBI sources and the system under consideration. Towards this end, first an accurate grid mismatch model is presented that is capable of assuming independent offsets for multiple NBI sources. Secondly, prior to NBI reconstruction, the sparsity of the unknown signal is restored by employing a sparsifying transform. To improve the spectral efficiency of the proposed scheme, a data-aided NBI recovery procedure is outlined that relies on adaptively selecting a subset of data carriers and uses them as additional measurements to enhance the NBI estimation. Finally, the proposed scheme is extended to single-input multi-output systems by performing a collaborative NBI support search over all antennas. Numerical results are presented that depict the suitability of the proposed scheme for NBI mitigation. △ Less

Submitted 8 October, 2014; originally announced December 2014.

arXiv:1410.2457 [pdf, other]

Receiver-based Recovery of Clipped OFDM Signals for PAPR Reduction: A Bayesian Approach

Authors: Anum Ali, Abdullatif Al-Rabah, Mudassir Masood, Tareq Y. Al-Naffouri

Abstract: Clipping is one of the simplest peak-to-average power ratio (PAPR) reduction schemes for orthogonal frequency division multiplexing (OFDM). Deliberately clipping the transmission signal degrades system performance, and clipping mitigation is required at the receiver for information restoration. In this work, we acknowledge the sparse nature of the clipping signal and propose a low-complexity Bayes… ▽ More Clipping is one of the simplest peak-to-average power ratio (PAPR) reduction schemes for orthogonal frequency division multiplexing (OFDM). Deliberately clipping the transmission signal degrades system performance, and clipping mitigation is required at the receiver for information restoration. In this work, we acknowledge the sparse nature of the clipping signal and propose a low-complexity Bayesian clipping estimation scheme. The proposed scheme utilizes a priori information about the sparsity rate and noise variance for enhanced recovery. At the same time, the proposed scheme is robust against inaccurate estimates of the clipping signal statistics. The undistorted phase property of the clipped signal, as well as the clipping likelihood, is utilized for enhanced reconstruction. Further, motivated by the nature of modern OFDM-based communication systems, we extend our clipping reconstruction approach to multiple antenna receivers, and multi-user OFDM. We also address the problem of channel estimation from pilots contaminated by the clipping distortion. Numerical findings are presented, that depict favourable results for the proposed scheme compared to the established sparse reconstruction schemes. △ Less

Submitted 21 October, 2014; v1 submitted 8 October, 2014; originally announced October 2014.

arXiv:1301.0550 [pdf]

Markov Equivalence Classes for Maximal Ancestral Graphs

Authors: Ayesha R. Ali, Thomas S. Richardson

Abstract: Ancestral graphs are a class of graphs that encode conditional independence relations arising in DAG models with latent and selection variables, corresponding to marginalization and conditioning. However, for any ancestral graph, there may be several other graphs to which it is Markov equivalent. We introduce a simple representation of a Markov equivalence class of ancestral graphs, thereby faci… ▽ More Ancestral graphs are a class of graphs that encode conditional independence relations arising in DAG models with latent and selection variables, corresponding to marginalization and conditioning. However, for any ancestral graph, there may be several other graphs to which it is Markov equivalent. We introduce a simple representation of a Markov equivalence class of ancestral graphs, thereby facilitating model search. \ More specifically, we define a join operation on ancestral graphs which will associate a unique graph with a Markov equivalence class. We also extend the separation criterion for ancestral graphs (which is an extension of d-separation) and provide a proof of the pairwise Markov property for joined ancestral graphs. △ Less

Submitted 12 December, 2012; originally announced January 2013.

Comments: Appears in Proceedings of the Eighteenth Conference on Uncertainty in Artificial Intelligence (UAI2002)

Report number: UAI-P-2002-PG-1-9

arXiv:1207.1365 [pdf]

Towards Characterizing Markov Equivalence Classes for Directed Acyclic Graphs with Latent Variables

Authors: Ayesha R. Ali, Thomas S. Richardson, Peter L. Spirtes, Jiji Zhang

Abstract: It is well known that there may be many causal explanations that are consistent with a given set of data. Recent work has been done to represent the common aspects of these explanations into one representation. In this paper, we address what is less well known: how do the relationships common to every causal explanation among the observed variables of some DAG process change in the presence of lat… ▽ More It is well known that there may be many causal explanations that are consistent with a given set of data. Recent work has been done to represent the common aspects of these explanations into one representation. In this paper, we address what is less well known: how do the relationships common to every causal explanation among the observed variables of some DAG process change in the presence of latent variables? Ancestral graphs provide a class of graphs that can encode conditional independence relations that arise in DAG models with latent and selection variables. In this paper we present a set of orientation rules that construct the Markov equivalence class representative for ancestral graphs, given a member of the equivalence class. These rules are sound and complete. We also show that when the equivalence class includes a DAG, the equivalence class representative is the essential graph for the said DAG △ Less

Submitted 4 July, 2012; originally announced July 2012.

Comments: Appears in Proceedings of the Twenty-First Conference on Uncertainty in Artificial Intelligence (UAI2005)

Report number: UAI-P-2005-PG-10-17

arXiv:0905.1540 [pdf, ps, other]

Supplementary material for Markov equivalence for ancestral graphs

Authors: R. A. Ali, T. Richardson, P. Spirtes

Abstract: We prove that the criterion for Markov equivalence provided by Zhao et al. (2005) may involve a set of features of a graph that is exponential in the number of vertices. We prove that the criterion for Markov equivalence provided by Zhao et al. (2005) may involve a set of features of a graph that is exponential in the number of vertices. △ Less

Submitted 11 May, 2009; originally announced May 2009.

Comments: 2 pages, 1 figure, supplement to paper to appear in the Annals of Statistics

Showing 1–32 of 32 results for author: Ali, A