-
Identifying and Exploiting Sparse Branch Correlations for Optimizing Branch Prediction
Authors:
Anastasios Zouzias,
Kleovoulos Kalaitzidis,
Konstantin Berestizshevsky,
Renzo Andri,
Leeor Peled,
Zhe Wang
Abstract:
Branch prediction is arguably one of the most important speculative mechanisms within a high-performance processor architecture. A common approach to improve branch prediction accuracy is to employ lengthy history records of previously seen branch directions to capture distant correlations between branches. The larger the history, the richer the information that the predictor can exploit for disco…
▽ More
Branch prediction is arguably one of the most important speculative mechanisms within a high-performance processor architecture. A common approach to improve branch prediction accuracy is to employ lengthy history records of previously seen branch directions to capture distant correlations between branches. The larger the history, the richer the information that the predictor can exploit for discovering predictive patterns. However, without appropriate filtering, such an approach may also heavily disorganize the predictor's internal mechanisms, leading to diminishing returns. This paper studies a fundamental control-flow property: the sparsity in the correlation between branches and recent history. First, we show that sparse branch correlations exist in standard applications and, more importantly, such correlations can be computed efficiently using sparse modeling methods. Second, we introduce a sparsity-aware branch prediction mechanism that can compactly encode and store sparse models to unlock essential performance opportunities. We evaluated our approach for various design parameters demonstrating MPKI improvements of up to 42% (2.3% on average) with 2KB of additional storage overhead. Our circuit-level evaluation of the design showed that it can operate within accepted branch prediction latencies, and under reasonable power and area limitations.
△ Less
Submitted 28 July, 2022;
originally announced July 2022.
-
Branch Prediction as a Reinforcement Learning Problem: Why, How and Case Studies
Authors:
Anastasios Zouzias,
Kleovoulos Kalaitzidis,
Boris Grot
Abstract:
Recent years have seen stagnating improvements to branch predictor (BP) efficacy and a dearth of fresh ideas in branch predictor design, calling for fresh thinking in this area. This paper argues that looking at BP from the viewpoint of Reinforcement Learning (RL) facilitates systematic reasoning about, and exploration of, BP designs. We describe how to apply the RL formulation to branch predictor…
▽ More
Recent years have seen stagnating improvements to branch predictor (BP) efficacy and a dearth of fresh ideas in branch predictor design, calling for fresh thinking in this area. This paper argues that looking at BP from the viewpoint of Reinforcement Learning (RL) facilitates systematic reasoning about, and exploration of, BP designs. We describe how to apply the RL formulation to branch predictors, show that existing predictors can be succinctly expressed in this formulation, and study two RL-based variants of conventional BPs.
△ Less
Submitted 25 June, 2021;
originally announced June 2021.
-
Team voyTECH: User Activity Modeling with Boosting Trees
Authors:
Immanuel Bayer,
Anastasios Zouzias
Abstract:
This paper describes our winning solution for the ECML-PKDD ChAT Discovery Challenge 2020. We show that whether or not a Twitch user has subscribed to a channel can be well predicted by modeling user activity with boosting trees. We introduce the connection between target-encodings and boosting trees in the context of high cardinality categoricals and find that modeling user activity is more power…
▽ More
This paper describes our winning solution for the ECML-PKDD ChAT Discovery Challenge 2020. We show that whether or not a Twitch user has subscribed to a channel can be well predicted by modeling user activity with boosting trees. We introduce the connection between target-encodings and boosting trees in the context of high cardinality categoricals and find that modeling user activity is more powerful then direct modeling of content when encoded properly and combined with a suitable optimization approach.
△ Less
Submitted 6 August, 2020; v1 submitted 3 July, 2020;
originally announced July 2020.
-
A Randomized Algorithm for Approximating the Log Determinant of a Symmetric Positive Definite Matrix
Authors:
Christos Boutsidis,
Petros Drineas,
Prabhanjan Kambadur,
Eugenia-Maria Kontopoulou,
Anastasios Zouzias
Abstract:
We introduce a novel algorithm for approximating the logarithm of the determinant of a symmetric positive definite (SPD) matrix. The algorithm is randomized and approximates the traces of a small number of matrix powers of a specially constructed matrix, using the method of Avron and Toledo~\cite{AT11}. From a theoretical perspective, we present additive and relative error bounds for our algorithm…
▽ More
We introduce a novel algorithm for approximating the logarithm of the determinant of a symmetric positive definite (SPD) matrix. The algorithm is randomized and approximates the traces of a small number of matrix powers of a specially constructed matrix, using the method of Avron and Toledo~\cite{AT11}. From a theoretical perspective, we present additive and relative error bounds for our algorithm. Our additive error bound works for any SPD matrix, whereas our relative error bound works for SPD matrices whose eigenvalues lie in the interval $(θ_1,1)$, with $0<θ_1<1$; the latter setting was proposed in~\cite{icml2015_hana15}. From an empirical perspective, we demonstrate that a C++ implementation of our algorithm can approximate the logarithm of the determinant of large matrices very accurately in a matter of seconds.
△ Less
Submitted 31 August, 2016; v1 submitted 1 March, 2015;
originally announced March 2015.
-
Approximate Matrix Multiplication with Application to Linear Embeddings
Authors:
Anastasios Kyrillidis,
Michail Vlachos,
Anastasios Zouzias
Abstract:
In this paper, we study the problem of approximately computing the product of two real matrices. In particular, we analyze a dimensionality-reduction-based approximation algorithm due to Sarlos [1], introducing the notion of nuclear rank as the ratio of the nuclear norm over the spectral norm. The presented bound has improved dependence with respect to the approximation error (as compared to previ…
▽ More
In this paper, we study the problem of approximately computing the product of two real matrices. In particular, we analyze a dimensionality-reduction-based approximation algorithm due to Sarlos [1], introducing the notion of nuclear rank as the ratio of the nuclear norm over the spectral norm. The presented bound has improved dependence with respect to the approximation error (as compared to previous approaches), whereas the subspace -- on which we project the input matrices -- has dimensions proportional to the maximum of their nuclear rank and it is independent of the input dimensions. In addition, we provide an application of this result to linear low-dimensional embeddings. Namely, we show that any Euclidean point-set with bounded nuclear rank is amenable to projection onto number of dimensions that is independent of the input dimensionality, while achieving additive error guarantees.
△ Less
Submitted 29 March, 2014;
originally announced March 2014.
-
Non-uniform Feature Sampling for Decision Tree Ensembles
Authors:
Anastasios Kyrillidis,
Anastasios Zouzias
Abstract:
We study the effectiveness of non-uniform randomized feature selection in decision tree classification. We experimentally evaluate two feature selection methodologies, based on information extracted from the provided dataset: $(i)$ \emph{leverage scores-based} and $(ii)$ \emph{norm-based} feature selection. Experimental evaluation of the proposed feature selection techniques indicate that such app…
▽ More
We study the effectiveness of non-uniform randomized feature selection in decision tree classification. We experimentally evaluate two feature selection methodologies, based on information extracted from the provided dataset: $(i)$ \emph{leverage scores-based} and $(ii)$ \emph{norm-based} feature selection. Experimental evaluation of the proposed feature selection techniques indicate that such approaches might be more effective compared to naive uniform feature selection and moreover having comparable performance to the random forest algorithm [3]
△ Less
Submitted 24 March, 2014;
originally announced March 2014.
-
Hidden cliques and the certification of the restricted isometry property
Authors:
Pascal Koiran,
Anastasios Zouzias
Abstract:
Compressed sensing is a technique for finding sparse solutions to underdetermined linear systems. This technique relies on properties of the sensing matrix such as the restricted isometry property. Sensing matrices that satisfy this property with optimal parameters are mainly obtained via probabilistic arguments. Deciding whether a given matrix satisfies the restricted isometry property is a non-t…
▽ More
Compressed sensing is a technique for finding sparse solutions to underdetermined linear systems. This technique relies on properties of the sensing matrix such as the restricted isometry property. Sensing matrices that satisfy this property with optimal parameters are mainly obtained via probabilistic arguments. Deciding whether a given matrix satisfies the restricted isometry property is a non-trivial computational problem. Indeed, we show in this paper that restricted isometry parameters cannot be approximated in polynomial time within any constant factor under the assumption that the hidden clique problem is hard. Moreover, on the positive side we propose an improvement on the brute-force enumeration algorithm for checking the restricted isometry property.
△ Less
Submitted 4 November, 2012;
originally announced November 2012.
-
Efficient Dimensionality Reduction for Canonical Correlation Analysis
Authors:
Haim Avron,
Christos Boutsidis,
Sivan Toledo,
Anastasios Zouzias
Abstract:
We present a fast algorithm for approximate Canonical Correlation Analysis (CCA). Given a pair of tall-and-thin matrices, the proposed algorithm first employs a randomized dimensionality reduction transform to reduce the size of the input matrices, and then applies any CCA algorithm to the new pair of matrices. The algorithm computes an approximate CCA to the original pair of matrices with provabl…
▽ More
We present a fast algorithm for approximate Canonical Correlation Analysis (CCA). Given a pair of tall-and-thin matrices, the proposed algorithm first employs a randomized dimensionality reduction transform to reduce the size of the input matrices, and then applies any CCA algorithm to the new pair of matrices. The algorithm computes an approximate CCA to the original pair of matrices with provable guarantees, while requiring asymptotically less operations than the state-of-the-art exact algorithms.
△ Less
Submitted 1 May, 2013; v1 submitted 10 September, 2012;
originally announced September 2012.
-
Randomized Extended Kaczmarz for Solving Least-Squares
Authors:
Anastasios Zouzias,
Nikolaos Freris
Abstract:
We present a randomized iterative algorithm that exponentially converges in expectation to the minimum Euclidean norm least squares solution of a given linear system of equations. The expected number of arithmetic operations required to obtain an estimate of given accuracy is proportional to the square condition number of the system multiplied by the number of non-zeros entries of the input matrix…
▽ More
We present a randomized iterative algorithm that exponentially converges in expectation to the minimum Euclidean norm least squares solution of a given linear system of equations. The expected number of arithmetic operations required to obtain an estimate of given accuracy is proportional to the square condition number of the system multiplied by the number of non-zeros entries of the input matrix. The proposed algorithm is an extension of the randomized Kaczmarz method that was analyzed by Strohmer and Vershynin.
△ Less
Submitted 5 January, 2013; v1 submitted 25 May, 2012;
originally announced May 2012.
-
Randomized Dimensionality Reduction for k-means Clustering
Authors:
Christos Boutsidis,
Anastasios Zouzias,
Michael W. Mahoney,
Petros Drineas
Abstract:
We study the topic of dimensionality reduction for $k$-means clustering. Dimensionality reduction encompasses the union of two approaches: \emph{feature selection} and \emph{feature extraction}. A feature selection based algorithm for $k$-means clustering selects a small subset of the input features and then applies $k$-means clustering on the selected features. A feature extraction based algorith…
▽ More
We study the topic of dimensionality reduction for $k$-means clustering. Dimensionality reduction encompasses the union of two approaches: \emph{feature selection} and \emph{feature extraction}. A feature selection based algorithm for $k$-means clustering selects a small subset of the input features and then applies $k$-means clustering on the selected features. A feature extraction based algorithm for $k$-means clustering constructs a small set of new artificial features and then applies $k$-means clustering on the constructed features. Despite the significance of $k$-means clustering as well as the wealth of heuristic methods addressing it, provably accurate feature selection methods for $k$-means clustering are not known. On the other hand, two provably accurate feature extraction methods for $k$-means clustering are known in the literature; one is based on random projections and the other is based on the singular value decomposition (SVD).
This paper makes further progress towards a better understanding of dimensionality reduction for $k$-means clustering. Namely, we present the first provably accurate feature selection method for $k$-means clustering and, in addition, we present two feature extraction methods. The first feature extraction method is based on random projections and it improves upon the existing results in terms of time complexity and number of features needed to be extracted. The second feature extraction method is based on fast approximate SVD factorizations and it also improves upon the existing results in terms of time complexity. The proposed algorithms are randomized and provide constant-factor approximation guarantees with respect to the optimal $k$-means objective value.
△ Less
Submitted 4 November, 2014; v1 submitted 13 October, 2011;
originally announced October 2011.
-
On the Certification of the Restricted Isometry Property
Authors:
Pascal Koiran,
Anastasios Zouzias
Abstract:
Compressed sensing is a technique for finding sparse solutions to underdetermined linear systems. This technique relies on properties of the sensing matrix such as the restricted isometry property. Sensing matrices that satisfy the restricted isometry property with optimal parameters are mainly obtained via probabilistic arguments. Given any matrix, deciding whether it satisfies the restricted iso…
▽ More
Compressed sensing is a technique for finding sparse solutions to underdetermined linear systems. This technique relies on properties of the sensing matrix such as the restricted isometry property. Sensing matrices that satisfy the restricted isometry property with optimal parameters are mainly obtained via probabilistic arguments. Given any matrix, deciding whether it satisfies the restricted isometry property is a non-trivial computational problem. In this paper, we give reductions from dense subgraph problems to the certification of the restricted isometry property. This gives evidence that certifying the restricted isometry property is unlikely to be feasible in polynomial-time. Moreover, on the positive side we propose an improvement on the brute-force enumeration algorithm for checking the restricted isometry property.
Another contribution of independent interest is a spectral algorithm for certifying that a random graph does not contain any dense k-subgraph. This "skewed spectral algorithm" performs better than the basic spectral algorithm in a certain range of parameters.
△ Less
Submitted 17 October, 2011; v1 submitted 25 March, 2011;
originally announced March 2011.
-
A Matrix Hyperbolic Cosine Algorithm and Applications
Authors:
Anastasios Zouzias
Abstract:
In this paper, we generalize Spencer's hyperbolic cosine algorithm to the matrix-valued setting. We apply the proposed algorithm to several problems by analyzing its computational efficiency under two special cases of matrices; one in which the matrices have a group structure and an other in which they have rank-one. As an application of the former case, we present a deterministic algorithm that,…
▽ More
In this paper, we generalize Spencer's hyperbolic cosine algorithm to the matrix-valued setting. We apply the proposed algorithm to several problems by analyzing its computational efficiency under two special cases of matrices; one in which the matrices have a group structure and an other in which they have rank-one. As an application of the former case, we present a deterministic algorithm that, given the multiplication table of a finite group of size $n$, it constructs an expanding Cayley graph of logarithmic degree in near-optimal O(n^2 log^3 n) time. For the latter case, we present a fast deterministic algorithm for spectral sparsification of positive semi-definite matrices, which implies an improved deterministic algorithm for spectral graph sparsification of dense graphs. In addition, we give an elementary connection between spectral sparsification of positive semi-definite matrices and element-wise matrix sparsification. As a consequence, we obtain improved element-wise sparsification algorithms for diagonally dominant-like matrices.
△ Less
Submitted 25 February, 2012; v1 submitted 14 March, 2011;
originally announced March 2011.
-
Random Projections for $k$-means Clustering
Authors:
Christos Boutsidis,
Anastasios Zouzias,
Petros Drineas
Abstract:
This paper discusses the topic of dimensionality reduction for $k$-means clustering. We prove that any set of $n$ points in $d$ dimensions (rows in a matrix $A \in \RR^{n \times d}$) can be projected into $t = Ω(k / \eps^2)$ dimensions, for any $\eps \in (0,1/3)$, in $O(n d \lceil \eps^{-2} k/ \log(d) \rceil )$ time, such that with constant probability the optimal $k$-partition of the point set is…
▽ More
This paper discusses the topic of dimensionality reduction for $k$-means clustering. We prove that any set of $n$ points in $d$ dimensions (rows in a matrix $A \in \RR^{n \times d}$) can be projected into $t = Ω(k / \eps^2)$ dimensions, for any $\eps \in (0,1/3)$, in $O(n d \lceil \eps^{-2} k/ \log(d) \rceil )$ time, such that with constant probability the optimal $k$-partition of the point set is preserved within a factor of $2+\eps$. The projection is done by post-multiplying $A$ with a $d \times t$ random matrix $R$ having entries $+1/\sqrt{t}$ or $-1/\sqrt{t}$ with equal probability. A numerical implementation of our technique and experiments on a large face images dataset verify the speed and the accuracy of our theoretical results.
△ Less
Submitted 20 November, 2010;
originally announced November 2010.
-
A Note on Element-wise Matrix Sparsification via a Matrix-valued Bernstein Inequality
Authors:
Petros Drineas,
Anastasios Zouzias
Abstract:
Given an n x n matrix A, we present a simple, element-wise sparsification algorithm that zeroes out all sufficiently small elements of A and then retains some of the remaining elements with probabilities proportional to the square of their magnitudes. We analyze the approximation accuracy of the proposed algorithm using a recent, elegant non-commutative Bernstein inequality, and compare our bounds…
▽ More
Given an n x n matrix A, we present a simple, element-wise sparsification algorithm that zeroes out all sufficiently small elements of A and then retains some of the remaining elements with probabilities proportional to the square of their magnitudes. We analyze the approximation accuracy of the proposed algorithm using a recent, elegant non-commutative Bernstein inequality, and compare our bounds with all existing (to the best of our knowledge) element-wise matrix sparsification algorithms.
△ Less
Submitted 13 January, 2011; v1 submitted 2 June, 2010;
originally announced June 2010.
-
Low Rank Matrix-Valued Chernoff Bounds and Approximate Matrix Multiplication
Authors:
Avner Magen,
Anastasios Zouzias
Abstract:
In this paper we develop algorithms for approximating matrix multiplication with respect to the spectral norm. Let A\in{\RR^{n\times m}} and B\in\RR^{n \times p} be two matrices and \eps>0. We approximate the product A^\top B using two down-sampled sketches, \tilde{A}\in\RR^{t\times m} and \tilde{B}\in\RR^{t\times p}, where t\ll n such that \norm{\tilde{A}^\top \tilde{B} - A^\top B} \leq \eps \nor…
▽ More
In this paper we develop algorithms for approximating matrix multiplication with respect to the spectral norm. Let A\in{\RR^{n\times m}} and B\in\RR^{n \times p} be two matrices and \eps>0. We approximate the product A^\top B using two down-sampled sketches, \tilde{A}\in\RR^{t\times m} and \tilde{B}\in\RR^{t\times p}, where t\ll n such that \norm{\tilde{A}^\top \tilde{B} - A^\top B} \leq \eps \norm{A}\norm{B} with high probability. We use two different sampling procedures for constructing \tilde{A} and \tilde{B}; one of them is done by i.i.d. non-uniform sampling rows from A and B and the other is done by taking random linear combinations of their rows. We prove bounds that depend only on the intrinsic dimensionality of A and B, that is their rank and their stable rank; namely the squared ratio between their Frobenius and operator norm. For achieving bounds that depend on rank we employ standard tools from high-dimensional geometry such as concentration of measure arguments combined with elaborate \eps-net constructions. For bounds that depend on the smaller parameter of stable rank this technology itself seems weak. However, we show that in combination with a simple truncation argument is amenable to provide such bounds. To handle similar bounds for row sampling, we develop a novel matrix-valued Chernoff bound inequality which we call low rank matrix-valued Chernoff bound. Thanks to this inequality, we are able to give bounds that depend only on the stable rank of the input matrices...
△ Less
Submitted 27 October, 2010; v1 submitted 16 May, 2010;
originally announced May 2010.
-
Low Dimensional Euclidean Volume Preserving Embeddings
Authors:
Anastasios Zouzias
Abstract:
Let $\mathcal{P}$ be an $n$-point subset of Euclidean space and $d\geq 3$ be an integer. In this paper we study the following question: What is the smallest (normalized) relative change of the volume of subsets of $\mathcal{P}$ when it is projected into $\RR^d$. We prove that there exists a linear mapping $f:\mathcal{P} \mapsto \RR^d$ that relatively preserves the volume of all subsets of size u…
▽ More
Let $\mathcal{P}$ be an $n$-point subset of Euclidean space and $d\geq 3$ be an integer. In this paper we study the following question: What is the smallest (normalized) relative change of the volume of subsets of $\mathcal{P}$ when it is projected into $\RR^d$. We prove that there exists a linear mapping $f:\mathcal{P} \mapsto \RR^d$ that relatively preserves the volume of all subsets of size up to $\lfloor d/2\rfloor$ within at most a factor of $O(n^{2/d}\sqrt{\log n \log\log n})$.
△ Less
Submitted 2 March, 2010;
originally announced March 2010.