Uniform Approximation of -Invariant and Antisymmetric Functions, Embedding Dimensions, and Polynomial Representations
Abstract
For any subgroup of the symmetric group on symbols, we present results for the uniform approximation of -invariant functions by -invariant polynomials. For the case of totally symmetric functions (), we show that this gives rise to the sum-decomposition Deep Sets ansatz of Zaheer etĀ al. (2018), where both the inner and outer functions can be chosen to be smooth, and moreover, the inner function can be chosen to be independent of the target function being approximated. In particular, we show that the embedding dimension required is independent of the regularity of the target function, the accuracy of the desired approximation, as well as . Next, we show that a similar procedure allows us to obtain a uniform approximation of antisymmetric functions as a sum of terms, where each term is a product of a smooth totally symmetric function and a smooth antisymmetric homogeneous polynomial of degree at most . We also provide upper and lower bounds on and show that is independent of the regularity of the target function, the desired approximation accuracy, and .
Keywords: Universal approximation, Embedding dimension, Polynomial representations, Symmetric and antisymmetric functions.
1 Introduction
Currently, deep learning is widely used with great success in many applications, which have been organized into five broad categories: classification, localization, detection, segmentation, and registration (Alzubaidi etĀ al., 2021). Subsets of the applications are used for image processing (Krizhevsky etĀ al., 2012; Tian etĀ al., 2020), audio and speech recognition (Deng etĀ al., 2013; Tjandra etĀ al., 2017; Khurana etĀ al., 2021), natural language processing (Deng and Liu, 2018; Otter etĀ al., 2021; Lauriola etĀ al., 2022), autonomous vehicles (Wan etĀ al., 2021; Zhu etĀ al., 2022), and many others (Goodfellow etĀ al., 2014; Silver etĀ al., 2017; Soffer etĀ al., 2019; Adeel etĀ al., 2020; Muhammad etĀ al., 2021; Zeleznik etĀ al., 2021). The common approach to most of these designs is building large neural networks (NNs) with deep layers; unfortunately, this often leads to one of the biggest challenges in deep learning in terms of computations, known as the curse of dimensionality. With large designs, the dimension of the data parameters increases; this causes an exponential increase in the number of necessary data samples that the model needs to properly learn the dataset. As a consequence, computational complexities also increase exponentially.
Then, how does deep learning tackle the curse of dimensionality? There are many developing theories and methods that mitigate the problems of the curse, but the means to overcome the curse of dimensionality remains an open problem; there is also an assumption that the problem cannot be eliminated entirely due to the nature of neural networks, which is ubiquitous in deep learning. The current theories that suggest a large moderation of the curse include automated feature extractions (Laird and Saul, 1994) and the manifold hypothesis (Cayton, 2004). Regularization methodologies such as dropout, batch normalization, and weight decay (Garbin etĀ al., 2020; Loshchilov and Hutter, 2019; Xie etĀ al., 2022), which serve to avert overfitting, can indirectly alleviate the curse by allowing the model to prevent learning noises in data. Another approach is exploiting the structure of the model function that can exhibit locality (Espi etĀ al., 2015; Zhang and Zhang, 2021) and/or symmetries (Zaheer etĀ al., 2018; Qi etĀ al., 2017a, b). The latter will be the main focus of our work; we propose theories that are relevant and favorable to deep learning problems with symmetries, specifically those involving invariance and antisymmetry. In fact, there has been a recent conjecture that under certain circumstances, permutation invariant NNs will have no barrier in linear interpolation of stochastic gradient descent solutions (Entezari etĀ al., 2022).
In this paper, we study approximations and polynomial representations of -invariant functions (for some [Lie] group ) and antisymmetric functions; note that -invariant functions are symmetric (or permutation invariant) when is the symmetric group on the domain of the function. Intuitively, these symmetries allow the neural networks to learn from the inputs regardless of transformations by the action of for -invariance or of transformations up to a signature for antisymmetry. There are two main advantages that come from our theories: First, the approximations give high accuracy, and they are useful for applications that require higher-order derivatives, e.g. solving many-electron Schrƶdinger equations (Han etĀ al., 2019; Pfau etĀ al., 2020; Choo etĀ al., 2020). Second, the polynomial representations are prevalent in STEM and bypass additional costs to have polynomial representations when the learning model only trains for some (continuous) functions. In fact, our work is a specific contribution to the long standing result called Universal Approximation Theorem, which informally means that any function can be approximated using neural networks. The result was originally proven by Cybenko (1989) for sigmoidal activation functions; Hornik (1991) later proved this result for any-nonlinear activation functions, which has a similar proof to Barronās Theorem on approximations of nonlinear, continuous functions. Due to this theorem, it is always sufficient to approximately represent -invariant and antisymmetric functions instead of using exact representations.
Furthermore, we specifically study approximations of -invariant, symmetric, and antisymmetric functions because of their ubiquity in science and technology: Neural networks that are inherently invariant under an action include -dimensional Convolutional Neural Networks (CNN) under translation, Spherical CNN under rotation, intrinsic CNN under the isometric group on the domains, and so on (Bronstein etĀ al., 2021). These architectures are currently used in signal identification, object detection, image classification and segmentation, face recognition, etc (Li etĀ al., 2022). Similarly, some architectures with permutation invariance are Graph Neural Networks (GNN), Deep Sets (Zaheer etĀ al., 2018), and Transformer: Some applications of GNN are traffic forecasting, molecular optimization, rumor detection, node and graph classification, and many others (Asif etĀ al., 2021); of Deep Sets include point clouds prediction and bounding boxes (Soelch etĀ al., 2019); of transformer comprise of answer selection (Shao etĀ al., 2019) and stock volatility (Ramos-PĆ©rez etĀ al., 2021). Lastly, neural networks can also learn solutions of systems that exhibit antisymmetry; in the physical sciences, there are many interests to find approximations of solutions to systems of many-fermion or many-Boson (Choo etĀ al., 2020; Han etĀ al., 2019; Hermann etĀ al., 2020; Klus etĀ al., 2021; Luo and Clark, 2019; Pfau etĀ al., 2020; Stokes etĀ al., 2020). In particular, Pfau etĀ al. (2020) developed FermiNet as an ansatz on top of the variational Monte Carlo (VMC) model to approximate the solutions for many-electron systems; the ansatz is based on the notion of generalized Slater determinants (Hutter, 2020). The method gave large improvements on the VMC model for many atoms and small molecules; this essentially opens the door to solve previously intractable many-electron systems.
Finally, this study is made on the foundational work of Zaheer etĀ al. (2018), who designed models with machine learning tasks defined on sets; the work was done on permutation invariant sets and equivariant tasks. Their work demonstrates great qualitative and quantitative results from experiments on statistic estimation, point cloud classification, set expansion, and outlier detection. Afterwards, many studies follow their work including PointNet (Qi etĀ al., 2017a, b), Deep Potential (Zhang etĀ al., 2018a, b), Set Aggregation Networks (Maziarka etĀ al., 2019), and so on. To build the context of our work in relation to Deep Sets, the technical and historical aspects are reserved in Section 1.1.
1.1 Background
The main theorems of this paper concern the approximations and representations of -invariant functions, totally symmetric functions, and -antisymmetric functions. Let us define them here. First for convenience, call the space dimension and the multivariate dimension, and let . By convention, , so we write non-negative numbers as . Throughout this study, we assume and unless stated otherwise. We also denote points in using boldface letters such as , and its components as .
Definition 1 (-invariant function)
Let and be a group which acts on . A function is -invariant if
(1) |
for all and .
Definition 2 (Totally symmetric function)
A -invariant function is called totally symmetric when , the symmetric group.
Definition 3 (-antisymmetric function)
Let . A function is called -antisymmetric if
(2) |
for all and , where is the sign of the permutation .
As mentioned in the last section, representation of totally symmetric, -invariant or -antisymmetric functions using simpler such functions can significantly reduce the curse of dimensionality. There are results which represent these functions exactly and there are also many approximate representation theorems (some mentioned below). From a mathematical viewpoint, finding an exact representation is solicited but because of the intrinsic approximating nature of Neural networks (Universal approximation theorem), an approximate model works just as well for all practical purposes. In fact many times, keeping room for approximation allows us to get a much simpler representation- an example of which can be found in this current work. As mentioned before, this line of research boomed due to a viable architecture of getting exact representation of continuous, totally symmetric functions, which was proposed in the paper Deep Sets by Zaheer etĀ al. (2018). In this work, symmetric functions were interpreted as functions on sets because the order of the elements does not matter in a set. The concept of set-valued functions from this last work was extended to functions of multisets in Xu etĀ al. (2019). The main idea behind Deep sets is to process individual set elements in parallel using a shared encoding function and then combine them using a symmetric āpoolingā function such as summation, average, or max-pooling. This idea behind deep sets was generalized considerably by the work on -ary Janossy pooling by Murphy etĀ al. (2019). A very rigorous theoretical understanding of the latent dimensions of Deep Sets architecture and Janossy pooling paradigm was indicated in Wagstaff etĀ al. (2021). The result of Zaheer etĀ al. (2018) was further corroborated by Chen etĀ al. (2023) recently where they proved that any totally symmetric continuous function can be expressed as a composition of two continuous functions. To understand it better, we should introduce some terminologies in the following (Jegelka, 2022):
Definition 4 (Inner, Outer Functions, Embedding dimension)
If a real valued, totally symmetric function evaluated at a set can be expressed as where is independent of the , then is called the outer function and mapping to , is called the inner function. The dimension of the range of the inner function ( here) is called embedding dimension.
The Deep sets ansatz states that a continuous totally symmetric function for a compact , can be expressed as a composition of continuous inner and outer functions in the form . For the proof of this ansatz can be found in Zaheer etĀ al. (2018) and for it has been shown in Chen etĀ al. (2023) that any continuous totally symmetric function can be written as where is the collection of all generators of totally symmetric polynomials. Hence number of such generators is the embedding dimension here i.e. . More about this can be found in Section 3.1.
There were not many results available about how to represent -antisymmetric functions effectively, especially when . There are some broad schemes of doing it, named as Backflow, Jastrow and Slater determinant ansatz (Zweig and Bruna, 2023). We now describe them briefly in the following. For complex valued functions from we define as . With this notation one can define the antisymmetric projection of tensor product of functions as
Up to some rescaling these projections are called the Slater determinant of the functions . The functional form of Backflow ansatz (with a single term) is
where is a totally symmetric function. Similarly the functional form of Jastrow ansatz (with a single term) can be written as
where is an equivariant function i.e. (by . we mean group action here). In the end the functional form of Slater determinant ansatz with terms is
It can be seen in Zweig and Bruna (2023) that the Jastrow ansatz is special case of backflow ansatz and the Slater determinant ansatz is a special case of Jastrow. Some very interesting theoretical work around these ansatzs involving Slater determinants can be found in Hutter (2020), Abrahamsen and Lin (2023).
In a very recent work the problem of exact representation of continuous -antisymmetric functions has been solved in Chen and Lu (2023). However we provide an example here that shows that their antisymmetric representation theorem cannot be directly used to give exact representations for or in general functions i.e. such a function cannot be written as where has same regularity as . Similar example in case of totally symmetric functions has been constructed in Chen etĀ al. (2023, pageĀ 7) where the function is yet cannot be so at a particular point. Before we present the counterexample, let us recall their recent work. According to Chen and Lu (2023), given and , a function satisfies assumption A if
-
(i)
is -antisymmetric and continuous for each ,
-
(ii)
if and only if for some with ,
-
(iii)
If then there exists a permutation such that .
Then the following theorem was proved in their paper:
Theorem 5
Given and compact set, if satisfy assumption A then for any -antisymmetric, continuous function , there exists a unique that is continuous and odd, satisfying
where is equipped with the topology induced from .
To construct a counterexample of this theorem in the category, let us take and . First we construct where satisfy assumption A above. Let us take for defined by , for . Then clearly s are antisymmetric under the action of , continuous and if then . Conversely, if then it implies i.e. . Also if then from above we can note that and . Then using the fact and writing explicitely, we get and . This implies and or in other words and . Now let us take as . We see is continuous function that is antisymmetric and at the origin. Then according to Theorem 5 above there exists that is continuous and odd and satisfies for all . However, now we show that this function cannot be (or even differentiable) at . Let us take . We investigate the differentiability of at along as . We note as is odd. If we have is differentiable at with derivative of being at , we must have
(3) |
But by definition which gives us for small , whereas showing that the limit in equation (3) does not exist.
One should note that the exact representation theorems by Chen et al. above do not strictly fall under the Backflow ansatz mentioned earlier. Similarly Han etĀ al. (2022) modified some results in the Deep Sets paper and showed that arbitrary uniform approximation of symmetric or antisymmetric functions defined on a compact subset of some Euclidean space is possible but the latent number of variables (embedding dimension) is dependent on the gradient of the function being approximated, and the order of approximation along with number and the dimension , of the input variables. We state their exact results in the following:
Theorem 6
Let be a continuously differentiable, totally symmetric functions, where is compact. If , then there exists , such that for any , we have
where with .
Theorem 7
Let be a continuously differentiable, -antisymmetric functions, where is compact. Then there exists permutation equivariant mappings and permutation invariant functions for such that for , we have
where and for each there exists , with such that for any ,
1.2 Our contributions
In this paper, we prove results that can be grouped as two broad contributions:
-
(I)
Let be compact and be a compact (Lie) group that acts on . If is -invariant and , then there exists a -invariant polynomial that is arbitrarily close to under -uniform norm on . This result remains true for permutation invariant , i.e. , which can be arbitrarily approximated by totally symmetric polynomials. Furthermore, since totally symmetric polynomials are finitely generated by totally symmetric power sums, its exact number of generators (embedding dimension) is .
-
(II)
Let be compact. If is -antisymmetric and , then there exists an -antisymmetric polynomial that is arbitrarily close to under -uniform norm on . Furthermore, since -antisymmetric polynomials form a finitely generated module over the totally symmetric polynomials, the exact number of module generators are stated in the following cases: When and , the number is 1; when and , the number is ; when and , the number is the -th Catalan number (Garsia and Haiman, 1996). For the general case, the lower and upper bound for the minimum number of module generators are stated as and , repectively, where for some .
Essentially, the representation of the approximation in the totally symmetric case follows the same sum-decomposition of Deep Sets. Both the inner and outer functions can be chosen to be polynomials, and the inner function is chosen independently from the . In addition, the embedding dimension is also independent of the -accuracy of the approximation, the target function , and the -norm. These independencies are in contrast to Theorem 6 in the work of Han etĀ al. (2022); specifically, both the inner and outer functions can only be made to be continuous, and the inner function is dependent on . The embedding dimension is dependent on the -accuracy of the approximation, the target function , and the gradient of .
For the -antisymmetric case, the representation of the approximation is a sum of terms, where each term is a product of a smooth totally symmetric polynomial and a smooth -antisymmetric, homogeneous polynomial of degree at most , which does not depend on the function ; in contrary, the functions in Theorem 7 depends on the target function. Similarly, the number is also independent of the -accuracy of the approximation, the target function , and the gradient of , unlike the one presented in Han etĀ al. (2022). The approximation results in these contributions are presented in Section 2, and the results for the representations are shown in Section 3.
1.3 Structure of the paper
This paper is organized as follows: In Section 2, we prove the theorems of uniform, arbitrary approximations of -invariant functions for some (Lie) group , symmetric functions, and -antisymmetric functions by such polynomials. Common notations are introduced in Section 2.1. Section 2.2 contains the necessary results and main proofs for uniform, approximating -invariant and totally symmetric functions. A similar proof is given for -antisymmetric functions in Section 2.3.
Section 3 is for the representations of totally symmetric and -antisymmetric polynomials, which are used for uniform, approximations in Section 2. We discuss the embedding dimension needed for totally symmetric polynomials, which are generated as an -algebra following the work of Chen etĀ al. (2023). The -antisymmetric polynomials are shown to form a finitely generated module over the totally symmetric functions in Section 3.2, and the bounds for the minimal number of generators are also given. In the end, Table 2 summarizes our results, which shows considerable improvement in representation of -antisymmetric polynomials over existing literature.
Appendix A provides the necessary background on commutative algebra for the readers, which will be used in Appendix B. Lastly, detail expositions on representation theory of general, finite -invariant polynomials are given in Appendix B along with necessary results on -antisymmetric polynomials; these are used to infer on the minimal number of generators in Secton 3.2.
2 approximation of -invariant and -antisymmetric functions
The approximation theory presented in this section applies to polynomials with coefficients in both or . We will focus on the proof for real polynomials here. For complex-valued polynomials, the same results are true, and we simply need to consider our results for the real and imaginary parts of the polynomial separately.
2.1 Notation
The approximations for -invariant functions will be shown for a compact Lie group , which has an action on ; the results for totally symmetric and -antisymmetric functions will then actually follow quite closely from the proof of -invariant functions. Notably, given such a compact Lie group , is a bijection on , so the action of is simply a permutation in . Therefore, one of the essences of proving our theorems is understanding how the permutations behave on the indices of and their involvement in the differentiation for the approximation. The other essence is to consider an appropriate function space that will eventually guarantee the existence of the approximation, which we will describe next.
The following -algebra (see DefinitionĀ 21) and topology are needed to describe a subalgebra that will be crucial for our initial approximation of these functions. Suppose is an open set. Define as the -algebra of all times continuously differentiable real valued functions on , and it is abbreviated to , as the codomain of these functions will always be . This space is endowed with the topology , which is the compact-open topology of order ; in other words, it is the topology of uniform convergence for the functions and all their partial derivatives up to the -order on compact subsets of . Next, notice that if is compact, then there always exists an open set such that . Pick any such , and then we may define the vector space of -times continuously differentiable functions on , which also forms an -algebra. We will equip this space with the -uniform norm defined as the maxima over continuous derivatives up to order , restricted to the compact set :
Definition 8 ( norm)
Let be compact, and suppose . Then we define
(4) |
where is a -dimensional multi-index, , and
(5) |
With the norm , the vector space turns into a Banach space, and is independent of the choice of the open set .
Let us introduce some notations to describe the permutations and multi-indices that are involved for the general case . Recall that for , may be written as using the standard basis. Then for , may be viewed as a -dimensional vector, which can be written similarly. In particular, a basis is needed, but for practicality, we introduce
(6) |
where and . Hence for any , we may write . This representation for the general case is relevant below in showing that the action can be described as an action on the -indices of the basis , instead of the indices of the vector .
2.2 approximation via -invariant and totally symmetric polynomials
In this section, we are interested in the approximation of totally symmetric functions and, in general, -invariant functions for some compact (Lie) group which has an action on . We first attain new observations of permutations on the indices of and then see the behaviors affected by differentiation. Subsequently, proofs for the approximations are given.
The main theorem of our section is the approximation for -invariant where is a compact (Lie) group. This result certainly will cover the case of because it is a finite group. We have the main theorem and its corollary:
Theorem 9 ( approximation: -invariant function)
Let be compact, and assume is a compact (Lie) group that acts on . Let be -invariant and . Then for every , there exists a -invariant polynomial such that
(7) |
Corollary 10 ( approximation: totally symmetric function)
Suppose is compact. Let be totally symmetric and . Then for every there exists a totally symmetric polynomial such that
(8) |
Note that for any group acting on , is -invariant, and this is true by constructing to be the -product of . In fact, the assumptions in the results above (and later results) can be looser. The approximation results are true for some general compact set because can always be made -invariant by the following argument: suppose is a compact (Lie) group that acts on ; then for any , is a bijection on , so
for some subgroup of . Note that is compact since is continuous on the compact set , and since is finite, is compact as it is the finite union of compact sets. Now, it is clear that is -invariant, so it may replace in the assumptions of our results.
Now, our proof actually begins with Nachbinās characterization (Nachbin, 1949) that the set of polynomials of real coefficients with -variables (a subalgebra) is -dense in . The next theorem is a consequence of Nachbinās Theorem.
Theorem 11 (Theorem 2.2, Prolla and Guerreiro (1976))
Suppose is open and is endowed with the topology . If be a polynomial algebra, then is -dense in if and only if the following conditions are satisfied
-
(a)
For any with , there exists such that .
-
(b)
For any , there exists such that .
-
(c)
For any and with , there exists such that .
It is easy to see that our subalgebra, the set of polynomials with real coefficient is -dense in the set , so Theorem 11 will be integral in our proof later. Now, we look to observe how acts on and behave in the differentiation for some :
Lemma 12
Suppose and . Let , then
(9) |
ProofĀ It suffices to show this for transpositions in . Without loss of generality, let .
This completes our proof.
Ā
The next lemma will be important for us to bridge that connection between and in the proof of Theorem 9. A general version of this lemma can be found in Kane (2001, pageĀ 261), but we present a more elementary proof for the easier case here. Even though the lemma is proven for , this can be applied for any group that has an action on because any such element induces a permutation of .
Lemma 13
Suppose is smooth. Let be a multi-index and , then
(10) |
ProofĀ We prove this for the general space dimension , and we will prove this statement using induction on the order of . For base case , we have , so the statement above is true trivially. For base case , we take for and . We compute , which involves the chain rule and the term . Hence, let us compute this first:
where the second equality follows from lemmaĀ 12. Now, we write
since . This concludes the base cases.
Assume that the statement is true for and we want to show that it is also true for . Let , then
because . The second equality follows from the induction hypothesis.
Ā
ProofĀ of Theorem 9.
First, we approximate up to the -order derivatives by a polynomial using Theorem 11. This allows us to assume that
(11) |
Now consider the symmetrized polynomial defined by
where is the Haar probability measure associated with the group . Similarly, since is -invariant, it can be written in the same form: .
In the above equation, follows by symmetrization, and follows from Lemma 13. We now briefly explain the reasoning for the other steps. follows by noting that for any , is always bounded on because is compact. Hence, letting for all , and since , the differentiation can be passed through the integral. To obtain , we note that for any , and thus integrating over the group retains the inequality
follows because we are taking the maximum. follows because for a fixed and all the such that , is a bijection on the list of ās, so it is the same list of ās.
Ā
2.3 approximation via -antisymmetric polynomials
In this section, we prove a similar approximation for -antisymmetric functions. The formalism in this section will mirror that which was presented in Section 2.2. There is a notion of āskew-symmetricā polynomials that generalize the notion of antisymmetry but in that the group has to be generated by reflections or pseudo-reflections. For such compact groups acting on , our results below will hold true. However, we will only work with in our study here. Interested readers may find the notion of skew-symmetric polynomials in Bergeron (2009, pageĀ 93).
Theorem 15 ( approximation: -antisymmetric function)
Let be compact. Suppose that is -antisymmetric and . Then for every there exists an -antisymmetric polynomial such that
(12) |
ProofĀ Using Theorem 11, we approximate up to the -order derivatives by a polynomial . Hence, let
(13) |
Now consider the -antisymmetrized polynomial defined by
Similarly, since is -antisymmetric.
We prove the approximation in a similar manner as in the proof of TheoremĀ 9.
Ā
3 Representations of totally symmetric and -antisymmetric polynomials
The representation theory presented in this section applies to polynomials with coefficients in any field of characteristics zero. However, we will mainly show the results for real polynomials.
3.1 Representations of the totally symmetric polynomials
In Section 2.2, a totally symmetric is shown to be uniformly approximated by a totally symmetric polynomial over , where is compact. In this section, we infer more about this totally symmetric polynomial .
For consistency, we are using the notations in Chen etĀ al. (2023) which reference from Briand (2004) (page 359, Theorem 3). Let us denote as the -algebra consisting of all multi-symmetric polynomials with real coefficients, i.e. real totally symmetric polynomials in . Furthermore, is generated as an -algebra by multi-symmetric power sums:
(14) | ||||
where and . One can define a total ordering on these indices and enumerate them as , with from above for all . One can easily check that
(15) |
Then can be written using the generators of the algebra:
(16) |
where . Now, based on the structures of and , we can further simplify our representation. First, encode the information of in an matrix , whose entries are
(17) |
for and . Also, note that the -th column of the matrix can be written as
By (16), is a function in terms of ; then from (14), is a function in terms of , where the sum is the so-called inner function. This functional dependence is clearly smooth. Dimension of the range of this function is the āembedding dimensionā, which is . This is a considerable improvement compared to Han etĀ al. (2022), where the upper bound on the embedding dimension depends on and the norm of the gradient of the approximated function.
3.2 Representations of the -antisymmetric polynomials
This section now utilizes a fair amount of commutative algebra. The relevant definitions and results for this section can be found in Appendices A and B. Given the polynomial ring where , the representation of -antisymmetric polynomials is a more subtle problem than that of totally symmetric ones. Unlike symmetric polynomials, -antisymmetric polynomials do not form an -algebra because the closure axiom fails in general. Hence, we cannot talk about the generators for this algebra. However, we shall find representations for -antisymmetric polynomials with respect to the totally symmetric polynomials, where the representation of the latter has been discussed in Section 3.1.
3.2.1 Finite generation of -antisymmetric polynomials and the case
In fact, the -antisymmetric polynomials form a finitely generated module over the ring of totally symmetric polynomials (Appendix B, Lemma 28 ). Consequently, we want to determine the minimum number of module generators of -antisymmetric polynomials over the totally symmetric polynomial ring; this is a required analysis because unlike a vector space over a field, there is not a single number of generators for a finitely generated module. The solution for the exact minimum number of module generators for the general case of and is daunting and very difficult to determine, so instead we establish some upper bounds (and lower bounds) on this minimality. This is the main result of the section, which is discussed in Section 3.2.2. In addition, we also establish the minimality for specific cases here as well.
In the case and , the minimum number of generators is 1, which is well known, and goes back to Cauchy (1905). In particular, any -antisymmetric polynomial can be written as a product of a symmetric polynomial and Vandermonde determinant . In other words, if is -antisymmetric then there exists a symmetric polynomial such that . Since for any , and , the result of the action is . We make a useful observation that for any transposition where , = - implies that vanishes on the hyperplane ; this means is divisible by .
For and , the solution takes the inspiration from the observation above, which will be used in its proof. First, let us consider Vandermonde determinants of all the scalar variables coming from ; in other words, let where . The result is proven in the following lemma:
Lemma 16
Let and . Any -antisymmetric polynomial can be written in the following form:
(18) |
where are totally symmetric polynomials.
ProofĀ In this particular case, where , and in this proof, we write . Let be the permutation transforming , so the -antisymmetry of can be written as . The -transformation on is identified with the matrix
(21) |
where is the identity matrix and is the zero matrix. To arrive at the proposed form, we observe the 2-antisymmetry in a different coordinate via a linear transformation and factor out terms. First, let and for . This matrix of transformation is
(24) |
so that . If , which is also a polynomial, the condition translates in the new coordinates as . One can easily compute and see
(25) |
Now, if is expressed as an -linear combination of monomials, then this last condition is also valid for each monomial term. We see this by writing where
Furthermore, and are monomials with the form where is odd for monomials and even for monomials . From equation 25, we must have
and from the conditions on the monomials, . This means . Also, , so each monomial obeys equation (25). Hence, for any index , which is written explicitly as . This means must be odd, so at least one of the must be odd. Let us now consider the case where is odd for some . Here, we can factor out one from , so where . Because must be an even number, we see that has the following symmetry:
(26) |
In the original coordinate , it simply means that (expressed in coordinate) will be a totally symmetric monomial. Noting that , we see , when expressed in coordinate, can be factored as times a totally symmetric monomial. This factorization of one of can be done for each monomial , so this concludes our proof.
Ā
3.2.2 Bounds for minimum module generators of -antisymmetric polynomials for
When we are representing -antisymmetric polynomials, the use of Vandermonde determinants seem intuitive. We may mirror the previous Lemma 16 for the general case and ; however, this will not hold true. One immediate reason is the degree of some -antisymmetric functions. Recall the Vandermonde determinants . Their degrees are , so some -antisymmetric functions of the same or lower degree cannot be represented this way. For example, when and , one such function is where .
This brings us to the realm of hardcore representation theory where finding the minimal number of generators for -antisymmetric polynomials over the symmetric polynomial rings have been tackled by many eminent mathematicians over the years. For and , the exact minimal number of such generators was found by a deep result of Haiman as a solution of the Haiman-Garsia conjecture. (See Garsia and Haiman (1996), Haiman (2003), Haiman (1994), Bergeron (2009) for background. For explicit reference see Haglund etĀ al. (2004), page 2.) The solution would be the -th Catalan number given by . For the record, when this solution agrees with our Lemma 16.
Let and , and let us follow the notations from Appendix B from here onwards. The exact minimal number of -antisymmetric generators is given by the dimension of the vector space , which is given by Lemma 36. As once stated, we aim to the determine the upper and lower bound of this minimal number of generators, and some great work has been done by Wallach (2021); we will draw much from this work to find these bounds. For convenience, -antisymmetric polynomials are now often called āalternants,ā which is the same terminology used in Wallach (2021).
Let us determine the lower bound first. Note that the the set of minimal degree alternants form a vector space over , and let this minimal degree be . We will use this later on. In Wallach (2021, TheoremĀ 1), the lower bound of the number of module generators for alternants over symmetric polynomials is given by the dimension of the vector space spanned by this set of minimal degree alternants. We provide an argument here, relating these two numbers: Suppose is the minimal set of alternants that generate the -antisymmetric polynomials as a module over the totally symmetric polynomials. Now, recall that any polynomial can be written as sum of homogeneous polynomials, i.e. , where is the homogeneous part with degree . By Lemma 29, if is an alternant, then are also alternants. Now, suppose , which is the minimal degree in the set of minimal degree alternants, then must be homogeneous because of the minimality assumption. Consequently, for some totally symmetric polynomials . Furthermore, for any , , where are nonnegative integers. In order to satisfy the minimality assumption for the degree , every index where and the corresponding . Hence, where for , and so is represented by the dimension of the vector space spanned by degree alternants. (Also note that the minimal generating set of -antisymmetric polynomials can be comprised of homogeneous polynomials, see Theorem 37 in Appendix B.)
Before stating the lower bound for the number of generators, recall a standard combinatorial fact: Given , can be written uniquely as with and . From Wallach (2021, TheoremĀ 1), the minimum degree of the minimal generating set of alternants is ; in the proof, the dimension of the vector space spanned by set of minimal degree alternants, which give the lower bound, is
Now, we want to establish the upper bound for the minimum number of generators for the alternants over the totally symmetric polynomials. According to Wallach (2021), the maximum degree of the minimial generating set of alternants is , and this fact is proven in Lemma 31, Appendix B. Now, mirroring Wallach (2021), define a graded lexicographic order on the elements of : For any , if or if , then if being the first index where , then . For convenience, let us define where , and let be the variables. Using a shorthand notation, we define a monomial as
Now, define on any function with the variables by
where, . Then, a basis for the alternants as a vector space over is the set
according to Wallach (2021, PropositionĀ 4); of course, this is not the set of interest in our study. However, scalars in are also totally symmetric polynomials, so with the additional condition of maximal degree of , i.e. , becomes a set of generators for the alternants over the totally symmetric polynomials, see Theorem 37. Let us call this set , and so its cardinality is actually a loose upper bound on the minimum number of generators. Actually, finding this cardinality is a very difficult combinatorial problem, which might not have a closed-form expression; this is mainly due to this ordering where are vectors in . Hence, for simplicity, we compute the number of non-negative integral solutions of , which is the cardinality of a superset of without the ordering condition. The number of non-negative integral solutions give an upper bound
on the minimum number of generators of -antisymmetric polynomials as a module over the totally symmetric polynomials.
We tested our bounds in Table 1, which shows the exact numbers of generators for and falling (quite crudely) between the lower and upper bound. We also summarize our findings of Section 3 in Table 2. We emphasize here again that the exact number or the bounds on the generators (of algebra or module) that we obtain here are all independent of any function that we are approximating and the degree of approximation as we are simply dealing with polynomials now.
Bounds and Exact Minimal Number of Generators for | |||||
Exact | Bounds | ||||
Lower | Upper | ||||
3 | 3 | 0 | 5 | 1 | 84 |
4 | 3 | 1 | 14 | 3 | 3003 |
5 | 3 | 2 | 42 | 3 | |
6 | 4 | 0 | 132 | 1 | |
7 | 4 | 1 | 429 | 4 | |
8 | 4 | 2 | 1430 | 6 |
Summary of Results | |||||
Polynomial Type | Exact | Bounds | |||
Lower | Upper | ||||
ā | ā | ||||
ā | ā | ||||
ā |
4 Acknowledgments and disclosure of funding
S.G. and K.T. are grateful to their peers N. Ramachandran, H. Bhatia, and S. Bhattacharya, in the Dept. of Mathematics, UCSD for enlightening conversations about many algebraic facts. S.G. is thankful to S. Chhabra for introducing him to the broad area of mathematical deep learning. The authors are grateful to Prof. Steven Sam and Prof. Brendon Rhoades for illuminating conversations on commutative algebra and representation theory. R.S. would like to thank the Institute for Pure and Applied Mathematics, Stanford for being generous hosts, and for providing a great environment during the period when most of this work was completed. The authors would also like to thank Prof. Nolan Wallach, for communicating many of the results and proofs related to invariant and representation theory, which have been used extensively in AppendixĀ B. The authors declare that they have no competing interests.
Appendix A Necessary background for commutative algebra
In this section, some basic definitions and theorems from commutative algebra are given, which are useful in understanding the results in Section 3.2 and AppendixĀ B. For more details, the reader is referred to any standard textbook such as Atiyah and Macdonald (1969), and the results will be referenced. We assume the reader has a basic understanding of groups, rings, and vector spaces over fields. Below the operations of ring addition and multiplication are denoted using the symbols and respectively. We will also use these same symbols later to denote the group addition and scalar multiplication operations for a module (defined below in DefinitionĀ 18), but the distinction will always be clear from context, and no confusion should arise.
If is a ring, for any , we will frequently write to mean , when there is no chance for confusion. A ring is unital if multiplication has an identity element (denoted as ). A ring is called commutative if for all . Throughout this paper, we will assume our rings to be unital, commutative rings. A nonzero commutative ring in which every nonzero element has a multiplicative inverse is called a field, e.g. and . A nonzero commutative ring is called an integral domain if , for , implies either or is the zero element of . For example, if , then the set of polynomials , form a ring when equipped with the usual addition and multiplication operations for polynomials. This polynomial ring plays a key role in this paper. In fact, is also an integral domain (Atiyah and Macdonald, 1969, PageĀ 2). Any subring of inherits the same addition and multiplication operations from . A notion that we will repeatedly encounter is that of an ideal of a ring:
Definition 17 (Ideal of a ring)
An ideal of a commutative ring is a subset of such that is a subgroup of , and for every and , the product .
Another important class of objects that we will also encounter frequently is the notion of a module over a ring, which generalizes the concept of a vector space over a field.
Definition 18 (Module over a ring)
Given a ring , a set is called an -Module if is an abelian group, equipped with an operation satisfying , , , , for all and . The operation is called scalar multiplication.
For a -module , the group identity will also be denoted as , and again the distinction from the ring unit will be clear from context. If is an -module and is a subgroup of , then is a -submodule if for any and any , . Recall that if is an ideal of a ring , then is naturally an -module. An -module is finitely generated if there exist finitely many elements such that every element of is a linear combination of with coefficients from .
We next discuss Noetherian rings and Noetherian modules which play a key role in SectionĀ 3.2, but we will not give the axiomatic definition of these objects (which depends on ascending chain conditions). Alternate and equivalent definitions are given below because they are more relevant to the material in this paper.
Definition 19 (Noetherian ring)
A ring is called Noetherian if every ideal of is finitely generated as a -module.
Definition 20 (Noetherian module)
A -module is called Noetherian if every submodule of is finitely generated over .
For the characterization of Noetherian modules that we gave above in DefinitionĀ 20, the reader is referred to Atiyah and Macdonald (1969, PropositionĀ 6.2). Finally, we define the notion of an associative algebra over a ring, which is also extensively used in this paper.
Definition 21 (Associative algebra over a ring)
Let be a ring, and let be a -module. Then is called an -algebra (or an algebra over ), if also forms a ring such that the ring addition is the same operation as module addition, and module scalar multiplication satisfies for all and , where denotes ring multiplication in .
When the associative algebra ring multiplication is also commutative, we say that it is a commutative algebra. All the associative algebras that we will encounter in this paper are commutative algebras, and hence for simplicity we will simply refer to them as algebras going forward. An -algebra is finitely generated if there exists a finite set of elements such that any element of can be written as a finite linear combination of terms, with coefficients in , where each term is a finite product of the elements in . For example, the polynomial ring introduced previously, forms an -algebra. It is in fact finitely generated by the set .
Another notion that is needed in SectionĀ 3.2, is that of integral elements in a ring, over a subring. Next we define this notion, and then state an important result involving integral elements (PropositionĀ 23 below, the proof of which can be found in Atiyah and Macdonald (1969, PropositionĀ 5.1)).
Definition 22 (Integral element over subring)
Let be a ring and be a subring. An element is integral over , if is a root of a monic polynomial with coefficients in . Here monic polynomial means that the polynomial is univariate and the coefficient of the highest degree term of the polynomial is . We say that is integral over if every element is integral over .
Proposition 23
Let be a ring and be a subring. Then is integral over if and only if is a finitely generated -module, where is the subring generated by .
We also need the following three results, the proofs of which can be found in Atiyah and Macdonald (1969, CorollaryĀ 7.6), Matsumura (1989, TheoremĀ 3.7.i), and Atiyah and Macdonald (1969, PropositionĀ 6.5) respectively. Of these, Hilbertās basis theorem is a famous theorem in commutative algebra.
Theorem 24 (Hilbertās basis theorem)
If is a Noetherian ring then is also a Noetherian (polynomial) ring.
Theorem 25 (Eakin-Nagata theorem)
If is a Noetherian ring and is a subring such that is finitely generated as a -module, then is also a Noetherian ring.
Proposition 26
If is a Noetherian ring, and is a finitely generated -module, then is a Noetherian -module.
Appendix B Necessary results from representation theory
In this section, we establish some facts in representation theory, which are used in SectionĀ 3.2 to prove the upper and lower bounds of the minimum number of module generators for the set of -antisymmetric polynomials (as a module) over the totally symmetric polynomials. In particular, these polynomials belong to the ring where . We are grateful to Prof. Nolan Wallach for communicating and discussing these results with us. The results are included here for completeness and may illuminate the readers on the main results of our paper, especially those from the machine learning community. Finally, the results are discussed here for the field ; however, they remain true for polynomials over any field of characteristic zero. For a more detailed discussion of the topics presented here, the reader can refer to Wallach (2017, SectionĀ 3.7.6).
This section is structured as follows: In SectionĀ B.1, we introduce the -algebra of polynomials equipped with an inner-product, identify several modules and subalgebras associated to a group action on the space of polynomials, and state some of their properties. SectionĀ B.2 is dedicated to proving the finite-dimensionality of the -vector space , which is an important subspace of the vector space of polynomials that plays a key role in the analysis. Then in SectionĀ B.3, we prove several structural results for this subspace, eventually culminating with LemmaĀ 36, which is the main tool for quantifying the minimum number of module generators for the space of -antisymmetric polynomials. In these subsections, we first prove the results for a general group action , then state the results for the special case . The precise definition of the group action relevant for us is provided below in SectionĀ B.1.
B.1 Polynomial algebras and modules induced by a group action
We begin by defining the notion of representation of a group:
Definition 27 (Representation of a group)
A representation of a group on a vector space over a field is a group homomorphism from to , the general linear group on . This means satisfies for all . The dimension of is called the dimension of the representation.
Here, we will only deal with finite-dimensional representations. From this definition, it is clear that representations can also be interpreted as group actions. Now since elements of can be represented by invertible matrices, we can take their trace, and the resulting quantity defines the character of the representation, denoted as . Thus for all , if is a representation of a group . A subspace of is called -invariant if for all and . A -representation is called irreducible if and the only -invariant subspaces of are and itself. Interested readers are referred to Sagan (2001, ChapterĀ 1) for more information.
Let be any finite dimensional inner product space over where . Now, we will define a few polynomial -algebras and -modules over , equipped with an inner-product, that we will use extensively. Let be an orthonormal basis of . This allows us to define coordinates in by expressing any vector as , where for all . Now, any can be denoted by its coordinates . We define an -algebra of polynomials on with respect to these coordinates, which we will denote as , i.e. . We also define a positive definite inner product on as
which turns into an inner-product space. We also introduce a useful notation: If are subsets of then we define as the set of all finite sums of the form , where each and . Obviously, . We note a special case where is a subring of , and is a subset of , then is the -module generated by and is a -submodule of .
Next, let be a finite subgroup of acting on as for any , , and . Let us denote as the -algebra of polynomials invariant under action, i.e. , and it is easy to check that it is indeed an -algebra. We will call elements of as invariants in this section ā where the group will be specified and understood from context ā to avoid conflict with the definition of -invariant functions in DefinitionĀ 1. Denote as the subspace of invariants that vanish at . One can then define to be the orthogonal complement of in , where can be easily shown to be the smallest ideal of containing . The space is called the space of -harmonic polynomials, and it is a known result that it is equal to the set of all polynomials annihilated by all -invariant constant coefficient differential operators on (Wallach, 2017, LemmaĀ 3.105). Finally, if is a character of an irreducible representation of , then we can define .
Now let us bring the context of our work into these definitions and notations. For Section Ā 3.2, we can take and , which acts on the first factor of the tensor product. We identify with , and recall the notation of an element in ā denoted as for each . We maintain the same ordering of coordinates under the identification , so that gives coordinates on . In this case, is precisely the set of totally symmetric polynomials, and is the set of polynomials in that are annihilated by all operators of the form where and . Similarly, if we take to be the character of the sign representation of , the set of polynomials , denoted as , are the -antisymmetric polynomials that we defined previously in SectionĀ 2.1. (The sign representation of is the one-dimensional representation of defined by , where is the signature of the permutation .) We note that forms an -module.
We now prove an important property of in the next lemma. One should note that a similar result holds in the general case of arbitrary , , and , and the proof changes slightly.
Lemma 28
For , , and , forms a finitely generated module over .
ProofĀ Recalling DefinitionĀ 22, we can show that is integral over (for a proof, see Atiyah and Macdonald (1969, ChapterĀ 5, ExerciseĀ 12)). In fact, we observe that for any , is a root of the following univariate polynomial in variable : . This polynomial is monic in and the coefficients of in this polynomial all belong to . Now clearly is a finitely generated -algebra, with a generating set given by . We just showed above that each is integral over . Then letting and in PropositionĀ 23, we get that the subring of generated by and is a finitely generated -module. However, the subring generated by is , and so we have proved that is a finitely generated -module.
Next, we note that is a commutative, Noetherian ring, and this is a direct consequence of TheoremĀ 24. Since is a commutative ring (by virtue of being an -algebra), and is a finitely generated -module, it then follows that is a Noetherian ring by Theorem 25. Finally, by PropositionĀ 26, we can conclude that is a Noetherian module over , and thus every -submodule of is finitely generated. Since the -antisymmetric polynomials , form an -submodule of , the proof is complete.
Ā
B.2 Finite-dimensionality of
Our next goal is to prove that introduced in the previous subsection is a finite-dimensional -vector space. To proceed, we need to introduce some more terminology, and let us work more generally for arbitrary and finite group of . As before in SectionĀ 3.2.2, let denote the homogeneous component of of degree . Then for any polynomial of degree , we can write . If is a subspace of such that implies for all , then we say that is a homogeneous subspace. Given a homogeneous subspace of , we can write , with . Let us first prove some properties about homogenous subspaces, and identify a few that are important for us in the next lemma:
Lemma 29
Let be a subring of that is also a homogenous subspace, and let be a finite subgroup of acting on . Then we have the following:
-
(a)
If is a homogenous subspace of , then is also a homogenous subspace.
-
(b)
Suppose are subspaces of , and . Then if any two of them are homogenous subspaces of , then so is the third.
-
(c)
, , , , are homogeneous subspaces of .
ProofĀ (a) Every element is of the form , where each and . Then for any , we have
Since each term in the summation is in , as are homogeneous, we get that .
(b) First assume that and are homogenous subspaces, and let . Then for any integer , , and , we have . From this, we conclude that , as both and , by homogeneity. The other cases follow by a similar argument.
(c) If two polynomials are equal, then so are each of their homogeneous components. For , if we write the group action as , then for , for all . This observation proves that , , are homogeneous subspaces. Then by (a) we see that is a homogeneous subspace since is clearly homogenous. By (b), we conclude that is a homogeneous subspace because .
Ā
We may now specialize to the context of our work, and assume that and . First let and . We define
along with the shorthand notations and . We also let be the lexicographic order on the monomials such that . Then the ideal, denoted as , of symmetric polynomials of positive degree is generated by for . A Grƶbner basis for is given by the polynomials
(27) |
More specifically, these polynomials are in the ideal . Readers can refer to Mora and Sala (2003, PropositionĀ 2.1) for this fact.
Now can be identified as a vector space of real valued matrices . Let be identified as permutation matrices, which acts from the left by multiplication. Consequently, we need to define a lexicographic order on : . We note this lexicographic ordering of the variables also defines a total ordering of the monomials of degree in these variables, for every integer . Let us define the family of polynomials for every and . Consequently, given , its action is defined as . We make the following claim:
Claim B.1
Let and . Then for every , the ideal contains the set of polynomials
(28) |
ProofĀ
Let us fix an arbitrary and note that the ring of real polynomials in is a subring of . Also, every non constant -symmetric polynomial in is particularly a non-constant, totally symmetric polynomial in , i.e. it is in . This makes the ideal generated by the symmetric polynomials of positive degree in the ring a subset of . Then the observations from Mora and Sala (2003), mentioned above, completes the proof.
Ā
The set of polynomials defined inĀ (28) can be used to define another important family of polynomials , where is a -tuple of non-negative integers satisfying . They are obtained by the expansion of (using the definition of ):
(29) |
In particular, are homogeneous polynomials of degree ; we also infer that each is in the ideal , which will be proven using LemmaĀ 30. First, for a fixed , the set has cardinality , which is the number of polynomials in the expression . Let us enumerate them as . Since we can choose freely from , we also have the following lemma:
Lemma 30
There exist points such that the matrix , whose entries are given by , is invertible.
ProofĀ Let us fixed a , which will be determined later. Define and for all . From these choices, is a Vandermonde matrix, and so . This vanishes if and only if for some , and since we want to be invertible, we need to choose such that . It is sufficient to choose such that is not identically the zero polynomial in .
The polynomial can be identically zero if and only if at least one of its factors is identically zero polynomial, i.e. (as polynomials) for some . (Recall that is an integral domain (see appendix A). This means that for some if and only if or .) However, given that every in the enumeration are distinct from each other, we must have (as polynomials), whenever . It means that there is at least one choice of such that .
Ā
Let us choose and fix values of to get an invertible matrix as in LemmaĀ 30. Then for fixed and using (29) we obtain a system of equations, written as , where is the vector formed by the polynomials , and is the vector formed by the polynomials . Since every entry of is in the ideal (by Claim B.1), then every entry of the vector is also in the ideal . This finishes the proof that each .
Next, notice that the leading monomial of based on our lexicographic ordering is , with a coefficient
This is true because the leading monomial requires , which implies that for . Finally, we can now prove our intended result:
Lemma 31
Let . If is a homogeneous polynomial in of degree , then .
ProofĀ We prove this by contradiction. Assume that there is at least one homogeneous polynomial of degree which is not in . Among all such homogenous polynomials of degree , choose such that its leading monomial is minimal with respect to our lexicographic ordering. Suppose the leading monomial of is
where , and . We note that there must exist a first such that (otherwise, if for all , then , contradicting our initial assumption on ). We fix this and write , for some with . Finally, let us consider the following polynomial:
where the leading monomial of is .
Thus the leading monomial of is the leading monomial of . Since , we have implying . However, according to this construction, the leading monomial of is less than the leading monomial of by the lexicographic order. This contradicts the minimality for the leading monomial of .
Ā
LemmaĀ 31 allows us to establish an upper bound on the maximum degree allowed for any polynomial in . This is the reason because where , and is a homogenous subspace by LemmaĀ 29(c). This immediately implies that is finite dimensional as a -vector space where every subspace is spanned by monomials of degree for . This result is recorded in LemmaĀ 32. A couple of remarks about LemmaĀ 31 is needed:
-
(i)
The maximum degree of a polynomial in , does not depend on .
-
(ii)
LemmaĀ 31 also holds for any polynomial whose lowest degree monomial is greater than . This can be shown by expressing in terms of its homogeneous components.
Lemma 32
is a finite dimensional -vector space. The maximum degree of a polynomial in is at most .
B.3 Module structure of -antisymmetric polynomials in terms of
In this section, we establish a connection between the set of -antisymmetric polynomials as a -module and the subspace . This will allow us to quantify the minimum number of module generators of . We prove the first three results (LemmaĀ 33, PropositionĀ 34, and LemmaĀ 35) in a general setting, where is an arbitrary vector space, of is a finite group, and is a character of a representation of . When , , and the character of the sign representation of , LemmaĀ 35 yields the main result of this subsection.
Lemma 33
For every , . Thus, .
ProofĀ We will use strong induction on for the proof. If , then the assertion is true since the constant polynomials are contained in by definition. Assume the assertion holds for all . Since , by comparing homogeneous parts of degree on both sides, we can say . Now, as is homogenous by LemmaĀ 29(c), so we need to show that . Now clearly if since minimum degree of any polynomial in is one. Therefore we conclude
In the equation above, with . Hence, for all , because and are both homogeneous subspaces by LemmaĀ 29(c). This explains the inclusion . Then follows from the inductive hypothesis. Hence, we are done.
Ā
Proposition 34
as an -vector space has a homogeneous polynomial basis.
ProofĀ
By Lemma 29(c), both and are homogeneous subspaces of , which means is also homogeneous. Then if and only if every homogeneous component of is in the intersection. Since every vector space has a basis, let (for some index set ) be a basis of , and let be the set of all homogeneous components of all the elements in , i.e. . Then it is clear that is a spanning set for . Since every spanning set of a vector space contains a basis, our proposition is proven.
Ā
Lemma 35
If is the character of a one-dimensional, real representation of , then . In other words, generates as an -module.
ProofĀ Let be the linear operator given by
Our first goal is to prove that is a projection operator. To make sure maps into , we need to be the character of a one-dimensional, real representation of , because then we have for all . Given that is a finite group, any must have finite order, and hence must be a real root of unity. So we must have . In that case, for any , we have for all ,
Next, if , then . The last equality follows as is a real root of unity, giving us . This completes the proof for being a projection operator.
Next, commutes with any -invariant, constant coefficient differential operator on , so . By Lemma 33, any can be written as , where and for all . Applying , we have for all ,
which completes the proof.
Ā
It is worth noting that LemmaĀ 35 is also true when is the character of a one-dimensional, complex representation of and is . In that case, we define as . Similarly, we will have , and for , we get .
Let us specialize the setting to our context: , , and . In LemmaĀ 28, we prove that is a finitely generated -module. Then in LemmaĀ 35, we show that generates as an -module. Thus, any basis of (as a -vector space) can be taken to be the module generators of , and this basis can be taken to be homogenous by PropositionĀ 34. Furthermore, LemmaĀ 32 shows that this basis is finite dimensional, which agrees with LemmaĀ 28. Thus, the dimension of the -vector space gives us an upper bound on the minimum number of module generators for as an -module. In fact, as the following lemma shows, it is exactly equal to the minimum number of module generators.
Lemma 36
Let , and . Then the minimum number of generators needed to generate as a module over is .
ProofĀ If is a minimum sized set of generators for as an -module, then by the minimality assumption, they must be linearly independent. Let be their linear span. We will first show that . Suppose . Then , where for every . We write each of the in terms of its homogeneous components as . Hence, we get ; in the expression, the first term is in , and the second term is in , because for all , and by LemmaĀ 29(c). One can easily verify that . Thus,
(30) |
We next claim that
(31) |
We prove this claim at the end, but notice that this claim implies the result. This is true because from (30) and (31), we have ; however, also by the minimality assumption on , we get from the comments immediately preceding this lemma, that .
It remains to prove the claim (31). We first note that is a subspace of . Let be the subspace of orthogonal to with respect to the inner product , so that we have the direct sum decomposition . We will show that . Let us make two important observations:
-
(i)
For any , let us denote as the constant coefficient differential operator obtained by replacing each in by for all , . Then for all , we have for all (cf. Wallach, 2017, LemmaĀ 3.105).
-
(ii)
By LemmaĀ 13, any and , .
Now, for all and , using observations (i) and (ii); hence, , or equivalently . This shows . For the converse, assume that , and we want to show . Take any , i.e. it can be expressed as a finite sum , where and for every . By observation (i),
where the last equality follows because implies that is an -invariant, constant coefficient differential operator that annihilates polynomials in . This proves .
Ā
Finally, we establish the main theorem using the results in this appendix:
Theorem 37
Let , , and be the sign representation of . Then , i.e. generates as an -module. The maximum degree of any polynomial in is . The minimum number of module generators of is equal to , as a -vector space, and the module generators can be chosen to be a homogenous polynomial basis of .
References
- Abrahamsen and Lin (2023) Nilin Abrahamsen and Lin Lin. Anti-symmetric barron functions and their approximation with sums of determinants, 2023.
- Adeel etĀ al. (2020) Ahsan Adeel, Mandar Gogate, and Amir Hussain. Contextual deep learning-based audio-visual switching for speech enhancement in real-world environments. Information Fusion, 59:163ā170, 2020. ISSN 1566-2535. doi: https://doi.org/10.1016/j.inffus.2019.08.008. URL https://www.sciencedirect.com/science/article/pii/S1566253518306018.
- Alzubaidi etĀ al. (2021) Laith Alzubaidi, Jinglan Zhang, AmjadĀ J. Humaidi, Ayad Al-Dujaili, YeĀ Duan, Omran Al-Shamma, J.Ā SantamarĆa, MohammedĀ A. Fadhel, Muthana Al-Amidie, and Laith Farhan. Review of deep learning: concepts, cnn architectures, challenges, applications, future directions. Journal of Big Data, 8(1):53, Mar 2021. ISSN 2196-1115. doi: 10.1186/s40537-021-00444-8. URL https://doi.org/10.1186/s40537-021-00444-8.
- Asif etĀ al. (2021) NurulĀ A. Asif, Yeahia Sarker, RiponĀ K. Chakrabortty, MichaelĀ J. Ryan, Md.Ā Hafiz Ahamed, DipĀ K. Saha, FaisalĀ R. Badal, SajalĀ K. Das, Md.Ā Firoz Ali, SumayaĀ I. Moyeen, Md.Ā Robiul Islam, and Zinat Tasneem. Graph neural network: A comprehensive review on non-euclidean space. IEEE Access, 9:60588ā60606, 2021. doi: 10.1109/ACCESS.2021.3071274.
- Atiyah and Macdonald (1969) M.Ā F. Atiyah and I.Ā G. Macdonald. Introduction to commutative algebra. Addison-Wesley Publishing Co., Reading, Mass.-London-Don Mills, Ont., 1969.
- Barron (1993) A.R. Barron. Universal approximation bounds for superpositions of a sigmoidal function. IEEE Transactions on Information Theory, 39(3):930ā945, 1993. doi: 10.1109/18.256500.
- Bergeron (2009) FranƧois Bergeron. Algebraic combinatorics and coinvariant spaces. CMS Treatises in Mathematics. Canadian Mathematical Society, Ottawa, ON; A K Peters, Ltd., Wellesley, MA, 2009. ISBN 978-1-56881-324-0. doi: 10.1201/b10583. URL https://doi.org/10.1201/b10583.
- Briand (2004) Emmanuel Briand. When is the algebra of multisymmetric polynomials generated by the elementary multisymmetric polynomials? BeitrƤge Algebra Geom., 45(2):353ā368, 2004. ISSN 0138-4821.
- Bronstein etĀ al. (2021) MichaelĀ M Bronstein, Joan Bruna, Taco Cohen, and Petar VeliÄkoviÄ. Geometric deep learning: Grids, groups, graphs, geodesics, and gauges. arXiv preprint arXiv:2104.13478, 2021.
- Cauchy (1905) AugustinĀ Louis Cauchy. Åuvres complĆØtes. Series 2. Volume 1. Cambridge Library Collection. Cambridge University Press, Cambridge, 1905. ISBN 978-1-108-00290-5. Reprint of the 1905 original.
- Cayton (2004) Lawrence Cayton. Algorithms for manifold learning. Technical Report CS2008-0923, UCSD, 2004.
- Chen etĀ al. (2023) Chongyao Chen, Ziang Chen, and Jianfeng Lu. Representation theorem for multivariable totally symmetric functions, 2023.
- Chen and Lu (2023) Ziang Chen and Jianfeng Lu. Exact and efficient representation of totally anti-symmetric functions, 2023.
- Choo etĀ al. (2020) Kenny Choo, Antonio Mezzacapo, and Giuseppe Carleo. Fermionic neural-network states for ab-initio electronic structure. Nature Communications, 11(1), May 2020. ISSN 2041-1723. doi: 10.1038/s41467-020-15724-9. URL http://dx.doi.org/10.1038/s41467-020-15724-9.
- Cybenko (1989) GeorgeĀ V. Cybenko. Approximation by superpositions of a sigmoidal function. Mathematics of Control, Signals and Systems, 2:303ā314, 1989. URL https://api.semanticscholar.org/CorpusID:3958369.
- Deng and Liu (2018) LiĀ Deng and Yang Liu. A Joint Introduction to Natural Language Processing and to Deep Learning. Springer Singapore, Singapore, 2018. ISBN 978-981-10-5209-5. doi: 10.1007/978-981-10-5209-5. URL https://doi.org/10.1007/978-981-10-5209-5.
- Deng etĀ al. (2013) LiĀ Deng, Jinyu Li, Jui-Ting Huang, Kaisheng Yao, Dong Yu, Frank Seide, Michael Seltzer, Geoff Zweig, Xiaodong He, Jason Williams, Yifan Gong, and Alex Acero. Recent advances in deep learning for speech research at microsoft. In 2013 IEEE International Conference on Acoustics, Speech and Signal Processing, pages 8604ā8608, 2013. doi: 10.1109/ICASSP.2013.6639345.
- Entezari etĀ al. (2022) Rahim Entezari, Hanie Sedghi, Olga Saukh, and Behnam Neyshabur. The role of permutation invariance in linear mode connectivity of neural networks, 2022.
- Espi etĀ al. (2015) Miquel Espi, Masakiyo Fujimoto, Keisuke Kinoshita, and Tomohiro Nakatani. Exploiting spectro-temporal locality in deep learning based acoustic event detection. EURASIP Journal on Audio, Speech, and Music Processing, 2015(1):26, Sep 2015. ISSN 1687-4722. doi: 10.1186/s13636-015-0069-2. URL https://doi.org/10.1186/s13636-015-0069-2.
- Garbin etĀ al. (2020) Christian Garbin, Xingquan Zhu, and Oge Marques. Dropout vs. batch normalization: an empirical study of their impact to deep learning. Multimedia Tools and Applications, 79(19):12777ā12815, May 2020. ISSN 1573-7721. doi: 10.1007/s11042-019-08453-9. URL https://doi.org/10.1007/s11042-019-08453-9.
- Garsia and Haiman (1996) A.Ā M. Garsia and M.Ā Haiman. A remarkable -Catalan sequence and -Lagrange inversion. J. Algebraic Combin., 5(3):191ā244, 1996. ISSN 0925-9899,1572-9192. doi: 10.1023/A:1022476211638. URL https://doi.org/10.1023/A:1022476211638.
- Goodfellow etĀ al. (2014) Ian Goodfellow, Yaroslav Bulatov, Julian Ibarz, Sacha Arnoud, and Vinay Shet. Multi-digit number recognition from street view imagery using deep convolutional neural networks. In ICLR2014, 2014.
- Haglund etĀ al. (2004) J.Ā Haglund, M.Ā Haiman, N.Ā Loehr, J.Ā B. Remmel, and A.Ā Ulyanov. A combinatorial formula for the character of the diagonal coinvariants, 2004.
- Haiman (2003) Mark Haiman. Combinatorics, symmetric functions, and Hilbert schemes. In Current developments in mathematics, 2002, pages 39ā111. Int. Press, Somerville, MA, 2003. ISBN 1-57146-102-7.
- Haiman (1994) MarkĀ D. Haiman. Conjectures on the quotient ring by diagonal invariants. Journal of Algebraic Combinatorics, 3:17ā76, 1994. URL https://api.semanticscholar.org/CorpusID:16526954.
- Han etĀ al. (2019) Jiequn Han, Linfeng Zhang, and Weinan E. Solving many-electron schrƶdinger equation using deep neural networks. Journal of Computational Physics, 399:108929, December 2019. ISSN 0021-9991. doi: 10.1016/j.jcp.2019.108929. URL http://dx.doi.org/10.1016/j.jcp.2019.108929.
- Han etĀ al. (2022) Jiequn Han, Yingzhou Li, Lin Lin, Jianfeng Lu, Jiefu Zhang, and Linfeng Zhang. Universal approximation of symmetric and anti-symmetric functions, 2022.
- Hermann etĀ al. (2020) Jan Hermann, Zeno SchƤtzle, and Frank NoĆ©. Deep-neural-network solution of the electronic schrƶdinger equation. Nature Chemistry, 12(10):891ā897, September 2020. ISSN 1755-4349. doi: 10.1038/s41557-020-0544-y. URL http://dx.doi.org/10.1038/s41557-020-0544-y.
- Hornik (1991) Kurt Hornik. Approximation capabilities of multilayer feedforward networks. Neural Networks, 4(2):251ā257, 1991. ISSN 0893-6080. doi: https://doi.org/10.1016/0893-6080(91)90009-T. URL https://www.sciencedirect.com/science/article/pii/089360809190009T.
- Hutter (2020) Marcus Hutter. On representing (anti)symmetric functions, 2020.
- Jegelka (2022) Stefanie Jegelka. Theory of graph neural networks: Representation and learning, 2022.
- Kane (2001) Richard Kane. Reflection groups and invariant theory, volumeĀ 5 of CMS Books in Mathematics/Ouvrages de MathĆ©matiques de la SMC. Springer-Verlag, New York, 2001. ISBN 0-387-98979-X. doi: 10.1007/978-1-4757-3542-0. URL https://doi.org/10.1007/978-1-4757-3542-0.
- Khurana etĀ al. (2021) Lokesh Khurana, Arun Chauhan, Mohd Naved, and Prabhishek Singh. Speech recognition with deep learning. Journal of Physics: Conference Series, 1854(1):012047, apr 2021. doi: 10.1088/1742-6596/1854/1/012047. URL https://dx.doi.org/10.1088/1742-6596/1854/1/012047.
- Klus etĀ al. (2021) Stefan Klus, Patrick GelĆ, Feliks NĆ¼ske, and Frank NoĆ©. Symmetric and antisymmetric kernels for machine learning problems in quantum physics and chemistry. Machine Learning: Science and Technology, 2(4):045016, August 2021. ISSN 2632-2153. doi: 10.1088/2632-2153/ac14ad. URL http://dx.doi.org/10.1088/2632-2153/ac14ad.
- Krizhevsky etĀ al. (2012) Alex Krizhevsky, Ilya Sutskever, and GeoffreyĀ E Hinton. Imagenet classification with deep convolutional neural networks. In F.Ā Pereira, C.J. Burges, L.Ā Bottou, and K.Q. Weinberger, editors, Advances in Neural Information Processing Systems, volumeĀ 25. Curran Associates, Inc., 2012. URL https://proceedings.neurips.cc/paper_files/paper/2012/file/c399862d3b9d6b76c8436e924a68c45b-Paper.pdf.
- Laird and Saul (1994) P.Ā Laird and R.Ā Saul. Automated feature extraction for supervised learning. In Proceedings of the First IEEE Conference on Evolutionary Computation. IEEE World Congress on Computational Intelligence, pages 674ā679 vol.2, 1994. doi: 10.1109/ICEC.1994.349977.
- Lauriola etĀ al. (2022) Ivano Lauriola, Alberto Lavelli, and Fabio Aiolli. An introduction to deep learning in natural language processing: Models, techniques, and tools. Neurocomputing, 470:443ā456, 2022. ISSN 0925-2312. doi: https://doi.org/10.1016/j.neucom.2021.05.103. URL https://www.sciencedirect.com/science/article/pii/S0925231221010997.
- Li etĀ al. (2022) Zewen Li, Fan Liu, Wenjie Yang, Shouheng Peng, and Jun Zhou. A survey of convolutional neural networks: Analysis, applications, and prospects. IEEE Transactions on Neural Networks and Learning Systems, 33(12):6999ā7019, 2022. doi: 10.1109/TNNLS.2021.3084827.
- Loshchilov and Hutter (2019) Ilya Loshchilov and Frank Hutter. Decoupled weight decay regularization. In International Conference on Learning Representations, 2019. URL https://openreview.net/forum?id=Bkg6RiCqY7.
- Luo and Clark (2019) DiĀ Luo and BryanĀ K. Clark. Backflow transformations via neural networks for quantum many-body wave functions. Physical Review Letters, 122(22), June 2019. ISSN 1079-7114. doi: 10.1103/physrevlett.122.226401. URL http://dx.doi.org/10.1103/PhysRevLett.122.226401.
- Matsumura (1989) Hideyuki Matsumura. Commutative ring theory, volumeĀ 8 of Cambridge Studies in Advanced Mathematics. Cambridge University Press, Cambridge, second edition, 1989. ISBN 0-521-36764-6. Translated from the Japanese by M. Reid.
- Maziarka etĀ al. (2019) Åukasz Maziarka, Marek Åmieja, Aleksandra Nowak, Jacek Tabor, Åukasz Struski, and PrzemysÅaw Spurek. Set Aggregation Network as a Trainable Pooling Layer, page 419ā431. Springer International Publishing, 2019. ISBN 9783030367114. doi: 10.1007/978-3-030-36711-4Ė35. URL http://dx.doi.org/10.1007/978-3-030-36711-4_35.
- Mora and Sala (2003) Teo Mora and Massimiliano Sala. On the grƶbner bases of some symmetric systems and their application to coding theory. Journal of Symbolic Computation, 35(2):177ā194, 2003. ISSN 0747-7171. doi: https://doi.org/10.1016/S0747-7171(02)00131-1. URL https://www.sciencedirect.com/science/article/pii/S0747717102001311.
- Muhammad etĀ al. (2021) Khan Muhammad, Salman Khan, JavierĀ Del Ser, and Victor Hugo C.Ā de Albuquerque. Deep learning for multigrade brain tumor classification in smart healthcare systems: A prospective survey. IEEE Transactions on Neural Networks and Learning Systems, 32(2):507ā522, 2021. doi: 10.1109/TNNLS.2020.2995800.
- Murphy etĀ al. (2019) RyanĀ L. Murphy, Balasubramaniam Srinivasan, Vinayak Rao, and Bruno Ribeiro. Janossy pooling: Learning deep permutation-invariant functions for variable-size inputs, 2019.
- Nachbin (1949) Leopoldo Nachbin. Sur les algebres denses de fonctions diffĆ©rentiables sur une variĆ©tĆ©. Comptes Rendus de lāAcadĆ©mie des Sciences de Paris, 228:1549ā1551, 1949.
- Otter etĀ al. (2021) DanielĀ W. Otter, JulianĀ R. Medina, and JugalĀ K. Kalita. A survey of the usages of deep learning for natural language processing. IEEE Transactions on Neural Networks and Learning Systems, 32(2):604ā624, 2021. doi: 10.1109/TNNLS.2020.2979670.
- Pfau etĀ al. (2020) David Pfau, JamesĀ S. Spencer, Alexander G. D.Ā G. Matthews, and W.Ā M.Ā C. Foulkes. Ab initio solution of the many-electron schrƶdinger equation with deep neural networks. Physical Review Research, 2(3), September 2020. ISSN 2643-1564. doi: 10.1103/physrevresearch.2.033429. URL http://dx.doi.org/10.1103/PhysRevResearch.2.033429.
- Prolla and Guerreiro (1976) JoĆ£oĀ B. Prolla and ClaudiaĀ S. Guerreiro. An extension of nachbinās theorem to differentiable functions on banach spaces with the approximation property. Arkiv fƶr Matematik, 14:251ā258, 1976. URL https://api.semanticscholar.org/CorpusID:120704067.
- Qi etĀ al. (2017a) CharlesĀ R. Qi, Hao Su, Kaichun Mo, and LeonidasĀ J. Guibas. Pointnet: Deep learning on point sets for 3d classification and segmentation, 2017a.
- Qi etĀ al. (2017b) CharlesĀ R. Qi, LiĀ Yi, Hao Su, and LeonidasĀ J. Guibas. Pointnet++: Deep hierarchical feature learning on point sets in a metric space, 2017b.
- Ramos-PĆ©rez etĀ al. (2021) Eduardo Ramos-PĆ©rez, PabloĀ J. Alonso-GonzĆ”lez, and JosĆ©Ā Javier NĆŗƱez-VelĆ”zquez. Multi-transformer: A new neural network-based architecture for forecasting s&p volatility. Mathematics, 9(15), 2021. ISSN 2227-7390. doi: 10.3390/math9151794. URL https://www.mdpi.com/2227-7390/9/15/1794.
- Sagan (2001) BruceĀ E. Sagan. The symmetric group, volume 203 of Graduate Texts in Mathematics. Springer-Verlag, New York, second edition, 2001. ISBN 0-387-95067-2. doi: 10.1007/978-1-4757-6804-6. URL https://doi.org/10.1007/978-1-4757-6804-6. Representations, combinatorial algorithms, and symmetric functions.
- Shao etĀ al. (2019) Taihua Shao, Yupu Guo, Honghui Chen, and Zepeng Hao. Transformer-based neural network for answer selection in question answering. IEEE Access, 7:26146ā26156, 2019. doi: 10.1109/ACCESS.2019.2900753.
- Silver etĀ al. (2017) David Silver, Julian Schrittwieser, Karen Simonyan, Ioannis Antonoglou, Aja Huang, Arthur Guez, Thomas Hubert, Lucas Baker, Matthew Lai, Adrian Bolton, Yutian Chen, Timothy Lillicrap, Fan Hui, Laurent Sifre, George vanĀ den Driessche, Thore Graepel, and Demis Hassabis. Mastering the game of go without human knowledge. Nature, 550(7676):354ā359, Oct 2017. ISSN 1476-4687. doi: 10.1038/nature24270. URL https://doi.org/10.1038/nature24270.
- Soelch etĀ al. (2019) Maximilian Soelch, Adnan Akhundov, Patrick vanĀ der Smagt, and Justin Bayer. On deep set learning and the choice of aggregations. In IgorĀ V. Tetko, VÄra KÅÆrkovĆ”, Pavel Karpov, and Fabian Theis, editors, Artificial Neural Networks and Machine Learning ā ICANN 2019: Theoretical Neural Computation, pages 444ā457, Cham, 2019. Springer International Publishing. ISBN 978-3-030-30487-4.
- Soffer etĀ al. (2019) Shelly Soffer, Avi Ben-Cohen, Orit Shimon, MichalĀ Marianne Amitai, Hayit Greenspan, and Eyal Klang. Convolutional neural networks for radiologic images: A radiologistās guide. Radiology, 290(3):590ā606, 2019. doi: 10.1148/radiol.2018180547. URL https://doi.org/10.1148/radiol.2018180547. PMID: 30694159.
- Stokes etĀ al. (2020) James Stokes, JavierĀ Robledo Moreno, EftychiosĀ A. Pnevmatikakis, and Giuseppe Carleo. Phases of two-dimensional spinless lattice fermions with first-quantized deep neural-network quantum states. Physical Review B, 102(20), November 2020. ISSN 2469-9969. doi: 10.1103/physrevb.102.205122. URL http://dx.doi.org/10.1103/PhysRevB.102.205122.
- Tian etĀ al. (2020) Haiman Tian, Shu-Ching Chen, and Mei-Ling Shyu. Evolutionary programming based deep learning feature selection and network construction for visual data classification. Information Systems Frontiers, 22(5):1053ā1066, Oct 2020. ISSN 1572-9419. doi: 10.1007/s10796-020-10023-6. URL https://doi.org/10.1007/s10796-020-10023-6.
- Tjandra etĀ al. (2017) Andros Tjandra, Sakriani Sakti, and Satoshi Nakamura. Listening while speaking: Speech chain by deep learning. In 2017 IEEE Automatic Speech Recognition and Understanding Workshop (ASRU), pages 301ā308, 2017. doi: 10.1109/ASRU.2017.8268950.
- Wagstaff etĀ al. (2021) Edward Wagstaff, FabianĀ B. Fuchs, Martin Engelcke, MichaelĀ A. Osborne, and Ingmar Posner. Universal approximation of functions on sets, 2021.
- Wallach (2021) Nolan Wallach. The representation of GL(k) on the alternants of minimal degree for the diagonal action of Sn on k copies of the permutation representation. 09 2021. URL https://www.researchgate.net/publication/354389796_The_representation_of_GLk_on_the_alternants_of_minimal_degree_for_the_diagonal_action_of_S_n_on_k_copies_of_the_the_permutation_representation.
- Wallach (2017) NolanĀ R. Wallach. Geometric invariant theory. Universitext. Springer, Cham, 2017. ISBN 978-3-319-65905-3; 978-3-319-65907-7. doi: 10.1007/978-3-319-65907-7. URL https://doi.org/10.1007/978-3-319-65907-7. Over the real and complex numbers.
- Wan etĀ al. (2021) Liangtian Wan, Yuchen Sun, LuĀ Sun, Zhaolong Ning, and Joel J. P.Ā C. Rodrigues. Deep learning based autonomous vehicle super resolution doa estimation for safety driving. IEEE Transactions on Intelligent Transportation Systems, 22(7):4301ā4315, 2021. doi: 10.1109/TITS.2020.3009223.
- Xie etĀ al. (2022) Zeke Xie, Issei Sato, and Masashi Sugiyama. Understanding and scheduling weight decay, 2022. URL https://openreview.net/forum?id=J7V_4aauV6B.
- Xu etĀ al. (2019) Keyulu Xu, Weihua Hu, Jure Leskovec, and Stefanie Jegelka. How powerful are graph neural networks?, 2019.
- Zaheer etĀ al. (2018) Manzil Zaheer, Satwik Kottur, Siamak Ravanbakhsh, Barnabas Poczos, Ruslan Salakhutdinov, and Alexander Smola. Deep sets, 2018.
- Zeleznik etĀ al. (2021) Roman Zeleznik, Borek Foldyna, Parastou Eslami, Jakob Weiss, Ivanov Alexander, Jana Taron, Chintan Parmar, RazaĀ M. Alvi, Dahlia Banerji, Mio Uno, Yasuka Kikuchi, Julia Karady, Lili Zhang, Jan-Erik Scholtz, Thomas Mayrhofer, Asya Lyass, TaylorĀ F. Mahoney, JosephĀ M. Massaro, RamachandranĀ S. Vasan, PamelaĀ S. Douglas, Udo Hoffmann, MichaelĀ T. Lu, and Hugo J. W.Ā L. Aerts. Deep convolutional neural networks to predict cardiovascular risk from computed tomography. Nature Communications, 12(1):715, Jan 2021. ISSN 2041-1723. doi: 10.1038/s41467-021-20966-2. URL https://doi.org/10.1038/s41467-021-20966-2.
- Zhang and Zhang (2021) Jie-Fang Zhang and Zhengya Zhang. Point-x: A spatial-locality-aware architecture for energy-efficient graph-based point-cloud deep learning. In MICRO-54: 54th Annual IEEE/ACM International Symposium on Microarchitecture, MICRO ā21, page 1078ā1090, New York, NY, USA, 2021. Association for Computing Machinery. ISBN 9781450385572. doi: 10.1145/3466752.3480081. URL https://doi.org/10.1145/3466752.3480081.
- Zhang etĀ al. (2018a) Linfeng Zhang, Jiequn Han, Han Wang, Roberto Car, and Weinan E. Deep potential molecular dynamics: A scalable model with the accuracy of quantum mechanics. Physical Review Letters, 120(14), April 2018a. ISSN 1079-7114. doi: 10.1103/physrevlett.120.143001. URL http://dx.doi.org/10.1103/PhysRevLett.120.143001.
- Zhang etĀ al. (2018b) Linfeng Zhang, Jiequn Han, Han Wang, WissamĀ A. Saidi, Roberto Car, and Weinan E. End-to-end symmetry preserving inter-atomic potential energy model for finite and extended systems, 2018b.
- Zhu etĀ al. (2022) Zijiang Zhu, Zhenlong Hu, Weihuang Dai, Hang Chen, and Zhihan Lv. Deep learning for autonomous vehicle and pedestrian interaction safety. Safety Science, 145:105479, 2022. ISSN 0925-7535. doi: https://doi.org/10.1016/j.ssci.2021.105479. URL https://www.sciencedirect.com/science/article/pii/S0925753521003222.
- Zweig and Bruna (2023) Aaron Zweig and Joan Bruna. Towards antisymmetric neural ansatz separation, 2023.