Uniform $\mathcal{C}^{k}$ Approximation of $G$ -Invariant and Antisymmetric Functions, Embedding Dimensions, and Polynomial Representations

\nameSoumya Ganguly \email[email protected]
\nameKhoa Tran \email[email protected]
\addrDepartment of Mathematics
University of California San Diego,
San Diego, CA 92093, USA \AND\nameRahul Sarkar \email[email protected]
\addrInstitute for Computational & Mathematical Engineering
Stanford University,
Stanford, CA 94306, USA

Abstract

For any subgroup $G$ of the symmetric group $\mathcal{S}_{n}$ on $n$ symbols, we present results for the uniform $\mathcal{C}^{k}$ approximation of $G$ -invariant functions by $G$ -invariant polynomials. For the case of totally symmetric functions ( $G=\mathcal{S}_{n}$ ), we show that this gives rise to the sum-decomposition Deep Sets ansatz of Zaheer et al. (2018), where both the inner and outer functions can be chosen to be smooth, and moreover, the inner function can be chosen to be independent of the target function being approximated. In particular, we show that the embedding dimension required is independent of the regularity of the target function, the accuracy of the desired approximation, as well as $k$ . Next, we show that a similar procedure allows us to obtain a uniform $\mathcal{C}^{k}$ approximation of antisymmetric functions as a sum of $K$ terms, where each term is a product of a smooth totally symmetric function and a smooth antisymmetric homogeneous polynomial of degree at most $\binom{n}{2}$ . We also provide upper and lower bounds on $K$ and show that $K$ is independent of the regularity of the target function, the desired approximation accuracy, and $k$ .

Keywords: Universal $\mathcal{C}^{k}$ approximation, Embedding dimension, Polynomial representations, Symmetric and antisymmetric functions.

1 Introduction

Currently, deep learning is widely used with great success in many applications, which have been organized into five broad categories: classification, localization, detection, segmentation, and registration (Alzubaidi et al., 2021). Subsets of the applications are used for image processing (Krizhevsky et al., 2012; Tian et al., 2020), audio and speech recognition (Deng et al., 2013; Tjandra et al., 2017; Khurana et al., 2021), natural language processing (Deng and Liu, 2018; Otter et al., 2021; Lauriola et al., 2022), autonomous vehicles (Wan et al., 2021; Zhu et al., 2022), and many others (Goodfellow et al., 2014; Silver et al., 2017; Soffer et al., 2019; Adeel et al., 2020; Muhammad et al., 2021; Zeleznik et al., 2021). The common approach to most of these designs is building large neural networks (NNs) with deep layers; unfortunately, this often leads to one of the biggest challenges in deep learning in terms of computations, known as the curse of dimensionality. With large designs, the dimension of the data parameters increases; this causes an exponential increase in the number of necessary data samples that the model needs to properly learn the dataset. As a consequence, computational complexities also increase exponentially.

Then, how does deep learning tackle the curse of dimensionality? There are many developing theories and methods that mitigate the problems of the curse, but the means to overcome the curse of dimensionality remains an open problem; there is also an assumption that the problem cannot be eliminated entirely due to the nature of neural networks, which is ubiquitous in deep learning. The current theories that suggest a large moderation of the curse include automated feature extractions (Laird and Saul, 1994) and the manifold hypothesis (Cayton, 2004). Regularization methodologies such as dropout, batch normalization, and weight decay (Garbin et al., 2020; Loshchilov and Hutter, 2019; Xie et al., 2022), which serve to avert overfitting, can indirectly alleviate the curse by allowing the model to prevent learning noises in data. Another approach is exploiting the structure of the model function that can exhibit locality (Espi et al., 2015; Zhang and Zhang, 2021) and/or symmetries (Zaheer et al., 2018; Qi et al., 2017a, b). The latter will be the main focus of our work; we propose theories that are relevant and favorable to deep learning problems with symmetries, specifically those involving invariance and antisymmetry. In fact, there has been a recent conjecture that under certain circumstances, permutation invariant NNs will have no barrier in linear interpolation of stochastic gradient descent solutions (Entezari et al., 2022).

In this paper, we study $\mathcal{C}^{k}$ approximations and polynomial representations of $G$ -invariant functions (for some [Lie] group $G$ ) and antisymmetric functions; note that $G$ -invariant functions are symmetric (or permutation invariant) when $G$ is the symmetric group on the domain of the function. Intuitively, these symmetries allow the neural networks to learn from the inputs regardless of transformations by the action of $G$ for $G$ -invariance or of transformations up to a signature for antisymmetry. There are two main advantages that come from our theories: First, the $\mathcal{C}^{k}$ approximations give high accuracy, and they are useful for applications that require higher-order derivatives, e.g. solving many-electron Schrödinger equations (Han et al., 2019; Pfau et al., 2020; Choo et al., 2020). Second, the polynomial representations are prevalent in STEM and bypass additional costs to have polynomial representations when the learning model only trains for some (continuous) functions. In fact, our work is a specific contribution to the long standing result called Universal Approximation Theorem, which informally means that any function can be approximated using neural networks. The result was originally proven by Cybenko (1989) for sigmoidal activation functions; Hornik (1991) later proved this result for any-nonlinear activation functions, which has a similar proof to Barron’s Theorem on approximations of nonlinear, continuous functions. Due to this theorem, it is always sufficient to approximately represent $G$ -invariant and antisymmetric functions instead of using exact representations.

Furthermore, we specifically study approximations of $G$ -invariant, symmetric, and antisymmetric functions because of their ubiquity in science and technology: Neural networks that are inherently invariant under an action $G$ include $n$ -dimensional Convolutional Neural Networks (CNN) under translation, Spherical CNN under rotation, intrinsic CNN under the isometric group on the domains, and so on (Bronstein et al., 2021). These architectures are currently used in signal identification, object detection, image classification and segmentation, face recognition, etc (Li et al., 2022). Similarly, some architectures with permutation invariance are Graph Neural Networks (GNN), Deep Sets (Zaheer et al., 2018), and Transformer: Some applications of GNN are traffic forecasting, molecular optimization, rumor detection, node and graph classification, and many others (Asif et al., 2021); of Deep Sets include point clouds prediction and bounding boxes (Soelch et al., 2019); of transformer comprise of answer selection (Shao et al., 2019) and stock volatility (Ramos-Pérez et al., 2021). Lastly, neural networks can also learn solutions of systems that exhibit antisymmetry; in the physical sciences, there are many interests to find approximations of solutions to systems of many-fermion or many-Boson (Choo et al., 2020; Han et al., 2019; Hermann et al., 2020; Klus et al., 2021; Luo and Clark, 2019; Pfau et al., 2020; Stokes et al., 2020). In particular, Pfau et al. (2020) developed FermiNet as an ansatz on top of the variational Monte Carlo (VMC) model to approximate the solutions for many-electron systems; the ansatz is based on the notion of generalized Slater determinants (Hutter, 2020). The method gave large improvements on the VMC model for many atoms and small molecules; this essentially opens the door to solve previously intractable many-electron systems.

Finally, this study is made on the foundational work of Zaheer et al. (2018), who designed models with machine learning tasks defined on sets; the work was done on permutation invariant sets and equivariant tasks. Their work demonstrates great qualitative and quantitative results from experiments on statistic estimation, point cloud classification, set expansion, and outlier detection. Afterwards, many studies follow their work including PointNet (Qi et al., 2017a, b), Deep Potential (Zhang et al., 2018a, b), Set Aggregation Networks (Maziarka et al., 2019), and so on. To build the context of our work in relation to Deep Sets, the technical and historical aspects are reserved in Section 1.1.

1.1 Background

The main theorems of this paper concern the $\mathcal{C}^{k}$ approximations and representations of $G$ -invariant functions, totally symmetric functions, and $n$ -antisymmetric functions. Let us define them here. First for convenience, call $d$ the space dimension and $n$ the multivariate dimension, and let $[n]=\{1,2,\ldots,n\}$ . By convention, $\mathbb{N}=\{1,2,\ldots\}$ , so we write non-negative numbers as $\mathbb{Z}_{\geq 0}=\mathbb{N}\cup\{0\}$ . Throughout this study, we assume $n\geq 2$ and $d\geq 1$ unless stated otherwise. We also denote points in $\mathbb{R}^{d}$ using boldface letters such as $\bm{x}_{i}$ , and its components as $(x_{i1},x_{i2},\ldots,x_{id})$ .

Definition 1 ( $G$ -invariant function)

Let $\Omega\subset\mathbb{R}^{d}$ and $G$ be a group which acts on $[n]$ . A function $f:\Omega^{n}\to\mathbb{R}$ is $G$ -invariant if

f(\sigma.\bm{x})\coloneqq f(\bm{x}_{\sigma(1)},\bm{x}_{\sigma(2)},\ldots,\bm{x% }_{\sigma(n)})=f(\bm{x}_{1},\bm{x}_{2},\ldots,\bm{x}_{n})\coloneqq f(\bm{x}),

(1)

for all $\sigma\in G$ and $\{\bm{x}_{i}\}_{i=1}^{n}\subset\Omega$ .

Definition 2 (Totally symmetric function)

A $G$ -invariant function $f$ is called totally symmetric when $G=\mathcal{S}_{n}$ , the symmetric group.

Definition 3 ( $n$ -antisymmetric function)

Let $\Omega\subset\mathbb{R}^{d}$ . A function $f:\Omega^{n}\to\mathbb{R}$ is called $n$ -antisymmetric if

f(\sigma.\bm{x})=\text{sgn}(\sigma)f(\bm{x}),

(2)

for all $\sigma\in\mathcal{S}_{n}$ and $\{\bm{x}_{i}\}_{i=1}^{n}\subset\Omega$ , where $\text{sgn}(\sigma)$ is the sign of the permutation $\sigma$ .

As mentioned in the last section, representation of totally symmetric, $G$ -invariant or $n$ -antisymmetric functions using simpler such functions can significantly reduce the curse of dimensionality. There are results which represent these functions exactly and there are also many approximate representation theorems (some mentioned below). From a mathematical viewpoint, finding an exact representation is solicited but because of the intrinsic approximating nature of Neural networks (Universal approximation theorem), an approximate model works just as well for all practical purposes. In fact many times, keeping room for approximation allows us to get a much simpler representation- an example of which can be found in this current work. As mentioned before, this line of research boomed due to a viable architecture of getting exact representation of continuous, totally symmetric functions, which was proposed in the paper Deep Sets by Zaheer et al. (2018). In this work, symmetric functions were interpreted as functions on sets because the order of the elements does not matter in a set. The concept of set-valued functions from this last work was extended to functions of multisets in Xu et al. (2019). The main idea behind Deep sets is to process individual set elements in parallel using a shared encoding function and then combine them using a symmetric ‘pooling’ function such as summation, average, or max-pooling. This idea behind deep sets was generalized considerably by the work on $k$ -ary Janossy pooling by Murphy et al. (2019). A very rigorous theoretical understanding of the latent dimensions of Deep Sets architecture and Janossy pooling paradigm was indicated in Wagstaff et al. (2021). The result of Zaheer et al. (2018) was further corroborated by Chen et al. (2023) recently where they proved that any totally symmetric continuous function can be expressed as a composition of two continuous functions. To understand it better, we should introduce some terminologies in the following (Jegelka, 2022):

Definition 4 (Inner, Outer Functions, Embedding dimension)

If a real valued, totally symmetric function evaluated at a set $S$ can be expressed as $f(S)=\rho(\sum_{s\in S}\phi(s))$ where $\phi$ is independent of the $f$ , then $\rho:\mathbb{R}^{d_{1}}\to\mathbb{R}$ is called the outer function and $\phi$ mapping to $\mathbb{R}^{d_{1}}$ , is called the inner function. The dimension of the range of the inner function ( $d_{1}$ here) is called embedding dimension.

The Deep sets ansatz states that a continuous totally symmetric function $f:\Omega^{n}\to\mathbb{R}$ for a compact $\Omega\in\mathbb{R}^{d}$ , can be expressed as a composition of continuous inner and outer functions in the form $f(\bm{x})=\rho(\sum_{i=1}^{n}\phi(\bm{x}_{i}))$ . For $d=1$ the proof of this ansatz can be found in Zaheer et al. (2018) and for $d>1$ it has been shown in Chen et al. (2023) that any continuous totally symmetric function $f$ can be written as $g\circ\bm{\eta}$ where $\bm{\eta}=(\eta_{1},\ldots,\eta_{M})$ is the collection of all generators of totally symmetric polynomials. Hence number of such generators is the embedding dimension here i.e. $M$ . More about this can be found in Section 3.1.

There were not many results available about how to represent $n$ -antisymmetric functions effectively, especially when $d>1$ . There are some broad schemes of doing it, named as Backflow, Jastrow and Slater determinant ansatz (Zweig and Bruna, 2023). We now describe them briefly in the following. For $f,g$ complex valued functions from $\Omega\subset\mathbb{R}^{n}$ we define $(f\otimes g):\Omega^{2}\to\mathbb{C}$ as $(f\otimes g)(x,y)=f(x)g(y)$ . With this notation one can define the antisymmetric projection of tensor product of functions as

\displaystyle\mathcal{A}(\phi_{1}\otimes\ldots\otimes\phi_{n})=\frac{1}{n!}% \sum_{\sigma\in\mathcal{S}_{n}}(-1)^{\sigma}\phi_{\sigma(1)}\otimes\ldots% \otimes\phi_{\sigma(n)}.

Up to some rescaling these projections are called the Slater determinant of the functions $\phi_{1},\ldots\phi_{n}$ . The functional form of Backflow ansatz (with a single term) is

\displaystyle p(\bm{x})\mathcal{A}(g_{1}\otimes\ldots\otimes g_{n})(\bm{x}),

where $p$ is a totally symmetric function. Similarly the functional form of Jastrow ansatz (with a single term) can be written as

\displaystyle\mathcal{A}(g_{1}\otimes\ldots\otimes g_{n})(\Phi(\bm{x})),

where $\Phi(\bm{x}):\mathbb{R}^{n}\to\mathbb{R}^{n}$ is an equivariant function i.e. $\Phi(\sigma.\bm{x})=\sigma.\Phi(\bm{x})$ (by . we mean group action here). In the end the functional form of Slater determinant ansatz with $L$ terms is

\displaystyle\sum_{\ell=1}^{L}\mathcal{A}(g_{1}^{\ell}\otimes\ldots\otimes g_{% n}^{\ell})(\bm{x}).

It can be seen in Zweig and Bruna (2023) that the Jastrow ansatz is special case of backflow ansatz and the Slater determinant ansatz is a special case of Jastrow. Some very interesting theoretical work around these ansatzs involving Slater determinants can be found in Hutter (2020), Abrahamsen and Lin (2023).

In a very recent work the problem of exact representation of continuous $n$ -antisymmetric functions has been solved in Chen and Lu (2023). However we provide an example here that shows that their antisymmetric representation theorem cannot be directly used to give exact representations for $\mathcal{C}^{1}$ or in general $\mathcal{C}^{k}$ functions i.e. such a function $f$ cannot be written as $f=g\circ\bm{\eta}$ where $g$ has same regularity as $f$ . Similar example in case of totally symmetric functions has been constructed in Chen et al. (2023, page 7) where the function $f$ is $\mathcal{C}^{1}$ yet $g$ cannot be so at a particular point. Before we present the counterexample, let us recall their recent work. According to Chen and Lu (2023), given $d\geq 1$ and $n\geq 1$ , a function $\bm{\eta}=(\eta_{1},\eta_{2},\ldots,\eta_{m}):(\mathbb{R}^{d})^{n}\to\mathbb{R% }^{m}$ satisfies assumption A if

(i)

$\eta_{k}:(\mathbb{R}^{d})^{n}\to\mathbb{R}$ is $n$ -antisymmetric and continuous for each $k\in[n]$ ,
(ii)

$\bm{\eta}(\bm{x}_{1},\ldots,\bm{x}_{n})=\bm{0}$ if and only if $\bm{x}_{i}=\bm{x}_{j}$ for some $i,j\in[n]$ with $i\neq j$ ,
(iii)

If $\bm{\eta}(\bm{x}_{1},\ldots,\bm{x}_{n})=\bm{\eta}(\bm{x}_{1}^{\prime},\ldots,% \bm{x}_{n}^{\prime})\neq\bm{0}$ then there exists a permutation $\sigma\in\mathcal{S}_{n}$ such that $(\bm{x}_{1}^{\prime},\ldots,\bm{x}_{n}^{\prime})=(\bm{x}_{\sigma(1)},\ldots,% \bm{x}_{\sigma(n)})$ .

Then the following theorem was proved in their paper:

Theorem 5

Given $d,n\geq 1$ and $\Omega\in\mathbb{R}^{d}$ compact set, if $\bm{\eta}:\Omega^{n}\to\mathbb{R}^{m}$ satisfy assumption A then for any $n$ -antisymmetric, continuous function $f:\Omega^{n}\to\mathbb{R}$ , there exists a unique $g:\bm{\eta}(\Omega^{n})\to\mathbb{R}$ that is continuous and odd, satisfying

\displaystyle f(\bm{x}_{1},\ldots,\bm{x}_{n})=g(\bm{\eta}(\bm{x}_{1},\ldots,% \bm{x}_{n})),\ \text{for all}\ (\bm{x}_{1},\ldots,\bm{x}_{n})\in\Omega^{n},

where $\bm{\eta}(\Omega^{n})$ is equipped with the topology induced from $\mathbb{R}^{m}$ .

To construct a counterexample of this theorem in the $\mathcal{C}^{1}$ category, let us take $n=2$ and $d=1$ . First we construct $\bm{\eta}=(\eta_{1},\ldots,\eta_{m})$ where $\eta_{i}:\mathbb{R}^{2}\to\mathbb{R}$ satisfy assumption A above. Let us take $\bm{\eta}=(\eta_{1},\ldots,\eta_{m})$ for $m\geq 4$ defined by $\eta_{i}(x_{1},x_{2})=(x_{1}^{3}-x_{2}^{3})(x_{1}^{i-1}+x_{2}^{i-1})$ , for $i\in[m]$ . Then clearly $\eta_{i}$ s are antisymmetric under the action of $\mathcal{S}_{2}$ , continuous and if $x_{1}=x_{2}$ then $\bm{\eta}(x_{1},x_{2})=\bm{0}$ . Conversely, if $\bm{\eta}(x_{1},x_{2})=\bm{0}$ then it implies $x_{1}^{3}-x_{2}^{3}=0$ i.e. $x_{1}=x_{2}$ . Also if $\bm{\eta}(x_{1},x_{2})=\bm{\eta}(y_{1},y_{2})\neq\bm{0}$ then from above we can note that $x_{1}\neq x_{2}$ and $y_{1}\neq y_{2}$ . Then using the fact $m\geq 4$ and writing $\bm{\eta}$ explicitely, we get $x_{1}^{3}-x_{2}^{3}=y_{1}^{3}-y_{2}^{3}\neq 0$ and $(x_{1}^{3}-x_{2}^{3})(x_{1}^{3}+x_{2}^{3})=(y_{1}^{3}-y_{2}^{3})(y_{1}^{3}+y_{% 2}^{3})$ . This implies $x_{1}^{3}-x_{2}^{3}=y_{1}^{3}-y_{2}^{3}\neq 0$ and $x_{1}^{3}+x_{2}^{3}=y_{1}^{3}+y_{2}^{3}$ or in other words $x_{1}=y_{1}$ and $x_{2}=y_{2}$ . Now let us take $f:\mathbb{R}^{2}\to\mathbb{R}$ as $f(x_{1},x_{2})=x_{1}^{4/3}-x_{2}^{4/3}$ . We see $f$ is continuous function that is antisymmetric and $\mathcal{C}^{1}$ at the origin. Then according to Theorem 5 above there exists $g:\bm{\eta}(\mathbb{R}^{2})\to\mathbb{R}$ that is continuous and odd and satisfies $f(\bm{x})=g(\bm{\eta}(\bm{x}))$ for all $\bm{x}\in\mathbb{R}^{2}$ . However, now we show that this function $g$ cannot be $C^{1}$ (or even differentiable) at $\bm{\eta}(\bm{0})=\bm{0}$ . Let us take $(x_{1},x_{2})=(\epsilon,0)=\bm{x}(\epsilon)$ . We investigate the differentiability of $g$ at $\bm{0}$ along $\bm{\eta}(\bm{x}(\epsilon))=(\epsilon^{3},\epsilon^{4}\ldots,\epsilon^{m+2})$ as $\epsilon\to 0$ . We note $g(\bm{0})=0$ as $g$ is odd. If we have $g$ is differentiable at $\bm{0}$ with derivative of $g$ being $D_{g}(\bm{0})$ at $\bm{0}$ , we must have

\displaystyle\lim_{\epsilon\to 0}\frac{||g(\bm{\eta}(\bm{x}(\epsilon)))-g(\bm{% 0})-D_{g}(\bm{0})\cdot\bm{\eta}(\bm{x}(\epsilon))||}{||\bm{\eta}(\bm{x}(% \epsilon))||}=0.

(3)

But $g(\bm{\eta}(\bm{x}(\epsilon)))=f(\bm{x}(\epsilon))=\epsilon^{4/3}$ by definition which gives us for small $\epsilon$ , $||g(\bm{\eta}(\bm{x}(\epsilon)))-g(\bm{0})-D_{g}(\bm{0})\cdot\bm{\eta}(\bm{x}(% \epsilon))||=||f(\bm{x}(\epsilon))-D_{g}(\bm{0})\cdot\bm{\eta}(\bm{x}(\epsilon% ))||=O(\epsilon^{4/3})$ whereas $||\bm{\eta}(\bm{x}(\epsilon))||=O(\epsilon^{3})$ showing that the limit in equation (3) does not exist.

One should note that the exact representation theorems by Chen et al. above do not strictly fall under the Backflow ansatz mentioned earlier. Similarly Han et al. (2022) modified some results in the Deep Sets paper and showed that arbitrary uniform approximation of symmetric or antisymmetric functions defined on a compact subset of some Euclidean space is possible but the latent number of variables (embedding dimension) is dependent on the gradient of the function being approximated, and the order of approximation along with number $n$ and the dimension $d$ , of the input variables. We state their exact results in the following:

Theorem 6

Let $f:\Omega^{n}\to\mathbb{R}$ be a continuously differentiable, totally symmetric functions, where $\Omega\subset\mathbb{R}^{d}$ is compact. If $0<\epsilon<||\nabla f||_{2}\sqrt{nd}(n^{-1/d})$ , then there exists $\phi:\mathbb{R}^{d}\to\mathbb{R}^{M}$ , $g:\mathbb{R}^{M}\to R$ such that for any $\bm{x}=(\bm{x}_{1},\ldots,\bm{x}_{n})\in\Omega^{n}$ , we have

\displaystyle\big{|}f(\bm{x})-g(\sum_{j=1}^{n}\phi(\bm{x}_{j}))\big{|}\leq\epsilon,

where $M\leq 2^{n}(||\nabla f||_{2}^{2}nd)^{nd/2}/(\epsilon^{n}dn!)$ with $||\nabla f||_{2}=\max_{\bm{x}}||\nabla f(\bm{x})||_{2}$ .

Theorem 7

Let $f:\Omega^{n}\to\mathbb{R}$ be a continuously differentiable, $n$ -antisymmetric functions, where $\Omega\subset\mathbb{R}^{d}$ is compact. Then there exists $K$ permutation equivariant mappings $Y^{k}:(\mathbb{R}^{d})^{n}\to\mathbb{R}^{n}$ and permutation invariant functions $U^{k}:(\mathbb{R}^{d})^{n}\to\mathbb{R}$ for $k\in[K]$ such that for $\bm{x}=(\bm{x}_{1},\ldots,\bm{x}_{n})\in\Omega^{n}$ , we have

\displaystyle\big{|}f(\bm{x})-\sum_{k=1}^{K}U^{k}(\bm{x})\prod_{i<j}(y_{i}^{k}% (\bm{x})-y_{j}^{k}(\bm{x}))\big{|}\leq\epsilon,

where $K\leq(||\nabla f||_{2}^{2}nd)^{nd/2}/(\epsilon^{n}dn!)$ and for each $U^{k}$ there exists $g^{k}:\mathbb{R}^{d}\to\mathbb{R}^{m}$ , $\phi^{k}:\mathbb{R}^{m}\to\mathbb{R}$ with $m\leq 2^{n}$ such that for any $\bm{x}=(\bm{x}_{1},\ldots,\bm{x}_{n})\in\Omega^{n}$ ,

\displaystyle U^{k}(\bm{x})=g^{k}(\sum_{j=1}^{n}\phi^{k}(\bm{x})).

1.2 Our contributions

In this paper, we prove results that can be grouped as two broad contributions:

(I)

Let $\Omega\subset\mathbb{R}^{d}$ be compact and $G$ be a compact (Lie) group that acts on $[n]$ . If $f:\Omega^{n}\to\mathbb{R}$ is $G$ -invariant and $\mathcal{C}^{k}$ , then there exists a $G$ -invariant polynomial $P$ that is arbitrarily close to $f$ under $\mathcal{C}^{k}$ -uniform norm on $\Omega^{n}$ . This result remains true for permutation invariant $f$ , i.e. $G=\mathcal{S}_{n}$ , which can be arbitrarily $\mathcal{C}^{k}$ approximated by totally symmetric polynomials. Furthermore, since totally symmetric polynomials are finitely generated by totally symmetric power sums, its exact number of generators (embedding dimension) is $\binom{n+d}{d}$ .
(II)

Let $\Omega\subset\mathbb{R}^{d}$ be compact. If $f:\Omega^{n}\to\mathbb{R}$ is $n$ -antisymmetric and $\mathcal{C}^{k}$ , then there exists an $n$ -antisymmetric polynomial $P$ that is arbitrarily close to $f$ under $\mathcal{C}^{k}$ -uniform norm on $\Omega^{n}$ . Furthermore, since $n$ -antisymmetric polynomials form a finitely generated module over the totally symmetric polynomials, the exact number of module generators are stated in the following cases: When $n\geq 2$ and $d=1$ , the number is 1; when $n=2$ and $d\geq 1$ , the number is $d$ ; when $n>2$ and $d=2$ , the number is the $n$ -th Catalan number $\frac{(2n)!}{(n+1)!n!}$ (Garsia and Haiman, 1996). For the general case, the lower and upper bound for the minimum number of module generators are stated as $\dbinom{\binom{r}{d-1}}{j}$ and $\dbinom{\binom{n}{2}+dn}{dn}$ , repectively, where $n=\binom{r}{d}+j$ for some $0\leq j<\binom{r}{d-1}$ .

Essentially, the representation of the approximation in the totally symmetric case follows the same sum-decomposition of Deep Sets. Both the inner and outer functions can be chosen to be polynomials, and the inner function is chosen independently from the $f$ . In addition, the embedding dimension is also independent of the $\epsilon$ -accuracy of the approximation, the target function $f$ , and the $\mathcal{C}^{k}$ -norm. These independencies are in contrast to Theorem 6 in the work of Han et al. (2022); specifically, both the inner and outer functions can only be made to be continuous, and the inner function $\phi$ is dependent on $f$ . The embedding dimension $M$ is dependent on the $\epsilon$ -accuracy of the approximation, the target function $f$ , and the gradient of $f$ .

For the $n$ -antisymmetric case, the representation of the approximation is a sum of $K$ terms, where each term is a product of a smooth totally symmetric polynomial and a smooth $n$ -antisymmetric, homogeneous polynomial of degree at most $\binom{n}{2}$ , which does not depend on the function $f$ ; in contrary, the functions $\{U^{k}\}_{k=1}^{K}$ in Theorem 7 depends on the target function. Similarly, the number $K$ is also independent of the $\epsilon$ -accuracy of the approximation, the target function $f$ , and the gradient of $f$ , unlike the one presented in Han et al. (2022). The approximation results in these contributions are presented in Section 2, and the results for the representations are shown in Section 3.

1.3 Structure of the paper

This paper is organized as follows: In Section 2, we prove the theorems of uniform, arbitrary $\mathcal{C}^{k}$ approximations of $G$ -invariant functions for some (Lie) group $G$ , symmetric functions, and $n$ -antisymmetric functions by such polynomials. Common notations are introduced in Section 2.1. Section 2.2 contains the necessary results and main proofs for uniform, $\mathcal{C}^{k}$ approximating $G$ -invariant and totally symmetric functions. A similar proof is given for $n$ -antisymmetric functions in Section 2.3.

Section 3 is for the representations of totally symmetric and $n$ -antisymmetric polynomials, which are used for uniform, $\mathcal{C}^{k}$ approximations in Section 2. We discuss the embedding dimension needed for totally symmetric polynomials, which are generated as an $\mathbb{R}$ -algebra following the work of Chen et al. (2023). The $n$ -antisymmetric polynomials are shown to form a finitely generated module over the totally symmetric functions in Section 3.2, and the bounds for the minimal number of generators are also given. In the end, Table 2 summarizes our results, which shows considerable improvement in representation of $n$ -antisymmetric polynomials over existing literature.

Appendix A provides the necessary background on commutative algebra for the readers, which will be used in Appendix B. Lastly, detail expositions on representation theory of general, finite $G$ -invariant polynomials are given in Appendix B along with necessary results on $n$ -antisymmetric polynomials; these are used to infer on the minimal number of generators in Secton 3.2.

2 $\mathcal{C}^{k}$ approximation of $G$ -invariant and $n$ -antisymmetric functions

The approximation theory presented in this section applies to polynomials with coefficients in both $\mathbb{R}$ or $\mathbb{C}$ . We will focus on the proof for real polynomials here. For complex-valued polynomials, the same results are true, and we simply need to consider our results for the real and imaginary parts of the polynomial separately.

2.1 Notation

The $\mathcal{C}^{k}$ approximations for $G$ -invariant functions will be shown for a compact Lie group $G$ , which has an action on $[n]=\{1,2,\ldots,n\}$ ; the results for totally symmetric and $n$ -antisymmetric functions will then actually follow quite closely from the proof of $G$ -invariant functions. Notably, given such a compact Lie group $G$ , $\sigma\in G$ is a bijection on $[n]$ , so the action of $\sigma$ is simply a permutation in $\mathcal{S}_{n}$ . Therefore, one of the essences of proving our theorems is understanding how the permutations behave on the indices of $\bm{x}=(\bm{x}_{1},\ldots,\bm{x}_{n})\in(\mathbb{R}^{d})^{n}$ and their involvement in the differentiation for the $\mathcal{C}^{k}$ approximation. The other essence is to consider an appropriate function space that will eventually guarantee the existence of the $\mathcal{C}^{k}$ approximation, which we will describe next.

The following $\mathbb{R}$ -algebra (see Definition 21) and topology are needed to describe a subalgebra that will be crucial for our initial $\mathcal{C}^{k}$ approximation of these functions. Suppose $U\subset(\mathbb{R}^{d})^{n}$ is an open set. Define $\mathcal{C}^{k}(U;\mathbb{R})$ as the $\mathbb{R}$ -algebra of all $k$ times continuously differentiable real valued functions on $U$ , and it is abbreviated to $\mathcal{C}^{k}(U)$ , as the codomain of these functions will always be $\mathbb{R}$ . This space is endowed with the topology $\tau_{u}^{k}$ , which is the compact-open topology of order $k$ ; in other words, it is the topology of uniform convergence for the functions and all their partial derivatives up to the $k^{\text{th}}$ -order on compact subsets of $U$ . Next, notice that if $V\subset(\mathbb{R}^{d})^{n}$ is compact, then there always exists an open set $U$ such that $V\subset U\subseteq(\mathbb{R}^{d})^{n}$ . Pick any such $U$ , and then we may define the vector space $\mathcal{C}^{k}(V):=\{f|_{V}:f\in\mathcal{C}^{k}(U)\}$ of $k$ -times continuously differentiable functions on $V$ , which also forms an $\mathbb{R}$ -algebra. We will equip this space with the $\mathcal{C}^{k}$ -uniform norm defined as the maxima over continuous derivatives up to order $k$ , restricted to the compact set $V$ :

Definition 8 ( $\mathcal{C}^{k}$ norm)

Let $V\subset(\mathbb{R}^{d})^{n}$ be compact, and suppose $p\in\mathcal{C}^{k}(V)$ . Then we define

\|p\|_{\mathcal{C}^{k}(V)}\coloneqq\sum_{|\alpha|\leq k}\max_{\bm{x}\in V}% \left|(D^{\alpha}p)(\bm{x})\right|,

(4)

where $\alpha\coloneqq(\alpha_{1},\ldots,\alpha_{dn})\in\mathbb{Z}_{\geq 0}^{dn}$ is a $dn$ -dimensional multi-index, $|\alpha|:=\alpha_{1}+\ldots+\alpha_{dn}$ , and

D^{\alpha}p:=\frac{\partial^{\alpha_{1}}}{\partial x_{1}^{\alpha_{1}}}\cdots% \frac{\partial^{\alpha_{dn}}}{\partial x_{dn}^{\alpha_{dn}}}p.

(5)

With the norm $\lVert\cdot\rVert_{\mathcal{C}^{k}(V)}$ , the vector space $\mathcal{C}^{k}(V)$ turns into a Banach space, and is independent of the choice of the open set $U$ .

Let us introduce some notations to describe the permutations and multi-indices that are involved for the general case $d\geq 1$ . Recall that for $d=1$ , $\bm{x}\in\Omega^{n}$ may be written as $\bm{x}=x_{1}\bm{e}_{1}+\ldots+x_{n}\bm{e}_{n}$ using the standard basis. Then for $d\geq 1$ , $\bm{x}\in\Omega^{n}$ may be viewed as a $dn$ -dimensional vector, which can be written similarly. In particular, a basis $\{\bm{e}_{\ell}\}_{\ell=1}^{dn}$ is needed, but for practicality, we introduce

\bm{e}_{i,j}=\bm{e}_{d(i-1)+j},

(6)

where $i\in[n]$ and $j\in[d]$ . Hence for any $\bm{x}=(\bm{x}_{1},\ldots,\bm{x}_{n})\in\Omega^{n}$ , we may write $\bm{x}=\sum_{i=1,j=1}^{n,d}x_{ij}\bm{e}_{i,j}$ . This representation for the general case is relevant below in showing that the action $\sigma.\bm{x}$ can be described as an action on the $i$ -indices of the basis $\{\bm{e}_{i,j}\}_{i=1,j=1}^{n,d}$ , instead of the indices of the vector $(\bm{x}_{1},\ldots,\bm{x}_{n})$ .

2.2 $\mathcal{C}^{k}$ approximation via $G$ -invariant and totally symmetric polynomials

In this section, we are interested in the $\mathcal{C}^{k}$ approximation of totally symmetric functions and, in general, $G$ -invariant functions for some compact (Lie) group $G$ which has an action on $[n]$ . We first attain new observations of permutations on the indices of $\bm{x}\in\Omega^{n}$ and then see the behaviors affected by differentiation. Subsequently, proofs for the $\mathcal{C}^{k}$ approximations are given.

The main theorem of our section is the approximation for $G$ -invariant $f\in\mathcal{C}^{k}$ where $G$ is a compact (Lie) group. This result certainly will cover the case of $G=\mathcal{S}_{n}$ because it is a finite group. We have the main theorem and its corollary:

Theorem 9 ( $\mathcal{C}^{k}$ approximation: $G$ -invariant function)

Let $\Omega\subset\mathbb{R}^{d}$ be compact, and assume $G$ is a compact (Lie) group that acts on $[n]$ . Let $f:\Omega^{n}\to\mathbb{R}$ be $G$ -invariant and $\mathcal{C}^{k}$ . Then for every $\epsilon>0$ , there exists a $G$ -invariant polynomial $P$ such that

\left\|f-P\right\|_{\mathcal{C}^{k}}<\epsilon.

(7)

Corollary 10 ( $\mathcal{C}^{k}$ approximation: totally symmetric function)

Suppose $\Omega\subset\mathbb{R}^{d}$ is compact. Let $f:\Omega^{n}\to\mathbb{R}$ be totally symmetric and $\mathcal{C}^{k}$ . Then for every $\epsilon>0$ there exists a totally symmetric polynomial $P$ such that

\left\|f-P\right\|_{\mathcal{C}^{k}}<\epsilon.

(8)

Note that for any group $G$ acting on $[n]$ , $\Omega^{n}$ is $G$ -invariant, and this is true by constructing $\Omega^{n}$ to be the $n$ -product of $\Omega$ . In fact, the assumptions in the results above (and later results) can be looser. The approximation results are true for some general compact set $K\subset(\mathbb{R}^{d})^{n}$ because $K$ can always be made $G$ -invariant by the following argument: suppose $G$ is a compact (Lie) group that acts on $[n]$ ; then for any $\sigma\in G$ , $\sigma$ is a bijection on $[n]$ , so

K^{\prime}=\bigcup_{\sigma\in G}\sigma(K)=\bigcup_{\sigma\in H\subseteq% \mathcal{S}_{n}}\sigma(K),

for some subgroup $H$ of $\mathcal{S}_{n}$ . Note that $\sigma(K)$ is compact since $\sigma$ is continuous on the compact set $K$ , and since $\left|H\right|\leq\left|\mathcal{S}_{n}\right|=n!$ is finite, $K^{\prime}$ is compact as it is the finite union of compact sets. Now, it is clear that $K^{\prime}$ is $G$ -invariant, so it may replace $\Omega^{n}$ in the assumptions of our results.

Now, our proof actually begins with Nachbin’s characterization (Nachbin, 1949) that the set of polynomials of real coefficients with $dn$ -variables (a subalgebra) is $\tau_{u}^{k}$ -dense in $\mathcal{C}^{k}$ . The next theorem is a consequence of Nachbin’s Theorem.

Theorem 11 (Theorem 2.2, Prolla and Guerreiro (1976))

Suppose $U\subset\mathbb{R}^{m}$ is open and $\mathcal{C}^{k}(U)$ is endowed with the topology $\tau_{u}^{k}$ . If $\mathcal{A}\subset\mathcal{C}^{k}(U)$ be a polynomial algebra, then $\mathcal{A}$ is $\tau_{u}^{k}$ -dense in $\mathcal{C}^{k}(U)$ if and only if the following conditions are satisfied

(a)

For any $x,y\in U$ with $x\neq y$ , there exists $f\in\mathcal{A}$ such that $f(x)\neq f(y)$ .
(b)

For any $x\in U$ , there exists $f\in\mathcal{A}$ such that $f(x)\neq 0$ .
(c)

For any $x\in U$ and $u\in\mathbb{R}^{m}$ with $u\neq 0$ , there exists $f\in\mathcal{A}$ such that $D_{u}f(x)\neq 0$ .

It is easy to see that our subalgebra, the set of polynomials with real coefficient is $\tau_{u}^{k}$ -dense in the set $\mathcal{C}^{k}$ , so Theorem 11 will be integral in our proof later. Now, we look to observe how $\sigma$ acts on $\bm{x}$ and behave in the differentiation for some $\sigma\in\mathcal{S}_{n}$ :

Lemma 12

Suppose $d\geq 1$ and $\bm{x}\in\Omega^{n}$ . Let $\sigma\in\mathcal{S}_{n}$ , then

\sigma.\bm{x}=\sum_{i=1,j=1}^{n,d}x_{ij}\bm{e}_{\sigma^{-1}(i),j}

(9)

Proof It suffices to show this for transpositions in $\mathcal{S}_{n}$ . Without loss of generality, let $\sigma=(1\;2)$ .

	$\displaystyle\sigma.\bm{x}=\sum_{j=1}^{d}x_{\sigma(1)j}\bm{e}_{1,j}+x_{\sigma(% 2)j}\bm{e}_{2,j}+\ldots+x_{\sigma(n)j}\bm{e}_{n,j}=\sum_{j=1}^{d}x_{2j}\bm{e}_% {1,j}+x_{1j}\bm{e}_{2,j}+\ldots+x_{nj}\bm{e}_{n,j}$
	$\displaystyle=\sum_{j=1}^{d}x_{1j}\bm{e}_{2,j}+x_{2j}\bm{e}_{1,j}+\ldots+x_{nj% }\bm{e}_{n,j}=\sum_{j=1}^{d}x_{1j}\bm{e}_{\sigma^{-1}(1),j}+x_{2j}\bm{e}_{% \sigma^{-1}(2),j}+\ldots+x_{nj}\bm{e}_{\sigma^{-1}(n),j}.$

This completes our proof.

The next lemma will be important for us to bridge that connection between $D^{\alpha}(P\circ\sigma)(\bm{x})$ and $D^{\alpha}P(\bm{x})$ in the proof of Theorem 9. A general version of this lemma can be found in Kane (2001, page 261), but we present a more elementary proof for the easier case here. Even though the lemma is proven for $\mathcal{S}_{n}$ , this can be applied for any group $G$ that has an action on $[n]$ because any such element $g\in G$ induces a permutation of $[n]$ .

Lemma 13

Suppose $p:U\subseteq(\mathbb{R}^{d})^{n}\to\mathbb{R}$ is smooth. Let $\alpha\in\mathbb{Z}_{\geq 0}^{dn}$ be a multi-index and $\sigma\in\mathcal{S}_{n}$ , then

D^{\alpha}(p\circ\sigma)(\bm{x})=D^{\sigma.\alpha}p(\sigma(\bm{x}))

(10)

Proof We prove this for the general space dimension $d\geq 1$ , and we will prove this statement using induction on the order of $|\alpha|$ . For base case $|\alpha|=0$ , we have $D^{\alpha}p=p$ , so the statement above is true trivially. For base case $|\alpha|=1$ , we take $\alpha=\bm{e}_{i,j}=\bm{e}_{d(i-1)+j}\in\mathbb{Z}_{\geq 0}^{dn}$ for $1\leq i\leq n$ and $1\leq j\leq d$ . We compute $\frac{\partial}{\partial y_{ij}}(p\circ\sigma)(\bm{y})\lvert_{\bm{y}=\bm{x}}$ , which involves the chain rule and the term $\frac{\partial\sigma}{\partial y_{ij}}(\bm{y})$ . Hence, let us compute this first:

	$\displaystyle\frac{\partial\sigma}{\partial y_{ij}}(\bm{y})$	$\displaystyle=\frac{\partial}{\partial y_{ij}}\left(\sum_{\ell=1}^{d}y_{\sigma% (1)\ell}\bm{e}_{1,\ell}+\ldots+y_{\sigma(i)\ell}\bm{e}_{i,\ell}+\ldots y_{% \sigma(n)\ell}\bm{e}_{n,\ell}\right)$
		$\displaystyle=\frac{\partial}{\partial y_{ij}}\left(\sum_{\ell=1}^{d}y_{1\ell}% \bm{e}_{\sigma^{-1}(1),\ell}+\ldots+y_{i\ell}\bm{e}_{\sigma^{-1}(i),\ell}+% \ldots+y_{n\ell}\bm{e}_{\sigma^{-1}(n),\ell}\right)=\bm{e}_{\sigma^{-1}(i),j},$

where the second equality follows from lemma 12. Now, we write

	$\displaystyle D^{\alpha}(p\circ\sigma)(\bm{y})\bigg{\lvert}_{\bm{y}=\bm{x}}=% \frac{\partial}{\partial y_{ij}}(p\circ\sigma)(\bm{y})\bigg{\lvert}_{\bm{y}=% \bm{x}}=\nabla_{\bm{y}}p(\sigma(\bm{y}))\cdot\frac{\partial\sigma}{\partial y_% {ij}}(\bm{y})\bigg{\lvert}_{\bm{y}=\bm{x}}$
	$\displaystyle=\left(\frac{\partial p}{\partial y_{11}}(\sigma(\bm{y})),\ldots,% \frac{\partial p}{\partial y_{nd}}(\sigma(\bm{y}))\right)\cdot\bm{e}_{\sigma^{% -1}(i),j}\bigg{\lvert}_{\bm{y}=\bm{x}}=\frac{\partial p}{\partial y_{\sigma^{-% 1}(i)j}}(\sigma(\bm{y}))\bigg{\lvert}_{\bm{y}=\bm{x}}=D^{\sigma.\alpha}p(% \sigma(\bm{x})),$

since $\sigma.\alpha=\sigma(\bm{e}_{i,j})=\bm{e}_{\sigma(-1),j}$ . This concludes the base cases.

Assume that the statement is true for $|\alpha|=k$ and we want to show that it is also true for $|\beta|=k+1$ . Let $\beta=\alpha+\bm{e}_{i,j}$ , then

	$\displaystyle D^{\beta}(p\circ\sigma)(\bm{y})\bigg{\lvert}_{\bm{y}=\bm{x}}$	$\displaystyle=\frac{\partial}{\partial y_{ij}}(D^{\alpha}(p\circ\sigma))(\bm{y% })\bigg{\lvert}_{\bm{y}=\bm{x}}=\frac{\partial}{\partial y_{ij}}(D^{\sigma.% \alpha}p)(\sigma(\bm{y}))\bigg{\lvert}_{\bm{y}=\bm{x}}$
		$\displaystyle=\nabla_{\bm{y}}(D^{\sigma.\alpha}p)(\sigma(\bm{y}))\cdot\bm{e}_{% \sigma^{-1}(i),j}\bigg{\lvert}_{\bm{y}=\bm{x}}=\frac{\partial D^{\sigma.\alpha% }p}{\partial y_{\sigma^{-1}(i)j}}(\sigma(\bm{y}))\bigg{\lvert}_{\bm{y}=\bm{x}}% =D^{\sigma.\beta}p(\sigma(\bm{x}))$

because $\sigma.\beta=\sigma.(\alpha+\bm{e}_{i,j})=\sigma.\alpha+\bm{e}_{\sigma^{-1}(i)% ,j}$ . The second equality follows from the induction hypothesis.

Proof of Theorem 9.

First, we approximate $f\in\mathcal{C}^{k}$ up to the $k$ -order derivatives by a polynomial $\hat{P}$ using Theorem 11. This allows us to assume that

\|f-\hat{P}\|_{\mathcal{C}^{k}}=\sum_{|\alpha|\leq k}\max_{\bm{x}\in\Omega^{n}% }|D^{\alpha}(f-\hat{P})|(\bm{x})<\epsilon.

(11)

Now consider the symmetrized polynomial $P$ defined by

P(\bm{x})=\int_{G}(\hat{P}\circ\sigma)(\bm{x})\,d\mu(\sigma),

where $d\mu(\sigma)$ is the Haar probability measure associated with the group $G$ . Similarly, since $f$ is $G$ -invariant, it can be written in the same form: $f(\bm{x})=\int_{G}\left(f\circ\sigma\right)(\bm{x})\,d\mu(\sigma)$ .

\begin{split}&\left\|f-P\right\|_{\mathcal{C}^{k}}=\sum_{|\alpha|\leq k}\max_{% \bm{x}\in\Omega^{n}}\left|D^{\alpha}\left(f-P\right)\right|(\bm{x})\overset{(a% )}{=}\sum_{|\alpha|\leq k}\max_{\bm{x}\in\Omega^{n}}\left|D^{\alpha}\int_{G}((% f-\hat{P})\circ\sigma)(\bm{x})\,d\mu(\sigma)\right|\\ &\overset{(b)}{=}\sum_{|\alpha|\leq k}\max_{\bm{x}\in\Omega^{n}}\left|\int_{G}% D^{\alpha}((f-\hat{P})\circ\sigma)(\bm{x})\,d\mu(\sigma)\right|\overset{(c)}{=% }\sum_{|\alpha|\leq k}\max_{\bm{x}\in\Omega^{n}}\left|\int_{G}D^{\sigma.\alpha% }(f-\hat{P})(\sigma(\bm{x}))\,d\mu(\sigma)\right|\\ &\leq\sum_{|\alpha|\leq k}\max_{\bm{x}\in\Omega^{n}}\int_{G}|D^{\sigma.\alpha}% (f-\hat{P})|(\sigma(\bm{x}))\,d\mu(\sigma)\overset{(d)}{\leq}\sum_{|\alpha|% \leq k}\int_{G}\max_{\bm{x}\in\Omega^{n}}|D^{\sigma.\alpha}(f-\hat{P})|(\sigma% (\bm{x}))\,d\mu(\sigma)\\ &\overset{(e)}{=}\sum_{|\alpha|\leq k}\int_{G}\max_{\bm{x}\in\Omega^{n}}|D^{% \sigma.\alpha}(f-\hat{P})|(\bm{x})\,d\mu(\sigma)=\int_{G}\sum_{|\alpha|\leq k}% \max_{\bm{x}\in\Omega^{n}}|D^{\sigma.\alpha}(f-\hat{P})|(\bm{x})\,d\mu(\sigma)% \\ &\overset{(f)}{=}\int_{G}\sum_{|\alpha|\leq k}\max_{\bm{x}\in\Omega^{n}}|D^{% \alpha}(f-\hat{P})|(\bm{x})\,d\mu(\sigma)<\epsilon.\end{split}

In the above equation, $(a)$ follows by symmetrization, and $(c)$ follows from Lemma 13. We now briefly explain the reasoning for the other steps. $(b)$ follows by noting that for any $|\alpha|\leq k$ , $|D^{\alpha}(f-\hat{P})|$ is always bounded on $\Omega^{n}$ because $\Omega^{n}$ is compact. Hence, letting $|D^{\alpha}(f-\hat{P})|\leq M$ for all $\alpha$ , and since $M\in L^{1}(\mu)$ , the differentiation can be passed through the integral. To obtain $(d)$ , we note that $|D^{\sigma.\alpha}(f-\hat{P})|(\sigma(\bm{x}))\leq\max_{\bm{x}\in\Omega^{n}}|D% ^{\sigma.\alpha}(f-\hat{P})|(\sigma(\bm{x}))$ for any $\sigma\in G$ , and thus integrating over the group retains the inequality

\int_{G}|D^{\sigma.\alpha}(f-\hat{P})|(\sigma(\bm{x}))\,d\mu(\sigma)\leq\int_{% G}\max_{\bm{x}\in\Omega^{n}}|D^{\sigma.\alpha}(f-\hat{P})|(\sigma(\bm{x}))\,d% \mu(\sigma).

$(e)$ follows because we are taking the maximum. $(f)$ follows because for a fixed $\sigma\in G$ and all the $\alpha$ such that $|\alpha|=l$ , $\sigma$ is a bijection on the list of $\alpha$ ’s, so it is the same list of $\alpha$ ’s.

Remark 14

The proof of Corollary 10 follows the proof for Theorem 9 closely with a small difference. Given $G=\mathcal{S}_{n}$ , $G$ is now finite and discrete. In this case, the Haar measure becomes a normalized counting measure where we replace integration by a summation with a normalization factor $\frac{1}{n!}$ because $|\mathcal{S}_{n}|=n!$ .

2.3 $\mathcal{C}^{k}$ approximation via $n$ -antisymmetric polynomials

In this section, we prove a similar $\mathcal{C}^{k}$ approximation for $n$ -antisymmetric functions. The formalism in this section will mirror that which was presented in Section 2.2. There is a notion of ‘skew-symmetric’ polynomials that generalize the notion of antisymmetry but in that the group $G$ has to be generated by reflections or pseudo-reflections. For such compact groups $G$ acting on $[n]$ , our results below will hold true. However, we will only work with $\mathcal{S}_{n}$ in our study here. Interested readers may find the notion of skew-symmetric polynomials in Bergeron (2009, page 93).

Theorem 15 ( $\mathcal{C}^{k}$ approximation: $n$ -antisymmetric function)

Let $\Omega\subset\mathbb{R}^{d}$ be compact. Suppose that $f:\Omega^{n}\to\mathbb{R}$ is $n$ -antisymmetric and $\mathcal{C}^{k}$ . Then for every $\epsilon>0$ there exists an $n$ -antisymmetric polynomial $P$ such that

\left\|f-P\right\|_{\mathcal{C}^{k}}<\epsilon.

(12)

Proof Using Theorem 11, we approximate $f\in\mathcal{C}^{k}$ up to the $k$ -order derivatives by a polynomial $\hat{P}$ . Hence, let

\|f-\hat{P}\|_{\mathcal{C}^{k}}=\sum_{|\alpha|\leq k}\max_{\bm{x}\in\Omega^{n}% }|D^{\alpha}(f-\hat{P})|(\bm{x})<\epsilon.

(13)

Now consider the $n$ -antisymmetrized polynomial $P$ defined by

P(\bm{x})=\frac{1}{n!}\sum_{\sigma\in\mathcal{S}_{n}}\text{sgn}(\sigma)(\hat{P% }\circ\sigma)(\bm{x}).

Similarly, $f(\bm{x})=\sum_{\sigma\in\mathcal{S}_{n}}\text{sgn}(\sigma)\left(f\circ\sigma% \right)(\bm{x})$ since $f$ is $n$ -antisymmetric.

We prove the approximation in a similar manner as in the proof of Theorem 9.

	$\displaystyle\left\\|f-P\right\\|_{\mathcal{C}^{k}}=\sum_{\|\alpha\|\leq k}\max_{% \bm{x}\in\Omega^{n}}\left\|D^{\alpha}\left(f-P\right)\right\|(\bm{x})$
	$\displaystyle=\sum_{\|\alpha\|\leq k}\max_{\bm{x}\in\Omega^{n}}\left\|D^{\alpha}% \left(\frac{1}{n!}\sum_{\sigma\in\mathcal{S}_{n}}\text{sgn}(\sigma)((f-\hat{P}% )\circ\sigma)(\bm{x})\right)\,\right\|$
	$\displaystyle=\sum_{\|\alpha\|\leq k}\max_{\bm{x}\in\Omega^{n}}\left\|\frac{1}{n!% }\sum_{\sigma\in\mathcal{S}_{n}}\text{sgn}(\sigma)D^{\alpha}((f-\hat{P})\circ% \sigma)(\bm{x})\,\right\|$
	$\displaystyle=\sum_{\|\alpha\|\leq k}\max_{\bm{x}\in\Omega^{n}}\left\|\frac{1}{n!% }\sum_{\sigma\in\mathcal{S}_{n}}\text{sgn}(\sigma)D^{\sigma.\alpha}(f-\hat{P})% (\sigma(\bm{x}))\,\right\|$
	$\displaystyle\leq\sum_{\|\alpha\|\leq k}\max_{\bm{x}\in\Omega^{n}}\frac{1}{n!}% \sum_{\sigma\in\mathcal{S}_{n}}\|D^{\sigma.\alpha}(f-\hat{P})\|(\sigma(\bm{x}))% \,\leq\sum_{\|\alpha\|\leq k}\frac{1}{n!}\sum_{\sigma\in\mathcal{S}_{n}}\max_{% \bm{x}\in\Omega^{n}}\|D^{\sigma.\alpha}(f-\hat{P})\|(\sigma(\bm{x}))\,$
	$\displaystyle=\sum_{\|\alpha\|\leq k}\frac{1}{n!}\sum_{\sigma\in\mathcal{S}_{n}}% \max_{\bm{x}\in\Omega^{n}}\|D^{\sigma.\alpha}(f-\hat{P})\|(\bm{x})\,=\frac{1}{n!% }\sum_{\sigma\in\mathcal{S}_{n}}\sum_{\|\alpha\|\leq k}\max_{\bm{x}\in\Omega^{n}% }\|D^{\sigma.\alpha}(f-\hat{P})\|(\bm{x})\,$
	$\displaystyle=\frac{1}{n!}\sum_{\sigma\in\mathcal{S}_{n}}\sum_{\|\alpha\|\leq k}% \max_{\bm{x}\in\Omega^{n}}\|D^{\alpha}(f-\hat{P})\|(\bm{x})\ <\epsilon.$

3 Representations of totally symmetric and $n$ -antisymmetric polynomials

The representation theory presented in this section applies to polynomials with coefficients in any field of characteristics zero. However, we will mainly show the results for real polynomials.

3.1 Representations of the totally symmetric polynomials

In Section 2.2, a totally symmetric $f\in\mathcal{C}^{k}$ is shown to be uniformly approximated by a totally symmetric polynomial $P$ over $\Omega^{n}$ , where $\Omega\subset\mathbb{R}^{d}$ is compact. In this section, we infer more about this totally symmetric polynomial $P$ .

For consistency, we are using the notations in Chen et al. (2023) which reference from Briand (2004) (page 359, Theorem 3). Let us denote $\mathcal{P}^{d,n}_{\text{sum}}(\mathbb{R})$ as the $\mathbb{R}$ -algebra consisting of all multi-symmetric polynomials with real coefficients, i.e. real totally symmetric polynomials in $\bm{x}=(\bm{x}_{1},\ldots,\bm{x}_{n})\in(\mathbb{R}^{d})^{n}$ . Furthermore, $\mathcal{P}^{d,n}_{\text{sum}}(\mathbb{R})$ is generated as an $\mathbb{R}$ -algebra by multi-symmetric power sums:

	$\displaystyle\eta_{\bm{s}}(\bm{x}_{1},\ldots,\bm{x}_{n})$	$\displaystyle\coloneqq\sum_{i=1}^{n}x_{i1}^{s_{1}}x_{i2}^{s_{2}}\cdots x_{id}^% {s_{d}},\ \ 0\leq s_{1}+s_{2}+\cdots+s_{d}\leq n,$		(14)
		$\displaystyle=\sum_{i=1}^{n}h_{\bm{s}}(\bm{x}_{i})$		(14)

where $\bm{s}=(s_{1},s_{2},\ldots,s_{d})$ and $\bm{x}_{i}\coloneqq(x_{i1},\ldots,x_{id})$ . One can define a total ordering on these $\bm{s}$ indices and enumerate them as $\eta_{1},\eta_{2},\ldots,\eta_{M}$ , with $\eta_{j}(\bm{x}_{1},\ldots,\bm{x}_{n})=\sum_{i=1}^{n}h_{j}(\bm{x}_{i})$ from above for all $j\in[M]=\{1,2,\ldots,M\}$ . One can easily check that

M=\sum_{i=0}^{n}\binom{i+d-1}{d-1}=\sum_{i=0}^{n}\binom{i+d-1}{i}=\binom{n+1+d% -1}{n}=\binom{n+d}{n}.

(15)

Then $P$ can be written using the generators of the algebra:

\displaystyle P=\sum_{k=1}^{p}C_{k}\prod_{i=1}^{M}\eta_{i}^{k,i},

(16)

where $C_{k}\in\mathbb{R}$ . Now, based on the structures of $\{\eta_{j}\}_{j=1}^{M}$ and $P$ , we can further simplify our representation. First, encode the information of $\{\eta_{j}\}_{j=1}^{M}$ in an $(M\times n)$ matrix $\Theta$ , whose entries are

(\Theta)_{k\ell}=h_{k}(\bm{x}_{\ell})

(17)

for $k\in[M]$ and $\ell\in[n]$ . Also, note that the $\ell$ -th column of the matrix $\Theta$ can be written as

\displaystyle\Theta_{\ell}=\phi(\bm{x}_{\ell})\coloneqq(h_{1}(\bm{x}_{\ell}),h% _{2}(\bm{x}_{\ell}),\ldots,h_{M}(\bm{x}_{\ell}))^{T}.

By (16), $P$ is a function in terms of $\{\eta_{i}\}_{i=1}^{M}$ ; then from (14), $P$ is a function in terms of $\sum_{i=1}^{n}\phi(\bm{x}_{i})$ , where the sum is the so-called inner function. This functional dependence is clearly smooth. Dimension of the range of this function $\phi$ is the ‘embedding dimension’, which is $M=\binom{n+d}{n}$ . This is a considerable improvement compared to Han et al. (2022), where the upper bound on the embedding dimension depends on $\epsilon$ and the norm of the gradient of the approximated function.

3.2 Representations of the $n$ -antisymmetric polynomials

This section now utilizes a fair amount of commutative algebra. The relevant definitions and results for this section can be found in Appendices A and B. Given the polynomial ring $\mathbb{R}[\bm{x}_{1},\ldots,\bm{x}_{n}]$ where $\{\bm{x}_{i}\}_{i=1}^{n}\subset\mathbb{R}^{d}$ , the representation of $n$ -antisymmetric polynomials is a more subtle problem than that of totally symmetric ones. Unlike symmetric polynomials, $n$ -antisymmetric polynomials do not form an $\mathbb{R}$ -algebra because the closure axiom fails in general. Hence, we cannot talk about the generators for this algebra. However, we shall find representations for $n$ -antisymmetric polynomials with respect to the totally symmetric polynomials, where the representation of the latter has been discussed in Section 3.1.

3.2.1 Finite generation of $n$ -antisymmetric polynomials and the case $n=2$

In fact, the $n$ -antisymmetric polynomials form a finitely generated module over the ring of totally symmetric polynomials (Appendix B, Lemma 28 ). Consequently, we want to determine the minimum number of module generators of $n$ -antisymmetric polynomials over the totally symmetric polynomial ring; this is a required analysis because unlike a vector space over a field, there is not a single number of generators for a finitely generated module. The solution for the exact minimum number of module generators for the general case of $n\geq 2$ and $d\geq 1$ is daunting and very difficult to determine, so instead we establish some upper bounds (and lower bounds) on this minimality. This is the main result of the section, which is discussed in Section 3.2.2. In addition, we also establish the minimality for specific cases here as well.

In the case $n\geq 2$ and $d=1$ , the minimum number of generators is 1, which is well known, and goes back to Cauchy (1905). In particular, any $n$ -antisymmetric polynomial can be written as a product of a symmetric polynomial and Vandermonde determinant $D(\bm{x})=\prod_{1\leq i<j\leq n}(x_{i}-x_{j})$ . In other words, if $f$ is $n$ -antisymmetric then there exists a symmetric polynomial $g$ such that $f(\bm{x})=D(\bm{x})g(\bm{x})$ . Since for any $\sigma\in\mathcal{S}_{n}$ , $D(\sigma.\bm{x})=\text{sgn}(\sigma)D(\bm{x})$ and $g(\sigma.\bm{x})=g(\bm{x})$ , the result of the action is $f(\sigma.\bm{x})=\text{sgn}(\sigma)f(\bm{x})$ . We make a useful observation that for any transposition $(i\;j)=\sigma\in\mathcal{S}_{n}$ where $i\neq j$ , $f(\sigma.\bm{x})$ = - $f(\bm{x})$ implies that $f$ vanishes on the hyperplane $x_{i}=x_{j}$ ; this means $f$ is divisible by $(x_{i}-x_{j})$ .

For $n=2$ and $d>1$ , the solution takes the inspiration from the observation above, which will be used in its proof. First, let us consider Vandermonde determinants of all the $n$ scalar variables coming from $\bm{x}_{1},\ldots,\bm{x}_{n}\in\mathbb{R}^{d}$ ; in other words, let $D_{l}(\bm{x})=\prod_{1\leq i<j\leq n}(x_{il}-x_{jl})$ where $l\in\{1,2,\ldots,d\}$ . The result is proven in the following lemma:

Lemma 16

Let $n=2$ and $d\geq 1$ . Any $n$ -antisymmetric polynomial $f$ can be written in the following form:

\displaystyle f(\bm{x})=\sum_{l=1}^{d}D_{l}(\bm{x})g_{l}(\bm{x}),

(18)

where $g_{1},\ldots,g_{d}$ are totally symmetric polynomials.

Proof In this particular case, $D_{l}(\bm{x})=(x_{1l}-x_{2l})$ where $l\in\{1,\ldots,d\}$ , and in this proof, we write $\bm{x}=(\bm{x}_{1},\bm{x}_{2})\in(\mathbb{R}^{d})^{2}$ . Let $\sigma$ be the permutation transforming $\bm{x}\mapsto(\bm{x}_{2},\bm{x}_{1})$ , so the $2$ -antisymmetry of $f$ can be written as $-f(\bm{x})=-f(\bm{x}_{1},\bm{x}_{2})=f(\bm{x}_{2},\bm{x}_{1})=f(\sigma.\bm{x})$ . The $\sigma$ -transformation on $\bm{x}$ is identified with the matrix

\displaystyle\Pi=\begin{pmatrix}0_{d}&I_{d}\\ I_{d}&0_{d}\end{pmatrix},

(21)

where $I_{d}$ is the $d\times d$ identity matrix and $0_{d}$ is the $d\times d$ zero matrix. To arrive at the proposed form, we observe the 2-antisymmetry in a different coordinate via a linear transformation and factor out $D_{l}(\bm{x})$ terms. First, let $y_{1i}=x_{1i}-x_{2i}$ and $y_{2i}=x_{1i}+x_{2i}$ for $i\in\{1,\ldots,d\}$ . This matrix of transformation is

\displaystyle L=\begin{pmatrix}I_{d}&-I_{d}\\ I_{d}&I_{d}\end{pmatrix},

(24)

so that $\bm{y}=(\bm{y}_{1},\bm{y}_{2})=L(\bm{x}_{1},\bm{x}_{2})=L\bm{x}$ . If $P=f\circ L^{-1}$ , which is also a polynomial, the condition $-f(\bm{x})=f(\sigma.\bm{x})$ translates in the new coordinates as $-P(\bm{y})=P(L\Pi L^{-1}\bm{y})$ . One can easily compute $L\Pi L^{-1}$ and see

\displaystyle-P(\bm{y}_{1},\bm{y}_{2})=P(-\bm{y}_{1},\bm{y}_{2}).

(25)

Now, if $P$ is expressed as an $\mathbb{R}$ -linear combination of monomials, then this last condition is also valid for each monomial term. We see this by writing $P(\bm{y})=P_{1}(\bm{y})+P_{2}(\bm{y})$ where

\displaystyle P_{1}(\bm{y}_{1},\bm{y}_{2})\coloneqq\sum_{\alpha}c_{\alpha}m_{% \alpha}(\bm{y}_{1},\bm{y}_{2})\quad\mathrm{and}\quad P_{2}(\bm{y}_{1},\bm{y}_{% 2})\coloneqq\sum_{\alpha}d_{\alpha}n_{\alpha}(\bm{y}_{1},\bm{y}_{2}).

Furthermore, $m_{\alpha}$ and $n_{\alpha}$ are monomials with the form $(\prod_{i=1}^{d}y_{1i}^{e_{i}})\cdot(\prod_{i=1}^{d}y_{2i}^{f_{i}})$ where $\sum_{i=1}^{d}e_{i}$ is odd for monomials $m_{\alpha}$ and even for monomials $n_{\alpha}$ . From equation 25, we must have

\displaystyle-P_{1}(\bm{y}_{1},\bm{y}_{2})=P_{1}(-\bm{y}_{1},\bm{y}_{2})\quad% \mathrm{and}\quad-P_{2}(\bm{y}_{1},\bm{y}_{2})=P_{2}(-\bm{y}_{1},\bm{y}_{2}),

and from the conditions on the monomials, $P_{2}(\bm{y}_{1},\bm{y}_{2})+P_{2}(-\bm{y}_{1},\bm{y}_{2})=2P_{2}(\bm{y})=0$ . This means $P_{2}=0$ . Also, $P=P_{1}$ , so each monomial $m_{\alpha}$ obeys equation (25). Hence, $-m_{\alpha}(\bm{y}_{1},\bm{y}_{2})=m(-\bm{y}_{1},\bm{y}_{2})$ for any index $\alpha$ , which is written explicitly as $-(\prod_{i=1}^{d}y_{1i}^{e_{i}})\cdot(\prod_{i=1}^{d}y_{2i}^{f_{i}})=(-1)^{% \sum_{i=1}^{d}e_{i}}\cdot(\prod_{i=1}^{d}y_{1i}^{e_{i}})\cdot(\prod_{i=1}^{d}y% _{2i}^{f_{i}})$ . This means $\sum_{i=1}^{d}e_{i}$ must be odd, so at least one of the $\{e_{i}\}_{i=1}^{d}$ must be odd. Let us now consider the case where $e_{j}$ is odd for some $j\in\{1,\ldots,d\}$ . Here, we can factor out one $y_{1j}$ from $m_{\alpha}$ , so $m_{\alpha}(\bm{y})=y_{1j}q_{\alpha}(\bm{y})$ where $q_{\alpha}(\bm{y})=(\prod_{i=1,i\neq j}^{d}y_{1i}^{e_{i}})\cdot y_{1j}^{e_{j}-% 1}\cdot(\prod_{i=1}^{d}y_{2i}^{f_{i}})$ . Because $\sum_{i=1}^{d}e_{i}-1$ must be an even number, we see that $q_{\alpha}$ has the following symmetry:

\displaystyle q_{\alpha}(\bm{y}_{1},\bm{y}_{2})=q_{\alpha}(-\bm{y}_{1},\bm{y}_% {2}).

(26)

In the original coordinate $\bm{x}$ , it simply means that $q_{\alpha}$ (expressed in $\bm{x}$ coordinate) will be a totally symmetric monomial. Noting that $y_{1j}=x_{1j}-x_{2j}=D_{j}(\bm{x})$ , we see $m_{\alpha}$ , when expressed in $\bm{x}$ coordinate, can be factored as $D_{j}(\bm{x})$ times a totally symmetric monomial. This factorization of one of $\{D_{i}(\bm{x})\}_{i=1}^{d}$ can be done for each monomial $m_{\alpha}(\bm{x})$ , so this concludes our proof.

3.2.2 Bounds for minimum module generators of $n$ -antisymmetric polynomials for $n>2$

When we are representing $n$ -antisymmetric polynomials, the use of Vandermonde determinants seem intuitive. We may mirror the previous Lemma 16 for the general case $n>2$ and $d\geq 2$ ; however, this will not hold true. One immediate reason is the degree of some $n$ -antisymmetric functions. Recall the Vandermonde determinants $D_{l}(\bm{x})=\prod_{1\leq i<j\leq n}(x_{il}-x_{jl})$ . Their degrees are $\binom{n}{2}$ , so some $n$ -antisymmetric functions of the same or lower degree cannot be represented this way. For example, when $n=3$ and $d=2$ , one such function is $f(\bm{x}_{1},\bm{x}_{2},\bm{x}_{3})=(x_{22}-x_{12})(x_{31}-x_{11})+(x_{12}-x_{% 32})(x_{21}-x_{11})$ where $\deg(f)=2<3=\binom{3}{2}$ .

This brings us to the realm of hardcore representation theory where finding the minimal number of generators for $n$ -antisymmetric polynomials over the symmetric polynomial rings have been tackled by many eminent mathematicians over the years. For $n\geq 2$ and $d=2$ , the exact minimal number of such generators was found by a deep result of Haiman as a solution of the Haiman-Garsia conjecture. (See Garsia and Haiman (1996), Haiman (2003), Haiman (1994), Bergeron (2009) for background. For explicit reference see Haglund et al. (2004), page 2.) The solution would be the $n$ -th Catalan number given by $\frac{(2n)!}{(n+1)!n!}$ . For the record, when $n=d=2$ this solution $\frac{4!}{3!2!}=2$ agrees with our Lemma 16.

Let $n\geq 2$ and $d\geq 1$ , and let us follow the notations from Appendix B from here onwards. The exact minimal number of $n$ -antisymmetric generators is given by the dimension of the vector space $H_{\mathcal{S}_{n}}(\mathbb{R}^{n}\otimes\mathbb{R}^{d})\cap\mathcal{O}(% \mathbb{R}^{n}\otimes\mathbb{R}^{d})_{\text{sgn}}$ , which is given by Lemma 36. As once stated, we aim to the determine the upper and lower bound of this minimal number of generators, and some great work has been done by Wallach (2021); we will draw much from this work to find these bounds. For convenience, $n$ -antisymmetric polynomials are now often called “alternants,” which is the same terminology used in Wallach (2021).

Let us determine the lower bound first. Note that the the set of minimal degree alternants form a vector space over $\mathbb{R}$ , and let this minimal degree be $r$ . We will use this later on. In Wallach (2021, Theorem 1), the lower bound of the number of module generators for alternants over symmetric polynomials is given by the dimension of the vector space spanned by this set of minimal degree alternants. We provide an argument here, relating these two numbers: Suppose $\{f_{1},f_{2},\ldots,f_{k}\}$ is the minimal set of alternants that generate the $n$ -antisymmetric polynomials as a module over the totally symmetric polynomials. Now, recall that any polynomial $g$ can be written as sum of homogeneous polynomials, i.e. $g=\sum_{q}g^{(q)}$ , where $g^{(q)}$ is the homogeneous part with degree $q$ . By Lemma 29, if $g$ is an alternant, then $g^{(q)}$ are also alternants. Now, suppose $\deg(g)=r$ , which is the minimal degree in the set of minimal degree alternants, then $g$ must be homogeneous because of the minimality assumption. Consequently, $g=\sum_{j=1}^{k}u_{j}f_{j}=\sum_{j=1}^{k}(u_{j}f_{j})^{(r)}$ for some totally symmetric polynomials $\{u_{j}\}_{j=1}^{k}$ . Furthermore, for any $j\in[k]$ , $(u_{j}f_{j})^{(r)}=\sum_{a+b=r}u_{j}^{(a)}f_{j}^{(b)}$ , where $a,b$ are nonnegative integers. In order to satisfy the minimality assumption for the degree $r$ , every index $b=r$ where $f_{j}^{(b)}\neq 0$ and the corresponding $a=0$ . Hence, $g=\sum_{j=1}^{k}u_{j}^{(0)}f_{j}^{(r)}$ where $u_{j}^{(0)}\in\mathbb{R}$ for $f_{j}^{(r)}\neq 0$ , and so $g$ is represented by the dimension of the vector space spanned by degree $r$ alternants. (Also note that the minimal generating set of $n$ -antisymmetric polynomials can be comprised of homogeneous polynomials, see Theorem 37 in Appendix B.)

Before stating the lower bound for the number of generators, recall a standard combinatorial fact: Given $d,n\in\mathbb{N}$ , $n$ can be written uniquely as $n=\binom{r}{d}+j$ with $0\leq j<\binom{r}{d-1}$ and $r\in\mathbb{Z}_{\geq 0}$ . From Wallach (2021, Theorem 1), the minimum degree of the minimal generating set of alternants is $d\binom{r}{d+1}+j(r-d+1)$ ; in the proof, the dimension of the vector space spanned by set of minimal degree alternants, which give the lower bound, is

\binom{\binom{r}{d-1}}{j}\ \text{where}\ n=\binom{r}{d}+j\ \text{with}\ 0\leq j% <\binom{r}{d-1}.

Now, we want to establish the upper bound for the minimum number of generators for the alternants over the totally symmetric polynomials. According to Wallach (2021), the maximum degree of the minimial generating set of alternants is $\binom{n}{2}$ , and this fact is proven in Lemma 31, Appendix B. Now, mirroring Wallach (2021), define a graded lexicographic order on the elements of $\mathbb{Z}_{\geq 0}^{d}$ : For any $\bm{b},\bm{c}\in\mathbb{Z}_{\geq 0}^{d}$ , $\bm{b}<\bm{c}$ if $\sum_{i=1}^{d}b_{i}<\sum_{i=1}^{d}c_{i}$ or if $\sum_{i=1}^{d}b_{i}=\sum_{i=1}^{d}c_{i}$ , then if $j$ being the first index where $b_{j}\neq c_{j}$ , then $b_{j}<c_{j}$ . For convenience, let us define $A\in\mathbb{R}^{d\times n}$ where $(A)_{ij}\in\mathbb{Z}_{\geq 0}$ , and let $\bm{x}=(\bm{x}_{1},\ldots,\bm{x}_{n})\in(\mathbb{R}^{d})^{n}$ be the variables. Using a shorthand notation, we define a monomial as

\bm{x}^{A}\coloneqq\prod_{i,j}^{d,n}x_{ij}^{a_{ij}}.

Now, define $\text{Alt}(\cdot)$ on any function with the variables $\bm{x}\in(\mathbb{R}^{d})^{n}$ by

\text{Alt}(f)(\bm{x})\coloneqq\frac{1}{|\mathcal{S}_{n}|}\sum_{\sigma\in% \mathcal{S}_{n}}\text{sgn}(\sigma)f(\sigma.\bm{x}),

where, $\sigma.\bm{x}=(\bm{x}_{\sigma(1)},\ldots,\bm{x}_{\sigma(n)})$ . Then, a basis for the alternants as a vector space over $\mathbb{R}$ is the set

\Lambda=\left\{\text{Alt}(\bm{x}^{A}):A=(\bm{a}_{1},\ldots,\bm{a}_{n})\in(% \mathbb{Z}_{\geq 0}^{d})^{n}\text{ and }\bm{a}_{1}<\bm{a}_{2}<\ldots<\bm{a}_{n% }\right\}

according to Wallach (2021, Proposition 4); of course, this is not the set of interest in our study. However, scalars in $\mathbb{R}$ are also totally symmetric polynomials, so $\Lambda$ with the additional condition of maximal degree of $\binom{n}{2}$ , i.e. $\sum_{i=1,j=1}^{d,n}(A)_{ij}\leq\binom{n}{2}$ , becomes a set of generators for the alternants over the totally symmetric polynomials, see Theorem 37. Let us call this set $\Lambda_{n,d}$ , and so its cardinality is actually a loose upper bound on the minimum number of generators. Actually, finding this cardinality is a very difficult combinatorial problem, which might not have a closed-form expression; this is mainly due to this ordering $\bm{a}_{1}<\bm{a}_{2}<\ldots<\bm{a}_{n}$ where $\{\bm{a}_{i}\}_{i=1}^{n}$ are vectors in $\mathbb{R}^{d}$ . Hence, for simplicity, we compute the number of non-negative integral solutions of $\sum_{i=1,j=1}^{d,n}(A)_{ij}\leq\binom{n}{2}$ , which is the cardinality of a superset of $\Lambda_{n,d}$ without the ordering condition. The number of non-negative integral solutions give an upper bound

\left|\Lambda_{n,d}\right|\leq\binom{\binom{n}{2}+dn}{dn}

on the minimum number of generators of $n$ -antisymmetric polynomials as a module over the totally symmetric polynomials.

We tested our bounds in Table 1, which shows the exact numbers of generators for $n\geq 2$ and $d=2$ falling (quite crudely) between the lower and upper bound. We also summarize our findings of Section 3 in Table 2. We emphasize here again that the exact number or the bounds on the generators (of algebra or module) that we obtain here are all independent of any function that we are approximating and the degree of approximation as we are simply dealing with polynomials now.

Bounds and Exact Minimal Number of Generators for $d=2$
$n$	$r$	$j$	Exact	Bounds
			Exact	Lower	Upper
			$\displaystyle\frac{(2n)!}{(n+1)!n!}$	$\displaystyle\binom{r}{j}$	$\displaystyle\binom{\binom{n}{2}+2n}{2n}$
3	3	0	5	1	84
4	3	1	14	3	3003
5	3	2	42	3	$1.85\text{\times}{10}^{5}$
6	4	0	132	1	$1.74\text{\times}{10}^{7}$
7	4	1	429	4	$2.32\text{\times}{10}^{9}$
8	4	2	1430	6	$4.17\text{\times}{10}^{11}$

Table 1: Demonstrate that the exact minimal number of generators fall between the derived bounds for

d=2

. Note that

n=\binom{r}{2}+j

where

j\in\{0,1,\ldots,r-1\}

Summary of Results
Polynomial Type	$n^{*}$	$d$	Exact	Bounds
Polynomial Type	$n^{*}$	$d$	Exact	Lower	Upper
$\text{Totally Symmetric}^{\dagger}$	$\geq 2$	$\geq 1$	$\displaystyle\binom{n+d}{d}$	–	–
$n\text{-Antisymmetric}^{\ddagger}$	$\geq 2$	$1$	$1$	–	–
	$2$	$\geq 1$	$d$	$\displaystyle\binom{\binom{r}{d-1}}{j}$	$\displaystyle\binom{\binom{n}{2}+dn}{dn}$
	$>2$	$2$	$\displaystyle\frac{(2n)!}{(n+1)!n!}$
	$\geq 2$	$\geq 2$	–

Table 2:

{}^{\dagger}

Totally symmetric polynomials forms an

\mathbb{R}

-algebra, so the number of algebra generators is given.

{}^{\ddagger}

n

-Antisymmtric polynomials form a finitely generated module over the totally symmetric polynomials, so lower and upper bounds are given for the minimum number of module generators..

{}^{*}

In the antisymmetric case,

n=\binom{r}{d}+j

for

0\leq j<\binom{r}{d-1}

4 Acknowledgments and disclosure of funding

S.G. and K.T. are grateful to their peers N. Ramachandran, H. Bhatia, and S. Bhattacharya, in the Dept. of Mathematics, UCSD for enlightening conversations about many algebraic facts. S.G. is thankful to S. Chhabra for introducing him to the broad area of mathematical deep learning. The authors are grateful to Prof. Steven Sam and Prof. Brendon Rhoades for illuminating conversations on commutative algebra and representation theory. R.S. would like to thank the Institute for Pure and Applied Mathematics, Stanford for being generous hosts, and for providing a great environment during the period when most of this work was completed. The authors would also like to thank Prof. Nolan Wallach, for communicating many of the results and proofs related to invariant and representation theory, which have been used extensively in Appendix B. The authors declare that they have no competing interests.

Appendix A Necessary background for commutative algebra

In this section, some basic definitions and theorems from commutative algebra are given, which are useful in understanding the results in Section 3.2 and Appendix B. For more details, the reader is referred to any standard textbook such as Atiyah and Macdonald (1969), and the results will be referenced. We assume the reader has a basic understanding of groups, rings, and vector spaces over fields. Below the operations of ring addition and multiplication are denoted using the symbols $+$ and $\cdot$ respectively. We will also use these same symbols later to denote the group addition and scalar multiplication operations for a module (defined below in Definition 18), but the distinction will always be clear from context, and no confusion should arise.

If $R$ is a ring, for any $r,s\in R$ , we will frequently write $rs$ to mean $r\cdot s$ , when there is no chance for confusion. A ring is unital if multiplication has an identity element (denoted as $1$ ). A ring is called commutative if $r\cdot s=s\cdot r$ for all $r,s\in R$ . Throughout this paper, we will assume our rings to be unital, commutative rings. A nonzero commutative ring in which every nonzero element has a multiplicative inverse is called a field, e.g. $\mathbb{R}$ and $\mathbb{C}$ . A nonzero commutative ring $R$ is called an integral domain if $x\cdot y=0$ , for $x,y\in R$ , implies either $x$ or $y$ is the zero element of $R$ . For example, if $\{\bm{x}_{i}\}_{i=1}^{n}\subset\mathbb{R}^{d}$ , then the set of polynomials $\mathbb{R}[\bm{x}_{1},\ldots,\bm{x}_{n}]$ , form a ring when equipped with the usual addition and multiplication operations for polynomials. This polynomial ring plays a key role in this paper. In fact, $\mathbb{R}[\bm{x}_{1},\ldots,\bm{x}_{n}]$ is also an integral domain (Atiyah and Macdonald, 1969, Page 2). Any subring of $\mathbb{R}[\bm{x}_{1},\ldots,\bm{x}_{n}]$ inherits the same addition and multiplication operations from $\mathbb{R}[\bm{x}_{1},\ldots,\bm{x}_{n}]$ . A notion that we will repeatedly encounter is that of an ideal of a ring:

Definition 17 (Ideal of a ring)

An ideal $I$ of a commutative ring $(R,+,\cdot)$ is a subset of $R$ such that $(I,+)$ is a subgroup of $(R,+)$ , and for every $r\in R$ and $x\in I$ , the product $r\cdot x\in I$ .

Another important class of objects that we will also encounter frequently is the notion of a module over a ring, which generalizes the concept of a vector space over a field.

Definition 18 (Module over a ring)

Given a ring $R$ , a set $M$ is called an $R$ -Module if $(M,+)$ is an abelian group, equipped with an operation $\cdot:R\times M\to M$ satisfying $r\cdot(x+y)=r\cdot x+r\cdot y$ , $(r+s)\cdot x=r\cdot x+s\cdot x$ , $(rs)\cdot x=r\cdot(s\cdot x)$ , $1\cdot x=x$ , for all $r,s\in R$ and $x,y\in M$ . The operation $\cdot$ is called scalar multiplication.

For a $R$ -module $M$ , the group identity will also be denoted as $1$ , and again the distinction from the ring unit will be clear from context. If $M$ is an $R$ -module and $N$ is a subgroup of $M$ , then $N$ is a $R$ -submodule if for any $n\in N$ and any $r\in R$ , $r\cdot n\in N$ . Recall that if $I$ is an ideal of a ring $R$ , then $I$ is naturally an $R$ -module. An $R$ -module $M$ is finitely generated if there exist finitely many elements $\{y_{i}\}_{i=1}^{n}\subset M$ such that every element of $M$ is a linear combination of $\{y_{i}\}_{i=1}^{n}$ with coefficients from $R$ .

We next discuss Noetherian rings and Noetherian modules which play a key role in Section 3.2, but we will not give the axiomatic definition of these objects (which depends on ascending chain conditions). Alternate and equivalent definitions are given below because they are more relevant to the material in this paper.

Definition 19 (Noetherian ring)

A ring $R$ is called Noetherian if every ideal of $R$ is finitely generated as a $R$ -module.

Definition 20 (Noetherian module)

A $R$ -module $M$ is called Noetherian if every submodule of $M$ is finitely generated over $R$ .

For the characterization of Noetherian modules that we gave above in Definition 20, the reader is referred to Atiyah and Macdonald (1969, Proposition 6.2). Finally, we define the notion of an associative algebra over a ring, which is also extensively used in this paper.

Definition 21 (Associative algebra over a ring)

Let $R$ be a ring, and let $\mathcal{A}$ be a $R$ -module. Then $\mathcal{A}$ is called an $R$ -algebra (or an algebra over $R$ ), if $\mathcal{A}$ also forms a ring such that the ring addition is the same operation as module addition, and module scalar multiplication $\cdot$ satisfies $r\cdot(x\ast y)=(r\cdot x)\ast y=x\ast(r\cdot y)$ for all $r\in R$ and $x,y\in\mathcal{A}$ , where $\ast$ denotes ring multiplication in $\mathcal{A}$ .

When the associative algebra ring multiplication $\ast$ is also commutative, we say that it is a commutative algebra. All the associative algebras that we will encounter in this paper are commutative algebras, and hence for simplicity we will simply refer to them as algebras going forward. An $R$ -algebra $\mathcal{A}$ is finitely generated if there exists a finite set of elements $\{z_{i}\}_{i=1}^{m}\subset\mathcal{A}$ such that any element of $\mathcal{A}$ can be written as a finite linear combination of terms, with coefficients in $R$ , where each term is a finite product of the elements in $\{z_{i}\}_{i=1}^{m}$ . For example, the polynomial ring $\mathbb{R}[\bm{x}_{1},\ldots,\bm{x}_{n}]$ introduced previously, forms an $\mathbb{R}$ -algebra. It is in fact finitely generated by the set $\{x_{ij}\}_{i=1,j=1}^{n,d}$ .

Another notion that is needed in Section 3.2, is that of integral elements in a ring, over a subring. Next we define this notion, and then state an important result involving integral elements (Proposition 23 below, the proof of which can be found in Atiyah and Macdonald (1969, Proposition 5.1)).

Definition 22 (Integral element over subring)

Let $R$ be a ring and $S\subset R$ be a subring. An element $x\in R$ is integral over $S$ , if $x$ is a root of a monic polynomial with coefficients in $S$ . Here monic polynomial means that the polynomial is univariate and the coefficient of the highest degree term of the polynomial is $1$ . We say that $R$ is integral over $S$ if every element $x\in R$ is integral over $S$ .

Proposition 23

Let $R$ be a ring and $S\subset R$ be a subring. Then $x\in R$ is integral over $S$ if and only if $S[x]$ is a finitely generated $S$ -module, where $S[x]$ is the subring generated by $S\cup\{x\}$ .

We also need the following three results, the proofs of which can be found in Atiyah and Macdonald (1969, Corollary 7.6), Matsumura (1989, Theorem 3.7.i), and Atiyah and Macdonald (1969, Proposition 6.5) respectively. Of these, Hilbert’s basis theorem is a famous theorem in commutative algebra.

Theorem 24 (Hilbert’s basis theorem)

If $R$ is a Noetherian ring then $R[x_{1},\ldots,x_{n}]$ is also a Noetherian (polynomial) ring.

Theorem 25 (Eakin-Nagata theorem)

If $R$ is a Noetherian ring and $S$ is a subring such that $R$ is finitely generated as a $S$ -module, then $S$ is also a Noetherian ring.

Proposition 26

If $R$ is a Noetherian ring, and $M$ is a finitely generated $R$ -module, then $M$ is a Noetherian $R$ -module.

Appendix B Necessary results from representation theory

In this section, we establish some facts in representation theory, which are used in Section 3.2 to prove the upper and lower bounds of the minimum number of module generators for the set of $n$ -antisymmetric polynomials (as a module) over the totally symmetric polynomials. In particular, these polynomials belong to the ring $\mathbb{R}[\bm{x}_{1},\ldots,\bm{x}_{n}]$ where $\{\bm{x}_{i}\}_{i=1}^{n}\subset\mathbb{R}^{d}$ . We are grateful to Prof. Nolan Wallach for communicating and discussing these results with us. The results are included here for completeness and may illuminate the readers on the main results of our paper, especially those from the machine learning community. Finally, the results are discussed here for the field $\mathbb{R}$ ; however, they remain true for polynomials over any field of characteristic zero. For a more detailed discussion of the topics presented here, the reader can refer to Wallach (2017, Section 3.7.6).

This section is structured as follows: In Section B.1, we introduce the $\mathbb{R}$ -algebra of polynomials equipped with an inner-product, identify several modules and subalgebras associated to a group action on the space of polynomials, and state some of their properties. Section B.2 is dedicated to proving the finite-dimensionality of the $\mathbb{R}$ -vector space $H_{\mathcal{S}_{n}}(\mathbb{R}^{n}\otimes\mathbb{R}^{d})$ , which is an important subspace of the vector space of polynomials $\mathbb{R}[\bm{x}_{1},\dots,\bm{x}_{n}]$ that plays a key role in the analysis. Then in Section B.3, we prove several structural results for this subspace, eventually culminating with Lemma 36, which is the main tool for quantifying the minimum number of module generators for the space of $n$ -antisymmetric polynomials. In these subsections, we first prove the results for a general group action $G$ , then state the results for the special case $G=\mathcal{S}_{n}$ . The precise definition of the group action $\mathcal{S}_{n}$ relevant for us is provided below in Section B.1.

B.1 Polynomial algebras and modules induced by a group action

We begin by defining the notion of representation of a group:

Definition 27 (Representation of a group)

A representation of a group $G$ on a vector space $Y$ over a field $\mathbb{K}$ is a group homomorphism $\rho$ from $G$ to $GL(Y)$ , the general linear group on $Y$ . This means $\rho:G\to GL(Y)$ satisfies $\rho(g_{1}g_{2})=\rho(g_{1})\rho(g_{2})$ for all $g_{1},g_{2}\in G$ . The dimension of $Y$ is called the dimension of the representation.

Here, we will only deal with finite-dimensional representations. From this definition, it is clear that representations can also be interpreted as group actions. Now since elements of $GL(Y)$ can be represented by invertible matrices, we can take their trace, and the resulting quantity defines the character of the representation, denoted as $\chi$ . Thus $\chi(g)\coloneqq\text{tr}(\rho(g))$ for all $g\in G$ , if $\rho$ is a representation of a group $G$ . A subspace $W$ of $Y$ is called $G$ -invariant if $\rho(g)w\in W$ for all $g\in G$ and $w\in W$ . A $G$ -representation is called irreducible if $\text{dim}(Y)\neq 0$ and the only $G$ -invariant subspaces of $Y$ are $\{0\}$ and $Y$ itself. Interested readers are referred to Sagan (2001, Chapter 1) for more information.

Let $V$ be any finite dimensional inner product space over $\mathbb{R}$ where $\text{dim}(V)=m$ . Now, we will define a few polynomial $\mathbb{R}$ -algebras and $\mathbb{R}$ -modules over $V$ , equipped with an inner-product, that we will use extensively. Let $\{v_{i}\}_{i=1}^{m}$ be an orthonormal basis of $V$ . This allows us to define coordinates in $V$ by expressing any vector $v\in V$ as $v=\sum_{i=1}^{m}z_{i}v_{i}$ , where $z_{i}\in\mathbb{R}$ for all $i\in[m]=\{1,2,\ldots,m\}$ . Now, any $v\in V$ can be denoted by its coordinates $\bm{z}=(z_{1},\ldots,z_{m})$ . We define an $\mathbb{R}$ -algebra of polynomials on $V$ with respect to these coordinates, which we will denote as $\mathcal{O}(V)$ , i.e. $\mathcal{O}(V):=\mathbb{R}[\bm{z}]$ . We also define a positive definite inner product on $\mathcal{O}(V)$ as

\displaystyle\langle f,g\rangle_{\mathcal{O}(V)}=f\left(\frac{\partial}{% \partial z_{1}},\ldots,\frac{\partial}{\partial z_{m}}\right)g(z_{1},\ldots,z_% {m})\bigg{\lvert}_{\bm{z}=0},\;\;\forall f,g\in\mathcal{O}(V),

which turns $\mathcal{O}(V)$ into an inner-product space. We also introduce a useful notation: If $W,W^{\prime}$ are subsets of $\mathcal{O}(V)$ then we define $WW^{\prime}$ as the set of all finite sums of the form $\sum_{i=1}a_{i}b_{i}$ , where each $a_{i}\in W$ and $b_{i}\in W^{\prime}$ . Obviously, $WW^{\prime}=W^{\prime}W$ . We note a special case where $W$ is a subring of $\mathcal{O}(V)$ , and $W^{\prime}$ is a subset of $\mathcal{O}(V)$ , then $WW^{\prime}$ is the $W$ -module generated by $W^{\prime}$ and is a $W$ -submodule of $\mathcal{O}(V)$ .

Next, let $G$ be a finite subgroup of $GL(V)$ acting on $\mathcal{O}(V)$ as $s.f(\bm{z})\coloneqq s.f(z_{1},\ldots,z_{m})=f(s^{-1}\bm{z})$ for any $f\in\mathcal{O}(V)$ , $\bm{z}\in V$ , and $s\in G$ . Let us denote $\mathcal{O}(V)^{G}$ as the $\mathbb{R}$ -algebra of polynomials invariant under $G$ action, i.e. $\{f\in\mathcal{O}(V):s.f(z)=f(z),\;\forall\ s\in G\}$ , and it is easy to check that it is indeed an $\mathbb{R}$ -algebra. We will call elements of $\mathcal{O}(V)^{G}$ as invariants in this section – where the group $G$ will be specified and understood from context – to avoid conflict with the definition of $G$ -invariant functions in Definition 1. Denote $\mathcal{O}_{+}(V)^{G}$ as the subspace of invariants that vanish at $\bm{z}=0$ . One can then define $H_{G}(V)$ to be the orthogonal complement of $\mathcal{O}(V)\mathcal{O}_{+}(V)^{G}$ in $\mathcal{O}(V)$ , where $\mathcal{O}(V)\mathcal{O}_{+}(V)^{G}$ can be easily shown to be the smallest ideal of $\mathcal{O}(V)$ containing $\mathcal{O}_{+}(V)^{G}$ . The space $H_{G}(V)$ is called the space of $G$ -harmonic polynomials, and it is a known result that it is equal to the set of all polynomials annihilated by all $G$ -invariant constant coefficient differential operators on $V$ (Wallach, 2017, Lemma 3.105). Finally, if $\chi$ is a character of an irreducible representation of $G$ , then we can define $\mathcal{O}(V)_{\chi}\coloneqq\{f\in\mathcal{O}(V):f(g.v)=\chi(g)f(v),\;\;% \forall v\in V,g\in G\}$ .

Now let us bring the context of our work into these definitions and notations. For Section 3.2, we can take $V=\mathbb{R}^{n}\otimes\mathbb{R}^{d}$ and $G=\mathcal{S}_{n}$ , which acts on the first factor of the tensor product. We identify $\mathbb{R}^{n}\otimes\mathbb{R}^{d}$ with $(\mathbb{R}^{d})^{n}$ , and recall the notation of an element in $(\mathbb{R}^{d})^{n}$ – denoted as $\bm{x}=(\bm{x}_{1},\dots,\bm{x}_{n})$ for each $\bm{x}_{i}\in\mathbb{R}^{d}$ . We maintain the same ordering of coordinates under the identification $\mathbb{R}^{n}\otimes\mathbb{R}^{d}\cong(\mathbb{R}^{d})^{n}$ , so that $\bm{x}$ gives coordinates on $\mathbb{R}^{n}\otimes\mathbb{R}^{d}$ . In this case, $\mathcal{O}(V)^{\mathcal{S}_{n}}$ is precisely the set of totally symmetric polynomials, and $H_{\mathcal{S}_{n}}(V)$ is the set of polynomials in $\{x_{ij}:i\in[n],j\in[d]\}$ that are annihilated by all operators of the form $\sum_{i=1}^{n}\partial_{x_{i1}}^{a_{1}}\cdots\partial_{x_{in}}^{a_{n}}$ where $\partial_{x}^{a}=\frac{\partial^{a}}{\partial x^{a}}$ and $0\neq(a_{1},\dots,a_{d})\in\mathbb{N}^{d}$ . Similarly, if we take $\chi$ to be the character of the sign representation of $\mathcal{S}_{n}$ , the set of polynomials $\mathcal{O}(V)_{\chi}$ , denoted as $\mathcal{O}(V)_{\text{sgn}}$ , are the $n$ -antisymmetric polynomials that we defined previously in Section 2.1. (The sign representation of $\mathcal{S}_{n}$ is the one-dimensional representation of $\mathcal{S}_{n}$ defined by $\rho(\sigma)=\text{sgn}(\sigma)$ , where $\text{sgn}(\sigma)$ is the signature of the permutation $\sigma\in\mathcal{S}_{n}$ .) We note that $\mathcal{O}(V)_{\text{sgn}}$ forms an $\mathcal{O}(V)^{\mathcal{S}_{n}}$ -module.

We now prove an important property of $\mathcal{O}(V)_{\text{sgn}}$ in the next lemma. One should note that a similar result holds in the general case of arbitrary $V$ , $G$ , and $\chi$ , and the proof changes slightly.

Lemma 28

For $V=\mathbb{R}^{n}\otimes\mathbb{R}^{d}$ , $G=\mathcal{S}_{n}$ , and $\mathcal{O}(V)=\mathbb{R}[\bm{x}_{1},\ldots,\bm{x}_{n}]$ , $\mathcal{O}(V)_{\text{sgn}}$ forms a finitely generated module over $\mathcal{O}(V)^{\mathcal{S}_{n}}$ .

Proof Recalling Definition 22, we can show that $\mathcal{O}(V)$ is integral over $\mathcal{O}(V)^{\mathcal{S}_{n}}$ (for a proof, see Atiyah and Macdonald (1969, Chapter 5, Exercise 12)). In fact, we observe that for any $f\in\mathcal{O}(V)$ , $f$ is a root of the following univariate polynomial in variable $t$ : $\prod_{\sigma\in G}(t-\sigma.f)$ . This polynomial is monic in $t$ and the coefficients of $t$ in this polynomial all belong to $\mathcal{O}(V)^{\mathcal{S}_{n}}$ . Now clearly $\mathcal{O}(V)$ is a finitely generated $\mathbb{R}$ -algebra, with a generating set given by $\{x_{ij}:i\in[n],\;j\in[d]\}$ . We just showed above that each $x_{ij}$ is integral over $\mathcal{O}(V)^{\mathcal{S}_{n}}$ . Then letting $S=\mathcal{O}(V)^{\mathcal{S}_{n}}$ and $R=\mathcal{O}(V)$ in Proposition 23, we get that the subring of $\mathcal{O}(V)$ generated by $\mathcal{O}(V)^{\mathcal{S}_{n}}$ and $\{x_{ij}:i\in[n],\;j\in[d]\}$ is a finitely generated $\mathcal{O}(V)^{\mathcal{S}_{n}}$ -module. However, the subring generated by $\mathcal{O}(V)^{\mathcal{S}_{n}}\cup\{x_{ij}:i\in[n],\;j\in[d]\}$ is $\mathcal{O}(V)$ , and so we have proved that $\mathcal{O}(V)$ is a finitely generated $\mathcal{O}(V)^{\mathcal{S}_{n}}$ -module.

Next, we note that $\mathcal{O}(V)$ is a commutative, Noetherian ring, and this is a direct consequence of Theorem 24. Since $\mathcal{O}(V)^{\mathcal{S}_{n}}\subset\mathcal{O}(V)$ is a commutative ring (by virtue of being an $\mathbb{R}$ -algebra), and $\mathcal{O}(V)$ is a finitely generated $\mathcal{O}(V)^{\mathcal{S}_{n}}$ -module, it then follows that $\mathcal{O}(V)^{\mathcal{S}_{n}}$ is a Noetherian ring by Theorem 25. Finally, by Proposition 26, we can conclude that $\mathcal{O}(V)$ is a Noetherian module over $\mathcal{O}(V)^{\mathcal{S}_{n}}$ , and thus every $\mathcal{O}(V)^{\mathcal{S}_{n}}$ -submodule of $\mathcal{O}(V)$ is finitely generated. Since the $n$ -antisymmetric polynomials $\mathcal{O}(V)_{\text{sgn}}$ , form an $\mathcal{O}(V)^{\mathcal{S}_{n}}$ -submodule of $\mathcal{O}(V)$ , the proof is complete.

B.2 Finite-dimensionality of $H_{\mathcal{S}_{n}}(\mathbb{R}^{n}\otimes\mathbb{R}^{d})$

Our next goal is to prove that $H_{\mathcal{S}_{n}}(\mathbb{R}^{n}\otimes\mathbb{R}^{d})$ introduced in the previous subsection is a finite-dimensional $\mathbb{R}$ -vector space. To proceed, we need to introduce some more terminology, and let us work more generally for arbitrary $V$ and finite group $G$ of $GL(V)$ . As before in Section 3.2.2, let $f^{(j)}$ denote the homogeneous component of $f\in\mathcal{O}(V)$ of degree $j$ . Then for any polynomial $f\in\mathcal{O}(V)$ of degree $k$ , we can write $f=\sum_{j=0}^{k}f^{(j)}$ . If $W$ is a subspace of $\mathcal{O}(V)$ such that $f\in W$ implies $f^{(j)}\in W$ for all $j$ , then we say that $W$ is a homogeneous subspace. Given a homogeneous subspace $W$ of $\mathcal{O}(V)$ , we can write $W=\bigoplus_{j}W^{(j)}$ , with $W^{(j)}:=\{f^{(j)}:f\in W\}\cup\{0\}$ . Let us first prove some properties about homogenous subspaces, and identify a few that are important for us in the next lemma:

Lemma 29

Let $W$ be a subring of $\mathcal{O}(V)$ that is also a homogenous subspace, and let $G$ be a finite subgroup of $GL(V)$ acting on $\mathcal{O}(V)$ . Then we have the following:

(a)

If $W^{\prime}$ is a homogenous subspace of $\mathcal{O}(V)$ , then $WW^{\prime}$ is also a homogenous subspace.
(b)

Suppose $A_{1},A_{2},A_{3}$ are subspaces of $\mathcal{O}(V)$ , and $A_{1}=A_{2}+A_{3}$ . Then if any two of them are homogenous subspaces of $\mathcal{O}(V)$ , then so is the third.
(c)

$\mathcal{O}(V)^{G}$ , $\mathcal{O}(V)^{G}_{+}$ , $\mathcal{O}(V)_{\chi}$ , $\mathcal{O}(V)\mathcal{O}(V)^{G}_{+}$ , $H_{G}(V)$ are homogeneous subspaces of $\mathcal{O}(V)$ .

Proof (a) Every element $f\in WW^{\prime}$ is of the form $\sum_{i=1}a_{i}b_{i}$ , where each $a_{i}\in W$ and $b_{i}\in W^{\prime}$ . Then for any $p\in\mathbb{N}$ , we have

\displaystyle f^{(p)}=\sum_{i=1}\sum_{\begin{subarray}{c}j,k\geq 0\\ j+k=p\end{subarray}}a_{i}^{(j)}b_{i}^{(k)}.

Since each term in the summation is in $WW^{\prime}$ , as $W,W^{\prime}$ are homogeneous, we get that $f^{(p)}\in WW^{\prime}$ .

(b) First assume that $A_{2}$ and $A_{3}$ are homogenous subspaces, and let $f\in A_{1}$ . Then for any integer $j\geq 0$ , $f_{2}\in A_{2}$ , and $f_{3}\in A_{3}$ , we have $f^{(j)}=f_{2}^{(j)}+f_{3}^{(j)}$ . From this, we conclude that $f^{(j)}\in A_{1}$ , as both $f_{2}^{(j)}\in A_{2}$ and $f_{3}^{(j)}\in A_{3}$ , by homogeneity. The other cases follow by a similar argument.

(c) If two polynomials are equal, then so are each of their homogeneous components. For $s\in G$ , if we write the group action as $s.f(\bm{z})=f(s^{-1}.\bm{z})$ , then for $f=\sum_{j}f^{(j)}$ , $(s.f)^{(j)}=s.(f^{(j)})$ for all $j$ . This observation proves that $\mathcal{O}(V)^{G}$ , $\mathcal{O}(V)^{G}_{+}$ , $\mathcal{O}(V)_{\chi}$ are homogeneous subspaces. Then by (a) we see that $\mathcal{O}(V)\mathcal{O}(V)^{G}_{+}$ is a homogeneous subspace since $\mathcal{O}(V)$ is clearly homogenous. By (b), we conclude that $H_{G}(V)$ is a homogeneous subspace because $\mathcal{O}(V)=\mathcal{O}(V)\mathcal{O}_{+}(V)^{G}\oplus H_{G}(V)$ .

We may now specialize to the context of our work, and assume that $V=\mathbb{R}^{n}\otimes\mathbb{R}^{d}$ and $G=\mathcal{S}_{n}$ . First let $\bm{y}=(y_{1},\ldots,y_{m})\in\mathbb{R}^{m}$ and $\bm{a}=(a_{1},\ldots,a_{m})\in\mathbb{Z}_{\geq 0}^{m}$ . We define

h_{k,m}(\bm{y})\coloneqq\sum_{a_{1}+\ldots+a_{m}=k}y_{1}^{a_{1}}y_{2}^{a_{2}}% \cdots y_{m}^{a_{m}}=\sum_{|\bm{a}|=k}\bm{y}^{\bm{a}}

along with the shorthand notations $|\bm{a}|=a_{1}+\ldots+a_{m}$ and $\bm{y}^{\bm{a}}=y_{1}^{a_{1}}y_{2}^{a_{2}}\cdots y_{m}^{a_{m}}$ . We also let $>_{\text{lex}}$ be the lexicographic order on the monomials $\{y_{j}\}_{j=1}^{m}$ such that $y_{1}>y_{2}>\ldots>y_{m}$ . Then the ideal, denoted as $\mathcal{I}\subset\mathbb{R}[y_{1},\ldots,y_{m}]$ , of symmetric polynomials of positive degree is generated by $\eta_{k,m}(\bm{y})=\sum_{i=1}^{m}y_{i}^{k}$ for $k\in[m]$ . A Gröbner basis for $\mathcal{I}$ is given by the polynomials

\{u_{k,m}(\bm{y})=h_{k,m-k+1}(y_{k},\ldots,y_{n})\}_{k=1}^{m}.

(27)

More specifically, these polynomials are in the ideal $\mathcal{I}$ . Readers can refer to Mora and Sala (2003, Proposition 2.1) for this fact.

Now $V=\mathbb{R}^{n}\otimes\mathbb{R}^{d}$ can be identified as a vector space of real valued $n\times d$ matrices $M_{n\times d}(\mathbb{R})$ . Let $\mathcal{S}_{n}$ be identified as $n\times n$ permutation matrices, which acts from the left by multiplication. Consequently, we need to define a lexicographic order $>_{\text{lex}}$ on $\{x_{ij}:i\in[n],j\in[d]\}$ : $x_{11}>x_{12}>\ldots>x_{1d}>\ldots>x_{n1}>x_{n2}>\ldots>x_{nd}$ . We note this lexicographic ordering of the variables also defines a total ordering of the monomials of degree $k$ in these variables, for every integer $k$ . Let us define the family of polynomials $y_{\ell,\bm{z}}(\bm{x}):=\sum_{j=1}^{d}z_{j}x_{\ell j}$ for every $\bm{z}=(z_{1},\ldots,z_{d})\in\mathbb{R}^{d}$ and $\ell\in[n]$ . Consequently, given $\sigma\in\mathcal{S}_{n}$ , its action is defined as $\sigma.y_{\ell,\bm{z}}(\bm{x})\coloneqq y_{\sigma^{-1}(\ell),\bm{z}}(\bm{x})$ . We make the following claim:

Claim B.1

Let $V=\mathbb{R}^{n}\otimes\mathbb{R}^{d}$ and $G=\mathcal{S}_{n}$ . Then for every $\bm{z}\in\mathbb{R}^{d}$ , the ideal $\mathcal{I}_{n,d}\coloneqq\mathcal{O}(V)\mathcal{O}_{+}(V)^{\mathcal{S}_{n}}$ contains the set of polynomials

\left\{v_{k,n,\bm{z}}:=h_{k,n-k+1}(y_{k,\bm{z}},y_{k+1,\bm{z}},\ldots,y_{n,\bm% {z}})\right\}_{k=1}^{n}.

(28)

Proof Let us fix an arbitrary $\bm{z}\in\mathbb{R}^{d}$ and note that the ring of real polynomials in $\{y_{1,\bm{z}},\dots,y_{n,\bm{z}}\}$ is a subring of $\mathcal{O}(V)$ . Also, every non constant $\mathcal{S}_{n}$ -symmetric polynomial in $\{y_{1,\bm{z}},\dots,y_{n,\bm{z}}\}$ is particularly a non-constant, totally symmetric polynomial in $\mathcal{O}(V)$ , i.e. it is in $\mathcal{O}_{+}(V)^{\mathcal{S}_{n}}$ . This makes the ideal generated by the symmetric polynomials of positive degree in the ring $\mathbb{R}[y_{1,\bm{z}},\dots,y_{n,\bm{z}}]$ a subset of $\mathcal{I}_{n,d}$ . Then the observations from Mora and Sala (2003), mentioned above, completes the proof.

The set of polynomials defined in (28) can be used to define another important family of polynomials $p_{k,n,\bm{b}}\in\mathbb{R}[\bm{x}_{1},\dots,\bm{x}_{n}]$ , where $\bm{b}=(b_{1},\dots,b_{d})$ is a $d$ -tuple of non-negative integers satisfying $\sum_{\ell=1}^{d}b_{\ell}=k$ . They are obtained by the expansion of $v_{k,n,\bm{z}}$ (using the definition of $h_{k,n-k+1}$ ):

\begin{split}v_{k,n,\bm{z}}(\bm{x}_{k},\ldots,\bm{x}_{n})&=\sum_{a_{k}+\ldots+% a_{n}=k}\left(\sum_{j=1}^{d}z_{j}x_{kj}\right)^{a_{k}}\left(\sum_{j=1}^{d}z_{j% }x_{(k+1)j}\right)^{a_{k+1}}\cdots\left(\sum_{j=1}^{d}z_{j}x_{nj}\right)^{a_{n% }}\\ &=\sum_{b_{1}+\ldots+b_{d}=k}z_{1}^{b_{1}}z_{2}^{b_{2}}\cdots z_{d}^{b_{d}}p_{% k,n,\bm{b}}(\bm{x}_{k},\ldots,\bm{x}_{n}).\end{split}

(29)

In particular, $p_{k,n,\bm{b}}$ are homogeneous polynomials of degree $k$ ; we also infer that each $p_{k,n,\bm{b}}$ is in the ideal $\mathcal{I}_{n,d}$ , which will be proven using Lemma 30. First, for a fixed $k\in[n]$ , the set $\{\bm{b}=(b_{1},b_{2},\ldots,b_{d})\in\mathbb{Z}_{\geq 0}^{d}:|\bm{b}|=k\}$ has cardinality $N=\binom{k+d-1}{d-1}$ , which is the number of polynomials $p_{k,n,\bm{b}}$ in the expression $v_{k,n,\bm{z}}$ . Let us enumerate them as $\bm{b}_{1},\bm{b}_{2},\ldots,\bm{b}_{N}$ . Since we can choose $\bm{z}$ freely from $\mathbb{R}^{d}$ , we also have the following lemma:

Lemma 30

There exist points $\bm{z}_{1},\bm{z}_{2},\ldots,\bm{z}_{N}\in\mathbb{R}^{d}$ such that the $N\times N$ matrix $A_{N}$ , whose entries are given by $(A_{N})_{ij}:=\bm{z}_{i}^{\bm{b}_{j}}$ , is invertible.

Proof Let us fixed a $\bm{z}=(p_{1},\dots,p_{d})\in\mathbb{R}^{d}$ , which will be determined later. Define $\bm{z}_{i}=(p_{1}^{i-1},\ldots,p_{d}^{i-1})$ and $q_{i}=\bm{z}^{\bm{b}_{i}}$ for all $i\in[N]$ . From these choices, $A_{N}^{\top}$ is a Vandermonde matrix, and so $(\det A_{N})(\bm{z})=\prod_{1\leq i<j\leq N}(q_{j}-q_{i})$ . This vanishes if and only if $q_{i}=q_{j}$ for some $i\neq j$ , and since we want $A_{N}$ to be invertible, we need to choose $\bm{z}$ such that $\prod_{1\leq i<j\leq N}(q_{j}-q_{i})\neq 0$ . It is sufficient to choose $\bm{z}$ such that $\prod_{1\leq i<j\leq N}(q_{j}-q_{i})$ is not identically the zero polynomial in $\mathbb{R}[\bm{z}]$ .

The polynomial $(\det A_{N})(\bm{z})$ can be identically zero if and only if at least one of its factors is identically zero polynomial, i.e. $q_{j}=q_{i}$ (as polynomials) for some $i\neq j$ . (Recall that $\mathbb{R}[\bm{z}]$ is an integral domain (see appendix A). This means that $fg=0$ for some $f,g\in\mathbb{R}[\bm{z}]$ if and only if $f=0$ or $g=0$ .) However, given that every $\bm{b}_{i}$ in the enumeration are distinct from each other, we must have $q_{i}=\bm{z}^{\bm{b}_{i}}\neq\bm{z}^{\bm{b}_{j}}=q_{j}$ (as polynomials), whenever $i\neq j$ . It means that there is at least one choice of $\bm{z}\in\mathbb{R}^{d}$ such that $(\det A_{N})(\bm{z})\neq 0$ .
Let us choose and fix values of $\bm{z}_{1},\bm{z}_{2},\ldots,\bm{z}_{N}$ to get an invertible matrix $A_{N}$ as in Lemma 30. Then for fixed $k$ and $n$ using (29) we obtain a system of equations, written as $Y=A_{N}X$ , where $Y$ is the vector formed by the polynomials $\{v_{k,n,\bm{z}_{i}}\}_{i=1}^{N}$ , and $X$ is the vector formed by the polynomials $\{p_{k,n,\bm{b}_{i}}\}_{i=1}^{N}$ . Since every entry of $Y$ is in the ideal $\mathcal{I}_{n,d}$ (by Claim B.1), then every entry of the vector $X=A_{N}^{-1}Y$ is also in the ideal $\mathcal{I}_{n,d}$ . This finishes the proof that each $p_{k,n,\bm{b}}\in\mathcal{I}_{n,d}$ .

Next, notice that the leading monomial of $p_{k,n,\bm{b}}$ based on our lexicographic ordering is $\bm{x}_{k}^{\bm{b}}=x_{k1}^{b_{1}}x_{k2}^{b_{2}}\cdots x_{kd}^{b_{d}}$ , with a coefficient

C_{k,\bm{b}}=\frac{k!}{b_{1}!\cdots b_{d}!}.

This is true because the leading monomial requires $a_{k}=k$ , which implies that $a_{j}=0$ for $k<j\leq n$ . Finally, we can now prove our intended result:

Lemma 31

Let $V=\mathbb{R}^{n}\otimes\mathbb{R}^{d}$ . If $f$ is a homogeneous polynomial in $\mathcal{O}(V)$ of degree $r>\binom{n}{2}$ , then $f\in\mathcal{O}(V)\mathcal{O}_{+}(V)^{\mathcal{S}_{n}}$ .

Proof We prove this by contradiction. Assume that there is at least one homogeneous polynomial of degree $r>\binom{n}{2}$ which is not in $\mathcal{I}_{n,d}=\mathcal{O}(V)\mathcal{O}_{+}(V)^{\mathcal{S}_{n}}$ . Among all such homogenous polynomials of degree $r$ , choose $f$ such that its leading monomial is minimal with respect to our lexicographic ordering. Suppose the leading monomial of $f$ is

c\cdot\bm{x}_{1}^{\bm{a}_{1}}\bm{x}_{2}^{\bm{a}_{2}}\cdots\bm{x}_{n}^{\bm{a}_{% n}},

where $|\bm{a}_{1}|+\ldots+|\bm{a}_{n}|=r=\text{deg}(f)$ , and $c\in\mathbb{R}$ . We note that there must exist a first $k\in[n]$ such that $|\bm{a}_{k}|\geq k$ (otherwise, if $|a_{j}|<j$ for all $j\in[n]$ , then $r=|\bm{a}_{1}|+\ldots+|\bm{a}_{n}|\leq\sum_{j=1}^{n}(j-1)=\binom{n}{2}$ , contradicting our initial assumption on $\text{deg}(f)$ ). We fix this $k\in[n]$ and write $\bm{x}_{k}^{\bm{a}_{k}}=\bm{x}_{k}^{\bm{\alpha}}\bm{x}_{k}^{\bm{\beta}}$ , for some $\bm{\alpha},\bm{\beta}\in\mathbb{Z}_{\geq 0}^{d}$ with $|\bm{\beta}|=k$ . Finally, let us consider the following polynomial:

g=\frac{c\cdot\bm{x}_{1}^{\bm{a}_{1}}\bm{x}_{2}^{\bm{a}_{2}}\cdots\bm{x}_{n}^{% \bm{a}_{n}}}{C_{k,\bm{\beta}}\cdot\bm{x}_{k}^{\bm{\alpha}}\bm{x}_{k}^{\bm{% \beta}}}\bm{x}_{k}^{\bm{\alpha}}p_{k,n,\bm{\beta}},

where the leading monomial of $\bm{x}_{k}^{\bm{\alpha}}p_{k,n,\bm{\beta}}$ is $C_{k,\bm{\beta}}\cdot\bm{x}_{k}^{\bm{\alpha}}\bm{x}_{k}^{\bm{\beta}}$ . Thus the leading monomial of $g$ is the leading monomial of $f$ . Since $p_{k,n,\bm{\beta}}\in\mathcal{I}_{n,d}$ , we have $g\in\mathcal{I}_{n,d}$ implying $0\neq f-g\notin\mathcal{I}_{n,d}$ . However, according to this construction, the leading monomial of $f-g$ is less than the leading monomial of $f$ by the lexicographic order. This contradicts the minimality for the leading monomial of $f$ .

Lemma 31 allows us to establish an upper bound on the maximum degree allowed for any polynomial in $H_{\mathcal{S}_{n}}(\mathbb{R}^{n}\otimes\mathbb{R}^{d})$ . This is the reason because $\mathcal{O}(V)=\mathcal{O}(V)\mathcal{O}_{+}(V)^{\mathcal{S}_{n}}\oplus H_{% \mathcal{S}_{n}}(V)$ where $V=\mathbb{R}^{n}\otimes\mathbb{R}^{d}$ , and $H_{\mathcal{S}_{n}}(\mathbb{R}^{n}\otimes\mathbb{R}^{d})$ is a homogenous subspace by Lemma 29(c). This immediately implies that $H_{\mathcal{S}_{n}}(\mathbb{R}^{n}\otimes\mathbb{R}^{d})$ is finite dimensional as a $\mathbb{R}$ -vector space where every subspace $H_{\mathcal{S}_{n}}(\mathbb{R}^{n}\otimes\mathbb{R}^{d})^{(j)}$ is spanned by monomials of degree $j$ for $j\leq\binom{n}{2}$ . This result is recorded in Lemma 32. A couple of remarks about Lemma 31 is needed:

(i)

The maximum degree $\binom{n}{2}$ of a polynomial in $H_{\mathcal{S}_{n}}(\mathbb{R}^{n}\otimes\mathbb{R}^{d})$ , does not depend on $d$ .
(ii)

Lemma 31 also holds for any polynomial $f$ whose lowest degree monomial is greater than $\binom{n}{2}$ . This can be shown by expressing $f$ in terms of its homogeneous components.

Lemma 32

$H_{\mathcal{S}_{n}}(\mathbb{R}^{n}\otimes\mathbb{R}^{d})$ is a finite dimensional $\mathbb{R}$ -vector space. The maximum degree of a polynomial in $H_{\mathcal{S}_{n}}(\mathbb{R}^{n}\otimes\mathbb{R}^{d})$ is at most $\binom{n}{2}$ .

B.3 Module structure of $n$ -antisymmetric polynomials in terms of $H_{\mathcal{S}_{n}}(\mathbb{R}^{n}\otimes\mathbb{R}^{d})$

In this section, we establish a connection between the set of $n$ -antisymmetric polynomials $\mathcal{O}(\mathbb{R}^{n}\otimes\mathbb{R}^{d})_{\chi}$ as a $\mathcal{O}(\mathbb{R}^{n}\otimes\mathbb{R}^{d})^{\mathcal{S}_{n}}$ -module and the subspace $H_{\mathcal{S}_{n}}(\mathbb{R}^{n}\otimes\mathbb{R}^{d})$ . This will allow us to quantify the minimum number of module generators of $\mathcal{O}(\mathbb{R}^{n}\otimes\mathbb{R}^{d})_{\chi}$ . We prove the first three results (Lemma 33, Proposition 34, and Lemma 35) in a general setting, where $V$ is an arbitrary vector space, $G$ of $GL(V)$ is a finite group, and $\chi$ is a character of a representation of $G$ . When $V=\mathbb{R}^{n}\otimes\mathbb{R}^{d}$ , $G=\mathcal{S}_{n}$ , and $\chi$ the character of the sign representation of $\mathcal{S}_{n}$ , Lemma 35 yields the main result of this subsection.

Lemma 33

For every $k\in\mathbb{N}$ , $\mathcal{O}(V)^{(k)}\subset\mathcal{O}(V)^{G}H_{G}(V)$ . Thus, $\mathcal{O}(V)=\mathcal{O}(V)^{G}H_{G}(V)$ .

Proof We will use strong induction on $k$ for the proof. If $k=0$ , then the assertion is true since the constant polynomials are contained in $H_{G}(V)$ by definition. Assume the assertion holds for all $k\leq l$ . Since $\mathcal{O}(V)=\mathcal{O}(V)\mathcal{O}_{+}(V)^{G}\oplus H_{G}(V)$ , by comparing homogeneous parts of degree $l+1$ on both sides, we can say $\mathcal{O}(V)^{(l+1)}=\big{(}\mathcal{O}(V)\mathcal{O}_{+}(V)^{G}\big{)}^{(l+% 1)}\oplus H_{G}(V)^{(l+1)}$ . Now, $H_{G}(V)^{(l+1)}\subset\mathcal{O}(V)^{G}H_{G}(V)$ as $H_{G}(V)$ is homogenous by Lemma 29(c), so we need to show that $\big{(}\mathcal{O}(V)\mathcal{O}_{+}(V)^{G}\big{)}^{(l+1)}\subset\mathcal{O}(V% )^{G}H_{G}(V)$ . Now clearly $\big{(}\mathcal{O}(V)^{(k)}\mathcal{O}_{+}(V)^{G}\big{)}^{(l+1)}=\{0\}$ if $k>l$ since minimum degree of any polynomial in $\mathcal{O}_{+}(V)^{G}$ is one. Therefore we conclude

\displaystyle\big{(}\mathcal{O}(V)\mathcal{O}_{+}(V)^{G}\big{)}^{(l+1)}=\sum_{% k\leq l}\big{(}\mathcal{O}(V)^{(k)}\mathcal{O}_{+}(V)^{G}\big{)}^{(l+1)}% \overset{(a)}{\subset}\sum_{k\leq l}\mathcal{O}(V)^{G}\mathcal{O}(V)^{(k)}% \overset{(b)}{\subset}\mathcal{O}(V)^{G}H_{G}(V).

In the equation above, $\big{(}\mathcal{O}(V)^{(k)}\mathcal{O}_{+}(V)^{G}\big{)}^{(l+1)}=\mathcal{O}(V% )^{(k)}\big{(}\mathcal{O}_{+}(V)^{G}\big{)}^{(j)}$ with $j+k=l+1$ . Hence, for all $j\leq l+1$ , $(\mathcal{O}_{+}(V)^{G})^{(j)}\subset\mathcal{O}_{+}(V)^{G}\subset\mathcal{O}(% V)^{G}$ because $\mathcal{O}_{+}(V)^{G}$ and $\mathcal{O}(V)^{G}$ are both homogeneous subspaces by Lemma 29(c). This explains the inclusion $(a)$ . Then $(b)$ follows from the inductive hypothesis. Hence, we are done.

Proposition 34

$H_{G}(V)\cap\mathcal{O}(V)_{\chi}$ as an $\mathbb{R}$ -vector space has a homogeneous polynomial basis.

Proof

By Lemma 29(c), both $H_{G}(V)$ and $\mathcal{O}(V)_{\chi}$ are homogeneous subspaces of $\mathcal{O}(V)$ , which means $H_{G}(V)\cap\mathcal{O}(V)_{\chi}$ is also homogeneous. Then $\phi\in H_{G}(V)\cap\mathcal{O}(V)_{\chi}$ if and only if every homogeneous component of $\phi$ is in the intersection. Since every vector space has a basis, let $\mathcal{V}=\{\phi_{i}\}_{i\in\mathcal{I}}$ (for some index set $\mathcal{I}$ ) be a basis of $H_{G}(V)\cap\mathcal{O}(V)_{\chi}$ , and let $\{\psi_{ij}\}_{i\in\mathcal{I},j\in\mathbb{N}}$ be the set of all homogeneous components of all the elements in $\mathcal{V}$ , i.e. $\psi_{ij}=\phi_{i}^{(j)}$ . Then it is clear that $\{\psi_{ij}\}_{i\in\mathcal{I},j\in\mathbb{N}}$ is a spanning set for $H_{G}(V)\cap\mathcal{O}(V)_{\chi}$ . Since every spanning set of a vector space contains a basis, our proposition is proven.

Lemma 35

If $\chi$ is the character of a one-dimensional, real representation of $G$ , then $\mathcal{O}(V)_{\chi}=\mathcal{O}(V)^{G}(H_{G}(V)\cap\mathcal{O}(V)_{\chi})$ . In other words, $H_{G}(V)\cap\mathcal{O}(V)_{\chi}$ generates $O(V)_{\chi}$ as an $O(V)^{G}$ -module.

Proof Let $P_{\chi}:\mathcal{O}(V)\to\mathcal{O}(V)_{\chi}$ be the linear operator given by

\displaystyle(P_{\chi}f)(v)=\frac{1}{|G|}\sum_{s\in G}\chi(s)f(s.v),\;\;f\in% \mathcal{O}(V),v\in V.

Our first goal is to prove that $P_{\chi}$ is a projection operator. To make sure $P_{\chi}$ maps $\mathcal{O}(V)$ into $\mathcal{O}(V)_{\chi}$ , we need $\chi$ to be the character of a one-dimensional, real representation $\rho$ of $G$ , because then we have $\chi(g)\coloneqq\text{tr}(\rho(g))=\rho(g)$ for all $g\in G$ . Given that $G$ is a finite group, any $g\in G$ must have finite order, and hence $\rho(g)$ must be a real root of unity. So we must have $\rho(g)=\chi(g)=\chi(g^{-1})=\rho(g^{-1})$ . In that case, for any $g\in G$ , we have for all $v\in V$ ,

	$\displaystyle(P_{\chi}f)(g.v)$	$\displaystyle=\frac{1}{\|G\|}\sum_{s\in G}\chi(s)f(s.g.v)=\frac{1}{\|G\|}\sum_{h% \in G}\chi(hg^{-1})f(h.v)=\frac{1}{\|G\|}\sum_{h\in G}\rho(hg^{-1})f(h.v)$
		$\displaystyle=\frac{1}{\|G\|}\sum_{h\in G}\rho(h)\rho(g^{-1})f(h.v)=\frac{\rho(g% )}{\|G\|}\sum_{h\in G}\chi(h)f(h.v)=\chi(g)(P_{\chi}f)(v).$

Next, if $f\in\mathcal{O}(V)_{\chi}$ , then $(P_{\chi}f)(v)=\frac{1}{|G|}\sum_{s\in G}\chi(s)f(s.v)=\frac{1}{|G|}f(v)\sum_{% s\in G}\chi(s)^{2}=f(v)$ . The last equality follows as $\chi(s)$ is a real root of unity, giving us $\chi(s)^{2}=1$ . This completes the proof for $P_{\chi}$ being a projection operator.

Next, $P_{\chi}$ commutes with any $G$ -invariant, constant coefficient differential operator on $\mathcal{O}(V)$ , so $P_{\chi}H_{G}(V)=H_{G}(V)\cap\mathcal{O}(V)_{\chi}$ . By Lemma 33, any $f\in\mathcal{O}(V)_{\chi}$ can be written as $f=\sum_{i=1}^{M}u_{i}h_{i}$ , where $u_{i}\in\mathcal{O}(V)^{G}$ and $h_{i}\in H_{G}(V)$ for all $i$ . Applying $P_{\chi}$ , we have for all $v\in V$ ,

	$\displaystyle f(v)=(P_{\chi}f)(v)$	$\displaystyle=\frac{1}{\|G\|}\sum_{s\in G}\chi(s)f(s.v)=\frac{1}{\|G\|}\sum_{s\in G% }\chi(s)\left(\sum_{i=1}^{M}u_{i}(s.v)h_{i}(s.v)\right)$
		$\displaystyle=\frac{1}{\|G\|}\sum_{s\in G}\chi(s)\left(\sum_{i=1}^{M}u_{i}(v)h_{% i}(s.v)\right)=\sum_{i=1}^{M}u_{i}(v)\left(\frac{1}{\|G\|}\sum_{s\in G}\chi(s)h_% {i}(s.v)\right)$
		$\displaystyle=\sum_{i=1}^{M}u_{i}(v)(P_{\chi}h_{i})(v),$

which completes the proof.

It is worth noting that Lemma 35 is also true when $\chi$ is the character of a one-dimensional, complex representation of $G$ and $V$ is $\mathbb{C}^{n}\otimes\mathbb{C}^{d}$ . In that case, we define $P_{\chi}$ as $P_{\chi}(f(v))=\frac{1}{|G|}\sum_{s\in G}\overline{\chi(s)}f(s\cdot v)$ . Similarly, we will have $\overline{\chi(g)}=\chi(g^{-1})$ , and for $f\in\mathcal{O}(V)_{\chi}$ , we get $P_{\chi}(f(v))=\frac{1}{|G|}f(v)\sum_{s\in G}|\chi(s)|^{2}=f(v)$ .

Let us specialize the setting to our context: $V=\mathbb{R}^{n}\otimes\mathbb{R}^{d}$ , $G=\mathcal{S}_{n}$ , and $\chi=\text{sgn}$ . In Lemma 28, we prove that $\mathcal{O}(V)_{\text{sgn}}$ is a finitely generated $\mathcal{O}(V)^{\mathcal{S}_{n}}$ -module. Then in Lemma 35, we show that $H_{\mathcal{S}_{n}}(V)\cap\mathcal{O}(V)_{\text{sgn}}$ generates $O(V)_{\text{sgn}}$ as an $O(V)^{\mathcal{S}_{n}}$ -module. Thus, any basis of $H_{\mathcal{S}_{n}}(V)\cap\mathcal{O}(V)_{\text{sgn}}$ (as a $\mathbb{R}$ -vector space) can be taken to be the module generators of $\mathcal{O}(V)_{\text{sgn}}$ , and this basis can be taken to be homogenous by Proposition 34. Furthermore, Lemma 32 shows that this basis is finite dimensional, which agrees with Lemma 28. Thus, the dimension of the $\mathbb{R}$ -vector space $H_{\mathcal{S}_{n}}(V)\cap\mathcal{O}(V)_{\text{sgn}}$ gives us an upper bound on the minimum number of module generators for $\mathcal{O}(V)_{\text{sgn}}$ as an $O(V)^{\mathcal{S}_{n}}$ -module. In fact, as the following lemma shows, it is exactly equal to the minimum number of module generators.

Lemma 36

Let $V=\mathbb{R}^{n}\otimes\mathbb{R}^{d}$ , $G=\mathcal{S}_{n}$ and $\chi=\text{sgn}$ . Then the minimum number of generators needed to generate $\mathcal{O}(V)_{\text{sgn}}$ as a module over $\mathcal{O}(V)^{\mathcal{S}_{n}}$ is $\text{dim}(H_{\mathcal{S}_{n}}(V)\cap\mathcal{O}(V)_{\text{sgn}})$ .

Proof If $\{f_{1},\ldots,f_{r}\}$ is a minimum sized set of generators for $\mathcal{O}(V)_{\text{sgn}}$ as an $\mathcal{O}(V)^{\mathcal{S}_{n}}$ -module, then by the minimality assumption, they must be linearly independent. Let $W$ be their linear span. We will first show that $\mathcal{O}(V)_{\text{sgn}}=W+\mathcal{O}(V)^{\mathcal{S}_{n}}_{+}W$ . Suppose $\phi\in\mathcal{O}(V)_{\text{sgn}}$ . Then $\phi=\sum_{i=1}^{r}u_{i}f_{i}$ , where $u_{i}\in\mathcal{O}(V)^{\mathcal{S}_{n}}$ for every $i$ . We write each of the $u_{i}$ in terms of its homogeneous components as $u_{i}=\sum_{j\geq 0}u_{i}^{(j)}$ . Hence, we get $\phi=\sum_{i=1}^{r}u_{i}^{(0)}f_{i}+\sum_{i=1}^{r}(\sum_{j\geq 1}u_{i}^{(j)})f% _{i}$ ; in the expression, the first term is in $W$ , and the second term is in $\mathcal{O}(V)^{\mathcal{S}_{n}}_{+}W$ , because $u_{i}^{(j)}\in\mathcal{O}(V)^{\mathcal{S}_{n}}$ for all $i\in[r]$ , and $j\geq 1$ by Lemma 29(c). One can easily verify that $\mathcal{O}(V)^{\mathcal{S}_{n}}_{+}W=\mathcal{O}(V)^{\mathcal{S}_{n}}_{+}% \mathcal{O}(V)_{\text{sgn}}$ . Thus,

\text{dim}(\mathcal{O}(V)_{\text{sgn}}/\mathcal{O}(V)^{\mathcal{S}_{n}}_{+}% \mathcal{O}(V)_{\text{sgn}})\leq r.

(30)

We next claim that

\mathcal{O}(V)_{\text{sgn}}=\mathcal{O}(V)^{\mathcal{S}_{n}}_{+}\mathcal{O}(V)% _{\text{sgn}}\oplus H_{\mathcal{S}_{n}}(V)\cap\mathcal{O}(V)_{\text{sgn}}.

(31)

We prove this claim at the end, but notice that this claim implies the result. This is true because from (30) and (31), we have $\text{dim}(H_{\mathcal{S}_{n}}(V)\cap\mathcal{O}(V)_{\text{sgn}})\leq r$ ; however, also by the minimality assumption on $\{f_{1},\dots,f_{r}\}$ , we get from the comments immediately preceding this lemma, that $r\leq\text{dim}(H_{\mathcal{S}_{n}}(V)\cap\mathcal{O}(V)_{\text{sgn}})$ .

It remains to prove the claim (31). We first note that $\mathcal{O}(V)^{\mathcal{S}_{n}}_{+}\mathcal{O}(V)_{\text{sgn}}$ is a subspace of $\mathcal{O}(V)_{\text{sgn}}$ . Let $Z$ be the subspace of $\mathcal{O}(V)_{\text{sgn}}$ orthogonal to $\mathcal{O}(V)^{\mathcal{S}_{n}}_{+}\mathcal{O}(V)_{\text{sgn}}$ with respect to the inner product $\langle\cdot,\cdot\rangle_{\mathcal{O}(V)}$ , so that we have the direct sum decomposition $\mathcal{O}(V)_{\text{sgn}}=\mathcal{O}(V)^{\mathcal{S}_{n}}_{+}\mathcal{O}(V)% _{\text{sgn}}\oplus Z$ . We will show that $Z=H_{\mathcal{S}_{n}}(V)\cap\mathcal{O}(V)_{\text{sgn}}$ . Let us make two important observations:

(i)

For any $p\in\mathcal{O}(V)$ , let us denote $D_{p}$ as the constant coefficient differential operator obtained by replacing each $x_{ij}$ in $p$ by $\frac{\partial}{\partial x_{ij}}$ for all $i\in[n]$ , $j\in[d]$ . Then for all $f,g\in\mathcal{O}(V)$ , we have $\langle D_{p}f,g\rangle_{\mathcal{O}(V)}=\langle f,pg\rangle_{\mathcal{O}(V)}$ for all $p\in\mathcal{O}(V)$ (cf. Wallach, 2017, Lemma 3.105).
(ii)

By Lemma 13, any $p\in\mathcal{O}(V)^{\mathcal{S}_{n}}_{+}$ and $f\in\mathcal{O}(V)_{\text{sgn}}$ , $D_{p}f\in\mathcal{O}(V)_{\text{sgn}}$ .

Now, for all $f\in Z$ and $p\in\mathcal{O}(V)^{\mathcal{S}_{n}}_{+}$ , $\langle D_{p}f,D_{p}f\rangle_{\mathcal{O}(V)}=\langle f,pD_{p}f\rangle_{% \mathcal{O}(V)}=0$ using observations (i) and (ii); hence, $D_{p}f=0$ , or equivalently $f\in H_{\mathcal{S}_{n}}(V)\cap\mathcal{O}(V)_{\text{sgn}}$ . This shows $Z\subseteq H_{\mathcal{S}_{n}}(V)\cap\mathcal{O}(V)_{\text{sgn}}$ . For the converse, assume that $f\in H_{\mathcal{S}_{n}}(V)\cap\mathcal{O}(V)_{\text{sgn}}$ , and we want to show $f\in Z$ . Take any $g\in\mathcal{O}(V)^{\mathcal{S}_{n}}_{+}\mathcal{O}(V)_{\text{sgn}}$ , i.e. it can be expressed as a finite sum $g=\sum_{i}u_{i}g_{i}$ , where $u_{i}\in\mathcal{O}(V)^{\mathcal{S}_{n}}_{+}$ and $g_{i}\in\mathcal{O}(V)_{\text{sgn}}$ for every $i$ . By observation (i),

\langle f,g\rangle_{\mathcal{O}(V)}=\sum_{i}\langle f,u_{i}g_{i}\rangle_{% \mathcal{O}(V)}=\sum_{i}\langle D_{u_{i}}f,g_{i}\rangle_{\mathcal{O}(V)}=0,

where the last equality follows because $u_{i}\in\mathcal{O}(V)^{\mathcal{S}_{n}}_{+}$ implies that $D_{u_{i}}$ is an $\mathcal{S}_{n}$ -invariant, constant coefficient differential operator that annihilates polynomials in $H_{\mathcal{S}_{n}}(V)$ . This proves $f\in Z$ .

Finally, we establish the main theorem using the results in this appendix:

Theorem 37

Let $V=\mathbb{R}^{n}\otimes\mathbb{R}^{d}$ , $G=\mathcal{S}_{n}$ , and $\chi$ be the sign representation of $\mathcal{S}_{n}$ . Then $\mathcal{O}(V)_{\text{sgn}}=\mathcal{O}(V)^{\mathcal{S}_{n}}(H_{\mathcal{S}_{n% }}(V)\cap\mathcal{O}(V)_{\text{sgn}})$ , i.e. $H_{\mathcal{S}_{n}}(V)\cap\mathcal{O}(V)_{\text{sgn}}$ generates $O(V)_{\text{sgn}}$ as an $O(V)^{\mathcal{S}_{n}}$ -module. The maximum degree of any polynomial in $H_{\mathcal{S}_{n}}(V)\cap\mathcal{O}(V)_{\text{sgn}}$ is $\binom{n}{2}$ . The minimum number of module generators of $\mathcal{O}(V)_{\text{sgn}}$ is equal to $\dim(H_{\mathcal{S}_{n}}(V)\cap\mathcal{O}(V)_{\text{sgn}})$ , as a $\mathbb{R}$ -vector space, and the module generators can be chosen to be a homogenous polynomial basis of $H_{\mathcal{S}_{n}}(V)\cap\mathcal{O}(V)_{\text{sgn}}$ .

References

Abrahamsen and Lin (2023) Nilin Abrahamsen and Lin Lin. Anti-symmetric barron functions and their approximation with sums of determinants, 2023.
Adeel et al. (2020) Ahsan Adeel, Mandar Gogate, and Amir Hussain. Contextual deep learning-based audio-visual switching for speech enhancement in real-world environments. Information Fusion, 59:163–170, 2020. ISSN 1566-2535. doi: https://doi.org/10.1016/j.inffus.2019.08.008. URL https://www.sciencedirect.com/science/article/pii/S1566253518306018.
Alzubaidi et al. (2021) Laith Alzubaidi, Jinglan Zhang, Amjad J. Humaidi, Ayad Al-Dujaili, Ye Duan, Omran Al-Shamma, J. Santamaría, Mohammed A. Fadhel, Muthana Al-Amidie, and Laith Farhan. Review of deep learning: concepts, cnn architectures, challenges, applications, future directions. Journal of Big Data, 8(1):53, Mar 2021. ISSN 2196-1115. doi: 10.1186/s40537-021-00444-8. URL https://doi.org/10.1186/s40537-021-00444-8.
Asif et al. (2021) Nurul A. Asif, Yeahia Sarker, Ripon K. Chakrabortty, Michael J. Ryan, Md. Hafiz Ahamed, Dip K. Saha, Faisal R. Badal, Sajal K. Das, Md. Firoz Ali, Sumaya I. Moyeen, Md. Robiul Islam, and Zinat Tasneem. Graph neural network: A comprehensive review on non-euclidean space. IEEE Access, 9:60588–60606, 2021. doi: 10.1109/ACCESS.2021.3071274.
Atiyah and Macdonald (1969) M. F. Atiyah and I. G. Macdonald. Introduction to commutative algebra. Addison-Wesley Publishing Co., Reading, Mass.-London-Don Mills, Ont., 1969.
Barron (1993) A.R. Barron. Universal approximation bounds for superpositions of a sigmoidal function. IEEE Transactions on Information Theory, 39(3):930–945, 1993. doi: 10.1109/18.256500.
Bergeron (2009) François Bergeron. Algebraic combinatorics and coinvariant spaces. CMS Treatises in Mathematics. Canadian Mathematical Society, Ottawa, ON; A K Peters, Ltd., Wellesley, MA, 2009. ISBN 978-1-56881-324-0. doi: 10.1201/b10583. URL https://doi.org/10.1201/b10583.
Briand (2004) Emmanuel Briand. When is the algebra of multisymmetric polynomials generated by the elementary multisymmetric polynomials? Beiträge Algebra Geom., 45(2):353–368, 2004. ISSN 0138-4821.
Bronstein et al. (2021) Michael M Bronstein, Joan Bruna, Taco Cohen, and Petar Veličković. Geometric deep learning: Grids, groups, graphs, geodesics, and gauges. arXiv preprint arXiv:2104.13478, 2021.
Cauchy (1905) Augustin Louis Cauchy. Œuvres complètes. Series 2. Volume 1. Cambridge Library Collection. Cambridge University Press, Cambridge, 1905. ISBN 978-1-108-00290-5. Reprint of the 1905 original.
Cayton (2004) Lawrence Cayton. Algorithms for manifold learning. Technical Report CS2008-0923, UCSD, 2004.
Chen et al. (2023) Chongyao Chen, Ziang Chen, and Jianfeng Lu. Representation theorem for multivariable totally symmetric functions, 2023.
Chen and Lu (2023) Ziang Chen and Jianfeng Lu. Exact and efficient representation of totally anti-symmetric functions, 2023.
Choo et al. (2020) Kenny Choo, Antonio Mezzacapo, and Giuseppe Carleo. Fermionic neural-network states for ab-initio electronic structure. Nature Communications, 11(1), May 2020. ISSN 2041-1723. doi: 10.1038/s41467-020-15724-9. URL http://dx.doi.org/10.1038/s41467-020-15724-9.
Cybenko (1989) George V. Cybenko. Approximation by superpositions of a sigmoidal function. Mathematics of Control, Signals and Systems, 2:303–314, 1989. URL https://api.semanticscholar.org/CorpusID:3958369.
Deng and Liu (2018) Li Deng and Yang Liu. A Joint Introduction to Natural Language Processing and to Deep Learning. Springer Singapore, Singapore, 2018. ISBN 978-981-10-5209-5. doi: 10.1007/978-981-10-5209-5. URL https://doi.org/10.1007/978-981-10-5209-5.
Deng et al. (2013) Li Deng, Jinyu Li, Jui-Ting Huang, Kaisheng Yao, Dong Yu, Frank Seide, Michael Seltzer, Geoff Zweig, Xiaodong He, Jason Williams, Yifan Gong, and Alex Acero. Recent advances in deep learning for speech research at microsoft. In 2013 IEEE International Conference on Acoustics, Speech and Signal Processing, pages 8604–8608, 2013. doi: 10.1109/ICASSP.2013.6639345.
Entezari et al. (2022) Rahim Entezari, Hanie Sedghi, Olga Saukh, and Behnam Neyshabur. The role of permutation invariance in linear mode connectivity of neural networks, 2022.
Espi et al. (2015) Miquel Espi, Masakiyo Fujimoto, Keisuke Kinoshita, and Tomohiro Nakatani. Exploiting spectro-temporal locality in deep learning based acoustic event detection. EURASIP Journal on Audio, Speech, and Music Processing, 2015(1):26, Sep 2015. ISSN 1687-4722. doi: 10.1186/s13636-015-0069-2. URL https://doi.org/10.1186/s13636-015-0069-2.
Garbin et al. (2020) Christian Garbin, Xingquan Zhu, and Oge Marques. Dropout vs. batch normalization: an empirical study of their impact to deep learning. Multimedia Tools and Applications, 79(19):12777–12815, May 2020. ISSN 1573-7721. doi: 10.1007/s11042-019-08453-9. URL https://doi.org/10.1007/s11042-019-08453-9.
Garsia and Haiman (1996) A. M. Garsia and M. Haiman. A remarkable $q,t$ -Catalan sequence and $q$ -Lagrange inversion. J. Algebraic Combin., 5(3):191–244, 1996. ISSN 0925-9899,1572-9192. doi: 10.1023/A:1022476211638. URL https://doi.org/10.1023/A:1022476211638.
Goodfellow et al. (2014) Ian Goodfellow, Yaroslav Bulatov, Julian Ibarz, Sacha Arnoud, and Vinay Shet. Multi-digit number recognition from street view imagery using deep convolutional neural networks. In ICLR2014, 2014.
Haglund et al. (2004) J. Haglund, M. Haiman, N. Loehr, J. B. Remmel, and A. Ulyanov. A combinatorial formula for the character of the diagonal coinvariants, 2004.
Haiman (2003) Mark Haiman. Combinatorics, symmetric functions, and Hilbert schemes. In Current developments in mathematics, 2002, pages 39–111. Int. Press, Somerville, MA, 2003. ISBN 1-57146-102-7.
Haiman (1994) Mark D. Haiman. Conjectures on the quotient ring by diagonal invariants. Journal of Algebraic Combinatorics, 3:17–76, 1994. URL https://api.semanticscholar.org/CorpusID:16526954.
Han et al. (2019) Jiequn Han, Linfeng Zhang, and Weinan E. Solving many-electron schrödinger equation using deep neural networks. Journal of Computational Physics, 399:108929, December 2019. ISSN 0021-9991. doi: 10.1016/j.jcp.2019.108929. URL http://dx.doi.org/10.1016/j.jcp.2019.108929.
Han et al. (2022) Jiequn Han, Yingzhou Li, Lin Lin, Jianfeng Lu, Jiefu Zhang, and Linfeng Zhang. Universal approximation of symmetric and anti-symmetric functions, 2022.
Hermann et al. (2020) Jan Hermann, Zeno Schätzle, and Frank Noé. Deep-neural-network solution of the electronic schrödinger equation. Nature Chemistry, 12(10):891–897, September 2020. ISSN 1755-4349. doi: 10.1038/s41557-020-0544-y. URL http://dx.doi.org/10.1038/s41557-020-0544-y.
Hornik (1991) Kurt Hornik. Approximation capabilities of multilayer feedforward networks. Neural Networks, 4(2):251–257, 1991. ISSN 0893-6080. doi: https://doi.org/10.1016/0893-6080(91)90009-T. URL https://www.sciencedirect.com/science/article/pii/089360809190009T.
Hutter (2020) Marcus Hutter. On representing (anti)symmetric functions, 2020.
Jegelka (2022) Stefanie Jegelka. Theory of graph neural networks: Representation and learning, 2022.
Kane (2001) Richard Kane. Reflection groups and invariant theory, volume 5 of CMS Books in Mathematics/Ouvrages de Mathématiques de la SMC. Springer-Verlag, New York, 2001. ISBN 0-387-98979-X. doi: 10.1007/978-1-4757-3542-0. URL https://doi.org/10.1007/978-1-4757-3542-0.
Khurana et al. (2021) Lokesh Khurana, Arun Chauhan, Mohd Naved, and Prabhishek Singh. Speech recognition with deep learning. Journal of Physics: Conference Series, 1854(1):012047, apr 2021. doi: 10.1088/1742-6596/1854/1/012047. URL https://dx.doi.org/10.1088/1742-6596/1854/1/012047.
Klus et al. (2021) Stefan Klus, Patrick Gelß, Feliks Nüske, and Frank Noé. Symmetric and antisymmetric kernels for machine learning problems in quantum physics and chemistry. Machine Learning: Science and Technology, 2(4):045016, August 2021. ISSN 2632-2153. doi: 10.1088/2632-2153/ac14ad. URL http://dx.doi.org/10.1088/2632-2153/ac14ad.
Krizhevsky et al. (2012) Alex Krizhevsky, Ilya Sutskever, and Geoffrey E Hinton. Imagenet classification with deep convolutional neural networks. In F. Pereira, C.J. Burges, L. Bottou, and K.Q. Weinberger, editors, Advances in Neural Information Processing Systems, volume 25. Curran Associates, Inc., 2012. URL https://proceedings.neurips.cc/paper_files/paper/2012/file/c399862d3b9d6b76c8436e924a68c45b-Paper.pdf.
Laird and Saul (1994) P. Laird and R. Saul. Automated feature extraction for supervised learning. In Proceedings of the First IEEE Conference on Evolutionary Computation. IEEE World Congress on Computational Intelligence, pages 674–679 vol.2, 1994. doi: 10.1109/ICEC.1994.349977.
Lauriola et al. (2022) Ivano Lauriola, Alberto Lavelli, and Fabio Aiolli. An introduction to deep learning in natural language processing: Models, techniques, and tools. Neurocomputing, 470:443–456, 2022. ISSN 0925-2312. doi: https://doi.org/10.1016/j.neucom.2021.05.103. URL https://www.sciencedirect.com/science/article/pii/S0925231221010997.
Li et al. (2022) Zewen Li, Fan Liu, Wenjie Yang, Shouheng Peng, and Jun Zhou. A survey of convolutional neural networks: Analysis, applications, and prospects. IEEE Transactions on Neural Networks and Learning Systems, 33(12):6999–7019, 2022. doi: 10.1109/TNNLS.2021.3084827.
Loshchilov and Hutter (2019) Ilya Loshchilov and Frank Hutter. Decoupled weight decay regularization. In International Conference on Learning Representations, 2019. URL https://openreview.net/forum?id=Bkg6RiCqY7.
Luo and Clark (2019) Di Luo and Bryan K. Clark. Backflow transformations via neural networks for quantum many-body wave functions. Physical Review Letters, 122(22), June 2019. ISSN 1079-7114. doi: 10.1103/physrevlett.122.226401. URL http://dx.doi.org/10.1103/PhysRevLett.122.226401.
Matsumura (1989) Hideyuki Matsumura. Commutative ring theory, volume 8 of Cambridge Studies in Advanced Mathematics. Cambridge University Press, Cambridge, second edition, 1989. ISBN 0-521-36764-6. Translated from the Japanese by M. Reid.
Maziarka et al. (2019) Łukasz Maziarka, Marek Śmieja, Aleksandra Nowak, Jacek Tabor, Łukasz Struski, and Przemysław Spurek. Set Aggregation Network as a Trainable Pooling Layer, page 419–431. Springer International Publishing, 2019. ISBN 9783030367114. doi: 10.1007/978-3-030-36711-4˙35. URL http://dx.doi.org/10.1007/978-3-030-36711-4_35.
Mora and Sala (2003) Teo Mora and Massimiliano Sala. On the gröbner bases of some symmetric systems and their application to coding theory. Journal of Symbolic Computation, 35(2):177–194, 2003. ISSN 0747-7171. doi: https://doi.org/10.1016/S0747-7171(02)00131-1. URL https://www.sciencedirect.com/science/article/pii/S0747717102001311.
Muhammad et al. (2021) Khan Muhammad, Salman Khan, Javier Del Ser, and Victor Hugo C. de Albuquerque. Deep learning for multigrade brain tumor classification in smart healthcare systems: A prospective survey. IEEE Transactions on Neural Networks and Learning Systems, 32(2):507–522, 2021. doi: 10.1109/TNNLS.2020.2995800.
Murphy et al. (2019) Ryan L. Murphy, Balasubramaniam Srinivasan, Vinayak Rao, and Bruno Ribeiro. Janossy pooling: Learning deep permutation-invariant functions for variable-size inputs, 2019.
Nachbin (1949) Leopoldo Nachbin. Sur les algebres denses de fonctions différentiables sur une variété. Comptes Rendus de l’Académie des Sciences de Paris, 228:1549–1551, 1949.
Otter et al. (2021) Daniel W. Otter, Julian R. Medina, and Jugal K. Kalita. A survey of the usages of deep learning for natural language processing. IEEE Transactions on Neural Networks and Learning Systems, 32(2):604–624, 2021. doi: 10.1109/TNNLS.2020.2979670.
Pfau et al. (2020) David Pfau, James S. Spencer, Alexander G. D. G. Matthews, and W. M. C. Foulkes. Ab initio solution of the many-electron schrödinger equation with deep neural networks. Physical Review Research, 2(3), September 2020. ISSN 2643-1564. doi: 10.1103/physrevresearch.2.033429. URL http://dx.doi.org/10.1103/PhysRevResearch.2.033429.
Prolla and Guerreiro (1976) João B. Prolla and Claudia S. Guerreiro. An extension of nachbin’s theorem to differentiable functions on banach spaces with the approximation property. Arkiv för Matematik, 14:251–258, 1976. URL https://api.semanticscholar.org/CorpusID:120704067.
Qi et al. (2017a) Charles R. Qi, Hao Su, Kaichun Mo, and Leonidas J. Guibas. Pointnet: Deep learning on point sets for 3d classification and segmentation, 2017a.
Qi et al. (2017b) Charles R. Qi, Li Yi, Hao Su, and Leonidas J. Guibas. Pointnet++: Deep hierarchical feature learning on point sets in a metric space, 2017b.
Ramos-Pérez et al. (2021) Eduardo Ramos-Pérez, Pablo J. Alonso-González, and José Javier Núñez-Velázquez. Multi-transformer: A new neural network-based architecture for forecasting s&p volatility. Mathematics, 9(15), 2021. ISSN 2227-7390. doi: 10.3390/math9151794. URL https://www.mdpi.com/2227-7390/9/15/1794.
Sagan (2001) Bruce E. Sagan. The symmetric group, volume 203 of Graduate Texts in Mathematics. Springer-Verlag, New York, second edition, 2001. ISBN 0-387-95067-2. doi: 10.1007/978-1-4757-6804-6. URL https://doi.org/10.1007/978-1-4757-6804-6. Representations, combinatorial algorithms, and symmetric functions.
Shao et al. (2019) Taihua Shao, Yupu Guo, Honghui Chen, and Zepeng Hao. Transformer-based neural network for answer selection in question answering. IEEE Access, 7:26146–26156, 2019. doi: 10.1109/ACCESS.2019.2900753.
Silver et al. (2017) David Silver, Julian Schrittwieser, Karen Simonyan, Ioannis Antonoglou, Aja Huang, Arthur Guez, Thomas Hubert, Lucas Baker, Matthew Lai, Adrian Bolton, Yutian Chen, Timothy Lillicrap, Fan Hui, Laurent Sifre, George van den Driessche, Thore Graepel, and Demis Hassabis. Mastering the game of go without human knowledge. Nature, 550(7676):354–359, Oct 2017. ISSN 1476-4687. doi: 10.1038/nature24270. URL https://doi.org/10.1038/nature24270.
Soelch et al. (2019) Maximilian Soelch, Adnan Akhundov, Patrick van der Smagt, and Justin Bayer. On deep set learning and the choice of aggregations. In Igor V. Tetko, Věra Kůrková, Pavel Karpov, and Fabian Theis, editors, Artificial Neural Networks and Machine Learning – ICANN 2019: Theoretical Neural Computation, pages 444–457, Cham, 2019. Springer International Publishing. ISBN 978-3-030-30487-4.
Soffer et al. (2019) Shelly Soffer, Avi Ben-Cohen, Orit Shimon, Michal Marianne Amitai, Hayit Greenspan, and Eyal Klang. Convolutional neural networks for radiologic images: A radiologist’s guide. Radiology, 290(3):590–606, 2019. doi: 10.1148/radiol.2018180547. URL https://doi.org/10.1148/radiol.2018180547. PMID: 30694159.
Stokes et al. (2020) James Stokes, Javier Robledo Moreno, Eftychios A. Pnevmatikakis, and Giuseppe Carleo. Phases of two-dimensional spinless lattice fermions with first-quantized deep neural-network quantum states. Physical Review B, 102(20), November 2020. ISSN 2469-9969. doi: 10.1103/physrevb.102.205122. URL http://dx.doi.org/10.1103/PhysRevB.102.205122.
Tian et al. (2020) Haiman Tian, Shu-Ching Chen, and Mei-Ling Shyu. Evolutionary programming based deep learning feature selection and network construction for visual data classification. Information Systems Frontiers, 22(5):1053–1066, Oct 2020. ISSN 1572-9419. doi: 10.1007/s10796-020-10023-6. URL https://doi.org/10.1007/s10796-020-10023-6.
Tjandra et al. (2017) Andros Tjandra, Sakriani Sakti, and Satoshi Nakamura. Listening while speaking: Speech chain by deep learning. In 2017 IEEE Automatic Speech Recognition and Understanding Workshop (ASRU), pages 301–308, 2017. doi: 10.1109/ASRU.2017.8268950.
Wagstaff et al. (2021) Edward Wagstaff, Fabian B. Fuchs, Martin Engelcke, Michael A. Osborne, and Ingmar Posner. Universal approximation of functions on sets, 2021.
Wallach (2021) Nolan Wallach. The representation of GL(k) on the alternants of minimal degree for the diagonal action of Sn on k copies of the permutation representation. 09 2021. URL https://www.researchgate.net/publication/354389796_The_representation_of_GLk_on_the_alternants_of_minimal_degree_for_the_diagonal_action_of_S_n_on_k_copies_of_the_the_permutation_representation.
Wallach (2017) Nolan R. Wallach. Geometric invariant theory. Universitext. Springer, Cham, 2017. ISBN 978-3-319-65905-3; 978-3-319-65907-7. doi: 10.1007/978-3-319-65907-7. URL https://doi.org/10.1007/978-3-319-65907-7. Over the real and complex numbers.
Wan et al. (2021) Liangtian Wan, Yuchen Sun, Lu Sun, Zhaolong Ning, and Joel J. P. C. Rodrigues. Deep learning based autonomous vehicle super resolution doa estimation for safety driving. IEEE Transactions on Intelligent Transportation Systems, 22(7):4301–4315, 2021. doi: 10.1109/TITS.2020.3009223.
Xie et al. (2022) Zeke Xie, Issei Sato, and Masashi Sugiyama. Understanding and scheduling weight decay, 2022. URL https://openreview.net/forum?id=J7V_4aauV6B.
Xu et al. (2019) Keyulu Xu, Weihua Hu, Jure Leskovec, and Stefanie Jegelka. How powerful are graph neural networks?, 2019.
Zaheer et al. (2018) Manzil Zaheer, Satwik Kottur, Siamak Ravanbakhsh, Barnabas Poczos, Ruslan Salakhutdinov, and Alexander Smola. Deep sets, 2018.
Zeleznik et al. (2021) Roman Zeleznik, Borek Foldyna, Parastou Eslami, Jakob Weiss, Ivanov Alexander, Jana Taron, Chintan Parmar, Raza M. Alvi, Dahlia Banerji, Mio Uno, Yasuka Kikuchi, Julia Karady, Lili Zhang, Jan-Erik Scholtz, Thomas Mayrhofer, Asya Lyass, Taylor F. Mahoney, Joseph M. Massaro, Ramachandran S. Vasan, Pamela S. Douglas, Udo Hoffmann, Michael T. Lu, and Hugo J. W. L. Aerts. Deep convolutional neural networks to predict cardiovascular risk from computed tomography. Nature Communications, 12(1):715, Jan 2021. ISSN 2041-1723. doi: 10.1038/s41467-021-20966-2. URL https://doi.org/10.1038/s41467-021-20966-2.
Zhang and Zhang (2021) Jie-Fang Zhang and Zhengya Zhang. Point-x: A spatial-locality-aware architecture for energy-efficient graph-based point-cloud deep learning. In MICRO-54: 54th Annual IEEE/ACM International Symposium on Microarchitecture, MICRO ’21, page 1078–1090, New York, NY, USA, 2021. Association for Computing Machinery. ISBN 9781450385572. doi: 10.1145/3466752.3480081. URL https://doi.org/10.1145/3466752.3480081.
Zhang et al. (2018a) Linfeng Zhang, Jiequn Han, Han Wang, Roberto Car, and Weinan E. Deep potential molecular dynamics: A scalable model with the accuracy of quantum mechanics. Physical Review Letters, 120(14), April 2018a. ISSN 1079-7114. doi: 10.1103/physrevlett.120.143001. URL http://dx.doi.org/10.1103/PhysRevLett.120.143001.
Zhang et al. (2018b) Linfeng Zhang, Jiequn Han, Han Wang, Wissam A. Saidi, Roberto Car, and Weinan E. End-to-end symmetry preserving inter-atomic potential energy model for finite and extended systems, 2018b.
Zhu et al. (2022) Zijiang Zhu, Zhenlong Hu, Weihuang Dai, Hang Chen, and Zhihan Lv. Deep learning for autonomous vehicle and pedestrian interaction safety. Safety Science, 145:105479, 2022. ISSN 0925-7535. doi: https://doi.org/10.1016/j.ssci.2021.105479. URL https://www.sciencedirect.com/science/article/pii/S0925753521003222.
Zweig and Bruna (2023) Aaron Zweig and Joan Bruna. Towards antisymmetric neural ansatz separation, 2023.

	$\displaystyle\left\\|f-P\right\\|_{\mathcal{C}^{k}}=\sum_{\|\alpha\|\leq k}\max_{% \bm{x}\in\Omega^{n}}\left\|D^{\alpha}\left(f-P\right)\right\|(\bm{x})$
	$\displaystyle=\sum_{\|\alpha\|\leq k}\max_{\bm{x}\in\Omega^{n}}\left\|D^{\alpha}% \left(\frac{1}{n!}\sum_{\sigma\in\mathcal{S}_{n}}\text{sgn}(\sigma)((f-\hat{P}% )\circ\sigma)(\bm{x})\right)\,\right\|$
	$\displaystyle=\sum_{\|\alpha\|\leq k}\max_{\bm{x}\in\Omega^{n}}\left\|\frac{1}{n!% }\sum_{\sigma\in\mathcal{S}_{n}}\text{sgn}(\sigma)D^{\alpha}((f-\hat{P})\circ% \sigma)(\bm{x})\,\right\|$
	$\displaystyle=\sum_{\|\alpha\|\leq k}\max_{\bm{x}\in\Omega^{n}}\left\|\frac{1}{n!% }\sum_{\sigma\in\mathcal{S}_{n}}\text{sgn}(\sigma)D^{\sigma.\alpha}(f-\hat{P})% (\sigma(\bm{x}))\,\right\|$
	$\displaystyle\leq\sum_{\|\alpha\|\leq k}\max_{\bm{x}\in\Omega^{n}}\frac{1}{n!}% \sum_{\sigma\in\mathcal{S}_{n}}\|D^{\sigma.\alpha}(f-\hat{P})\|(\sigma(\bm{x}))% \,\leq\sum_{\|\alpha\|\leq k}\frac{1}{n!}\sum_{\sigma\in\mathcal{S}_{n}}\max_{% \bm{x}\in\Omega^{n}}\|D^{\sigma.\alpha}(f-\hat{P})\|(\sigma(\bm{x}))\,$
	$\displaystyle=\sum_{\|\alpha\|\leq k}\frac{1}{n!}\sum_{\sigma\in\mathcal{S}_{n}}% \max_{\bm{x}\in\Omega^{n}}\|D^{\sigma.\alpha}(f-\hat{P})\|(\bm{x})\,=\frac{1}{n!% }\sum_{\sigma\in\mathcal{S}_{n}}\sum_{\|\alpha\|\leq k}\max_{\bm{x}\in\Omega^{n}% }\|D^{\sigma.\alpha}(f-\hat{P})\|(\bm{x})\,$
	$\displaystyle=\frac{1}{n!}\sum_{\sigma\in\mathcal{S}_{n}}\sum_{\|\alpha\|\leq k}% \max_{\bm{x}\in\Omega^{n}}\|D^{\alpha}(f-\hat{P})\|(\bm{x})\ <\epsilon.$

Uniform 𝒞ksuperscript𝒞𝑘\mathcal{C}^{k}caligraphic_C start_POSTSUPERSCRIPT italic_k end_POSTSUPERSCRIPT Approximation of G𝐺Gitalic_G-Invariant and Antisymmetric Functions, Embedding Dimensions, and Polynomial Representations