Learning Rate of Regularized Regression Associated with Zonal Translation Networks
Abstract
:1. Introduction
1.1. Kernel Regularized Learning
1.2. Marcinkiewicz-Zygmund Setting (MZS)
2. The Properties of the Translation Networks on the Unit Sphere
2.1. Density
2.2. MZS on the Unit Sphere
2.3. The Reproducing Property
3. Apply to Kernel Regularized Regression
3.1. Learning Framework
3.2. Error Decompositions
3.3. Convergence Rate for the K-Functional
3.4. The Learning Rate
3.5. Comments
4. Lemmas
5. Proof for Theorems and Propositions
Author Contributions
Funding
Data Availability Statement
Conflicts of Interest
References
- Krizhevsky, A.; Sutskever, I.; Hinton, G.E. ImageNet classification with deep convolutional neural networks. Commun. ACM 2017, 60, 84–90. [Google Scholar] [CrossRef]
- Wu, Y.; Schuster, M.; Chen, Z.; Le, Q.-V.; Norouzi, M.; Macherey, W.; Cao, Y.; Gao, Q. Google’s neural machine translation system: Bridging the gap between human and machine translation. arXiv 2016, arXiv:1609.08144. [Google Scholar]
- Alipanahi, B.; Delong, A.; Weirauch, M.T.; Frey, B.J. Predicting the sequence specificities of DNA-and RNA-binding proteins by deep learning. Nat. Biotechnol. 2015, 33, 831–838. [Google Scholar] [CrossRef] [PubMed]
- Chui, C.K.; Lin, S.-B.; Zhou, D.-X. Construction of neural networks for realization of localized deep learning. arXiv 2018, arXiv:1803.03503. [Google Scholar] [CrossRef]
- Chui, C.K.; Lin, S.-B.; Zhou, D.-X. Deep neural networks for rotation-invariance approximation and learning. Anal. Appl. 2019, 17, 737–772. [Google Scholar] [CrossRef]
- Fang, Z.-Y.; Feng, H.; Huang, S.; Zhou, D.-X. Theory of deep convolutional neural networks II: Spherical analysis. Neural Netw. 2020, 131, 154–162. [Google Scholar] [CrossRef]
- Feng, H.; Huang, S.; Zhou, D.-X. Generalization analysis of CNNs for classification on spheres. IEEE Trans. Neural Netw. Learn. Syst. 2021, 34, 6200–6213. [Google Scholar] [CrossRef]
- Zhou, D.-X. Deep distributed convolutional neural networks: Universality. Anal. Appl. 2018, 16, 895–919. [Google Scholar] [CrossRef]
- Zhou, D.-X. Universality of deep convolutional neural networks. Appl. Comput. Harmon. Anal. 2020, 48, 787–794. [Google Scholar] [CrossRef]
- Cucker, F.; Zhou, D.-X. Learning Theory: An Approximation Theory Viewpoint; Cambridge University Press: New York, NY, USA, 2007. [Google Scholar]
- Steinwart, I.; Christmann, A. Support Vector Machines; Springer: New York, NY, USA, 2008. [Google Scholar]
- Cucker, F.; Smale, S. On the mathematical foundations of learning. Bull. Amer. Math. Soc. 2001, 39, 1–49. [Google Scholar] [CrossRef]
- An, C.-P.; Chen, X.-J.; Sloan, I.H.; Womersley, R.S. Regularized least squares approximations on the sphere using spherical designs. SIAM J. Numer. Anal. 2012, 50, 1513–1534. [Google Scholar] [CrossRef]
- An, C.-P.; Wu, H.-N. Lasso hyperinterpolation over general regions. SIAM J. Sci. Comput. 2021, 43, A3967–A3991. [Google Scholar] [CrossRef]
- An, C.-P.; Ran, J.-S. Hard thresholding hyperinterpolation over general regions. arXiv 2023, arXiv:2209.14634. [Google Scholar]
- De Mol, C.; De Vito, E.; Rosasco, L. Elastic-net regularization in learning theory. J. Complex. 2009, 25, 201–230. [Google Scholar] [CrossRef]
- Fischer, S.; Steinwart, I. Sobolev norm learning rates for regularized least-squares algorithms. J. Mach. Learn. Res. 2020, 21, 8464–8501. [Google Scholar]
- Lai, J.-F.; Li, Z.-F.; Huang, D.-G.; Lin, Q. The optimality of kernel classifiers in Sobolev space. arXiv 2024, arXiv:2402.01148. [Google Scholar]
- Sun, H.-W.; Wu, Q. Least square regression with indefinite kernels and coefficient regularization. Appl. Comput. Harmon. Anal. 2011, 30, 96–109. [Google Scholar] [CrossRef]
- Wu, Q.; Zhou, D.-X. Learning with sample dependent hypothesis spaces. Comput. Math. Appl. 2008, 56, 2896–2907. [Google Scholar] [CrossRef]
- Chen, H.; Wu, J.-T.; Chen, D.-R. Semi-supervised learning for regression based on the diffusion matrix. Sci. Sin. Math. 2014, 44, 399–408. (In Chinese) [Google Scholar]
- Sun, X.-J.; Sheng, B.-H. The learning rate of kernel regularized regression associated with a correntropy-induced loss. Adv. Math. 2024, 53, 633–652. [Google Scholar]
- Wu, Q.; Zhou, D.-X. Analysis of support vector machine classification. J. Comput. Anal. Appl. 2006, 8, 99–119. [Google Scholar]
- Sheng, B.-H. Reproducing property of bounded linear operators and kernel regularized least square regressions. Int. J. Wavelets Multiresolut. Inf. Process. 2024, 22, 2450013. [Google Scholar] [CrossRef]
- Lin, S.-B.; Wang, D.; Zhou, D.-X. Sketching with spherical designs for noisy data fitting on spheres. SIAM J. Sci. Comput. 2024, 46, A313–A337. [Google Scholar] [CrossRef]
- Lin, S.-B.; Zeng, J.-S.; Zhang, X.-Q. Constructive neural network learning. IEEE Trans. Cybern. 2019, 49, 221–232. [Google Scholar] [CrossRef]
- Mhaskar, H.N.; Micchelli, C.A. Degree of approximation by neural and translation networks with single hidden layer. Adv. Appl. Math. 1995, 16, 151–183. [Google Scholar] [CrossRef]
- Sheng, B.-H.; Zhou, S.-P.; Li, H.-T. On approximation by tramslation networks in Lp(Rk) spaces. Adv. Math. 2007, 36, 29–38. [Google Scholar]
- Mhaskar, H.N.; Narcowich, F.J.; Ward, J.D. Approximation properties of zonal function networks using scattered data on the sphere. Adv. Comput. Math. 1999, 11, 121–137. [Google Scholar] [CrossRef]
- Sheng, B.-H. On approximation by reproducing kernel spaces in weighted Lp-spaces. J. Syst. Sci. Complex. 2007, 20, 623–638. [Google Scholar] [CrossRef]
- Parhi, R.; Nowak, R.D. Banach space representer theorems for neural networks and ridge splines. J. Mach. Learn. Res. 2021, 22, 1–40. [Google Scholar]
- Oono, K.; Suzuki, Y.J. Approximation and non-parameteric estimate of ResNet-type convolutional neural networks. arXiv 2023, arXiv:1903.10047. [Google Scholar]
- Shen, G.-H.; Jiao, Y.-L.; Lin, Y.-Y.; Huang, J. Non-asymptotic excess risk bounds for classification with deep convolutional neural networks. arXiv 2021, arXiv:2105.00292. [Google Scholar]
- Mallat, S. Understanding deep convolutional networks. Phil. Trans. R. Soc. A 2016, 374, 20150203. [Google Scholar] [CrossRef] [PubMed]
- Narcowich, F.J.; Ward, J.D.; Wendland, H. Sobolev error estimates and a Bernstein inequality for scattered data interpolation via radial basis functions. Constr. Approx. 2006, 24, 175–186. [Google Scholar] [CrossRef]
- Narcowich, F.J.; Ward, J.D. Scattered data interpolation on spheres: Error estimates and locally supported basis functions. SIAM J. Math. Anal. 2002, 33, 1393–1410. [Google Scholar] [CrossRef]
- Narcowich, F.J.; Sun, X.P.; Ward, J.D.; Wendland, H. Direct and inverse Sobolev error estimates for scattered data interpolation via spherical basis functions. Found. Comput. Math. 2007, 7, 369–390. [Google Scholar] [CrossRef]
- Gröchenig, K. Sampling, Marcinkiewicz-Zygmund inequalities, approximation and quadrature rules. J. Approx. Theory 2020, 257, 105455. [Google Scholar] [CrossRef]
- Gia, Q.T.L.; Mhaskar, H.N. Localized linear polynomial operators and quadrature formulas on the sphere. SIAM J. Numer. Anal. 2008, 47, 440–466. [Google Scholar] [CrossRef]
- Xu, Y. The Marcinkiewicz-Zygmund inequalities with derivatives. Approx. Theory Its Appl. 1991, 7, 100–107. [Google Scholar] [CrossRef]
- Szegö, G. Orthogonal Polynomials; American Mathematical Society: New York, NY, USA, 1967. [Google Scholar]
- Mhaskar, H.N.; Narcowich, F.J.; Ward, J.D. Spherical Marcinkiewicz-Zygmund inequalities and positive quadratue. Math. Comput. 2001, 70, 1113–1130, Corrigendum in Math. Comp. 2001, 71, 453–454. [Google Scholar] [CrossRef]
- Dai, F. On generalized hyperinterpolation on the sphere. Proc. Amer. Math. Soc. 2006, 134, 2931–2941. [Google Scholar] [CrossRef]
- Mhaskar, H.N.; Narcowich, F.J.; Sivakumar, N.; Ward, J.D. Approximation with interpolatory constraints. Proc. Amer. Math. Soc. 2001, 130, 1355–1364. [Google Scholar] [CrossRef]
- Xu, Y. Mean convergence of generalized Jacobi series and interpolating polynomials, II. J. Approx. Theory 1994, 76, 77–92. [Google Scholar] [CrossRef]
- Marzo, J. Marcinkiewicz-Zygmund inequalities and interpolation by spherical harmonics. J. Funct. Anal. 2007, 250, 559–587. [Google Scholar] [CrossRef]
- Marzo, J.; Pridhnani, B. Sufficiant conditions for sampling and interpolation on the sphere. Constr. Approx. 2014, 40, 241–257. [Google Scholar] [CrossRef]
- Wang, H.P. Marcinkiewicz-Zygmund inequalities and interpolation by spherical polynomials with respect to doubling weights. J. Math. Anal. Appl. 2015, 423, 1630–1649. [Google Scholar] [CrossRef]
- Gia, T.L.; Slon, I.H. The nuiform norm of hyperinterpolation on the unit sphere in an arbitrary number of dimensions. Constr. Approx. 2001, 17, 249–265. [Google Scholar] [CrossRef]
- Sloan, I.H. Polynomial interpolation and hyperinterpolation over general regions. J.Approx.Theory 1995, 83, 238–254. [Google Scholar] [CrossRef]
- Sloan, I.H.; Womersley, R.S. Constructive polynomial approximation on the sphere. J. Approx. Theory 2000, 103, 91–118. [Google Scholar] [CrossRef]
- Wang, H.-P. Optimal lower estimates for the worst case cubature error and the approximation by hyperinterpolation operators in the Sobolev space sertting on the sphere. Int. J. Wavelets Multiresolut. Inf. Process. 2009, 7, 813–823. [Google Scholar] [CrossRef]
- Wang, H.-P.; Wang, K.; Wang, X.-L. On the norm of the hyperinterpolation operator on the d-dimensional cube. Comput. Appl. 2014, 68, 632–638. [Google Scholar]
- Sloan, I.H.; Womersley, R.S. Filtered hyperinterpolation: A constructive polynomial approximation on the sphere. Int. J. Geomath. 2012, 3, 95–117. [Google Scholar] [CrossRef]
- Bondarenko, A.; Radchenko, D.; Viazovska, M. Well-seperated spherical designs. Constr. Approx. 2015, 41, 93–112. [Google Scholar] [CrossRef]
- Hesse, K.; Womersley, R.S. Numerical integration with polynomial exactness over a spherical cap. Adv. Math. Math. 2012, 36, 451–483. [Google Scholar] [CrossRef]
- Delsarte, P.; Goethals, J.M.; Seidel, J.J. Spherical codes and designs. Geom. Dedicata 1977, 6, 363–388. [Google Scholar] [CrossRef]
- An, C.-P.; Chen, X.-J.; Sloan, I.H.; Womersley, R.S. Well conditioned spherical designs for integration and interpolation on the two-sphere. SIAM J. Numer. Anal. 2010, 48, 2135–2157. [Google Scholar] [CrossRef]
- Chen, X.; Frommer, A.; Lang, B. Computational existence proof for spherical t-designs. Numer. Math. 2010, 117, 289–305. [Google Scholar] [CrossRef]
- An, C.-P.; Wu, H.-N. Bypassing the quadrature exactness assumption of hyperinterpolation on the sphere. J. Complex. 2024, 80, 101789. [Google Scholar] [CrossRef]
- An, C.-P.; Wu, H.-N. On the quadrature exactness in hyperinterpolation. BIT Numer. Math. 2022, 62, 1899–1919. [Google Scholar] [CrossRef]
- Sun, X.-J.; Sheng, B.-H.; Liu, L.; Pan, X.-L. On the density of translation networks defined on the unit ball. Math. Found. Comput. 2024, 7, 386–404. [Google Scholar] [CrossRef]
- Wang, H.-P.; Wang, K. Optimal recovery of Besov classes of generalized smoothness and Sobolev class on the sphere. J. Complex. 2016, 32, 40–52. [Google Scholar] [CrossRef]
- Dai, F.; Xu, Y. Approximation Theory and Harmonic Analysis on Spheres and Balls; Springer: New York, NY, USA, 2013. [Google Scholar]
- Müller, C. Spherical Harmonic; Springer: Berlin/Heidelberg, Germany, 1966. [Google Scholar]
- Wang, K.-Y.; Li, L.-Q. Harmonic Analysis and Approximation on the Unit Sphere; Science Press: New York, NY, USA, 2000. [Google Scholar]
- Cheney, W.; Light, W. A Course in Approximation Theory; China Machine Press: Beijing, China, 2004. [Google Scholar]
- Dai, F.; Wang, H.-P. Positive cubature formulas and Marcinkiewicz-Zygmund inequalities on spherical caps. Constr. Approx. 2010, 31, 1–36. [Google Scholar] [CrossRef]
- Aronszajn, N. Theory of reproducing kernels. Trans. Amer. Math. Soc. 1950, 68, 337–404. [Google Scholar] [CrossRef]
- Lin, S.-B.; Wang, Y.-G.; Zhou, D.-X. Distributed filtered hyperinterpolation for noisy data on the sphere. SIAM J. Numer. Anal. 2021, 59, 634–659. [Google Scholar] [CrossRef]
- Montúfar, G.; Wang, Y.-G. Distributed learning via filtered hyperinterpolation on manifolds. Found. Comput. Math. 2022, 22, 1219–1271. [Google Scholar] [CrossRef]
- Sheng, B.-H.; Wang, J.-L. Moduli of smoothness, K-functionals and Jackson-type inequalities associated with kernel function approximation in learning theory. Anal. Appl. 2024, 22, 981–1022. [Google Scholar] [CrossRef]
- Christmann, A.; Xiang, D.-H.; Zhou, D.-X. Total stability of kernel methods. Neurocomputing 2018, 289, 101–118. [Google Scholar] [CrossRef]
- Sheng, B.-H.; Liu, H.-X.; Wang, H.-M. The learning rate for the kernel regularized regression (KRR) with a differentiable strongly convex loss. Commun. Pure Appl. Anal. 2020, 19, 3973–4005. [Google Scholar] [CrossRef]
- Wang, S.-H.; Sheng, B.-H. Error analysis of kernel regularized pairwise learning with a strongly convex loss. Math. Found. Comput. 2023, 6, 625–650. [Google Scholar] [CrossRef]
- Smale, S.; Zhou, D.-X. Learning theory estimates via integral operators and their applications. Constr. Approx. 2007, 26, 153–172. [Google Scholar] [CrossRef]
- Lin, S.-B. Integral operator approaches for scattered data fitting on sphere. arXiv 2024, arXiv:2401.15294. [Google Scholar]
- Feng, H.; Lin, S.-B.; Zhou, D.-X. Radial basis function approximation with distributively stored data on spahere. Constr. Approx. 2024, 60, 1–31. [Google Scholar] [CrossRef]
- Bauschke, H.H.; Combettes, P.L. Convex Analysis and Monotone Operator Theory in Hilbert Spaces; Springer: New York, NY, USA, 2010. [Google Scholar]
- Kyriazis, G.; Petrushev, P.; Xu, Y. Jacobi decomposition of weighted Triebel-Lizorkin and Besov spaces. Stud. Math. 2008, 186, 161–202. [Google Scholar] [CrossRef]
- Chen, W.; Ditzian, Z. Best approximation and K-functionals. Acta Math. Hung. 1997, 75, 165–208. [Google Scholar] [CrossRef]
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2024 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Ran, X.; Sheng, B.; Wang, S. Learning Rate of Regularized Regression Associated with Zonal Translation Networks. Mathematics 2024, 12, 2840. https://doi.org/10.3390/math12182840
Ran X, Sheng B, Wang S. Learning Rate of Regularized Regression Associated with Zonal Translation Networks. Mathematics. 2024; 12(18):2840. https://doi.org/10.3390/math12182840
Chicago/Turabian StyleRan, Xuexue, Baohuai Sheng, and Shuhua Wang. 2024. "Learning Rate of Regularized Regression Associated with Zonal Translation Networks" Mathematics 12, no. 18: 2840. https://doi.org/10.3390/math12182840