Accelerated Zero-Order SGD under High-Order Smoothness and Overparameterized Regime
Russian journal of nonlinear dynamics, Tome 20 (2024) no. 5, pp. 759-788.

Voir la notice de l'article provenant de la source Math-Net.Ru

We present a novel gradient-free algorithm to solve a convex stochastic optimization problem, such as those encountered in medicine, physics, and machine learning (e.g., the adversarial multi-armed bandit problem), where the objective function can only be computed through numerical simulation, either as the result of a real experiment or as feedback given by the function evaluations from an adversary. Thus, we suppose that only black-box access to the function values of the objective is available, possibly corrupted by adversarial noise: deterministic or stochastic. The noisy setup can arise naturally from modeling randomness within a simulation or by computer discretization, or when exact values of the function are forbidden due to privacy issues, or when solving nonconvex problems as convex ones with an inexact function oracle. By exploiting higher-order smoothness, fulfilled, e.g., in logistic regression, we improve the performance of zero-order methods developed under the assumption of classical smoothness (or having a Lipschitz gradient). The proposed algorithm enjoys optimal oracle complexity and is designed under an overparameterization setup, i.e., when the number of model parameters is much larger than the size of the training dataset. Overparametrized models fit to the training data perfectly while also having good generalization and outperforming underparameterized models on unseen data. We provide convergence guarantees for the proposed algorithm under both types of noise. Moreover, we estimate the maximum permissible adversarial noise level that maintains the desired accuracy in the Euclidean setup, and then we extend our results to a non-Euclidean setup. Our theoretical results are verified using the logistic regression problem.
Keywords: zero-order optimization, gradient-free algorithms, high-order smoothness, overparametrization
Mots-clés : kernel approximation
@article{ND_2024_20_5_a4,
     author = {G. K. Bychkov and D. M. Dvinskikh and A. V. Antsiferova and A. V. Gasnikov and A. V. Lobanov},
     title = {Accelerated {Zero-Order} {SGD} under {High-Order} {Smoothness} and {Overparameterized} {Regime}},
     journal = {Russian journal of nonlinear dynamics},
     pages = {759--788},
     publisher = {mathdoc},
     volume = {20},
     number = {5},
     year = {2024},
     language = {en},
     url = {http://geodesic.mathdoc.fr/item/ND_2024_20_5_a4/}
}
TY  - JOUR
AU  - G. K. Bychkov
AU  - D. M. Dvinskikh
AU  - A. V. Antsiferova
AU  - A. V. Gasnikov
AU  - A. V. Lobanov
TI  - Accelerated Zero-Order SGD under High-Order Smoothness and Overparameterized Regime
JO  - Russian journal of nonlinear dynamics
PY  - 2024
SP  - 759
EP  - 788
VL  - 20
IS  - 5
PB  - mathdoc
UR  - http://geodesic.mathdoc.fr/item/ND_2024_20_5_a4/
LA  - en
ID  - ND_2024_20_5_a4
ER  - 
%0 Journal Article
%A G. K. Bychkov
%A D. M. Dvinskikh
%A A. V. Antsiferova
%A A. V. Gasnikov
%A A. V. Lobanov
%T Accelerated Zero-Order SGD under High-Order Smoothness and Overparameterized Regime
%J Russian journal of nonlinear dynamics
%D 2024
%P 759-788
%V 20
%N 5
%I mathdoc
%U http://geodesic.mathdoc.fr/item/ND_2024_20_5_a4/
%G en
%F ND_2024_20_5_a4
G. K. Bychkov; D. M. Dvinskikh; A. V. Antsiferova; A. V. Gasnikov; A. V. Lobanov. Accelerated Zero-Order SGD under High-Order Smoothness and Overparameterized Regime. Russian journal of nonlinear dynamics, Tome 20 (2024) no. 5, pp. 759-788. http://geodesic.mathdoc.fr/item/ND_2024_20_5_a4/

[1] Polyak, B. T., Introduction to Optimization, Optimization Software, New York, 1987, xxvi, 438 pp. | MR

[2] Granichin, O. N. and Polyak, B. T., Randomized Algorithms of an Estimation and Optimization under Almost Arbitrary Noises, Nauka, Moscow, 2003, 291 pp.

[3] Risteski, A. and Li, Y., “Algorithms and Matching Lower Bounds for Approximately-Convex Optimization”, Proc. of the 30th Conf. on Neural Information Processing Systems (NIPS, Barcelona, Spain, Dec 2016), 4745–4753

[4] Vasin, A., Gasnikov, A., and Spokoiny, V., Stopping Rules for Accelerated Gradient Methods with Additive Noise in Gradient, WIAS Preprint No. 2812, WIAS, Berlin, 2021, 40 pp.

[5] Spall, J. C., Introduction to Stochastic Search and Optimization: Estimation, Simulation, and Control, Wiley-Interscience Series in Discrete Mathematics and Optimization, Wiley, Hoboken, N.J., 2003, xx, 595 pp. | MR | Zbl

[6] Conn, A. R., Scheinberg, K., and Vicente, L. N., Introduction to Derivative-Free Optimization, SIAM, Philadelphia, Penn., 2009, 289 pp. | MR | Zbl

[7] Duchi, J., Jordan, M., Wainwright, M., and Wibisono, A., “Optimal Rates for Zero-Order Convex Optimization: The Power of Two Function Evaluations”, IEEE Trans. Inf. Theory, 61:5 (2015), 2788–2806 | DOI | MR | Zbl

[8] Shamir, O., “An Optimal Algorithm for Bandit and Zero-Order Convex Optimization with Two-Point Feedback”, J. Mach. Learn. Res., 18:1 (2017), 1703–1713 | MR

[9] Nesterov, Yu. and Spokoiny, V., “Random Gradient-Free Minimization of Convex Functions”, Found. Comput. Math., 17:2 (2017), 527–566 | DOI | MR | Zbl

[10] Avtomat. i Telemekh., 2017, no. 2, 36–49 (Russian) | DOI | MR | Zbl

[11] Beznosikov, A., Sadiev, A., and Gasnikov, A., “Gradient-Free Methods with Inexact Oracle for Convex-Concave Stochastic Saddle-Point Problem”, MOTOR 2020: Mathematical Optimization Theory and Operations Research, Commun. Comput. Inf. Sci., 1275, eds. Yu. Kochetov, I. Bykadorov, T. Gruzdeva, Springer, Cham, 2020, 105–119 | MR | Zbl

[12] Gasnikov, A., Novitskii, A., Novitskii, V., Abdukhakimov, F., Kamzolov, D., Beznosikov, A., Takáč, M., Dvurechensky, P., and Gu, B., The Power of First-Order Smooth Optimization for Black-Box Non-Smooth Problems, , 2022, 33 pp. arXiv:2201.12289 [math.OC]

[13] Flaxman, A. D., Kalai, A. T., and McMahan, H. B., “Online Convex Optimization in the Bandit Setting: Gradient Descent without a Gradient”, Proc. of the 16th Annual ACM/SIAM Symp. on Discrete Algorithms (Vancouver, BC, Canada, 2005), 385–394 | MR | Zbl

[14] Bartlett, P., Dani, V., Hayes, Th., Kakade, Sh., Rakhlin, A., and Tewari, A., “High-Probability Regret Bounds for Bandit Online Linear Optimization”, Proc. of the 21st Annual Conf. on Learning Theory (COLT'2008, Helsinki, Finland, Jul 2008), 335–342 | MR

[15] Bubeck, S. and Cesa-Bianchi, N., “Regret Analysis of Stochastic and Nonstochastic Multi-Armed Bandit Problems”, Found. Trends Mach. Learn., 5:1 (2012), 1–122 | DOI | MR | Zbl

[16] Ge, Y., Wang, Q., Zheng, B., Zhuang, X., Li, Q., Shen, C., and Wang, C., “Anti-Distillation Backdoor Attacks: Backdoors Can Really Survive in Knowledge Distillation”, Proc. of the 29th ACM Internat. Conf. on Multimedia (Oct 2021), 826–834

[17] Dvinskikh, D., Tominin, V., Tominin, Ya., and Gasnikov, A., Gradient-Free Optimization for Non-Smooth Saddle Point Problems under Adversarial Noise, , 2022, 40 pp. arXiv:2202.06114 [math.OC] | MR

[18] Lobanov, A., “Stochastic Adversarial Noise in the “Black Box” Optimization Problem”, Optimization and Applications: Revised Selected Papers of the 14th Internat. Conf. (OPTIMA, Petrovac, Montenegro, Sep 2023), 60–71 | MR | Zbl

[19] Avtomat. i Telemekh., 2018, no. 8, 38–49 (Russian) | DOI | DOI | MR | Zbl

[20] Jacot, A., Gabriel, F., and Hongler, C., “Neural Tangent Kernel: Convergence and Generalization in Neural Networks”, Proc. of the 32nd Conf. on Neural Information Processing Systems (NeurIPS, Montreal, Canada, Dec 2018), 10 pp.

[21] Allen-Zhu, Z., and Li, Y., and Liang, Y., “Learning and Generalization in Overparameterized Neural Networks, Going beyond Two Layers”, Proc. of the 33rd Internat. Conf. on Neural Information Processing Systems (NIPS'19, Vancouver, BC, Canada, Dec 2019), 6158–6169

[22] Belkin, M., Hsu, D., Ma, S., and Mandal, S., “Reconciling Modern Machine-Learning Practice and the Classical Bias-Variance Trade-Off”, Proc. Natl. Acad. Sci. USA, 116:32 (2019), 15849–15854 | DOI | MR | Zbl

[23] Hoffmann, J., Borgeaud, S., Mensch, A., Buchatskaya, E., Cai, T., Rutherford, E., de Las Casas, D., Hendricks, L. A., Welbl, J., Clark, A., Hennigan, T., Noland, E., Millican, K., van den Driessche, G., Damoc, B., Guy, A., Osindero, S., Simonyan, K., Elsen, E., Vinyals, O., Rae, J. W., and Sifre, L., “Training Compute-Optimal Large Language Models”, Proc. of the 36th Internat. Conf. on Neural Information Processing Systems (NIPS'22, New Orleans, La., USA, 2022), 15 pp.

[24] Eldan, R. and Li, Y., Tinystories: How Small Can Language Models Be and Still Speak Coherent English?, , 2023, 27 pp. arXiv:2305.07759 [cs.CL]

[25] Lobanov, A. and Gasnikov, A., “Accelerated Zero-Order SGD Method for Solving the Black Box Optimization Problem under “Overparametrization” Condition”, Proc. of the 14th Internat. Conf. “Optimization and Applications” (OPTIMA'2023, Petrovac, Montenegro, Sep 2023), 72–83 | MR | Zbl

[26] Problemy Peredachi Informatsii, 26:2 (1990), 45–53 (Russian) | MR | Zbl

[27] Bach, F. and Perchet, V., “Highly-Smooth Zero-th Order Online Optimization”, Proc. of the 29th Annual Conf. on Learning Theory (COLT, New York, USA, Jun 2016), 257–283

[28] Akhavan, A., Chzhen, E., Pontil, M., and Tsybakov, A. B., Gradient-Free Optimization of Highly Smooth Functions: Improved Analysis and a New Algorithm, , 2023, 50 pp. arXiv:2306.02159 [math.ST] | MR

[29] Woodworth, B. and Srebro, N., “An Even More Optimal Stochastic Optimization Algorithm: Minibatching and Interpolation Learning”, NIPS'21: Proc. of the 35th Internat. Conf. on Neural Information Processing Systems, 7333–7345

[30] Zorich, V. A., Mathematical Analysis: II, transl. from the 5th and the 6th corrected (2012) Russian editions by R. Cooke and O. Paniagua, Universitext, 2nd ed., Springer, Heidelberg, 2016, xx, 720 pp. | DOI | MR | Zbl

[31] Akhavan, A., Chzhen, E., Pontil, M., and Tsybakov, A. B., “A Gradient Estimator via L1-Randomization for Online Zero-Order Optimization with Two Point Feedback”, Proc. of the 36th Internat. Conf. on Neural Information Processing Systems (NIPS'22), 7685–7696

[32] Lobanov, A., Bashirov, N., and Gasnikov, A., “The “Black-Box” Optimization Problem: Zero-Order Accelerated Stochastic Method via Kernel Approximation”, J. Optim. Theory Appl., 203:3 (2024), 2451–2486 | DOI | MR | Zbl