On Quasi-Convex Smooth Optimization Problems by a Comparison Oracle
Russian journal of nonlinear dynamics, Tome 20 (2024) no. 5, pp. 813-825.

Voir la notice de l'article provenant de la source Math-Net.Ru

Frequently, when dealing with many machine learning models, optimization problems appear to be challenging due to a limited understanding of the constructions and characterizations of the objective functions in these problems. Therefore, major complications arise when dealing with first-order algorithms, in which gradient computations are challenging or even impossible in various scenarios. For this reason, we resort to derivative-free methods (zeroth-order methods). This paper is devoted to an approach to minimizing quasi-convex functions using a recently proposed (in [56]) comparison oracle only. This oracle compares function values at two points and tells which is larger, thus by the proposed approach, the comparisons are all we need to solve the optimization problem under consideration. The proposed algorithm to solve the considered problem is based on the technique of comparison-based gradient direction estimation and the comparison-based approximation normalized gradient descent. The normalized gradient descent algorithm is an adaptation of gradient descent, which updates according to the direction of the gradients, rather than the gradients themselves. We proved the convergence rate of the proposed algorithm when the objective function is smooth and strictly quasi-convex in $\mathbb{R}^n$, this algorithm needs $\mathcal{O}\left( \frac{n D^2}{\varepsilon^2} \log\left(\frac{n D}\varepsilon\right)\right)$ comparison queries to find an $\varepsilon$-approximate of the optimal solution, where $D$ is an upper bound of the distance between all generated iteration points and an optimal solution.
Keywords: quasi-convex function, gradient-free algorithm, smooth function, normalized gradient descent
Mots-clés : comparison oracle
@article{ND_2024_20_5_a6,
     author = {A. V. Gasnikov and M. S. Alkousa and A. V. Lobanov and Y. V. Dorn and F. S. Stonyakin and I. A. Kuruzov and S. R. Singh},
     title = {On {Quasi-Convex} {Smooth} {Optimization} {Problems} by a {Comparison} {Oracle}},
     journal = {Russian journal of nonlinear dynamics},
     pages = {813--825},
     publisher = {mathdoc},
     volume = {20},
     number = {5},
     year = {2024},
     language = {en},
     url = {http://geodesic.mathdoc.fr/item/ND_2024_20_5_a6/}
}
TY  - JOUR
AU  - A. V. Gasnikov
AU  - M. S. Alkousa
AU  - A. V. Lobanov
AU  - Y. V. Dorn
AU  - F. S. Stonyakin
AU  - I. A. Kuruzov
AU  - S. R. Singh
TI  - On Quasi-Convex Smooth Optimization Problems by a Comparison Oracle
JO  - Russian journal of nonlinear dynamics
PY  - 2024
SP  - 813
EP  - 825
VL  - 20
IS  - 5
PB  - mathdoc
UR  - http://geodesic.mathdoc.fr/item/ND_2024_20_5_a6/
LA  - en
ID  - ND_2024_20_5_a6
ER  - 
%0 Journal Article
%A A. V. Gasnikov
%A M. S. Alkousa
%A A. V. Lobanov
%A Y. V. Dorn
%A F. S. Stonyakin
%A I. A. Kuruzov
%A S. R. Singh
%T On Quasi-Convex Smooth Optimization Problems by a Comparison Oracle
%J Russian journal of nonlinear dynamics
%D 2024
%P 813-825
%V 20
%N 5
%I mathdoc
%U http://geodesic.mathdoc.fr/item/ND_2024_20_5_a6/
%G en
%F ND_2024_20_5_a6
A. V. Gasnikov; M. S. Alkousa; A. V. Lobanov; Y. V. Dorn; F. S. Stonyakin; I. A. Kuruzov; S. R. Singh. On Quasi-Convex Smooth Optimization Problems by a Comparison Oracle. Russian journal of nonlinear dynamics, Tome 20 (2024) no. 5, pp. 813-825. http://geodesic.mathdoc.fr/item/ND_2024_20_5_a6/

[1] Agrawal, A. and Boyd, S., “Disciplined Quasiconvex Programming”, Optim. Lett., 14:7 (2020), 1643–1657 | DOI | MR | Zbl

[2] Akhavan, A., Chzhen, E., Pontil, M., and Tsybakov, A. B., “A Gradient Estimator via L1-Randomization for Online Zero-Order Optimization with Two Point Feedback”, Proc. of the 36th Internat. Conf. on Neural Information Processing Systems (NIPS'22), Art. No. 558, 7685–7696

[3] Zh. Vychisl. Mat. Mat. Fiz., 63:9 (2023), 1458–1512 (Russian) | DOI | MR | Zbl

[4] Audet, Ch. and Dennis, J. E., Jr., “Mesh Adaptive Direct Search Algorithms for Constrained Optimization”, SIAM J. Optim., 17:1 (2006), 188–217 | DOI | MR | Zbl

[5] Bansal, S., Calandra, R., Xiao, T., Levine, S., and Tomlin, C., “Goal-Driven Dynamics Learning via Bayesian Optimization”, IEEE 56th Annual Conf. on Decision and Control (CDC'2017), 5168–5173

[6] Bach, F. and Perchet, V., “Highly-Smooth Zero-th Order Online Optimization”, Proc. of the 29th Conf. on Learning Theory (PMLR'2016), Vol. 49, 257–283

[7] Balasubramanian, K. and Ghadimi, S., “Zeroth-Order Nonconvex Stochastic Optimization: Handling Constraints, High Dimensionality, and Saddle Points”, Found. Comput. Math., 22:1 (2022), 35–76 | DOI | MR | Zbl

[8] Bergou, El Houcine, Gorbunov, E., and Richtárik, P., “Stochastic Three Points Method for Unconstrained Smooth Minimization”, SIAM J. Optim., 30:4 (2020), 2726–2749 | DOI | MR | Zbl

[9] Boyd, S. and Vandenberghe, L., Convex Optimization, Cambridge Univ. Press, New York, 2004, 716 pp. | MR | Zbl

[10] Brent, R., Algorithms for Minimization without Derivatives, Dover Books on Mathematics, Dover, Mineola, N.Y., 2002, xii, 195 pp. | MR | Zbl

[11] Bubeck, S. and Cesa-Bianchi, N., “Regret Analysis of Stochastic and Nonstochastic Multi-Armed Bandit Problems”, Found. Trends Mach. Learn., 5:1 (2012), 1–122 | DOI | MR | Zbl

[12] Chen, P.-Y., Zhang, H., Sharma, Y., Yi, J., and Hsieh, C., “Zoo: Zeroth Order Optimization Based Black-Box Attacks to Deep Neural Networks without Training Substitute Models”, Proc. of the 10th ACM Workshop on Artificial Intelligence and Security (Dallas, Tex., USA, Nov 2017), 15–26

[13] Choromanski, K., Iscen, A., Sindhwani, V., Tan, J., and Coumans, E., “Optimizing Simulations with Noise-Tolerant Structured Exploration”, IEEE Internat. Conf. on Robotics and Automation (ICRA, Brisbane, QLD, Australia, 2018), 2970–2977

[14] Conn, A., Scheinberg, K., and Vicente, L., Introduction to Derivative-Free Optimization, MOS/SIAM Series on Optimization, 8, MOS/SIAM, Philadelphia, Penn., 2009, x, 266 pp. | MR

[15] Dai, Zh., Low, B., and Jaillet, P., “Federated Bayesian Optimization via Thompson Sampling”, Adv. Neural Inf. Process Syst., 33 (2020), 9687–9699

[16] Dennis, J. E., Jr. and Torczon, V., “Direct Search Methods on Parallel Machines”, SIAM J. Optim., 1:4 (1991), 448–474 | DOI | MR | Zbl

[17] Duchi, J., Jordan, M., Wainwright, M., and Wibisono, A., “Optimal Rates for Zero-Order Convex Optimization: The Power of Two Function Evaluations”, IEEE Trans. Inf. Theory, 61:5 (2015), 2788–2806 | DOI | MR | Zbl

[18] Duchi, J., Hazan, E., and Singer, Y., “Adaptive Subgradient Methods for Online Learning and Stochastic Optimization”, J. Mach. Learn. Res., 12:7 (2011), 2121–2159 | MR | Zbl

[19] Eppstein, D., “Quasiconvex Programming”, Combinatorial and Computational Geometry, Math. Sci. Res. Inst. Publ., 52, eds. J. E. Goodman, J. Pach, E. Welzl, Camb. Univ. Press, Cambridge, 2005, 287–331, xii, 616 pp. | MR | Zbl

[20] Fang, C., Li, C., Lin, Z., and Zhang, T., “SPIDER: Near-Optimal Non-Convex Optimization via Stochastic Path-Integrated Differential Estimator”, Proc. of the 32nd Conf. on Neural Information Processing Systems (NeurIPS, Montreal, Canada, Dec 2018), 11 pp.

[21] Fabian, V., “Stochastic Approximation of Minima with Improved Asymptotic Speed”, Ann. Math. Statist., 38:1 (1967), 191–200 | DOI | MR | Zbl

[22] Fenchel, W., Convex Cones, Sets, and Functions: Lecture Notes, Princeton Univ., Princeton N.J., 1953, 158 pp.

[23] Gao, J., Lanchantin, J., Soffa, M. L., and Qi, Y., “Black-Box Generation of Adversarial Text Sequences to Evade Deep Learning Classifiers”, Proc. of the IEEE Security and Privacy Workshops (SPW, San Francisco, Calif., USA, 2018), 50–56

[24] Gorbunov, E., Tupitsa, N., Choudhury, S., Aliev, A., Richtárik, P., Horváth, S., and Takáč, M., Methods for Convex $(L_0,\,L_1)$-Smooth Optimization: Clipping, Acceleration, and Adaptivity, , 2024, 51 pp. arXiv:2409.14989 [math.OC]

[25] Gorbunov, E., Bibi, A., Sener, O., Bergou, El Houcine, and Richtárik, P., “A Stochastic Derivative Free Optimization Method with Momentum”, Proc. of the 8th Internat. Conf. on Learning Representations (ICLR, Addis Ababa, Ethiopia, Apr 2020), 28 pp.

[26] Hazan, E., Levy, K., and Shalev-Shwartz, S., “Beyond Convexity: Stochastic Quasi-Convex Optimization”, Proc. of the 29th Conf. on Neural Information Processing Systems (NeurIPS, Montreal, Canada, Dec 2015), 1594–1602

[27] Ji, K., Wang, Z., Zhou, Y., and Liang, Y., “Improved Zeroth-Order Variance Reduced Algorithms and Analysis for Nonconvex Optimization”, Proc. of the 36th Internat. Conf. on Machine Learning (PMLR, Long Beach, Calif., USA, 2019), 3100–3109

[28] Ke, Q. and Kanade, T., “Uncertainty Models in Quasiconvex Optimization for Geometric Reconstruction”, Proc. of the IEEE Computer Society Conf. on Computer Vision and Pattern Recognition, Vol. 1 (CVPR'06, New York, N.Y., USA, 2006), 1199–1205

[29] Ke, Q. and Kanade, T., “Quasiconvex Optimization for Robust Geometric Reconstruction”, IEEE Trans. Pattern Anal. Mach. Intell., 29:10 (2007), 1834–1847 | DOI

[30] Kfir, L., “Online to Offline Conversions, Universality and Adaptive Minibatch Sizes”, Proc. of the 31st Conf. on Neural Information Processing Systems (NeurIPS, Long Beach, Calif., USA, Dec 2017), 10 pp.

[31] Kiwiel, K. C., “Convergence and Efficiency of Subgradient Methods for Quasiconvex Minimization”, Math. Program., 90:1 Ser. A (2001), 1–25 | DOI | MR | Zbl

[32] Kingma, D. P. and Ba, J., “Adam: A Method for Stochastic Optimization”, Proc. of the Internat. Conf. on Learning Representations (ICLR, San Diego, Calif., USA, May 2015), 13 pp.

[33] Kolda, T. G., Lewis, R. M., and Torczon, V., “Optimization by Direct Search: New Perspectives on Some Classical and Modern Methods”, SIAM Rev., 45:3 (2003), 385–482 | DOI | MR | Zbl

[34] Konnov, I., “On Convergence Properties of a Subgradient Method”, Optim. Methods Softw., 18:1 (2003), 53–62 | DOI | MR | Zbl

[35] Laffont, J.-J. and Martimort, D., The Theory of Incentives: The Principal-Agent Model, Princeton Univ. Press, Princeton, N.J., 2002, 421 pp.

[36] Lobanov, A., Gasnikov, A., and Krasnov, A., Acceleration Exists! Optimization Problems When Oracle Can Only Compare Objective Function Values, , 2024, 25 pp. arxiv.org/abs/2402.09014

[37] Lobanov, A. and Gasnikov, A., “Accelerated Zero-Order SGD Method for Solving the Black Box Optimization Problem under “Overparametrization” Condition”, Proc. of the 14th Internat. Conf. “Optimization and Applications” (OPTIMA'2023, Petrovac, Montenegro, Sep 2023), 72–83 | MR | Zbl

[38] Luenberger, D. G., “Quasi-Convex Programming”, SIAM J. Appl. Math., 16:5 (1968), 1090–1095 | DOI | MR | Zbl

[39] Madry, A., Makelov, A., Schmidt, L., Tsipras, D., and Vladu, A, “Towards Deep Learning Models Resistant to Adversarial Attacks”, Proc. of the 6th Internat. Conf. on Learning Representations (ICLR'2018, Vancouver Convention Center, Vancouver, Canada, 2018), 23 pp.

[40] Karabag, M. O., Neary, C., and Topcu, U., “Smooth Convex Optimization Using Sub-Zeroth-Order Oracles”, AAAI-21 Technical Tracks 5: Proc. of the AAAI Conference on Artificial Intelligence, 35:5 (2021), 3815–3822 | DOI

[41] Nesterov, Yu., Lectures on Convex Optimization, Springer Optimization and Its Applications, 137, Springer, Cham, 2018, xxiii, 589 pp. | DOI | MR | Zbl

[42] Nesterov, Yu. and Spokoiny, V., “Random Gradient-Free Minimization of Convex Functions”, Found. Comput. Math., 17:2 (2017), 527–566 | DOI | MR | Zbl

[43] Nesterov, Yu., “Minimization Methods for Nonsmooth Convex and Quasiconvex Functions”, Matekon, 29 (1984), 519–531 | MR

[44] Nelder, J. and Mead, R., “A Simplex Method for Function Minimization”, Comput. J., 7:4 (1965), 308–313 | DOI | MR | Zbl

[45] Nikaidô, H., “On von Neumann's Minimax Theorem”, Pacific J. Math., 4 (1954), 65–72 | DOI | MR | Zbl

[46] Patel, K. K., Saha, A., Wang, L., and Srebro, N., “Distributed Online and Bandit Convex Optimization”, 36th Conf. on Neural Information Processing Systems (NeurIPS, New Orleans, La., USA, 2022), 17 pp.

[47] Papernot, N., McDaniel, P., Goodfellow, I., Jha, S., Celik, B., and Swami, A., “Practical Black-Box Attacks against Machine Learning”, ASIA CCS'17: ACM Asia Conference on Computer and Communications Security (Abu Dhabi, United Arab Emirates, Apr 2017), eds. R. Karri, O. Sinanoglu, A.-R. Sadeghi, X. Yi, ACM, New York, N.Y., 2017, 506–519

[48] Rosenbrock, H. H., “An Automatic Method for Finding the Greatest or Least Value of a Function”, Comput. J., 3 (1960), 175–184 | DOI | MR

[49] Salimans, T., Ho, J., Chen, X., Sidor, Sz., and Sutskever, I., Evolution Strategies As a Scalable Alternative to Reinforcement Learning, , 2017, 13 pp. arXiv:1703.03864 [stat.ML]

[50] Seiler, P. and Balas, G., “Quasiconvex Sum-of-Squares Programming”, Proc. of the 49th IEEE Conf. on Decision and Control (CDC, Atlanta, Ga., USA, Dec 2010), 3337–3342

[51] Tang, Z., Rybin, D., and Chang, T., “Zeroth-Order Optimization Meets Human Feedback: Provable Learning via Ranking Oracles”, ICML'2023 Workshop “The Many Facets of Preference-Based Learning” (Honolulu, H.I., USA, Jul 2023), 14 pp.

[52] Vankov, D., Rodomanov, A., Nedich, A., Sankar, L., and Stich, S., Optimizing $(L_0,\,L_1)$-Smooth Functions by Gradient Methods, , 2024, 27 pp. arXiv:2410.10800 [math.OC]

[53] Varian, H., “Price Discrimination and Social Welfare”, Am. Econ. Rev., 75:4 (1985), 870–875 | MR

[54] Xu, W., Jones, C. N., Svetozarevic, B., Laughman, Ch. R., and Chakrabarty, A., “VABO: Violation-Aware Bayesian Optimization for Closed-Loop Control Performance Optimization with Unmodeled Constraints”, J. Process Control, 138 (2024), Art. 103212, 9 pp.

[55] Wolfstetter, E., Topics in Microeconomics: Industrial Organization, Auctions, and Incentives, Cambridge Univ. Press, Cambridge, 1999, 392 pp.

[56] Zhang, C. and Li, T., Comparisons Are All You Need for Optimizing Smooth Functions, , 2024, 39 pp. arXiv:2405.11454 [cs.LG]

[57] Zhang, H., Xiong, H., and Gu, B., “Zeroth-Order Negative Curvature Finding: Escaping Saddle Points without Gradients”, 36th Conf. on Neural Information Processing Systems (NeurIPS, New Orleans, La., USA, 2022), 38332–38344