On Some Works of Boris Teodorovich Polyak on the Convergence of Gradient Methods and Their Development

S. S. Ablaev; Аблаев С. С.; A. N. Beznosikov; Безносиков А. Н.; A. V. Gasnikov; Гасников А. В.; D. M. Dvinskikh; Двинских Д. М.; A. V. Lobanov; Лобанов A. B.; S. M. Puchinin; Пучинин C. М.; F. S. Stonyakin; Стонякин Ф. С.

doi:10.31857/S0044466924040028

О некоторых работах Бориса Теодоровича Поляка по сходимости градиентных методов и их развитии

Авторы: Аблаев С.С.¹^,2, Безносиков А.Н.¹^,3, Гасников А.В.¹^,3^,4, Двинских Д.М.¹^,3^,4, Лобанов A.B.¹^,4, Пучинин C.М.¹, Стонякин Ф.С.¹^,2
Учреждения:
1. МФТИ
2. Крымский федеральный университет им. В.И. Вернадского
3. ИППИ РАН
4. ИСП РАН
Выпуск: Том 64, № 4 (2024)
Страницы: 587-626
Раздел: ОПТИМАЛЬНОЕ УПРАВЛЕНИЕ
URL: https://rjonco.com/0044-4669/article/view/665134
DOI: https://doi.org/10.31857/S0044466924040028
EDN: https://elibrary.ru/ZKLBSS
ID: 665134

Цитировать

Полный текст

Открытый доступ
Доступ закрыт

Доступ предоставлен
Доступ закрыт

Доступ платный или только для подписчиков

Аннотация
Полный текст
Об авторах
Список литературы
Дополнительные файлы
Статистика

Аннотация

Представлен обзор современного состояния субградиентных и ускоренных методов выпуклой оптимизации, в том числе при наличии помех и доступа к различной информации о целевой функции (значение функции, градиент, стохастический градиент, старшие производные). Для невыпуклых задач рассматривается условие Поляка-Лоясиевича и приводится обзор основных результатов. Рассматривается поведение численных методов при наличии острого минимума. Цель данного обзора — показать влияние работ Б. Т. Поляка (1935—2023) по градиентным методам оптимизации и их окрестностям на современное развитие численных методов оптимизации. Библ. 200. Табл. 1.

Ключевые слова

градиентный спуск, условие градиентного доминирования (Поляка-Лоясиевича), острый минимум, субградиентный метод Поляка-Шора, условие ранней остановки, метод тяжелого шарика Поляка, стохастический градиентный спуск

Полный текст

Список литературы

Поляк Б. Т. Градиентные методы минимизации функционалов // Ж. вычисл. матем. и матем. физ. 1963. Т. 3. № 4. С. 643—653.
Поляк Б. Т. Введение в оптимизацию. М.: Наука, 1983.
Немировский А.С., Поляк Б. Т., Цыпкин Я. З. Оптимальные алгоритмы стохастической оптимизации при мультипликативных помехах // Докл. АН СССР. 1985. Т. 284. С. 564—567.
Поляк Б.Т., Цыбаков А. Б. Оптимальные порядки точности поисковых алгоритмов стохастической оптимизации // Пробл. передачи информ. 1990. Т. 26. № 2. С. 45—53.
Поляк Б. Т. Новый метод типа стохастической аппроксимации // Автоматика и телемехан. 1990. № 7. С. 98—107.
Polyak B.T., Juditsky A. B. Acceleration of stochastic approximation by averaging // SIAM J. Control and Optimizat. 1992. V. 30. № 4. P. 838—855.
Nesterov Y., Polyak B. T. Cubic regularization of Newton method and its global performance // Math. Program. 2006. V. 108. № 1. P. 177—205.
Поляк Б. Т. Градиентные методы решения уравнений и неравенств // Ж. вычисл. матем. и матем. физ. 1964. Т. 4. № 6. С. 995—1005.
Поляк Б.Т. О некоторых способах ускорения сходимости итерационных методов // Ж. вычисл. матем. и матем. физ. 1964. Т. 4. № 5. С. 791—803.
Левитин Е.С., Поляк Б. Т. Методы минимизации при наличии ограничений // Ж. вычисл. матем. и матем. физ. 1966. Т. 6. № 5. С. 787—823.
Поляк Б. Т. Минимизация негладких функционалов // Ж. вычисл. матем. и матем. физ. 1969. Т. 9. № 3. С. 509—521.
Поляк Б. Т. Метод сопряженных градиентов в задачах на экстремум // Ж. вычисл. матем. и матем. физ. 1969. Т. 9. № 4. С. 807—821.
Поляк Б.Т., Цыпкин Я. З. Оптимальные псевдоградиентные алгоритмы адаптации // Автоматика и телемехан. 1980. № 8. С. 74—84.
Poljak B. Iterative algorithms for singular minimization problems // Nonlin. Program. Elsevier, 1981. P. 147—166.
Poljak B. T. Sharp minimum // “Generalized Lagrangians and applications”. 1982.
Гасников А. В. Научный путь Бориса Теодоровича Поляка. Оптимизация // Компьют. исслед. и моделирование. 2023. Т. 15. № 2. С. 235—243.
Fradkov A.L., Granichin O. N. Boris Teodorovich Polyak // Cybernet. and Phys. 2023. V. 12(1).
Polyak B. T. Subgradient methods: a survey of Soviet research // Nonsmooth Optimizat. 1978. V. 3. P. 5—29.
Shor N. Z. Minimization methods for non-differentiable functions. V. 3. Springer Science & Business Media, 2012.
Шор Н. З. Методы минимизации недифференцируемых функций и их применения. Киев: Наук. думка, 1979.
Drori Y., Teboulle M. An optimal variants of Kelley’s cutting-plane method // Math. Program. 2016. V. 160. № 1. P. 321—351.
Stochastic Polyak step-size for sgd: An adaptive learning rate for fast convergence / N. Loizou [et al.] // Inter. Conf. on Artific. Intelligence and Statist. PMLR. 2021. P. 1306—1314.
Wang X., Johansson M., Zhang T. Generalized Polyak step size for first order optimization with momentum // arXiv preprint arXiv:2305.12939; 2023.
Hazan E., Kakade S. Revisiting the Polyak step size // arXiv preprint arXiv:1905.00313; 2019.
Nesterov Y. Universal gradient methods for convex optimization problems // Math. Program. 2015. V. 152. № 1. P. 381—404.
Гасников А.В., Нестеров Ю. Е. Универсальный метод для задач стохастической композитной оптимизации // Ж. вычисл. матем. и матем. физ. 2018. Т. 58. № 1. С. 51—68.
Jiang X., Stich S. U. Adaptive SGD with Polyak stepsize and Line-search: Robust Convergence and Variance Reduction // arXiv preprint arXiv:2308.06058v; 2023.
Поляк Б. Т. Один общий метод решения экстремальных задач // Докл. АН СССР. 1967. Т. 174. № 1. С. 33—36.
Huang Y., Lin Q. Single-Loop Switching Subgradient Methods for Non-Smooth Weakly Convex Optimization with Non-Smooth Convex Constraints // arXiv preprint. 2023.
Mirror descent and convex optimization problems with non-smooth inequality constraints / A. Bayandina [et al.] // Lect. Not. in Math. 2018. V. 2227. P. 181—213.
Lagae S. New efficient techniques to solve sparse structured linear systems, with applications to truss topology optimization. 2017.
Nesterov Y. Subgradient methods for huge-scale optimization problems // Math. Program. 2014. V. 146. № 1/2. P. 275—297.
Адаптивные алгоритмы зеркального спуска в задачах выпуклого программирования с липшицевыми ограничениями / Ф. С. Стонякин [и др.] // Тр. ИММ УрО РАН. 2018. Т. 24. № 2. С. 266—279.
Mirror descent for constrained optimization problems with large subgradient values of functional constraints / F. S. Stonyakin [et al.] // Comput. Res. and Model. 2020. V. 12. № 2. P. 301—317.
Адаптивные субградиентные методы для задач математического программирования с квазивыпуклыми функциями / С. С. Аблаев [и др.] // Тр. Ин-та матем. и мех. УрО РАН. 2023. Т. 29. № 3. С. 7—25.
Tiapkin D., Gasnikov A. Primal-dual stochastic mirror descent for MDPs // Inter. Conf. Artific. Intelligence and Statist. PMLR. 2022. P. 9723—9740.
Выпуклая оптимизация: учебное пособие / Е. А. Воронцова [и др.]. М.: МФТИ, 2021.
A Parameter-free and Projection-free Restarting Level Set Method for Adaptive Constrained Conver Optimization Under the Error Bound Condition / Q. Lin [et al.] // arXiv preprint. 2022.
Subgradient methods for sharp weakly convex functions / D. Davis [и др.] // J. of Optimizat. Theory and Appl. 2018. V. 179. P. 962—982.
Duchi J.C., Ruan F. Solving (most) of a set of quadratic equalities: Composite optimization for robust phase retrieval // Informat. and Inference: J. of the IMA. 2019. V. 8. № 3. P. 471—529.
Eldar Y.C., Mendelson S. Phase retrieval: stability and recovery guarantees // Appl. Comput. Harmon. Anal. 2014. V. 36. № 3. P. 473—494.
Li X. Nonconvex Robust Low-rank Matrix Recovery // arXiv preprint. 2018.
Дудов С.И., Осипцев М. А. Характеризация решения задач сильно-слабо выпуклого программирования // Матем. сб. 2021. Т. 212. № 6. С. 43—72.
Incremental Methods for Weakly Convex Optimization / X. Li [et al.] // OPT2020: 12th Ann. Workshop on Optimizat. for Mach. Learn. 2020.
Davis D., Drusvyatskiy D., Paquette C. The nonsmooth landscape of phase retrieval // IMA J. Numer. Analys. 2020. V. 40. № 4. P. 2652—2695.
Davis D., Drusvyatskiy D., Kellie M. Stochastic model-based minimization under high-order growth // arXiv preprint. 2018.
Субградиентные методы для слабо выпуклых и относительно слабо выпуклых задач с острым минимумом / Ф. С. Стонякин [и др.] // Компьют. исслед. и моделирование. 2023. Т. 15. № 2. С. 393—412.
Li Y., Sun Y., Chi Y. Low-Rank Positive Semidefinite Matrix Recovery from Corrupted Rank-One Measurements // IEEE Transact. Signal Proces. 2017. V. 65. P. 397—408.
Robust Principal Component Analysis / E. Candes [et al.] // J. of the ACM. 2011.
A Theory on the Absence of Spurious Solutions for Nonconvex and Nonsmooth Optimization / C. Josz [et al.] // NeurIPS. 2018. P. 2441—2449.
Lectures on convex optimization. V. 137 / Y. Nesterov [et al.]. Springer, 2018.
Немировский А.С., Юдин Д. Б. Сложность задач и эффективность методов оптимизации. Наука, 1979.
Евтушенко Ю. Г. Методы решения экстремальных задач и их применение в системах оптимизации. 1982.
Su W., Boyd S., Candes E. A differential equation for modeling Nesterov’s accelerated gradient method: theory and insights // Adv. Neural Informat. Proces. System. 2014. V. 27.
Wilson A.C., Recht B., Jordan M. I. A Lyapunov analysis of accelerated methods in optimization // J. Machine Learn. Res. 2021. V. 22. № 1. P. 5040—5073.
Lojasiewicz S. Une propri´et´e topologique des sous-ensembles analytiques réels // Les ´equations aux d´erivées partielles. 1963. V. 117. P. 87—89.
57. Leżański T. Ü. Ber das Minimumproblem für Funktionale in Banachschen r¨aumen // Mathematische Annalen. 1963. V. 152. № 4. P. 271—274.
Karimi H., Nutini J., Schmidt M. Linear convergence of gradient and proximal-gradient methods under the Polyak-Lojasiewiсz condition // Machine Learning and Knowledge Discovery in Databases: Europ. Conf., ECML PKDD 2016, Riva del Garda, Italy, September 19—23, 2016, Proceed., Part I 16. Springer, 2016. P. 795—811.
Liu C., Zhu L., Belkin M. Toward a theory of optimization for over-parameterized systems of non-linear equations: the lessons of deep learning // arXiv preprint arXiv:2003.00307; 2020. V. 7.
Fatkhullin I., Polyak B. Optimizing static linear feedback: Gradient method // SIAM J. Control and Optimizat. 2021. V. 59. № 5. P. 3887—3911.
Yue P., Fang C., Lin Z. On the lower bound of minimizing Polyak-Lojasiewiсz functions // The Thirty Sixth Annual Conference on Learning Theory. PMLR. 2023. P. 2948—2968.
Yang J., Kiyavash N., He N. Global convergence and variance-reduced optimization for a class of nonconvex-nonconcave minimax problems // arXiv preprint arXiv:2002.09621; 2020.
Garg K., Baranwal M. Fixed-Time Convergence for a Class of Nonconvex-Nonconcave Min-Max Problems // 2022 Eighth Indian Control Conference (ICC). IEEE. 2022. P. 19—24.
Solving a class of non-convex min-max games using iterative first order methods / M. Nouiehed [и др.] // Adv. Neural Inform. Proces. System. 2019. V. 32.
El Ghaoui L., Lebret H. Robust solutions to least-squares problems with uncertain data // SIAM J. Matrix Analys. and Appl. 1997. V. 18. № 4. P. 1035—1064.
Муратиди А.Я., Стонякин Ф. С. Правила остановки градиентного метода для седловых задач с двусторонним условием Поляка–Лоясиевича // arXiv preprint arXiv:2307.09921; 2023.
Бакушинский А.Б., Поляк Б. Т. О решении вариационных неравенств // Докл. АН СССР. 1974. Т. 219. С. 1038—1041.
Stonyakin F., Kuruzov I., Polyak B. Stopping rules for gradient methods for non-convex problems with additive noise in gradient // J. Optimizat. Theory and Appl. 2023. P. 1—21.
A theoretical and empirical comparison of gradient approximations in derivative-free optimization / A. S. Berahas [et al.] // Foundat. Comput. Math. 2022. V. 22. № 2. P. 507—560.
Conn A.R., Scheinberg K., Vicente L. N. Introduction to derivative-free optimization. SIAM, 2009.
Risteski A., Li Y. Algorithms and matching lower bounds for approximately-convex optimization // Adv. Neural Inform. Proces. System. 2016. V. 29.
Convex optimization in hilbert space with applications to inverse problems / A. Gasnikov [et al.] // arXiv preprint arXiv:1703.00267; 2017.
Kabanikhin S. I. Inverse and ill-posed problems: theory and applications. de Gruyter, 2011.
Devolder O., Glineur F., Nesterov Y. First-order methods of smooth convex optimization with inexact oracle // Math. Program. 2014. V. 146. P. 37—75.
Devolder O. Exactness, inexactness and stochasticity in first-order methods for large- scale convex optimization: дис… канд. / Devolder Olivier. CORE UCLouvain Louvain-la-Neuve, Belgium, 2013.
d’Aspremont A. Smooth optimization with approximate gradient // SIAM J. Optimizat. 2008. V. 19. № 3. P. 1171—1183.
Vasin A., Gasnikov A., Spokoiny V. Stopping rules for accelerated gradient methods with additive noise in gradient: тех. отч. / Berlin: Weierstraß-Institut fu¨r Angewandte Analysis und Stochastik, 2021.
Емелин И.В., Красносельский М. А. Правило останова в итерационных процедурах решения некорректных задач // Автоматика и телемехан. 1978. № 12. P. 59—63.
Carter R. G. On the global convergence of trust region algorithms using inexact gradient information // SIAM J. on Numer. Analys. 1991. V. 28. № 1. P. 251—265.
Гасников А. В. Современные численные методы оптимизации. Метод универсального градиентного спуска. М.: МЦНМО, 2021.
De Klerk E., Glineur F., Taylor A. B. On the worst-case complexity of the gradient method with exact line search for smooth strongly convex functions // Optimizat. Lett. 2017. V. 11. P. 1185—1199.
Puchinin S., Stonyakin F. Gradient-Type Method for Optimization Problems with Polyak-Lojasiewicz Condition: Relative Inexactness in Gradient and Adaptive Parameters Setting // arXiv preprint arXiv:2307.14101; 2023.
Convex optimization: Algorithms and complexity / S. Bubeck [et al.] // Foundat. and Trend. in Mach. Learn. 2015. V. 8. № 3/4. P. 231—357.
Cox B., Juditsky A., Nemirovski A. Decomposition techniques for bilinear saddle point problems and variational inequalities with affine monotone operators // J. Optimizat. Theory and Appl. 2017. V. 172. P. 402—435.
Гасников А., Гасникова Е. Модели равновесного распределения транспортных потоков в больших сетях. 2023.
Efficient numerical methods to solve sparse linear equations with application to pagerank / A. Anikin [и др.] // Optimizat. Method. and Software. 2022. V. 37. № 3. P. 907—935.
Bomze I. M., Rinaldi F., Zeffiro D. Frank–Wolfe and friends: a journey into projection-free first-order optimization methods // 4OR. 2021. V. 19. P. 313—345.
Conditional gradient methods / G. Braun [и др.] // arXiv preprint arXiv:2211.14103; 2022.
Zero-Order Stochastic Conditional Gradient Sliding Method for Non-smooth Convex Optimization / A. Lobanov [и др.] // arXiv preprint arXiv:2303.02778; 2023.
Vedernikov R., Rogozin A., Gasnikov A. Decentralized conditional gradient method over time-varying graphs // arXiv preprint arXiv:2307.10978; 2023.
Adaptive Variant of the Frank-Wolfe Algorithm for Convex Optimization Problems / G. Aivazian [et al.] // arXiv preprint arXiv:2307.16059; 2023.
Vial J.-P. Strong convexity of sets and functions // J. of Math. Economic. 1982. V. 9. № 1/2. P. 187—205.
Vial J.-P. Strong and weak convexity of sets and functions // Math. Operat. Res. 1983. V. 8. № 2. P. 231—259.
Половинкин Е. С. Сильно выпуклый анализ // Матем. сб. 199٦. Т. 187. № 2. С. 103—130.
Ito M., Lu Z., He C. A Parameter-Free Conditional Gradient Method for Composite Minimization under H¨older Condition // J. Machine Learn. Res. 2023. V. 24. P. 1—34.
Taylor A.B., Hendrickx J. M., Glineur F. Smooth strongly convex interpolation and exact worst-case performance of first-order methods // Math. Program. 2017. V. 161. P. 307—345.
Super-acceleration with cyclical step-sizes / B. Goujaud [et al.] // Inter. Conf. on Artificial Intelligence and Statist. PMLR. 2022. P. 3028—3065.
Немировский А.С. О регуляризующих свойствах метода сопряженных градиентов на некорректных задачах // Ж. вычисл. матем. и матем. физ. 1986. Т. 26. № 3. С. 332—347.
Acceleration methods / A. d’Aspremont, D. Scieur, A. Taylor [et al.] // Foundat. and Trends in Optimizat. 2021. V. 5. № 1/2. P. 1—245.
Scieur D., Pedregosa F. Universal average-case optimality of Polyak momentum // Inter. Conf. on Machine Learn. PMLR. 2020. P. 8565—8572.
Гельфанд И.М., Цетлин М. Л. Принцип нелокального поиска в системах автоматической оптимизации // Докл. АН СССР. 1961. Т. 137. С. 295—298.
Lessard L., Recht B., Packard A. Analysis and design of optimization algorithms via integral quadratic constraints // SIAM J. on Optimizat. 2016. V. 26. № 1. P. 57—95.
Ghadimi E., Feyzmahdavian H. R., Johansson M. Global convergence of the heavy-ball method for convex optimization // 2015 European control conference (ECC). IEEE. 2015. P. 310—315.
Goujaud B., Taylor A., Dieuleveut A. Provable non-accelerations of the heavy-ball method // arXiv preprint arXiv:2307.11291; 2023.
O’Donoghue B., Candes E. Adaptive restart for accelerated gradient schemes // Foundat. Comput. Math. 2015. V. 15. P. 715—732.
Danilova M., Kulakova A., Polyak B. Non-monotone behavior of the heavy ball method // Difference Equations and Discrete Dynamical Systems with Applications: 24th ICDEA, Dresden, Germany, May 21—25, 2018 24. Springer, 2020. P. 213—230.
Немировский А. Орт-метод гладкой выпуклой минимизации // Изв. АН СССР. Техн. кибернетика. 1982. № 2. P. 18—29.
Is local SGD better than minibatch SGD? / B. Woodworth [et al.] // Inter. Conf. Machine Learn. PMLR. 2020. P. 10334—10343.
The min-max complexity of distributed stochastic convex optimization with intermittent communication / B. E. Woodworth [et al.] // Conf. Learn. Theory. PMLR. 2021. P. 4386—4437.
Нестеров Ю. Е. Метод решения задачи выпуклого программирования со скоростью сходимости O(1/k2) // Докл. АН СССР. 1983. Т. 269. С. 543—547.
Lan G. First-order and stochastic optimization methods for machine learning. Vol. 1. Springer, 2020.
Lin Z., Li H., Fang C. Accelerated optimization for machine learning // Nature Singapore: Springer, 2020.
Peng W., Wang T. The Nesterov-Spokoiny Acceleration: o(1/kΘ2) Convergence without Proximal Operations // arXiv preprint arXiv:2308.14314; 2023.
Inexact model: A framework for optimization and variational inequalities / F. Stonyakin [et al.] // Optimizat. Meth. and Software. 2021. V. 36. № 6. P. 1155—1201.
Zhang Z., Lan G. Solving Convex Smooth Function Constrained Optimization Is As Almost Easy As Unconstrained Optimization // arXiv preprint arXiv:2210.05807; 2022.
Accelerated gradient methods with absolute and relative noise in the gradient / A. Vasin [et al.] // Optimizat. Method. and Software. 2023. P. 1—50.
Intermediate Gradient Methods with Relative Inexactness / N. Kornilov [et al.] // arXiv preprint arXiv:2310.00506; 2023.
Optimal gradient sliding and its application to optimal distributed optimization under similarity / D. Kovalev [et al.] // Adv. in Neural Informat. Proces. System. 2022. V. 35. P. 33494—33507.
Kovalev D., Gasnikov A., Malinovsky G. An Optimal Algorithm for Strongly Convex Min-min Optimization // arXiv preprint arXiv:2212.14439; 2022.
Optimal Algorithm with Complexity Separation for Strongly Convex-Strongly Concave Composite Saddle Point Problems / E. Borodich [et al.] // arXiv preprint arXiv:2307.12946; 2023.
Smooth monotone stochastic variational inequalities and saddle point problems: A survey / A. Beznosikov [et al.] // Europ. Math. Soc. Magazine. 2023. № 127. P. 15—28.
Nesterov Y. Implementable tensor methods in unconstrained convex optimization // Math. Program. 2021. V. 186. P. 157—183.
Monteiro R.D., Svaiter B. F. An accelerated hybrid proximal extragradient method for convex optimization and its implications to second-order methods // SIAM J. Optimizat. 2013. V. 23. № 2. P. 1092—1125.
Near optimal methods for minimizing convex functions with Lipschitz p-th derivatives / A. Gasnikov [et al.] // Conf. Learn. Theory. PMLR. 2019. P. 1392—1393.
Kovalev D., Gasnikov A. The first optimal acceleration of high-order methods in smooth convex optimization // Adv. Neural Inform. Proces. System. 2022. V. 35. P. 35339—35351.
Optimal and Adaptive Monteiro-Svaiter Acceleration / Y. Carmon [et al.] // Adv.n Neural Inform. Proces. System. V. 35 / Ed. S. Koyejo [et al.]. Curran Associates, Inc., 2022. P. 20338—20350. URL: https://proceedings.neurips.cc/.
Exploiting higher-order derivatives in convex optimization methods / D. Kamzolov [et al.] // arXiv preprint arXiv:2208.13190; 2022.
Bertsekas D., Tsitsiklis J. Parallel and distributed computation: numerical methods. Athena Scientific, 2015.
Recent theoretical advances in decentralized distributed convex optimization / E. Gorbunov [et al.] // High-Dimensional Optimization and Probability: With a View Towards Data Science. Springer, 2022. P. 253—325.
Кибардин В. М. Декомпозиция по функциям в задаче минимизации // Автоматика и телемех. 1979. № 9. С. 66—79.
Decentralized optimization over time-varying graphs: a survey / A. Rogozin [et al.] // arXiv preprint arXiv:2210.09719; 2022.
Decentralized Optimization Over Slowly Time-Varying Graphs: Algorithms and Lower Bounds / D. Metelev [et al.] // arXiv preprint arXiv:2307.12562; 2023.
Bao C., Chen L., Li J. The Global R-linear Convergence of Nesterov’s Accelerated Gradient Method with Unknown Strongly Convex Parameter // arXiv preprint arXiv:2308.14080; 2023.
Guminov S., Gasnikov A., Kuruzov I. Accelerated methods for weakly-quasi-convex optimization problems // Comput. Management Sci. 2023. V. 20. № 1. P. 1—19.
Algorithmic stochastic convex optimization / A. Beznosikov [и др.]. Springer, 2024.
Robbins H., Monro S. A stochastic approximation method // Ann. Math. Statistic. 1951. P. 400—407.
Ермольев Ю. Методы стохастического программирования. М.: Наука, 1976.
High-probability bounds for stochastic optimization and variational inequalities: the case of unbounded variance / A. Sadiev [et al.] // ICML. 2023.
Root-sgd: Sharp nonasymptotics and asymptotic efficiency in a single algorithm / C. J. Li [et al.] // Conf. Learn. Theory. PMLR. 2022. P. 909—981.
Невельсон М., Хасьминский Р. Стохастическая аппроксимация и рекуррентное оценивание. М.: Наука, 1972.
Fort G. Central limit theorems for stochastic approximation with controlled Markov chain dynamics // ESAIM: Probability and Statistic. 2015. V. 19. P. 60—80.
Bach F., Perchet V. Highly-smooth zero-th order online optimization // Conf. Learn. Theory. PMLR. 2016. P. 257—283.
Ruppert D. Efficient estimations from a slowly convergent Robbins-Monro process: тех. отч. / Cornell University Operations Research; Industrial Engineering. 1988.
Robust stochastic approximation approach to stochastic programming / A. Nemirovski [и др.] // SIAM J. Optimizat. 2009. V. 19. № 4. P. 1574—1609.
Nesterov Y. Primal-dual subgradient methods for convex problems // Math. Program. 2009. V. 120. № 1. P. 221—259.
Duchi J., Hazan E., Singer Y. Adaptive subgradient methods for online learning and stochastic optimization. // J. Machine Learn. Res. 2011. V. 12. № 7.
Ivgi M., Hinder O., Carmon Y. DoG is SGD’s Best Friend: A Parameter-Free Dynamic Step Size Schedule. 2023. arXiv: 2302.12022 [cs.LG].
Cutkosky A., Defazio A., Mehta H. Mechanic: A Learning Rate Tuner // arXiv preprint arXiv:2306.00144; 2023.
Stich S. U. Unified optimal analysis of the (stochastic) gradient method // arXiv preprint arXiv:1907.04232; 2019.
Gorbunov E. Unified analysis of SGD-type methods // arXiv preprint arXiv:2303.16502; 2023.
Lan G. An optimal method for stochastic composite optimization // Math. Program. 2012. V. 133. № 1/2. P. 365—397.
The power of first-order smooth optimization for black-box non-smooth problems / A. Gasnikov [et al.] // Inter. Conf. Machine Learn. PMLR. 2022. P. 7241—7265.
Woodworth B.E., Srebro N. An even more optimal stochastic optimization algorithm: minibatching and interpolation learning // Adv. Neural Inform. Proces. System. 2021. V. 34. P. 7333—7345.
Accelerated stochastic approximation with state-dependent noise / S. Ilandarideva [и др.] // arXiv preprint arXiv:2307.01497; 2023.
Unixgrad: A universal, adaptive algorithm with optimal guarantees for constrained optimization / A. Kavis [et al.] // Adv. Neural Inform. Proces. System. 2019. V. 32.
Ene A., Nguyen H. L., Vladu A. Adaptive gradient methods for constrained convex optimization and variational inequalities // Proceed. of the AAAI Conf. Artificial Intelligence. 2021. V. 35. P. 7314—7321.
Nesterov Y. Efficiency of coordinate descent methods on huge-scale optimization problems // SIAM J. Optimizat. 2012. V. 22. № 2. P. 341—362.
Richtárik P., Takač M. On optimal probabilities in stochastic coordinate descent methods // Optimizat. Lett. 2016. V. 10. P. 1233—1243.
Qu Z., Richtárik P. Coordinate descent with arbitrary sampling I: Algorithms and complexity // Optimizat. Meth. and Software. 2016. V. 31. № 5. P. 829—857.
QSGD: Communication-efficient SGD via gradient quantization and encoding / D. Alistarh [et al.] // Adv. Neural Inform. Proces. System. 2017. V. 30.
On biased compression for distributed learning / A. Beznosikov [et al.] // arXiv preprint arXiv:2002.12410; 2020.
Schmidt M., Roux N. L. Fast convergence of stochastic gradient descent under a strong growth condition // arXiv preprint arXiv:1308.6370; 2013.
Vaswani S., Bach F., Schmidt M. Fast and faster convergence of sgd for over-parameterized models and an accelerated perceptron // 22nd Inter. Conf. on Artificial Intelligence and Statistic. PMLR. 2019. P. 1195—1204.
First Order Methods with Markovian Noise: from Acceleration to Variational Inequalities / A. Beznosikov [et al.] // arXiv preprint arXiv:2305.15938; 2023.
Moulines E., Bach F. Non-Asymptotic Analysis of Stochastic Approximation Algorithms for Machine Learning // Adv. Neural Inform. Proces. System. V. 24 / Ed. J. Shawe-Taylor [et al.]. Curran Associates, Inc., 2011. URL: https:// proceedings.neurips.cc/paper_files/paper/2011/file/40008b9a5380fcacce3976bf7c08af5b-.
SGD: General analysis and improved rates / R. M. Gower [et al.] // Inter. Conf. Machine Learn. PMLR. 2019. P. 5200—5209.
Ma S., Bassily R., Belkin M. The power of interpolation: Understanding the effectiveness of SGD in modern over-parametrized learning // Inter. Conf. Machine Learn. PMLR. 2018. P. 3325—3334.
Distributed learning with compressed gradient differences / K. Mishchenko [et al.] // arXiv preprint arXiv:1901.09269; 2019.
Gorbunov E., Hanzely F., Richtárik P. A unified theory of SGD: Variance reduction, sampling, quantization and coordinate descent // Inter. Conf. Artificial Intelligence and Statistic. PMLR. 2020. P. 680—690.
Defazio A., Bach F., Lacoste-Julien S. SAGA: A fast incremental gradient method with support for non-strongly convex composite objectives // Adv. Neural Inform. Proces. System. 2014. V. 27.
Johnson R., Zhang T. Accelerating stochastic gradient descent using predictive variance reduction // Adv. Neural Inform. Proces. System. 2013. V. 26.
Kovalev D., Horváth S., Richtárik P. Don’t jump through hoops and remove those loops: SVRG and Katyusha are better without the outer loop // Algorithm. Learn. Theory. PMLR. 2020. P. 451—467.
Hanzely F., Mishchenko K., Richt´arik P. SEGA: Variance reduction via gradient sketching // Adv. Neural Inform. Proces. System. 2018. V. 31.
Unified analysis of stochastic gradient methods for composite convex and smooth optimization / A. Khaled [и др.] // J. Optimizat. Theory and Appl. 2023. P. 1—42.
Stochastic gradient descent-ascent: Unified theory and new efficient methods / A. Beznosikov [et al.] // Inter. Conf. Artificial Intelligence and Statistic. PMLR. 2023. P. 172—235.
Maslovskii A.Y. A unified analysis of variational inequality methods: variance reduction, sampling, quantization, and coordinate descent // Comput. Math. and Math. Phys. 2023. V. 63. № 2. P. 147—174.
Explore aggressively, update conservatively: Stochastic extragradient methods with variable stepsize scaling / Y.-G. Hsieh [et al.] // Adv. Neural Inform. Proces. System. 2020. V. 33. P. 16223—16234.
Stochastic extragradient: General analysis and improved rates / E. Gorbunov [и др.] // Inter. Conf. Artificial Intelligence and Statistic. PMLR. 2022. P. 7865—7901.
Алгоритмы робастной стохастической оптимизации на основе метода зеркального спуска / А. В. Назин [и др.] // Автоматика и телемех. 2019. № 9. С. 64—90.
High-Probability Convergence for Composite and Distributed Stochastic Minimization and Variational Inequalities with Heavy-Tailed Noise / E. Gorbunov [и др.]. 2023. arXiv: 2310.01860 [math.OC].
Поляк Б.Т., Цыпкин Я. З. Псевдоградиентные алгоритмы адаптации и обучения. Автоматика и телемеханика // Автоматика и телемех. 1973. № 3. С. 45—68.
Nonlinear gradient mappings and stochastic optimization: A general framework with applications to heavy-tail noise / D. Jakoveti´c [et al.] // SIAM J. Optimizat. 2023. V. 33. № 2. P. 394—423.
Advancing the lower bounds: An accelerated, stochastic, second-order method with optimal adaptation to inexactness / A. Agafonov [et al.]. 2023. arXiv: 2309.01570 [math.OC].
Граничин О.Н., Поляк Б. Т. Рандомизированные алгоритмы оценивания и оптимизации при почти произвольных помехах. М.: Наука, 2003.
rock H. An automatic method for finding the greatest or least value of a function // Comput. J. 1960. V. 3. № 3. P. 175—184.
Kiefer J., Wolfowitz J. Stochastic estimation of the maximum of a regression function // Ann. Math. Statistic. 1952. P. 462—466.
Randomized gradient-free methods in convex optimization / A. Gasnikov [et al.] // arXiv preprint arXiv:2211.13566; 2022.
Lobanov A., Gasnikov A., Stonyakin F. Highly Smoothness Zero-Order Methods for Solving Optimization Problems under PL Condition // Ж. вычисл. матем. и матем. физ. 2023.
Gradient-free optimization of highly smooth functions: improved analysis and a new algorithm / A. Akhavan [et al.] // arXiv preprint arXiv:2306.02159; 2023.
A theoretical and empirical comparison of gradient approximations in derivative-free optimization / A. S. Berahas [et al.] // Foundat. Comput. Math. 2022. V. 22. № 2. P. 507—560.
Akhavan A., Pontil M., Tsybakov A. Exploiting higher order smoothness in derivative-free optimization and continuous bandits // Adv. Neural Inform. Proces. System. 2020. V. 33. P. 9017—9027.
Novitskii V., Gasnikov A. Improved exploiting higher order smoothness in derivative-free optimization and continuous bandit // arXiv preprint arXiv:2101.03821; 2021.
Гасников А., Двуреченский П., Нестеров Ю. Стохастические градиентные методы с неточным оракулом // Тр. Московск. физ.-тех. ин-та. 2016. Т. 8. № 1 (29). С. 41—91.
Граничин О.Н., Иванский Ю. В., Копылова К. Д. Метод Б. Т. Поляка на основе стохастической функции Ляпунова для обоснования состоятельности оценок поискового алгоритма стохастической аппроксимации при неизвестных, но ограниченных помехах // Ж. вычисл. матем. и матем. физ. 2023.
Lobanov A., Bashirov N., Gasnikov A. The Black-Box Optimization Problem: Zero-Order Accelerated Stochastic Method via Kernel Approximation. 2023. arXiv: 2310.02371 [math.OC].
Learning supervised pagerank with gradient-based and gradient-free optimization methods / L. Bogolubsky [et al.] // Adv. Neural Inform. Proces. System. 2016. V. 29.
Noisy zeroth-order optimization for non-smooth saddle point problems / D. Dvinskikh [et al.] // Inter. Conf. Math. Optimizat. Theory and Operat. Res. Springer. 2022. P. 18—33.
Gradient-Free Federated Learning Methods with l_1 and l_2-Randomization for Non-Smooth Convex Stochastic Optimization Problems / A. Lobanov [et al.] // arXiv preprint arXiv:2211.10783; 2022.
Accelerated Zeroth-order Method for Non-Smooth Stochastic Convex Optimization Problem with Infinite Variance / N. Kornilov [et al.]. 2022.
Risteski A., Li Y. Algorithms and matching lower bounds for approximately-convex optimization // Adv. Neural Inform. Proces. System. 2016. V. 29.

Дополнительные файлы

Доп. файлы

Действие

1. JATS XML

Скачать

Имя пользователя
Пароль
Запомнить меня

Забыли пароль?	Регистрация

Имя пользователя
Пароль
Запомнить меня

Забыли пароль?	Регистрация

Том 65, № 9 (2025)

О некоторых работах Бориса Теодоровича Поляка по сходимости градиентных методов и их развитии

Полный текст

Аннотация

Ключевые слова

Полный текст

Об авторах

С. С. Аблаев

А. Н. Безносиков

А. В. Гасников

Д. М. Двинских

A. B. Лобанов

C. М. Пучинин

Ф. С. Стонякин

Список литературы

Дополнительные файлы