TY - JOUR AU - D. B. Rokhlin TI - $Q$-learning in a stochastic Stackelberg game between an uninformed leader and a naive follower JO - Teoriâ veroâtnostej i ee primeneniâ PY - 2019 SP - 53 EP - 74 VL - 64 IS - 1 UR - http://geodesic.mathdoc.fr/item/TVP_2019_64_1_a3/ LA - ru ID - TVP_2019_64_1_a3 ER -