Sign rank versus Vapnik-Chervonenkis dimension
Sbornik. Mathematics, Tome 208 (2017) no. 12, pp. 1724-1757 Cet article a éte moissonné depuis la source Math-Net.Ru

Voir la notice de l'article

This work studies the maximum possible sign rank of sign $(N \times N)$-matrices with a given Vapnik-Chervonenkis dimension $d$. For $d=1$, this maximum is three. For $d=2$, this maximum is $\widetilde{\Theta}(N^{1/2})$. For $d >2$, similar but slightly less accurate statements hold. The lower bounds improve on previous ones by Ben-David et al., and the upper bounds are novel. The lower bounds are obtained by probabilistic constructions, using a theorem of Warren in real algebraic topology. The upper bounds are obtained using a result of Welzl about spanning trees with low stabbing number, and using the moment curve. The upper bound technique is also used to: (i) provide estimates on the number of classes of a given Vapnik-Chervonenkis dimension, and the number of maximum classes of a given Vapnik-Chervonenkis dimension—answering a question of Frankl from 1989, and (ii) design an efficient algorithm that provides an $O(N/\log(N))$ multiplicative approximation for the sign rank. We also observe a general connection between sign rank and spectral gaps which is based on Forster's argument. Consider the adjacency $(N \times N)$-matrix of a $\Delta$-regular graph with a second eigenvalue of absolute value $\lambda$ and $\Delta \leq N/2$. We show that the sign rank of the signed version of this matrix is at least $\Delta/\lambda$. We use this connection to prove the existence of a maximum class $C\subseteq\{\pm 1\}^N$ with Vapnik-Chervonenkis dimension $2$ and sign rank $\widetilde{\Theta}(N^{1/2})$. This answers a question of Ben-David et al. regarding the sign rank of large Vapnik-Chervonenkis classes. We also describe limitations of this approach, in the spirit of the Alon-Boppana theorem. We further describe connections to communication complexity, geometry, learning theory, and combinatorics. Bibliography: 69 titles.
Keywords: hyperplane arrangement, sign rank, half-spaces, linear classifiers, unbounded-error communication complexity.
Mots-clés : VC dimension
@article{SM_2017_208_12_a1,
     author = {N. Alon and Sh. Moran and A. Yehudayoff},
     title = {Sign rank versus {Vapnik-Chervonenkis} dimension},
     journal = {Sbornik. Mathematics},
     pages = {1724--1757},
     year = {2017},
     volume = {208},
     number = {12},
     language = {en},
     url = {http://geodesic.mathdoc.fr/item/SM_2017_208_12_a1/}
}
TY  - JOUR
AU  - N. Alon
AU  - Sh. Moran
AU  - A. Yehudayoff
TI  - Sign rank versus Vapnik-Chervonenkis dimension
JO  - Sbornik. Mathematics
PY  - 2017
SP  - 1724
EP  - 1757
VL  - 208
IS  - 12
UR  - http://geodesic.mathdoc.fr/item/SM_2017_208_12_a1/
LA  - en
ID  - SM_2017_208_12_a1
ER  - 
%0 Journal Article
%A N. Alon
%A Sh. Moran
%A A. Yehudayoff
%T Sign rank versus Vapnik-Chervonenkis dimension
%J Sbornik. Mathematics
%D 2017
%P 1724-1757
%V 208
%N 12
%U http://geodesic.mathdoc.fr/item/SM_2017_208_12_a1/
%G en
%F SM_2017_208_12_a1
N. Alon; Sh. Moran; A. Yehudayoff. Sign rank versus Vapnik-Chervonenkis dimension. Sbornik. Mathematics, Tome 208 (2017) no. 12, pp. 1724-1757. http://geodesic.mathdoc.fr/item/SM_2017_208_12_a1/

[1] N. Alon, J. H. Spencer, The probabilistic method, Wiley Ser. Discrete Math. Optim., 4th ed., John Wiley Sons, Inc., Hoboken, NJ, 2016, xiv+375 pp. | MR | Zbl

[2] N. Alon, “Eigenvalues and expanders”, Combinatorica, 6:2 (1986), 83–96 | DOI | MR | Zbl

[3] N. Alon, “Eigenvalues, geometric expanders, sorting in rounds, and Ramsey theory”, Combinatorica, 6:3 (1986), 207–219 | DOI | MR | Zbl

[4] N. Alon, “A parallel algorithmic version of the local lemma”, Random Structures Algorithms, 2:4 (1991), 367–378 | DOI | MR | Zbl

[5] N. Alon, J. Balogh, B. Bollobás, R. Morris, “The structure of almost all graphs in a hereditary property”, J. Combin. Theory Ser. B, 101:2 (2011), 85–110 | DOI | MR | Zbl

[6] N. Alon, P. Frankl, V. R{o}dl, “Geometrical realization of set systems and probabilistic communication complexity”, SFCS' 85 Proceedings of the 26th annual symposium on foundations of computer science (Portland, OR, 1985), IEEE Computer Soc., Washington, DC, 1985, 277–280 | DOI

[7] N. Alon, D. Haussler, E. Welzl, “Partitioning and geometric embedding of range spaces of finite Vapnik–Chervonenkis dimension”, SCG' 87 Proceedings of the third annual symposium on computational geometry (Waterloo, Ontario, Canada, 1987), ACM, New York, NY, 1987, 331–340 | DOI

[8] N. Alon, V. D. Milman, “Eigenvalues, expanders and superconcentrators (extended abstract)”, SFCS' 84 Proceedings of the 25th annual symposium on foundations of computer science (Singer Island, FL, 1984), IEEE Computer Soc., Washington, DC, 1984, 320–322 | DOI

[9] N. Alon, V. D. Milman, “$\lambda_1$, isoperimetric inequalities for graphs, and superconcentrators”, J. Combin. Theory Ser. B, 38:1 (1985), 73–88 | DOI | MR | Zbl

[10] N. Alon, L. Rónyai, T. Szabó, “Norm-graphs: variations and applications”, J. Combin. Theory Ser. B, 76:2 (1999), 280–290 | DOI | MR | Zbl

[11] R. P. Anstee, L. Rónyai, A. Sali, “Shattering news”, Graphs Combin., 18:1 (2002), 59–73 | DOI | MR | Zbl

[12] R. I. Arriaga, S. Vempala, “An algorithmic theory of learning: robust concepts and random projection”, Mach. Learn., 63:2 (2006), 161–182 | DOI | Zbl

[13] H.-J. Bandelt, V. Chepoi, A. Dress, J. H. Koolen, “Combinatorics of lopsided sets”, European J. Combin., 27:5 (2006), 669–689 | DOI | MR | Zbl

[14] R. Basri, P. F. Felzenszwalb, R. B. Girshick, D. W. Jacobs, C. J. Klivans, “Visibility constraints on features of 3d objects”, CVPR2009 IEEE conference on computer vision and pattern recognition (Miami, FL, 2009), IEEE Computer Soc., 2009, 1231–1238 | DOI

[15] Sh. Ben-David, N. Eiron, H. U. Simon, “Limitations of learning via embeddings in Euclidean half spaces”, J. Mach. Learn. Res., 3, Spec. Issue Comput. Learn. Theory (2002), 441–461 | MR | Zbl

[16] Sh. Ben-David, M. Lindenbaum, “Localization vs. identification of semi-algebraic sets”, Mach. Learn., 32:3 (1998), 207–224 | DOI | Zbl

[17] A. Beutelspacher, U. Rosenbaum, Projective geometry: from foundations to applications, Cambridge Univ. Press, Cambridge, 1998, x+258 pp. | MR | Zbl

[18] A. Bhangale, S. Kopparty, The complexity of computing the minimum rank of a sign pattern matrix, arXiv: 1503.04486

[19] A. Blumer, A. Ehrenfeucht, D. Haussler, M. K. Warmuth, “Classifying learnable geometric concepts with the Vapnik–Chervonenkis dimension (extended abstract)”, STOC' 86 Proceedings of the 18th annual ACM symposium on theory of computing (Berkeley, CA, 1986), ACM, New York, 1986, 273–282 | DOI

[20] B. Bollobás, A. J. Radcliffe, “Defect Sauer results”, J. Combin. Theory Ser. A, 72:2 (1995), 189–208 | DOI | MR | Zbl

[21] B. E. Boser, I. M. Guyon, V. N. Vapnik, “A training algorithm for optimal margin classifiers”, COLT' 92 Proceedings of the 5th annual ACM workshop on computational learning theory (Pittsburgh, PA, 1992), ACM, New York, 1992, 144–152 | DOI

[22] W. G. Brown, “On graphs that do not contain a Thomsen graph”, Canad. Math. Bull., 9 (1966), 281–285 | DOI | MR | Zbl

[23] Ch. J. C. Burges, “A tutorial on support vector machines for pattern recognition”, Data Min. Knowl. Discov., 2:2 (1998), 121–167 | DOI

[24] B. Chazelle, E. Welzl, “Quasi-optimal range searching in space of finite VC-dimension”, Discrete Comput. Geom., 4:5 (1989), 467–489 | DOI | MR | Zbl

[25] F. R. K. Chung, R. L. Graham, P. Frankl, J. B. Shearer, “Some intersection theorems for ordered sets and graphs”, J. Combin. Theory Ser. A, 43:1 (1986), 23–37 | DOI | MR | Zbl

[26] C. Cortes, V. Vapnik, “Support-vector networks”, Mach. Learn., 20:3 (1995), 273–297 | DOI | Zbl

[27] J. Dodziuk, “Difference equations, isoperimetric inequality and transience of certain random walks”, Trans. Amer. Math. Soc., 284:2 (1984), 787–794 | DOI | MR | Zbl

[28] Th. Doliwa, Gaojian Fan, H. U. Simon, S. Zilles, “Recursive teaching dimension, VC-dimension and sample compression”, J. Mach. Learn. Res., 15:1 (2014), 3107–3131 | MR | Zbl

[29] Th. Doliwa, H. U. Simon, S. Zilles, “Recursive teaching dimension, learning complexity, and maximum classes”, Algorithmic learning theory (Canberra, Australia, 2010), Lecture Notes in Comput. Sci., 6331, Lecture Notes in Artificial Intelligence, Springer, Berlin, 2010, 209–223 | DOI | MR | Zbl

[30] S. Floyd, M. Warmuth, “Sample compression, learnability, and the Vapnik–Chervonenkis dimension”, Mach. Learn., 21:3 (1995), 269–304 | DOI

[31] J. Forster, “A linear lower bound on the unbounded error probabilistic communication complexity”, J. Comput. System Sci., 65:4 (2002), 612–625 | DOI | MR | Zbl

[32] J. Forster, M. Krause, S. V. Lokam, R. Mubarakzjanov, N. Schmitt, H. U. Simon, “Relations between communication complexity, linear arrangements, and computational complexity”, FST TCS2001: Foundations of software technology and theoretical computer science (Bangalore, 2001), Lecture Notes in Comput. Sci., 2245, Springer, Berlin, 2001, 171–182 | DOI | MR | Zbl

[33] J. Forster, N. Schmitt, H. U. Simon, Th. Suttorp, “Estimating the optimal margins of embeddings in Euclidean half spaces”, Mach. Learn., 51:3 (2003), 263–281 | DOI | Zbl

[34] J. Forster, H. U. Simon, “On the smallest possible dimension and the largest possible margin of linear arrangements representing given concept classes”, Theoret. Comput. Sci., 350:1 (2006), 40–48 | DOI | MR | Zbl

[35] P. Frankl, “Traces of antichains”, Graphs Combin., 5:1 (1989), 295–299 | DOI | MR | Zbl

[36] B. Gärtner, E. Welzl, “Vapnik–Chervonenkis dimension and (pseudo-)hyperplane arrangements”, Discrete Comput. Geom., 12:4 (1994), 399–432 | DOI | MR | Zbl

[37] D. Haussler, “Sphere packing numbers for subsets of the Boolean $n$-cube with bounded Vapnik–Chervonenkis dimension”, J. Combin. Theory Ser. A, 69:2 (1995), 217–232 | DOI | MR | Zbl

[38] D. Haussler, E. Welzl, “$\varepsilon$-nets and simplex range queries”, Discrete Comput. Geom., 2:2 (1987), 127–151 | DOI | MR | Zbl

[39] Sh. Hoory, N. Linial, A. Wigderson, “Expander graphs and their applications”, Bull. Amer. Math. Soc. (N.S.), 43:4 (2006), 439–561 | DOI | MR | Zbl

[40] W. B. Johnson, J. Lindenstrauss, “Extensions of Lipschitz mappings into a Hilbert space”, Conference in modern analysis and probability (New Haven, Conn., 1982), Contemp. Math., 26, Amer. Math. Soc., Providence, RI, 1984, 189–206 | DOI | MR | Zbl

[41] P. Keevash, The existence of designs, arXiv: 1401.3665

[42] J. Komlós, J. Pach, G. Woeginger, “Almost tight bounds for $\varepsilon$-nets”, Discrete Comput. Geom., 7:2 (1992), 163–173 | DOI | MR | Zbl

[43] I. Kremer, N. Nisan, D. Ron, “On randomized one-round communication complexity”, Comput. Complexity, 8:1 (1999), 21–49 | DOI | MR | Zbl

[44] E. Kushilevitz, N. Nisan, Communication complexity, Cambridge Univ. Press, Cambridge, 1997, xiv+189 pp. | DOI | MR | Zbl

[45] D. Kuzmin, M. K. Warmuth, “Unlabeled compression schemes for maximum classes”, J. Mach. Learn. Res., 8 (2007), 2047–2081 | MR | Zbl

[46] T. Lee, A. Shraibman, “An approximation algorithm for approximation rank”, CCC' 09 Proceedings of the 24th annual IEEE conference on computational complexity (Paris, 2009), IEEE Computer Soc., Los Alamitos, CA, 2009, 351–357 | DOI | MR

[47] N. Linial, A. Shraibman, “Learning complexity vs communication complexity”, Combin. Probab. Comput., 18:1-2 (2009), 227–245 | DOI | MR | Zbl

[48] S. V. Lokam, “Complexity lower bounds using linear algebra”, Found. Trends Theor. Comput. Sci., 4:1-2 (2009), 1–155 | DOI | MR | Zbl

[49] J. Matoušek, “Intersection graphs of segments and $\exists \mathbb {R}$”, arXiv: 1406.2636

[50] J. Matoušek, E. Welzl, L. Wernisch, “Discrepancy and approximations for bounded VC-dimension”, Combinatorica, 13:4 (1993), 455–466 | DOI | MR | Zbl

[51] N. E. Mnev, “The universality theorems on the classification problem of configuration varieties and convex polytopes varieties”, Topology and geometry — Rohlin seminar, Lecture Notes in Math., 1346, Springer, Berlin, 1989, 527–543 | DOI | MR | Zbl

[52] Sh. Moran, Shattering-extremal systems, arXiv: 1211.2980

[53] Sh. Moran, M. K. Warmuth, “Labeled compression schemes for extremal classes”, Algorithmic learning theory (Bari, 2016), Lecture Notes in Comput. Sci., 9925, Lecture Notes in Artificial Intelligence, Springer, [Cham], 2016, 34–49 ; arXiv: 1506.00165 | DOI | MR | Zbl

[54] A. Nilli, “On the second eigenvalue of a graph”, Discrete Math., 91:2 (1991), 207–210 | DOI | MR | Zbl

[55] R. Paturi, J. Simon, “Probabilistic communication complexity”, J. Comput. System Sci., 33:1 (1986), 106–123 | DOI | MR | Zbl

[56] A. A. Razborov, A. A. Sherstov, “The sign-rank of $\mathrm{AC}(0)$”, SIAM J. Comput., 39:5 (2010), 1833–1855 | DOI | MR | Zbl

[57] J. Richter-Gebert, “Mnëv's universality theorem revisited”, Sém. Lothar. Combin., 34 (1995), B34h, 15 pp. | MR | Zbl

[58] F. Rosenblatt, The perceptron — a perceiving and recognizing automaton, Tech. report No. 85-460-1, Cornell Aeronautical Laboratory, Inc., New York, 1957, ii+29 pp.

[59] B. I. P. Rubinstein, J. H. Rubinstein, “A geometric approach to sample compression”, J. Mach. Learn. Res., 13 (2012), 1221–1261 | MR | Zbl

[60] J. H. Rubinstein, B. I. P. Rubinstein, P. L. Bartlett, “Bounding embeddings of VC classes into maximum classes”, Measures of complexity, Springer, Cham, 2015, 303–325 ; arXiv: 1401.7388 | DOI | MR | Zbl

[61] N. Sauer, “On the density of families of sets”, J. Combinatorial Theory Ser. A, 13:1 (1972), 145–147 | DOI | MR | Zbl

[62] B. Schölkopf, A. Smola, K.-R. Müller, “Nonlinear component analysis as a kernel eigenvalue problem”, Neural Comput., 10:5 (1998), 1299–1319 | DOI

[63] A. A. Sherstov, “Halfspace matrices”, Comput. Complexity, 17:2 (2008), 149–178 | DOI | MR | Zbl

[64] A. A. Sherstov, “Communication complexity under product and nonproduct distributions”, Comput. Complexity, 19:1 (2010), 135–150 | DOI | MR | Zbl

[65] P. W. Shor, “Stretchabilltv of pseudolines is NP-hard”, Applied geometry and discrete mathematics, DIMACS Ser. Discrete Math. Theoret. Comput. Sci., 4, Amer. Math. Soc., Providence, RI, 1991, 531–554 | MR | Zbl

[66] V. N. Vapnik, A. Ya. Chervonenkis, “On the uniform convergence of relative frequencies of events to their probabilities”, Theory Probab. Appl., 16:2 (1971), 264–280 | DOI | MR | Zbl

[67] V. N. Vapnik, Statistical learning theory, Adapt. Learn. Syst. Signal Process. Commun. Control, John Wiley Sons, Inc., New York, 1998, xxvi+736 pp. | MR | Zbl

[68] H. E. Warren, “Lower bounds for approximation by nonlinear manifolds”, Trans. Amer. Math. Soc., 133 (1968), 167–178 | DOI | MR | Zbl

[69] E. Welzl, “Partition trees for triangle counting and other range searching problems”, SCG' 88 Proceedings of the 4th annual symposium on computational geometry (Urbana, IL, 1988), ACM, New York, 1988, 23–33 | DOI | MR