Testing the manifold hypothesis
Journal of the American Mathematical Society, Tome 29 (2016) no. 4, pp. 983-1049

Voir la notice de l'article provenant de la source American Mathematical Society

The hypothesis that high dimensional data tend to lie in the vicinity of a low dimensional manifold is the basis of manifold learning. The goal of this paper is to develop an algorithm (with accompanying complexity guarantees) for testing the existence of a manifold that fits a probability distribution supported in a separable Hilbert space, only using i.i.d. samples from that distribution. More precisely, our setting is the following. Suppose that data are drawn independently at random from a probability distribution $\mathcal {P}$ supported on the unit ball of a separable Hilbert space $\mathcal {H}$. Let $\mathcal {G}(d, V, \tau )$ be the set of submanifolds of the unit ball of $\mathcal {H}$ whose volume is at most $V$ and reach (which is the supremum of all $r$ such that any point at a distance less than $r$ has a unique nearest point on the manifold) is at least $\tau$. Let $\mathcal {L}(\mathcal {M}, \mathcal {P})$ denote the mean-squared distance of a random point from the probability distribution $\mathcal {P}$ to $\mathcal {M}$. We obtain an algorithm that tests the manifold hypothesis in the following sense. The algorithm takes i.i.d. random samples from $\mathcal {P}$ as input and determines which of the following two is true (at least one must be): There exists $\mathcal {M} \in \mathcal {G}(d, CV, \frac {\tau }{C})$ such that $\mathcal {L}(\mathcal {M}, \mathcal {P}) \leq C {\epsilon }.$ There exists no $\mathcal {M} \in \mathcal {G}(d, V/C, C\tau )$ such that $\mathcal {L}(\mathcal {M}, \mathcal {P}) \leq \frac {\epsilon }{C}.$ The answer is correct with probability at least $1-\delta$.
DOI : 10.1090/jams/852

Fefferman, Charles 1 ; Mitter, Sanjoy 2 ; Narayanan, Hariharan 3

1 Department of Mathematics, Princeton University, Princeton, New Jersey 08544
2 Laboratory for Information and Decision Systems, MIT, Cambridge, Massachusetts 02139
3 Department of Statistics and Department of Mathematics, University of Washington, Seattle, Washington 98195
@article{10_1090_jams_852,
     author = {Fefferman, Charles and Mitter, Sanjoy and Narayanan, Hariharan},
     title = {Testing the manifold hypothesis},
     journal = {Journal of the American Mathematical Society},
     pages = {983--1049},
     publisher = {mathdoc},
     volume = {29},
     number = {4},
     year = {2016},
     doi = {10.1090/jams/852},
     url = {http://geodesic.mathdoc.fr/articles/10.1090/jams/852/}
}
TY  - JOUR
AU  - Fefferman, Charles
AU  - Mitter, Sanjoy
AU  - Narayanan, Hariharan
TI  - Testing the manifold hypothesis
JO  - Journal of the American Mathematical Society
PY  - 2016
SP  - 983
EP  - 1049
VL  - 29
IS  - 4
PB  - mathdoc
UR  - http://geodesic.mathdoc.fr/articles/10.1090/jams/852/
DO  - 10.1090/jams/852
ID  - 10_1090_jams_852
ER  - 
%0 Journal Article
%A Fefferman, Charles
%A Mitter, Sanjoy
%A Narayanan, Hariharan
%T Testing the manifold hypothesis
%J Journal of the American Mathematical Society
%D 2016
%P 983-1049
%V 29
%N 4
%I mathdoc
%U http://geodesic.mathdoc.fr/articles/10.1090/jams/852/
%R 10.1090/jams/852
%F 10_1090_jams_852
Fefferman, Charles; Mitter, Sanjoy; Narayanan, Hariharan. Testing the manifold hypothesis. Journal of the American Mathematical Society, Tome 29 (2016) no. 4, pp. 983-1049. doi: 10.1090/jams/852

[1] Amari, Shun-Ichi, Nagaoka, Hiroshi Methods of information geometry 2000

[2] Baraniuk, Richard G., Wakin, Michael B. Random projections of smooth manifolds Found. Comput. Math. 2009 51 77

[3] Belkin, Mikhail, Niyogi, Partha Laplacian eigenmaps for dimensionality reduction and data representation Neural Comput. 2003

[4] Boissonnat, Jean-Daniel, Guibas, Leonidas J., Oudot, Steve Y. Manifold reconstruction in arbitrary dimensions using witness complexes Discrete Comput. Geom. 2009 37 70

[5] Bousquet, Olivier, Boucheron, Stã©Phane, Lugosi, Gã¡Bor Introduction to statistical learning theory 2004 169 207

[6] Bradley, P. S., Mangasarian, O. L. 𝑘-plane clustering J. Global Optim. 2000 23 32

[7] Brand, M. Charting a manifold NIPS 2002

[8] Caã±As, Guillermo D., Poggio, Tomaso, Rosasco, Lorenzo Learning manifolds with k-means and k-flats 2013

[9] Carlsson, Gunnar Topology and data Bull. Amer. Math. Soc. (N.S.) 2009 255 308

[10] Cheng, Siu-Wing, Dey, Tamal K., Ramos, Edgar A. Manifold reconstruction from point samples 2005 1018 1027

[11] Genovese, Christopher R., Perone-Pacifico, Marco, Verdinelli, Isabella, Wasserman, Larry Manifold estimation and singular deconvolution under Hausdorff loss Ann. Statist. 2012 941 963

[12] Genovese, Christopher R., Perone-Pacifico, Marco, Verdinelli, Isabella, Wasserman, Larry Minimax manifold estimation J. Mach. Learn. Res. 2012

[13] Clarkson, Kenneth L. Tighter bounds for random projections of manifolds 2008 39 48

[14] Coifman, Ronald R., Lafon, Stephane, Lee, Ann, Maggioni, Mauro, Nadler, Boaz, Warner, Frederick, Zucker, Steven Geometric diffusions as a tool for harmonic analysis and structure definition of data. Part II: Multiscale methods Proc. Natl. Acad. Sci. USA 2005

[15] Cox, Trevor F., Cox, Michael A. A. Multidimensional scaling 1994

[16] Dasgupta, Sanjoy, Freund, Yoav Random projection trees and low dimensional manifolds 2008 537 546

[17] David, Guy, Semmes, Stephen Analysis of and on uniformly rectifiable sets 1993

[18] Devroye, Luc, Gyã¶Rfi, Lã¡Szlo, Lugosi, Gã¡Bor A probabilistic theory of pattern recognition 1996

[19] Donoho, David L., Grimes, Carrie Hessian eigenmaps: locally linear embedding techniques for high-dimensional data Proc. Natl. Acad. Sci. USA 2003 5591 5596

[20] Dudley, R. M. Uniform central limit theorems 1999

[21] Federer, Herbert Curvature measures Trans. Amer. Math. Soc. 1959 418 491

[22] Fefferman, Charles Interpolation and extrapolation of smooth functions by linear operators Rev. Mat. Iberoamericana 2005 313 348

[23] Fefferman, Charles The 𝐶^{𝑚} norm of a function with prescribed jets. II Rev. Mat. Iberoam. 2009 275 421

[24] Genovese, Christopher R., Perone-Pacifico, Marco, Verdinelli, Isabella, Wasserman, Larry Nonparametric ridge estimation Ann. Statist. 2014 1511 1545

[25] Hastie, Trevor, Stuetzle, Werner Principal curves J. Amer. Statist. Assoc. 1989 502 516

[26] Hotelling, H. Analysis of a complex of statistical variables into principal components J. Educational Psychology 1933

[27] Iwen, Mark A., Maggioni, Mauro Approximation of points on low-dimensional manifolds via random linear projections Inf. Inference 2013 1 31

[28] Johnson, William B., Lindenstrauss, Joram Extensions of Lipschitz mappings into a Hilbert space 1984 189 206

[29] Jones, Peter W., Maggioni, Mauro, Schul, Raanan Universal local parametrizations via heat kernels and eigenfunctions of the Laplacian Ann. Acad. Sci. Fenn. Math. 2010 131 174

[30] Kambhatla, A., Leen, Todd K. Fast non-linear dimension reduction 1993

[31] Kã©Gl, Balã¡Zs, Krzyzak, Adam, Linder, Tamã¡S, Zeger, Kenneth Learning and design of principal curves IEEE Trans. Pattern Analysis and Machine Intelligence 2000

[32] Ledoux, Michel The concentration of measure phenomenon 2001

[33] Lerman, Gilad, Zhang, Teng Robust recovery of multiple subspaces by geometric 𝑙_{𝑝} minimization Ann. Statist. 2011 2686 2715

[34] Manifold learning theory and applications 2012

[35] Martinetz, Thomas, Schulten, Klaus Topology representing networks Neural Netw. 1994

[36] Massart, Pascal Some applications of concentration inequalities to statistics Ann. Fac. Sci. Toulouse Math. (6) 2000 245 303

[37] Maurer, Andreas, Pontil, Massimiliano 𝐾-dimensional coding schemes in Hilbert spaces IEEE Trans. Inform. Theory 2010 5839 5846

[38] Narasimhan, Raghavan Lectures on topics in analysis 1965

[39] Narayanan, Hariharan, Mitter, Sanjoy On the sample complexity of testing the manifold hypothesis NIPS 2010

[40] Narayanan, Hariharan, Niyogi, Partha On the sample complexity of learning smooth cuts on a manifol

[41] Newton, Nigel J. Infinite-dimensional manifolds of finite-entropy probability measures 2013 713 720

[42] Niyogi, Partha, Smale, Stephen, Weinberger, Shmuel Finding the homology of submanifolds with high confidence from random samples Discrete Comput. Geom. 2008 419 441

[43] Ozertem, Umut, Erdogmus, Deniz Locally defined principal curves and surfaces J. Mach. Learn. Res. 2011 1249 1286

[44] Pearson, K. On lines and planes of closest fit to systems of points in space Philos. Mag. 1901

[45] Roweis, S. T., Saul, L., Hinton, G. Global coordination of local linear models Adv. Neural Inf. Process. Syst. 2002

[46] Roweis, Sam T., Saul, Lawrence K. Nonlinear dimensionality reduction by locally linear embedding Science 2000

[47] Rudelson, M., Vershynin, R. Combinatorics of random processes and sections of convex bodies Ann. of Math. (2) 2006 603 648

[48] Shawe-Taylor, J., Christianini, N. Kernel methods for pattern analysis 2004

[49] Smola, Alex J., Williamson, Robert C., Mika, Sebastian, Schã¶Lkopf, Bernhard Regularized principal manifolds 1999 214 229

[50] Sridharan, Karthik, Srebro, Nathan Note on refined Dudley integral covering number bound

[51] Tenenbaum, J. B., Silva, V., Langford, J. C. A Global Geometric Framework for Nonlinear Dimensionality Reduction Science 2000

[52] Vaidya, Pravin M. A new algorithm for minimizing convex functions over convex sets Math. Programming 1996 291 341

[53] Vapnik, Vladimir Estimation of dependences based on empirical data 1982

[54] Weinberger, Kilian Q., Saul, Lawrence K. Unsupervised learning of image manifolds by semidefinite programming Int. J. Comput. Vision 2006

[55] Zhang, Zhenyue, Zha, Hongyuan Principal manifolds and nonlinear dimensionality reduction via tangent space alignment SIAM J. Sci. Comput. 2004 313 338

Cité par Sources :