The Chvátal–Sankoff problem as a problem of symbolic dynamics
Zapiski Nauchnykh Seminarov POMI, Representation theory, dynamical systems, combinatorial methods. Part XXXV, Tome 528 (2023), pp. 214-237 Cet article a éte moissonné depuis la source Math-Net.Ru

Voir la notice du chapitre de livre

Given two equally long, uniformly random binary strings, the expected length of their longest common subsequence (LCS) is asymptotically proportional to the strings' length. Finding the proportionality coefficient $\gamma$, i.e., the limit of the normalised LCS length for two random binary strings of length $n \to \infty$, is a natural problem, first posed by Chvátal and Sankoff in 1975, and as yet unresolved. This problem has relevance to diverse fields ranging from combinatorics and algorithm analysis to coding theory and computational biology. In a previous paper [47], we used methods of statistical mechanics, as well as some existing results on the combinatorial structure of LCS, to link constant $\gamma$ to the parameters of a certain stochastic particle process. Here, we complement this analysis by presenting a formulation of the problem in the language of symbolic dynamics and cellular automata, and reporting some preliminary results of a computational experiment aimed at improving the existing numerical estimates for $\gamma$. We also point out an error in the previous paper [47], which invalidates some of its claims on the properties of $\gamma$.
@article{ZNSL_2023_528_a12,
     author = {A. Tiskin},
     title = {The {Chv\'atal{\textendash}Sankoff} problem as a problem of symbolic dynamics},
     journal = {Zapiski Nauchnykh Seminarov POMI},
     pages = {214--237},
     year = {2023},
     volume = {528},
     language = {en},
     url = {http://geodesic.mathdoc.fr/item/ZNSL_2023_528_a12/}
}
TY  - JOUR
AU  - A. Tiskin
TI  - The Chvátal–Sankoff problem as a problem of symbolic dynamics
JO  - Zapiski Nauchnykh Seminarov POMI
PY  - 2023
SP  - 214
EP  - 237
VL  - 528
UR  - http://geodesic.mathdoc.fr/item/ZNSL_2023_528_a12/
LA  - en
ID  - ZNSL_2023_528_a12
ER  - 
%0 Journal Article
%A A. Tiskin
%T The Chvátal–Sankoff problem as a problem of symbolic dynamics
%J Zapiski Nauchnykh Seminarov POMI
%D 2023
%P 214-237
%V 528
%U http://geodesic.mathdoc.fr/item/ZNSL_2023_528_a12/
%G en
%F ZNSL_2023_528_a12
A. Tiskin. The Chvátal–Sankoff problem as a problem of symbolic dynamics. Zapiski Nauchnykh Seminarov POMI, Representation theory, dynamical systems, combinatorial methods. Part XXXV, Tome 528 (2023), pp. 214-237. http://geodesic.mathdoc.fr/item/ZNSL_2023_528_a12/

[1] A. Abboud, A. Backurs, V. Vassilevska Williams, “Tight hardness results for LCS and other sequence similarity measures”, Proceedings of FOCS, 2015, 59–78

[2] K. S. Alexander, “The rate of convergence of the mean length of the longest common subsequence”, Annal. Appl. Probab., 4:4 (1994), 1074–1082

[3] C. E. R. Alves, E. N. Cáceres, S. W. Song, “An all-substrings common subsequence algorithm”, Discrete Appl. Math., 156:7 (2008), 1025–1035

[4] R. A. Baeza-Yates, R. Gavaldà, G. Navarro, R. Scheihing, “Bounding the expected length of longest common subsequences and forests”, Theory of Computing Systems, 32:4 (1999), 435–452

[5] K. E. Batcher, “Sorting networks and their applications”, Proceedings of AFIPS, 32 (1968), 307–314

[6] A. Borodin, I. Corwin, V. Gorin, “Stochastic six-vertex model”, Duke Math. J., 165:3 (2016), 563–624

[7] K. Bringmann, M. Künnemann, “Multivariate fine-grained complexity of longest common subsequence”, Proceedings of ACM-SIAM SODA, 2018, 1216–1235

[8] B. Bukh, C. Cox, “Periodic words, common subsequences and frogs”, Annals Appl. Probab., 32:2 (2022), 1295–1332

[9] B. Bukh, V. Guruswami, J. Hastad, “An improved bound on the fraction of correctable deletions”, IEEE Transactions on Information Theory, 63:1 (2017), 93–103

[10] R. Bundschuh, “High precision simulations of the longest common subsequence problem”, European Phys. J. B, 22:4 (2001), 533–541

[11] J. Casse, “Probabilistic cellular automata with general alphabets possessing a Markov chain as an invariant distribution”, Adv. Appl. Probab., 48:2 (2016), 369–391

[12] P. Charalampopoulos, P. Gawrychowski, S. Mozes, O. Weimann, “An Almost Optimal Edit Distance Oracle”, Proceedings of ICALP, Leibniz International Proceedings in Informatics, 198, 2021, 48:1–48:20

[13] P. Charalampopoulos, T. Kociumaka, Sh. Mozes, “Dynamic string alignment”, Proceedings of CPM, LIPIcs, 161, 2020, 9:1–9:13

[14] V. Chvátal, D. Sankoff, “Longest common subsequences of two random sequences”, J. Appl. Probab., 12:2 (1975), 306–315

[15] M. Crochemore, C. S. Iliopoulos, Y. J. Pinzon, J. F. Reid, “A fast and practical bit-vector algorithm for the Longest Common Subsequence problem”, Inform. Proc. Letters, 80 (2001), 279–285

[16] M. Crochemore, G. M. Landau, M. Ziv-Ukelson, “A subquadratic sequence alignment algorithm for unrestricted score matrices”, SIAM J. Comput., 32 (2003), 1654–1673

[17] V. Dančík, Expected Length of Longest Common Subsequences, PhD thesis, University of Warwick, 1994

[18] J. Boutet De Monvel, “Extensive simulations for longest common subsequences: Finite size scaling, a cavity solution, and configuration space properties”, Europ. Phys. J. B, 7:2 (1999), 293–308

[19] J. G. Deken, “Some limit results for longest common subsequences”, Discrete Math., 26:1 (1979), 17–31

[20] Pablo A. Ferrari, “TASEP hydrodynamics using microscopic characteristics”, Probab. Surveys, 15 (2018), 1–27

[21] P. Gawrychowski, “Faster algorithm for computing the edit distance between SLP-compressed strings”, Proceedings of SPIRE, Lecture Notes in Computer Science, 7608, 229–236

[22] D. Hermelin, G. M. Landau, S. Landau, O. Weimann, “Unified Compression-Based Acceleration of Edit-Distance Computation”, Algorithmica, 65:2 (2013), 339–353

[23] H. Hyyr{ö}, “Mining bit-parallel LCS-length algorithms”, Proceedings of SPIRE, Lecture Notes in Computer Science, 10508, 2017, 214–220

[24] D. E. Knuth, The Art of Computer Programming: Sorting and Searching, v. 3, Addison Wesley, 1998

[25] T. Kriecherbauer, J. Krug, “A pedestrian's view on interacting particle systems, KPZ universality and random matrices”, J. Phys. A: Math. Theor., 43:40 (2010), 403001

[26] P. Krusche, A. Tiskin, “String comparison by transposition networks”, London Algorithmics 2008 Theory and Practice, Texts in Algorithmics, 11, College Publications, 2009

[27] G. S. Lueker, “Improved bounds on the average length of longest common subsequences”, J. ACM, 56:3 (2009), 17:1–17:38

[28] D. Lind, B. Marcus, An Introduction to Symbolic Dynamics and Coding, second edition, Cambridge University Press, 2021

[29] B. F. Logan, L. A. Shepp, “A variational problem for random Young tableaux”, Adv. Math., 26:2 (1977), 206–222

[30] J. Mairesse, I. Marcovici, “Around probabilistic cellular automata”, Theor. Computer Sci., 559 (2014), 42–72

[31] S. N. Majumdar, S. Nechaev, “Exact asymptotic results for the Bernoulli matching model of sequence alignment”, Physical Review E: Statistical, Nonlinear, and Soft Matter Physics, 72:2 (2005), 020901

[32] J. Martin, P. Schmidt, “Multi-type TASEP in discrete time”, Latin Amer. J. Probab. Math. Statist., 8 (2011), 303–333

[33] W. J. Masek, M. S. Paterson, “A faster algorithm computing string edit distances”, J. Comput. System Sci., 20:1 (1980), 18–31

[34] U. Matarazzo, D. Tsur, M. Ziv-Ukelson, “Efficient all path score computations on grid graphs”, Theor Comput. Sci., 525 (2014), 138–149

[35] M. Paterson, V. Dančík, “Longest common subsequences”, Proceedings of MFCS, Lecture Notes in Computer Science, 841, 1994, 127–142

[36] K Petersen, A Quas, S Shin, “Measures of maximal relative entropy”, Ergodic Theory and Dynamical Systems, 23:1 (2003)

[37] P. A. Pevzner, M. S. Waterman, “Open combinatorial problems in computational molecular biology”, Proceedings of ISTCS, 1995, 158–173

[38] V. B. Priezzhev, G. M. Sch{ü}tz, “Exact solution of the Bernoulli matching model of sequence alignment”, J. Statist. Mech.: Theory and Experiment, 09 (2008), P09007

[39] N. Rajewsky, L. Santen, A. Schadschneider, M. Schreckenberg, “The asymmetric exclusion process: comparison of update procedures”, J. Statist. Phys., 92 (1998), 151–194

[40] D. Romik, The Surprising Mathematics of Longest Increasing Subsequences, Cambridge University Press, Cambridge, 2014

[41] H. Rost, “Non-equilibrium behaviour of a many particle process: Density profile and local equilibria”, Zeitschrift für Wahrscheinlichkeitstheorie und Verwandte Gebiete, 58:1 (1981), 41–53

[42] Y. Sakai, “A substring–substring LCS data structure”, Theor. Comput. Sci., 753:2 (2019), 16–34

[43] Y. Sakai, “A fast algorithm for multiplying min-sum permutations”, Discrete Appl. Math., 159 (2011), 2175–2183

[44] S. Salsa, Partial Differential Equations in Action, UNITEXT, 99, Springer International Publishing, 2016

[45] J. P. Schmidt, “All highest scoring paths in weighted grid graphs and their application to finding all approximate repeats in strings”, SIAM J. Comput., 27:4 (1998), 972–992

[46] J. M. Steele, “An Efron-Stein inequality for nonsymmetric statistics”, Annals Statist., 14:2 (1986), 753–758

[47] A. Tiskin, “The Chvátal–Sankoff problem: Understanding random string comparison through stochastic processes”, Zap. Nauchn. Semin. POMI, 517, 2022, 191–224

[48] A. Tiskin, “Semi-local longest common subsequences in subquadratic time”, J. Disc. Algorithms, 6:4 (2008), 570–581

[49] A. Tiskin, “Periodic String Comparison”, Proceedings of CPM, Lecture Notes in Computer Science, 5577, 2009, 193–206

[50] A. Tiskin, “Towards Approximate Matching in Compressed Strings: Local Subsequence Recognition”, Proceedings of CSR, Lecture Notes in Computer Science, 6651, 2011, 401–414

[51] A. Tiskin, “Fast Distance Multiplication of Unit-Monge Matrices”, Algorithmica, 71 (2015), 859–888

[52] A. Tiskin, “Bounded-Length Smith–Waterman Alignment”, Proceedings of WABI, Leibniz International Proceedings in Informatics, 143, 2019, 16:1–16:12

[53] J. M. Steele, Probability Theory and Combinatorial Optimization, CBMS-NSF regional conference series in applied mathematics, 69, SIAM, 1997

[54] A. Tiskin, “Semi-local string comparison: Algorithmic techniques and applications”, Math. Comput. Sci., 1:4 (2008), 571–603

[55] A. Tiskin, “Communication vs synchronisation in parallel string comparison”, Proceedings of SPAA, 2020, 479–489

[56] A. M. Vershik, S. V. Kerov, “Asymptotics of the Plancherel measure of the symmetric group and the limiting form of Young tableaux”, Dokl. Akad. Nauk, 233:6 (1977), 1024–1027

[57] R. A. Wagner, M. J. Fischer, “The string-to-string correction problem”, J. ACM, 21:1 (1974), 168–173