A new practical linear space algorithm for the longest common subsequence problem
Kybernetika, Tome 38 (2002) no. 1, pp. 45-66
Cet article a éte moissonné depuis la source Czech Digital Mathematics Library

Voir la notice de l'article

This paper deals with a new practical method for solving the longest common subsequence (LCS) problem. Given two strings of lengths $m$ and $n$, $n\ge m$, on an alphabet of size $s$, we first present an algorithm which determines the length $p$ of an LCS in $O(ns+\min \lbrace mp,p(n-p)\rbrace )$ time and $O(ns)$ space. This result has been achieved before [ric94,ric95], but our algorithm is significantly faster than previous methods. We also provide a second algorithm which generates an LCS in $O(ns+\min \lbrace mp,m\log m+p(n-p)\rbrace )$ time while preserving the linear space bound, thus solving the problem posed in [ric94,ric95]. Experimental results confirm the efficiency of our method.
This paper deals with a new practical method for solving the longest common subsequence (LCS) problem. Given two strings of lengths $m$ and $n$, $n\ge m$, on an alphabet of size $s$, we first present an algorithm which determines the length $p$ of an LCS in $O(ns+\min \lbrace mp,p(n-p)\rbrace )$ time and $O(ns)$ space. This result has been achieved before [ric94,ric95], but our algorithm is significantly faster than previous methods. We also provide a second algorithm which generates an LCS in $O(ns+\min \lbrace mp,m\log m+p(n-p)\rbrace )$ time while preserving the linear space bound, thus solving the problem posed in [ric94,ric95]. Experimental results confirm the efficiency of our method.
Classification : 68Q25, 68W05, 68W32
Keywords: longest common subsequence problem; Rick’s algorithm
@article{KYB_2002_38_1_a2,
     author = {Goeman, Heiko and Clausen, Michael},
     title = {A new practical linear space algorithm for the longest common subsequence problem},
     journal = {Kybernetika},
     pages = {45--66},
     year = {2002},
     volume = {38},
     number = {1},
     mrnumber = {1899846},
     zbl = {1265.68363},
     language = {en},
     url = {http://geodesic.mathdoc.fr/item/KYB_2002_38_1_a2/}
}
TY  - JOUR
AU  - Goeman, Heiko
AU  - Clausen, Michael
TI  - A new practical linear space algorithm for the longest common subsequence problem
JO  - Kybernetika
PY  - 2002
SP  - 45
EP  - 66
VL  - 38
IS  - 1
UR  - http://geodesic.mathdoc.fr/item/KYB_2002_38_1_a2/
LA  - en
ID  - KYB_2002_38_1_a2
ER  - 
%0 Journal Article
%A Goeman, Heiko
%A Clausen, Michael
%T A new practical linear space algorithm for the longest common subsequence problem
%J Kybernetika
%D 2002
%P 45-66
%V 38
%N 1
%U http://geodesic.mathdoc.fr/item/KYB_2002_38_1_a2/
%G en
%F KYB_2002_38_1_a2
Goeman, Heiko; Clausen, Michael. A new practical linear space algorithm for the longest common subsequence problem. Kybernetika, Tome 38 (2002) no. 1, pp. 45-66. http://geodesic.mathdoc.fr/item/KYB_2002_38_1_a2/

[1] Aho A. V., Hirschberg D. S., Ullman J. D.: Bounds on the complexity of the longest common subsequence problem. J. Assoc. Comput. Mach. 23 (1976), 1, 1–12 | DOI | MR | Zbl

[2] Apostolico A.: Improving the worst–case performance of the Hunt–Szymanski strategy for the longest common subsequence of two strings. Inform. Process. Lett. 23 (1986), 63–69 | DOI | MR | Zbl

[3] Apostolico A.: Remarks on the Hsu–Du new algorithm for the longest common subsequence problem. Inform. Process. Lett. 25 (1987), 235–236 | DOI | MR

[4] Apostolico A., Guerra G.: The longest common subsequence problem revisited. Algorithmica 2 (1987), 315–336 | DOI | MR | Zbl

[5] Apostolico A., Browne, S., Guerra C.: Fast linear–space computations of longest common subsequences. Theoret. Comput. Sci. 92 (1992), 3–17 | DOI | MR | Zbl

[6] Chin F. Y. L., Poon C. K.: A fast algorithm for computing longest common subsequences of small alphabet size. J. Inform. Process. 13 (1990), 4, 463–469 | Zbl

[7] Chvátal V., Sankoff D.: Longest common subsequences of two random strings. J. Appl. Probab. 12 (1975), 306–315 | DOI | MR

[8] Dančík V., Paterson M.: Upper bounds for the expected length of a longest common subsequence of two binary sequences. In: Proceedings 11th Annual Symp. on Theoretical Aspects of Computer Science (Lecture Notes in Computer Science 775), Springer-Verlag, Berlin 1994, pp. 669–678 | MR | Zbl

[9] Dayhoff M. O.: Computer aids to protein sequence determination. J. Theoret. Biol. 8 (1965), 97–112 | DOI

[10] Dayhoff M. O.: Computer analysis of protein evolution. Sci. Amer. 221 (1969), 1, 86–95 | DOI

[11] Deken J. G.: Some limit results for longest common subsequences. Discrete Math. 26 (1979), 17–31 | DOI | MR | Zbl

[12] Dilworth R. P.: A decomposition theorem for partially ordered sets. Ann. of Math. 51 (1950), 161–166 | DOI | MR | Zbl

[13] Fu K. S., Bhargava B. K.: Tree systems for syntactic pattern recognition. IEEE Trans. Comput. C–22 (1973), 12, 1087–1099 | MR | Zbl

[14] Gallant J., Maier, D., Storer J. A.: On finding minimal length superstrings. J. Comput. System Sci. 20 (1980), 50–58 | DOI | MR | Zbl

[15] Goeman H.: Time and Space Efficient Algorithms for Decomposing Certain Partially Ordered Sets. PhD Thesis, Department of Computer Science, University of Bonn 1999. To appear in Bayreuther Mathematische Schriften | MR

[16] Hirschberg D. S.: A linear space algorithm for computing maximal common subsequences. Comm. ACM 18 (1975), 6, 341–343 | DOI | MR | Zbl

[17] Hirschberg D. S.: Algorithms for the longest common subsequence problem. J. Assoc. Comput. Mach. 24 (1977), 4, 664–675 | DOI | MR | Zbl

[18] Hsu W. J., Du M. W.: New algorithms for the LCS problem. J. Comput. System Sci. 29 (1984), 133–152 | DOI | MR | Zbl

[19] Hunt J. W., Szymanski T. G.: A fast algorithm for computing longest common subsequences. Comm. ACM 20 (1977), 5, 350–353 | DOI | MR | Zbl

[20] Kumar S. K., Rangan C. P.: A linear space algorithm for the LCS problem. Acta Inform. 24 (1987), 353–363 | DOI | MR | Zbl

[21] Lowrance R., Wagner R. A.: An extension of the string–to–string correction problem. J. Assoc. Comput. Mach. 22, (1975), 2, 177–183 | DOI | MR | Zbl

[22] Lu S. Y., Fu K. S.: A sentence–to–sentence clustering procedure for pattern analysis. IEEE Trans. Systems Man Cybernet. SMC–8, (1978), 5, 381–389 | DOI | MR | Zbl

[23] Maier D.: The complexity of some problem on subsequences and supersequences. J. Assoc. Comput. Mach. 25 (1978), 2, 322–336 | DOI | MR

[24] Masek W. J., Paterson M. S.: A faster algorithm for computing string edit distances. J. Comput. System Sci. 20 (1980), 1, 18–31 | DOI | MR

[25] Myers E. W.: An $O(ND)$ difference algorithm and its variations. Algorithmica 1 (1986), 251–266 | DOI | MR | Zbl

[26] Nakatsu N., Kambayashi, Y., Yajima S.: A longest common subsequence algorithm suitable for similar text strings. Acta Inform. 18 (1982), 171–179 | DOI | MR | Zbl

[27] Needleman S. B., Wunsch C. S.: A general method applicable to the search for similarities in the amino acid sequence of two proteins. J. Molecular Biol. 48 (1970), 443–453 | DOI

[28] Paterson M., Dančík V.: Longest common subsequences. In: Proceedings, 19th Intern. Symp. on Mathematical Foundations of Computer Science (Lecture Notes in Computer Science 841), Springer Verlag, Berlin 1994, pp. 127–142 | MR

[29] Rick C.: New Algorithms for the Longest Common Subsequence Problem. Research Report No. 85123–CS, Department of Computer Science, University of Bonn 1994

[30] Rick C.: A new flexible algorithm for the longest common subsequence problem. In: Proceedings, 6th Annual Symp. on Combinatorial Pattern Matching (Lecture Notes in Computer Science 937), Springer Verlag, Berlin 1995, pp. 340–351 | MR | Zbl

[31] Sankoff D., Cedergren R. J.: A test for nucleotide sequence homology. J. Molecular Biol. 77 (1973), 159–164 | DOI

[32] Sankoff D., Kruskal J. B.: Time Warps, String Edits, and Macromolecules: The Theory And Practice of Sequence Comparison. Addison–Wesley, Reading, MA 1983 | MR

[33] Ukkonen E.: Algorithms for approximate string matching. Inform. and Control 64 (1985), 100–118 | DOI | MR | Zbl

[34] Wagner R. A.: On the complexity of the extended string–to–string correction problem. In: Proceedings, 7th Ann. ACM Sympos. on Theory of Comput. 1975, pp. 218–223 | MR | Zbl

[35] Wagner R. A., Fischer M. J.: The string–to–string correction problem. J. Assoc. Comput. Mach. 21 (1974), 1, 168–173 | DOI | MR | Zbl

[36] Wong C. K., Chandra A. K.: Bounds for the string editing problem. J. Assoc. Comput. Mach. 28 (1976), 1, 13–18 | DOI | MR | Zbl

[37] Wu S., Manber U., Myers, G., Miller W.: An $O(NP)$ sequence comparison algorithm. Inform. Process. Lett. 35 (1990), 317–323 | DOI | MR | Zbl