Alignment of a hidden palindrome
Matematičeskaâ biologiâ i bioinformatika, Tome 19 (2024) no. 2, pp. 427-438

Voir la notice de l'article provenant de la source Math-Net.Ru

The aim of the work is to generalize known algorithms to solve new problems arising in bioinformatics. We consider algorithms for optimizing the edit distance between sequences, the first of which is known and the second is a hidden palindrome of arbitrary length. It is important that the length of the desired palindrome is determined as a result of optimization. In the first task, it is necessary to select a palindrome from the ensemble of palindromes defined by the second input sequence. In this case, the original sequences may not contain the desired palindrome entirely. But the second sequence contains half of the desired palindrome. The first input sequence is used for optimization. In another task, such a palindrome may be partial, that is, only a prefix is complementary to a suffix. Such a partial palindrome forms a hairpin. The new algorithms run in quadratic time, which is faster than exhaustive search of admissible palindromes. The algorithms essentially exploit the linear dependence of the edit distance on the length of a continuous deletion or insertion. On the other hand, the algorithm for solving the first task allows us to calculate the similarity of a given sequence to any palindrome. However, in general, comparing two different sequences does not reduce to finding palindromes in each of them. Fast search for suboptimal solutions is also discussed. Software implementations of the considered algorithms are created. They are available at lab6.iitp.ru/~/pali. Some examples of nucleotide sequences with degenerate inverted repeats are given. In particular, we consider inverted repeats in noncoding regions of plastid DNA in flowering plants as well as microRNA genes. The possible application of our method to the search for conservative secondary structures of RNA is also discussed.
@article{MBB_2024_19_2_a5,
     author = {O. A. Zverkov and A. V. Seliverstov and G. A. Shilovsky},
     title = {Alignment of a hidden palindrome},
     journal = {Matemati\v{c}eska\^a biologi\^a i bioinformatika},
     pages = {427--438},
     publisher = {mathdoc},
     volume = {19},
     number = {2},
     year = {2024},
     language = {ru},
     url = {http://geodesic.mathdoc.fr/item/MBB_2024_19_2_a5/}
}
TY  - JOUR
AU  - O. A. Zverkov
AU  - A. V. Seliverstov
AU  - G. A. Shilovsky
TI  - Alignment of a hidden palindrome
JO  - Matematičeskaâ biologiâ i bioinformatika
PY  - 2024
SP  - 427
EP  - 438
VL  - 19
IS  - 2
PB  - mathdoc
UR  - http://geodesic.mathdoc.fr/item/MBB_2024_19_2_a5/
LA  - ru
ID  - MBB_2024_19_2_a5
ER  - 
%0 Journal Article
%A O. A. Zverkov
%A A. V. Seliverstov
%A G. A. Shilovsky
%T Alignment of a hidden palindrome
%J Matematičeskaâ biologiâ i bioinformatika
%D 2024
%P 427-438
%V 19
%N 2
%I mathdoc
%U http://geodesic.mathdoc.fr/item/MBB_2024_19_2_a5/
%G ru
%F MBB_2024_19_2_a5
O. A. Zverkov; A. V. Seliverstov; G. A. Shilovsky. Alignment of a hidden palindrome. Matematičeskaâ biologiâ i bioinformatika, Tome 19 (2024) no. 2, pp. 427-438. http://geodesic.mathdoc.fr/item/MBB_2024_19_2_a5/

[1] V. I. Levenshtein, “Dvoichnye kody s ispravleniem vypadenii, vstavok i zameschenii simvolov”, Doklady AN SSSR, 163:4 (1965), 845–848 | Zbl | Zbl

[2] Leontev V. K., “Vosstanovlenie tsiklicheskikh slov po fragmentam”, Problemy peredachi informatsii, 48:2 (2012), 121–126 | Zbl | Zbl

[3] R. K. Tetuev, N. N. Nazipova, “Statisticheskaya model predskazaniya saitov svyazyvaniya TALEN s DNK na osnove skolzyaschego srednego”, Matem. biologiya i bioinform, 18:2 (2023), 621–645 | DOI | DOI

[4] L. Fu, Y. Cao, J. Wu, Q. Peng, Q. Nie, X. Xie, “UFold: fast and accurate RNA secondary structure prediction with deep learning”, Nucleic Acids Research, 50:3 (2022), e14 | DOI | DOI

[5] C. C. Chen, Y. M. Chan, “REDfold: accurate RNA secondary structure prediction using residual encoder-decoder network”, BMC Bioinformatics, 24 (2023), 1–13 | DOI | Zbl | DOI | Zbl

[6] L. N. Grinkevich, “Rol mikroRNK v obuchenii i dolgovremennoi pamyati”, Vavilovskii zhurnal genetiki i selektsii, 24:8 (2020), 885–896 | DOI | DOI

[7] K. V. Mikhailov, B. D. Efeykin, A. Y. Panchin, D. A. Knorre, M. D. Logacheva, A. A. Penin, M. S. Muntyan, M. A. Nikitin, O. V. Popova, O. N. Zanegina, M. Y. Vyssokikh, S. E. Spiridonov, V. V. Aleoshin, Y. V. Panchin, “Coding palindromes in mitochondrial genes of Nematomorpha”, Nucleic Acids Research, 47:13 (2019), 6858–6870 | DOI | DOI

[8] Nikolaeva O. V., Beregova A. M., Efeykin B. D., Miroliubova T. S., Zhuravlev A. Yu., A. Yu. Ivantsov, K. V. Mikhailov, S. E. Spiridonov, V. V. Aleoshin, “Expression of hairpin-enriched mitochondrial DNA in two hairworm species (Nematomorpha)”, International Journal of Molecular Sciences, 24:14 (2023) | DOI | DOI

[9] L. A. Miroshnichenko, N. A. Arefeva, Yu. P. Dzhioev, V. D. Gusev, A. Yu. Borisenko, S. V. Erdyneev, Yu. S. Bukin, “Struktura povtorov v genomakh salmonell”, Matem. biologiya i bioinform, 18:2 (2023), 602–620 | DOI | DOI

[10] V. A. Lyubetsky, O. A. Zverkov, L. I. Rubanov, Seliverstov A. V., “Modeling RNA polymerase competition: the effect of-subunit knockout and heat shock on gene transcription level”, Biology Direct, 6:3 (2011), 1–16 | DOI | DOI

[11] O. A. Zverkov, L. Yu. Rusin, A. V. Seliverstov, V. A. Lyubetskii, “Izuchenie vstavok pryamykh povtorov v mikroevolyutsii mitokhondrii i plastid rastenii na osnove klasterizatsii belkov”, Vestnik Moskovskogo universiteta. Seriya 16. Biologiya, 2013, no. 1, 8–13

[12] Alzamel M., Hampson C., Iliopoulos C. S., Lim Z., Pissis S., Vlachakis D., Watts S., “Maximal degenerate palindromes with gaps and mismatches”, Theoretical Computer Science, 978 (2023), 1–16 | DOI | MR | DOI | MR

[13] Needleman S. B., Wunsch Ch. D., “A general method applicable to the search for similarities in the amino acid sequence of two proteins”, Journal of Molecular Biology, 48:3 (1970), 443–453 | DOI | DOI

[14] V. O. Yankovskii, “Gruppy izometrii formalnykh yazykov otnositelno obobschennykh metrik Levenshteina”, Matematicheskie zametki, 116:2 (2024), 306–315 | DOI | Zbl | DOI | Zbl

[15] M.S. Uotermen (red.), Matematicheskie metody dlya analiza posledovatelnostei DNK, Mir, M., 1999, 349 pp.

[16] Backurs A., Indyk P., “Editdistance cannot be computedinstrongly subquadratic time (unless SETH is false)”, SIAM Journal on Computing, 47:3 (2018), 1087–1097 | DOI | MR | Zbl | DOI | MR | Zbl

[17] W. J. Masek, M. S. Paterson, “A faster algorithm computing string edit distances”, Journal of Computer and System Sciences, 20:1 (1980), 18–31 | DOI | MR | Zbl | DOI | MR | Zbl

[18] V. L. Arlazarov, E. A. Dinits, M. A. Kronrod, I. A. Faradzhev, “Ob ekonomnom postroenii tranzitivnogo zamykaniya orientirovannogo grafa”, Doklady AN SSSR, 194:3 (1970), 487–488 | MR | Zbl | MR | Zbl

[19] M. Crochemore, G. M. Landau, M. Ziv-Ukelson, “A subquadratic sequence alignment algorithm for unrestricted score matrices”, SIAM Journal on Computing, 32:6 (2003), 1654–1673 | DOI | MR | Zbl | DOI | MR | Zbl

[20] A. Tiskin, “Semi-local longest common subsequences in subquadratic time”, Journal of Discrete Algorithms, 6:4 (2008), 570–581 | DOI | MR | Zbl | DOI | MR | Zbl

[21] Akmal S., Jin C., “Near-optimal quantum algorithms for string problems”, Algorithmica, 85 (2023), 2260–2317 | DOI | MR | Zbl | DOI | MR | Zbl

[22] R. K. Tetuev, M. I. Pyatkov, A. N. Pankratov, “Parallelnyi algoritm globalnogo vyravnivaniya protyazhennykh aminokislotnykh i nukleotidnykh posledovatelnostei”, Matem.biologiya i bioinform, 12:1 (2017), 137–150 | DOI | DOI

[23] N. Mishin, D. Berezun, A. Tiskin, “Efficient parallel algorithms for string comparison”, ICPP '21: Proceedings of the 50th International Conference on Parallel Processing, 2021, 50, 1–10 | DOI | DOI

[24] A. Tiskin, “Periodic string comparison”, Combinatorial Pattern Matching. CPM 2009, Lecture Notes in Computer Science, 5577, eds. Kucherov G., Ukkonen E., Springer, Berlin–Heidelberg, 2009 | DOI | MR | Zbl | DOI | MR | Zbl

[25] B. A. Zolotov, N. S. Gaevoi, A. V. Tiskin, “Algoritmy sravneniya periodicheskikh strok”, IV Konferentsiya matematicheskikh tsentrov Rossii: sbornik tezisov (Sankt-Peterburg, 2024), 143 | MR | MR

[26] Sokol D., Benson G., Tojeira J., “Tandemrepeats over the edit distance”, Bioinformatics, 23:2 (2007), e30-e35 | DOI | DOI

[27] D. Sokol, J. Tojeira, “Speeding up the detection of tandem repeats over the edit distance”, Theoretical Computer Science, 525 (2014), 103–110 | DOI | MR | Zbl | DOI | MR | Zbl

[28] A. Tiskin, “Fast distance multiplication of unit-Monge matrices”, Algorithmica, 71 (2015), 859–888 | DOI | MR | Zbl | DOI | MR | Zbl

[29] A. Conte, R. Grossi, G. Punzi, T. Uno, “Enumeration of maximal common subsequences between two strings”, Algorithmica, 84 (2022), 757–783 | DOI | MR | Zbl | DOI | MR | Zbl

[30] T. Kociumaka, J. Radoszewski, T. Starikovskaya, “Longest common substring with approximately k mismatches”, Algorithmica, 81:6 (2019), 2633–2652 | DOI | MR | Zbl | DOI | MR | Zbl

[31] A. Amir, P. Charalampopoulos, S. P. Pissis, J. Radoszewski, “Dynamic and internal longest common substring”, Algorithmica, 82 (2020), 3707–3743 | DOI | MR | Zbl | DOI | MR | Zbl

[32] J. Ai, L. H. Sun, H. Che, R. Zhang, T. Z. Zhang, W. C. Wu, X. L. Su, X. Chen, G. Yang, K. Li, Wang N., Ban T., Bao Y. N., Guo F., Niu H. F., Zhu Y. L., Zhu X. Y., Zhao S. G., B. F. Yang, “MicroRNA-195 protects against dementia induced by chronic brain hypoperfusion via its anti-amyloidogenic effect in rats”, Journal of Neuroscience, 33:9 (2013), 3989–4001 | DOI | DOI

[33] R. B. Chanin, P. T. West, J. Wirbel, M. O. Gill, G. Z.M. Green, R. M. Park, N. Enright, A. M. Miklos, A. S. Hickey, E. F. Brooks, K. K. Lum, I. M. Cristea, A. S. Bhatt, “Intragenic DNAinversions expand bacterial coding capacity”, Nature, 634 (2024), 234–242 | DOI | DOI