Disambiguation of regular expressions with backreferences via term rewriting
Modelirovanie i analiz informacionnyh sistem, Tome 31 (2024) no. 4, pp. 426-445.

Voir la notice de l'article provenant de la source Math-Net.Ru

In this paper we focus on regular expressions with acyclic backreferences and treat them as a semiring satisfying certain theorems of Kleene algebra. Using these theorems as term rewriting rules, we introduce an algorithm for memory disambiguation of regular expressions. Furthermore, we demonstrate that the class of regexes with acyclic backreferences is closed under language reversal, in contrast to the generic backref-regexes, and provide the reversal algorithm, based on the disambiguation procedure. The results of our experiments revealed that, in certain cases, the matching time was significantly reduced when using the reversed expressions compared to the initial ones.
Keywords: extended regular expression, backreferences, Kleene algebra, capture group, reversal
Mots-clés : disambiguation.
@article{MAIS_2024_31_4_a2,
     author = {D. N. Ismagilova and A. N. Nepeivoda},
     title = {Disambiguation of regular expressions with backreferences via term rewriting},
     journal = {Modelirovanie i analiz informacionnyh sistem},
     pages = {426--445},
     publisher = {mathdoc},
     volume = {31},
     number = {4},
     year = {2024},
     language = {en},
     url = {http://geodesic.mathdoc.fr/item/MAIS_2024_31_4_a2/}
}
TY  - JOUR
AU  - D. N. Ismagilova
AU  - A. N. Nepeivoda
TI  - Disambiguation of regular expressions with backreferences via term rewriting
JO  - Modelirovanie i analiz informacionnyh sistem
PY  - 2024
SP  - 426
EP  - 445
VL  - 31
IS  - 4
PB  - mathdoc
UR  - http://geodesic.mathdoc.fr/item/MAIS_2024_31_4_a2/
LA  - en
ID  - MAIS_2024_31_4_a2
ER  - 
%0 Journal Article
%A D. N. Ismagilova
%A A. N. Nepeivoda
%T Disambiguation of regular expressions with backreferences via term rewriting
%J Modelirovanie i analiz informacionnyh sistem
%D 2024
%P 426-445
%V 31
%N 4
%I mathdoc
%U http://geodesic.mathdoc.fr/item/MAIS_2024_31_4_a2/
%G en
%F MAIS_2024_31_4_a2
D. N. Ismagilova; A. N. Nepeivoda. Disambiguation of regular expressions with backreferences via term rewriting. Modelirovanie i analiz informacionnyh sistem, Tome 31 (2024) no. 4, pp. 426-445. http://geodesic.mathdoc.fr/item/MAIS_2024_31_4_a2/

[1] Beesley, Kenneth R., “Kleene, a Free and Open-Source Language for Finite-State Programming”, Finite-State Methods and Natural Language Processing, 2012, 50–54

[2] Gelade, Wouter and Neven, Frank, “Succinctness of the Complement and Intersection of Regular Expressions”, ACM Transactions on Computational Logic, 13:1 (2012) | DOI | MR

[3] V. M. Glushkov, “The abstract theory of automata”, Russian Mathematical Surveys, 16:5 (1961), 3–62 (In Russian) | DOI | MR

[4] Angluin, Dana, “Finding patterns common to a set of strings”, Journal of Computer and System Sciences, 21:1 (1980), 46–62 | DOI | MR | Zbl

[5] Schmid, Markus L., “Characterising REGEX languages by regular languages equipped with factor-referencing”, Information and Computation, 249 (2016), 1–17 | DOI | MR | Zbl

[6] Goodman, Joshua, “Semiring Parsing”, Computational Linguistics, 25 (1999), 573–605 | MR

[7] Kozen, Dexter, “A Completeness Theorem for Kleene Algebras and the Algebra of Regular Events”, Information and Computation, 110:2 (1994), 366–390 | DOI | MR | Zbl

[8] Bruggemann-Klein, Anne and Wood, Derick, “One-Unambiguous Regular Languages”, Information and Computation, 140:2 (1998), 229–253 | DOI | MR | Zbl

[9] Berglund, Martin and van der Merwe, Brink, “Re-examining regular expressions with backreferences”, Theoretical Computer Science, 940 (2023), 66–80 | DOI | MR | Zbl

[10] Freydenberger, Dominik D. and Schmid, Markus L., “Deterministic regular expressions with back-references”, Journal of Computer and System Sciences, 105 (2019), 1–39 | DOI | MR | Zbl

[11] Li, Yeting and Chen, Zixuan and Cao, Jialun and Xu, Zhiwu and Peng, Qiancheng and Chen, Haiming and Chen, Liyuan and Cheung, Shing-Chi, “ReDoSHunter: A Combined Static and Dynamic Approach for Regular Expression DoS Detection”, 30th USENIX Security Symposium, 2021, 3847–3864

[12] Campeanu, Cezar and Salomaa, Kai and Yu, Sheng, “A Formal Study Of Practical Regular Expressions”, International Journal of Foundations of Computer Science, 14 (2003), 1007–1018 | DOI | MR | Zbl

[13] Chida, Nariyoshi and Terauchi, Tachio, “Repairing DoS Vulnerability of Real-World Regexes”, Proceedings of the IEEE Symposium on Security and Privacy (SP), 2022, 2060–2077 | DOI

[14] Reidenbach, Daniel and Schmid, Markus L., “Patterns with bounded treewidth”, Information and Computation, 239 (2014), 87–99 | DOI | MR | Zbl

[15] Schmid, Markus L., “Inside the Class of REGEX Languages”, Proceedings of the 16th International Conference on Developments in Language Theory, 2012, 73–84 | DOI | MR | Zbl

[16] Br{ü}ggemann-Klein, Anne, “Regular Expressions into Finite Automata”, Theoretical Computer Science, 120:2 (1993), 197–213 | DOI | MR

[17] Kahrs, Stefan and Runciman, Colin, “Simplifying regular expressions further”, Journal of Symbolic Computation, 109 (2022), 124–143 | DOI | MR | Zbl

[18] McClurg, Jedidiah and Claver, Miles and Garner, Jackson and Vossen, Jake and Schmerge, Jordan and Belviranli, Mehmet E., “Optimizing Regular Expressions via Rewrite-Guided Synthesis”, Proceedings of the International Conference on Parallel Architectures and Compilation Techniques, 2023, 426–438 | DOI

[19] Li, Yeting and Sun, Yecheng and Xu, Zhiwu and Cao, Jialun and Li, Yuekang and Li, Rongchen and Chen, Haiming and Cheung, Shing-Chi and Liu, Yang and Xiao, Yang, “RegexScalpel: Regular Expression Denial of Service (ReDoS) Defense by Localize-and-Fix”, 31st USENIX Security Symposium (USENIX Security 22), 2022, 4183–4200

[20] Chida, Nariyoshi and Terauchi, Tachio, “On Lookaheads in Regular Expressions with Backreferences”, IEICE Transactions on Information and Systems, E106–D:5 (2023), 959–975 | DOI | MR

[21] Uezato, Yuya, “Regular Expressions with Backreferences and Lookaheads Capture NLOG”, 51st International Colloquium on Automata, Languages, and Programming (ICALP 2024), 2024, 155:1–155:20 | DOI | MR