Hidden words statistics for large patterns
The electronic journal of combinatorics, Tome 28 (2021) no. 2
Cet article a éte moissonné depuis la source The Electronic Journal of Combinatorics website

Voir la notice de l'article

We study here the so called subsequence pattern matching also known as hidden pattern matching in which one searches for a given pattern $w$ of length $m$ as a subsequence in a random text of length $n$. The quantity of interest is the number of occurrences of $w$ as a subsequence (i.e., occurring in not necessarily consecutive text locations). This problem finds many applications from intrusion detection, to trace reconstruction, to deletion channel, and to DNA-based storage systems. In all of these applications, the pattern $w$ is of variable length. To the best of our knowledge this problem was only tackled for a fixed length $m=O(1)$. In our main result we prove that for $m=o(n^{1/3})$ the number of subsequence occurrences is normally distributed. In addition, we show that under some constraints on the structure of $w$ the asymptotic normality can be extended to $m=o(\sqrt{n})$. For a special pattern $w$ consisting of the same symbol, we indicate that for $m=o(n)$ the distribution of number of subsequences is either asymptotically normal or asymptotically log normal. After studying some special patterns (e.g., alternating) we conjecture that this dichotomy is true for all patterns. We use Hoeffding's projection method for $U$-statistics to prove our findings.
DOI : 10.37236/9452
Classification : 68R15, 60C05
Mots-clés : subsequence pattern matching

Svante Janson    ; Wojciech Szpankowski  1

1 Department of Computer Science Purdue University
@article{10_37236_9452,
     author = {Svante Janson and Wojciech Szpankowski},
     title = {Hidden words statistics for large patterns},
     journal = {The electronic journal of combinatorics},
     year = {2021},
     volume = {28},
     number = {2},
     doi = {10.37236/9452},
     zbl = {1474.68234},
     url = {http://geodesic.mathdoc.fr/articles/10.37236/9452/}
}
TY  - JOUR
AU  - Svante Janson
AU  - Wojciech Szpankowski
TI  - Hidden words statistics for large patterns
JO  - The electronic journal of combinatorics
PY  - 2021
VL  - 28
IS  - 2
UR  - http://geodesic.mathdoc.fr/articles/10.37236/9452/
DO  - 10.37236/9452
ID  - 10_37236_9452
ER  - 
%0 Journal Article
%A Svante Janson
%A Wojciech Szpankowski
%T Hidden words statistics for large patterns
%J The electronic journal of combinatorics
%D 2021
%V 28
%N 2
%U http://geodesic.mathdoc.fr/articles/10.37236/9452/
%R 10.37236/9452
%F 10_37236_9452
Svante Janson; Wojciech Szpankowski. Hidden words statistics for large patterns. The electronic journal of combinatorics, Tome 28 (2021) no. 2. doi: 10.37236/9452

Cité par Sources :