q-gram analysis and urn models
Discrete mathematics & theoretical computer science, DMTCS Proceedings vol. AC, Discrete Random Walks (DRW'03), DMTCS Proceedings vol. AC, Discrete Random Walks (DRW'03) (2003).

Voir la notice de l'article provenant de la source Episciences

Words of fixed size q are commonly referred to as $q$-grams. We consider the problem of $q$-gram filtration, a method commonly used to speed upsequence comparison. We are interested in the statistics of the number of $q$-grams common to two random texts (where multiplicities are not counted) in the non uniform Bernoulli model. In the exact and dependent model, when omitting border effects, a $q$-gramin a random sequence depends on the $q-1$ preceding $q$-grams. In an approximate and independent model, we draw randomly a $q$-gram at each position, independently of the others positions. Using ball and urn models, we analyze the independent model. Numerical simulations show that this model is an excellent first order approximationto the dependent model. We provide an algorithm to compute the moments.
@article{DMTCS_2003_special_248_a2,
     author = {Nicod\`eme, Pierre},
     title = {q-gram analysis and urn models},
     journal = {Discrete mathematics & theoretical computer science},
     publisher = {mathdoc},
     volume = {DMTCS Proceedings vol. AC, Discrete Random Walks (DRW'03)},
     year = {2003},
     doi = {10.46298/dmtcs.3322},
     language = {en},
     url = {http://geodesic.mathdoc.fr/articles/10.46298/dmtcs.3322/}
}
TY  - JOUR
AU  - Nicodème, Pierre
TI  - q-gram analysis and urn models
JO  - Discrete mathematics & theoretical computer science
PY  - 2003
VL  - DMTCS Proceedings vol. AC, Discrete Random Walks (DRW'03)
PB  - mathdoc
UR  - http://geodesic.mathdoc.fr/articles/10.46298/dmtcs.3322/
DO  - 10.46298/dmtcs.3322
LA  - en
ID  - DMTCS_2003_special_248_a2
ER  - 
%0 Journal Article
%A Nicodème, Pierre
%T q-gram analysis and urn models
%J Discrete mathematics & theoretical computer science
%D 2003
%V DMTCS Proceedings vol. AC, Discrete Random Walks (DRW'03)
%I mathdoc
%U http://geodesic.mathdoc.fr/articles/10.46298/dmtcs.3322/
%R 10.46298/dmtcs.3322
%G en
%F DMTCS_2003_special_248_a2
Nicodème, Pierre. q-gram analysis and urn models. Discrete mathematics & theoretical computer science, DMTCS Proceedings vol. AC, Discrete Random Walks (DRW'03), DMTCS Proceedings vol. AC, Discrete Random Walks (DRW'03) (2003). doi : 10.46298/dmtcs.3322. http://geodesic.mathdoc.fr/articles/10.46298/dmtcs.3322/

Cité par Sources :