Statistical Properties of Similarity Score Functions
Discrete mathematics & theoretical computer science, DMTCS Proceedings vol. AG, Fourth Colloquium on Mathematics and Computer Science Algorithms, Trees, Combinatorics and Probabilities, DMTCS Proceedings vol. AG, Fourth Colloquium on Mathematics and Computer Science Algorithms, Trees, Combinatorics and Probabilities (2006).

Voir la notice de l'article provenant de la source Episciences

In computational biology, a large amount of problems, such as pattern discovery, deals with the comparison of several sequences (of nucleotides, proteins or genes for instance). Very often, algorithms that address this problem use score functions that reflect a notion of similarity between the sequences. The most efficient methods take benefit from theoretical knowledge of the classical behavior of these score functions such as their mean, their variance, and sometime their asymptotic distribution in a given probabilistic model. In this paper, we study a recent family of score functions introduced in Mancheron 2003, which allows to compare two words having the same length. Here, the similarity takes into account all matches and mismatches between two sequences and not only the longest common subsequence as in the case of classical algorithms such as BLAST or FASTA. Based on generating functions, we provide closed formulas for the mean and the variance of these functions in an independent probabilistic model. Finally, we prove that every function in this family asymptotically behaves as a Gaussian random variable.
@article{DMTCS_2006_special_252_a26,
     author = {Bourdon, J\'er\'emie and Mancheron, Alban},
     title = {Statistical {Properties} of {Similarity} {Score} {Functions}},
     journal = {Discrete mathematics & theoretical computer science},
     publisher = {mathdoc},
     volume = {DMTCS Proceedings vol. AG, Fourth Colloquium on Mathematics and Computer Science Algorithms, Trees, Combinatorics and Probabilities},
     year = {2006},
     doi = {10.46298/dmtcs.3502},
     language = {en},
     url = {http://geodesic.mathdoc.fr/articles/10.46298/dmtcs.3502/}
}
TY  - JOUR
AU  - Bourdon, Jérémie
AU  - Mancheron, Alban
TI  - Statistical Properties of Similarity Score Functions
JO  - Discrete mathematics & theoretical computer science
PY  - 2006
VL  - DMTCS Proceedings vol. AG, Fourth Colloquium on Mathematics and Computer Science Algorithms, Trees, Combinatorics and Probabilities
PB  - mathdoc
UR  - http://geodesic.mathdoc.fr/articles/10.46298/dmtcs.3502/
DO  - 10.46298/dmtcs.3502
LA  - en
ID  - DMTCS_2006_special_252_a26
ER  - 
%0 Journal Article
%A Bourdon, Jérémie
%A Mancheron, Alban
%T Statistical Properties of Similarity Score Functions
%J Discrete mathematics & theoretical computer science
%D 2006
%V DMTCS Proceedings vol. AG, Fourth Colloquium on Mathematics and Computer Science Algorithms, Trees, Combinatorics and Probabilities
%I mathdoc
%U http://geodesic.mathdoc.fr/articles/10.46298/dmtcs.3502/
%R 10.46298/dmtcs.3502
%G en
%F DMTCS_2006_special_252_a26
Bourdon, Jérémie; Mancheron, Alban. Statistical Properties of Similarity Score Functions. Discrete mathematics & theoretical computer science, DMTCS Proceedings vol. AG, Fourth Colloquium on Mathematics and Computer Science Algorithms, Trees, Combinatorics and Probabilities, DMTCS Proceedings vol. AG, Fourth Colloquium on Mathematics and Computer Science Algorithms, Trees, Combinatorics and Probabilities (2006). doi : 10.46298/dmtcs.3502. http://geodesic.mathdoc.fr/articles/10.46298/dmtcs.3502/

Cité par Sources :