Rank-scaled metric clustering of amino-acid sequences
Matematičeskaâ biologiâ i bioinformatika, Tome 7 (2012) no. 1, pp. 345-359.

Voir la notice de l'article provenant de la source Math-Net.Ru

To solve the problem of the secondary protein structure recognition, an algorithm for amino-acid subsequences clustering is developed. To reviel clusters it uses the pairwise distances between the subsequences. The algorithm does not require the complete pairwise matrix. This main distinction of it implies the reduction of the computational complexity. To run the clustering, it needs no more than the ranks of the distances between subsequences. The algorithm is illustrated using synthetic data along with the amino-acid sequences from the UniProt KB Database.
@article{MBB_2012_7_1_a5,
     author = {V. Strijov and M. P. Kuznetsov and K. V. Rudakov},
     title = {Rank-scaled metric clustering of amino-acid sequences},
     journal = {Matemati\v{c}eska\^a biologi\^a i bioinformatika},
     pages = {345--359},
     publisher = {mathdoc},
     volume = {7},
     number = {1},
     year = {2012},
     language = {ru},
     url = {http://geodesic.mathdoc.fr/item/MBB_2012_7_1_a5/}
}
TY  - JOUR
AU  - V. Strijov
AU  - M. P. Kuznetsov
AU  - K. V. Rudakov
TI  - Rank-scaled metric clustering of amino-acid sequences
JO  - Matematičeskaâ biologiâ i bioinformatika
PY  - 2012
SP  - 345
EP  - 359
VL  - 7
IS  - 1
PB  - mathdoc
UR  - http://geodesic.mathdoc.fr/item/MBB_2012_7_1_a5/
LA  - ru
ID  - MBB_2012_7_1_a5
ER  - 
%0 Journal Article
%A V. Strijov
%A M. P. Kuznetsov
%A K. V. Rudakov
%T Rank-scaled metric clustering of amino-acid sequences
%J Matematičeskaâ biologiâ i bioinformatika
%D 2012
%P 345-359
%V 7
%N 1
%I mathdoc
%U http://geodesic.mathdoc.fr/item/MBB_2012_7_1_a5/
%G ru
%F MBB_2012_7_1_a5
V. Strijov; M. P. Kuznetsov; K. V. Rudakov. Rank-scaled metric clustering of amino-acid sequences. Matematičeskaâ biologiâ i bioinformatika, Tome 7 (2012) no. 1, pp. 345-359. http://geodesic.mathdoc.fr/item/MBB_2012_7_1_a5/

[1] Rudakov K.V., Torshin I.Yu., “Ob otbore informativnykh znachenii priznakov na baze kriteriev razreshimosti v zadache raspoznavaniya vtorichnoi struktury belka”, Doklady Akademii nauk, 441:1 (2011), 24–28 | Zbl

[2] Rudakov K.V., Torshin I.Yu., “Analiz informativnosti motivov na osnove kriteriya razreshimosti v zadache raspoznavaniya struktury belka”, Informatika i eë primeneniya, 6:1 (2012)

[3] About Nucleotide And Protein Sequence Formats URL: (data obrascheniya: 17.05.2012) http://www.ebi.ac.uk/help/formats.html

[4] UniProtKB protein knowledgebase: example of a record URL: (data obrascheniya: 20.11.2011) http://www.uniprot.org/uniprot/Q08753

[5] Huang Z.A., Fast Clustering Algorithm to Cluster Very Large Categorical Data Sets in Data Mining, Cooperative Research Centre for Advanced Computational Systems, 1997

[6] Cardot H., Cenac P., Monnez J.-M., Fast clustering of large datasets with sequential kmedians: a stochastic gradient approach, 2011, arXiv: (data obrascheniya: 16.05.2012) 1101.4179 | MR | Zbl

[7] Seber G.A.F., Multivariate Observations, John Wiley Sons, Inc., Hoboken, NJ, 1984 | MR | Zbl

[8] Spath H., Cluster Dissection and Analysis: Theory, FORTRAN Programs, Examples, Translated by J. Goldschmidt, Halsted Press, New York, 1985

[9] Wei C.-P., Lee Y.-H., Hsu C.-M., “Empirical comparison of fast partitioning-based clustering algorithms for large data sets”, Expert Systems with Application, 24:4 (2003) | DOI

[10] Giannopoulos P., Knauer C., Wahlstrom M., Werner D., Hardness of discrepancy computation and epsilon-net verification in high dimension, 2011, arXiv: (data obrascheniya: 16.05.2012) 1103.4503 | MR | Zbl

[11] Levenshtein V.I., “Dvoichnye kody s ispravleniem vypadenii, vstavok i zameschenii simvolov”, Doklady Akademii nauk SSSR, 163:4 (1965), 845–848 | Zbl

[12] UniRef DataBase URL: (data obrascheniya: 13.05.2012) http://www.uniprot.org/uniref/

[13] Protein knowledgebase UniprotKB URL: (data obrascheniya: 13.05.2012) http://www.uniprot.org

[14] Kabsch W., Sander C., “Dictionary of protein secondary structure: pattern recognition of hydrogen-bonded and geometrical features”, Biopolymers, 22:12 (1983), 2577–637 | DOI

[15] Marschall T., Algorithms and Statistical Methods for Exact Motif Discovery, PhD Thesis, Dortmund, 2011; URL: (дата обращения: 16.05.2012) https://eldorado.tu-dortmund.de/bitstream/2003/27760/1/dissertation.pdf

[16] Li G., Chan T.M., Leung K.S., Lee K.H., “A Cluster Refinement Algorithm for Motif Discovery”, IEEE/ACM Transactions on Computational Biology and Bioinformatics, 7:4 (2010), 654–668 | DOI