Thematic text document segmentation
Vestnik Sankt-Peterburgskogo universiteta. Prikladnaâ matematika, informatika, processy upravleniâ, no. 3 (2011), pp. 127-133
Cet article a éte moissonné depuis la source Math-Net.Ru

Voir la notice de l'article

A method for automatic text segmentation and annotation is presented. It first discovers the themes presented in the document collection and then split each document according to these themes.
Mots-clés : text segmentation
Keywords: natural language processing, information retrieval.
@article{VSPUI_2011_3_a12,
     author = {A. N. Mishenin},
     title = {Thematic text document segmentation},
     journal = {Vestnik Sankt-Peterburgskogo universiteta. Prikladna\^a matematika, informatika, processy upravleni\^a},
     pages = {127--133},
     year = {2011},
     number = {3},
     language = {ru},
     url = {http://geodesic.mathdoc.fr/item/VSPUI_2011_3_a12/}
}
TY  - JOUR
AU  - A. N. Mishenin
TI  - Thematic text document segmentation
JO  - Vestnik Sankt-Peterburgskogo universiteta. Prikladnaâ matematika, informatika, processy upravleniâ
PY  - 2011
SP  - 127
EP  - 133
IS  - 3
UR  - http://geodesic.mathdoc.fr/item/VSPUI_2011_3_a12/
LA  - ru
ID  - VSPUI_2011_3_a12
ER  - 
%0 Journal Article
%A A. N. Mishenin
%T Thematic text document segmentation
%J Vestnik Sankt-Peterburgskogo universiteta. Prikladnaâ matematika, informatika, processy upravleniâ
%D 2011
%P 127-133
%N 3
%U http://geodesic.mathdoc.fr/item/VSPUI_2011_3_a12/
%G ru
%F VSPUI_2011_3_a12
A. N. Mishenin. Thematic text document segmentation. Vestnik Sankt-Peterburgskogo universiteta. Prikladnaâ matematika, informatika, processy upravleniâ, no. 3 (2011), pp. 127-133. http://geodesic.mathdoc.fr/item/VSPUI_2011_3_a12/

[1] Skorokhod’ko E. F., “Adaptive method of automatic abstracting and indexing”, Proc. of the IFIP Congress, 71 (1972), 1179–1182

[2] Hears Marti A., “Multi-paragraph segmentation of expository text”, Proc. of the 32nd Meeting of the Association for Computational Linguistics (Los Cruces, NM, June, 1994), 1002–1010

[3] Li Hang, Yamanishi Kenji, “Topic analysis using a finite mixture model”, Information Processing and Management, 39 (2003), 521–541 | DOI | Zbl

[4] Reynar Jeffrey C., “An Automatic method for finding topic boundaries”, Proc. of the 32nd annual meeting on Association for Computational Linguistics (Stroudsburg, PA, USA), 1994, 331–333 | DOI

[5] Sardinha T. B., “Segmenting corpora of texts”, DELTA, 18 (2002), 273–286 | DOI

[6] Caillet Mark, Pessiot Jean-Francois, Massih Reza, Gallinari Patrick, “Unsupervised learning with term clustering for thematic segmentation of texts”, Proc. of RIAO, 2004, 648–656

[7] Pelleg Dan, Moore Andrew, “$X$-means: Extending $K$-means with efficient estimation of the number of clusters”, Proc. of the Seventeenth International Conference on Machine Learning (San Francisco), 2000, 727–734

[8] Hamerly Greg, Charles Elkan, “Learning the $K$ in $K$-means”, Proc. NIPS, 17 (2003), 281–289

[9] Dempster Arthur, Laird Nan, Rubin Donald, “Maximum likelihood from incomplete data via the EM algorithm”, J. of the Royal Statistical Society. Ser. B, 39:1 (1977), 1–39 | MR | Zbl