Semi-automatic generation of linear event extraction patterns for free texts
Učënye zapiski Kazanskogo universiteta. Seriâ Fiziko-matematičeskie nauki, Uchenye Zapiski Kazanskogo Universiteta. Seriya Fiziko-Matematicheskie Nauki, Tome 155 (2013) no. 4, pp. 99-108 Cet article a éte moissonné depuis la source Math-Net.Ru

Voir la notice du chapitre de livre

In this paper we describe semi-automatic approach to generating event extraction patterns for free texts. The algorithm is composed of four steps: we automatically extract possible events from a corpus of free documents, cluster them using dependency-based parse tree paths, validate random samples from each cluster and generate linear patterns using positive event clusters. We compare it with the system that uses handcrafted patterns.
Keywords: event extraction, linear patterns, regular expressions, TextMARKER
Mots-clés : RUTA.
@article{UZKU_2013_155_4_a9,
     author = {D. Dzendzik and S. Serebryakov},
     title = {Semi-automatic generation of linear event extraction patterns for free texts},
     journal = {U\v{c}\"enye zapiski Kazanskogo universiteta. Seri\^a Fiziko-matemati\v{c}eskie nauki},
     pages = {99--108},
     year = {2013},
     volume = {155},
     number = {4},
     language = {en},
     url = {http://geodesic.mathdoc.fr/item/UZKU_2013_155_4_a9/}
}
TY  - JOUR
AU  - D. Dzendzik
AU  - S. Serebryakov
TI  - Semi-automatic generation of linear event extraction patterns for free texts
JO  - Učënye zapiski Kazanskogo universiteta. Seriâ Fiziko-matematičeskie nauki
PY  - 2013
SP  - 99
EP  - 108
VL  - 155
IS  - 4
UR  - http://geodesic.mathdoc.fr/item/UZKU_2013_155_4_a9/
LA  - en
ID  - UZKU_2013_155_4_a9
ER  - 
%0 Journal Article
%A D. Dzendzik
%A S. Serebryakov
%T Semi-automatic generation of linear event extraction patterns for free texts
%J Učënye zapiski Kazanskogo universiteta. Seriâ Fiziko-matematičeskie nauki
%D 2013
%P 99-108
%V 155
%N 4
%U http://geodesic.mathdoc.fr/item/UZKU_2013_155_4_a9/
%G en
%F UZKU_2013_155_4_a9
D. Dzendzik; S. Serebryakov. Semi-automatic generation of linear event extraction patterns for free texts. Učënye zapiski Kazanskogo universiteta. Seriâ Fiziko-matematičeskie nauki, Uchenye Zapiski Kazanskogo Universiteta. Seriya Fiziko-Matematicheskie Nauki, Tome 155 (2013) no. 4, pp. 99-108. http://geodesic.mathdoc.fr/item/UZKU_2013_155_4_a9/

[1] Soderland S., “Learning Information Extraction Rules for Semi-Structured and Free Text”, Machine Learning, 34:1–3 (1999), 233–272 | DOI | Zbl

[2] Li Y., Krishnamurthy R., Raghavan S., Vaithyanathan S., Jagadish H. V., “Regular expression learning for information extraction”, EMNLP'08, Proc. Conf. on Empirical Methods in Natural Language Processing, Association for Computational Linguistics, Stroudsburg, PA, USA, 2008, 21–30

[3] Agichtein E., Gravano L., “Snowball: extracting relations from large plain-text collections”, DL'00, Proc. Fifth ACM Conf. Digital libraries, ACM, N.Y., USA, 2000, 85–94

[4] Bach N., Badaskar S., A Review of Relation Extraction, URL: , 2007 http://www.cs.cmu.edu/~nbach/papers/A-survey-on-Relation-Extraction.pdf

[5] McDonald R., Extracting Relations from Unstructured Text, Technical Report: MS-CIS-05-06, URL: , 2005 http://www.ryanmcd.com/papers/MS-CIS-05-06.pdf

[6] Yangarber R., Grishman R., Tapanainen P., “Automatic Acquisition of Domain Knowledge for Information Extraction”, COLING'00, Proc. 18th Conf. on Computational linguistics, v. 2, Association for Computational Linguistics, Stroudsburg, PA, USA, 2000, 940–946

[7] Brin S., “Extracting Patterns and Relations from the World Wide Web”, WebDB'98, Selected papers from the Int. Workshop on The World Wide Web and Databases, Springer-Verlag, London, UK, 1999, 172–183

[8] Etzioni O., Banko M., Soderland S., Weld D. S., “Open information extraction from the web”, Communications of the ACM, 51:12 (2008), 68–74 | DOI

[9] Etzioni O., Cafarella M., Downey D., Kok S., Popescu A.-M., Shaked T., Soderland S., Weld D.S., Yates A., “Web-scale information extraction in knowitall: (preliminary results)”, WWW'04, Proc. 13th Int. Conf. on World Wide Web, ACM, N.Y., USA, 2004, 100–110

[10] Yates A., Banko M., Broadhead M., Cafarella M. J., Etzioni O., Soderland S., “TextRunner: Open Information Extraction on the Web”, NAACL-Demonstrations'07, Proc. Human Language Technologies: The Annual Conference of the North American Chapter of the Association for Computational Linguistics: Demonstrations, Association for Computational Linguistics, Stroudsburg, PA, USA, 2007, 25–26 | DOI

[11] Kluegl P., Atzmueller M., Puppe F., “Integrating the Rule-Based IE Component TextMarker into UIMA”, Proc. LWA, 2008, 73–77