Semantic Textual Similarity on Brazilian Portuguese: An approach based on language-mixture models
Vestnik Sankt-Peterburgskogo universiteta. Prikladnaâ matematika, informatika, processy upravleniâ, Tome 15 (2019) no. 2, pp. 235-244 Cet article a éte moissonné depuis la source Math-Net.Ru

Voir la notice de l'article

The literature describes the Semantic Textual Similarity (STS) area as a fundamental part of many Natural Language Processing (NLP) tasks. The STS approaches are dependent on the availability of lexical-semantic resources. There are several efforts to improve the lexical-semantics resources for the English language, and the state-of-art report a large amount of application for this language. Brazilian Portuguese linguistics resources, when compared with English ones, do not have the same availability regarding relation and contents, generation a loss of precision in STS tasks. Therefore, the current work presents an approach that combines Brazilian Portuguese and English lexical-semantics ontology resources to reach all potential of both language linguistic relations, to generate a language-mixture model to measure STS. We evaluated the proposed approach with a well-known and respected Brazilian Portuguese STS dataset, which brought to light some considerations about mixture models and their relations with ontology language semantics.
Keywords: Semantic Textual Similarity, natural language processing, computational linguistics
Mots-clés : ontologies.
@article{VSPUI_2019_15_2_a6,
     author = {A. Silva and A. Lozkins and L. R. Bertoldi and S. Rigo and V. M. Bure},
     title = {Semantic {Textual} {Similarity} on {Brazilian} {Portuguese:} {An} approach based on language-mixture models},
     journal = {Vestnik Sankt-Peterburgskogo universiteta. Prikladna\^a matematika, informatika, processy upravleni\^a},
     pages = {235--244},
     year = {2019},
     volume = {15},
     number = {2},
     language = {en},
     url = {http://geodesic.mathdoc.fr/item/VSPUI_2019_15_2_a6/}
}
TY  - JOUR
AU  - A. Silva
AU  - A. Lozkins
AU  - L. R. Bertoldi
AU  - S. Rigo
AU  - V. M. Bure
TI  - Semantic Textual Similarity on Brazilian Portuguese: An approach based on language-mixture models
JO  - Vestnik Sankt-Peterburgskogo universiteta. Prikladnaâ matematika, informatika, processy upravleniâ
PY  - 2019
SP  - 235
EP  - 244
VL  - 15
IS  - 2
UR  - http://geodesic.mathdoc.fr/item/VSPUI_2019_15_2_a6/
LA  - en
ID  - VSPUI_2019_15_2_a6
ER  - 
%0 Journal Article
%A A. Silva
%A A. Lozkins
%A L. R. Bertoldi
%A S. Rigo
%A V. M. Bure
%T Semantic Textual Similarity on Brazilian Portuguese: An approach based on language-mixture models
%J Vestnik Sankt-Peterburgskogo universiteta. Prikladnaâ matematika, informatika, processy upravleniâ
%D 2019
%P 235-244
%V 15
%N 2
%U http://geodesic.mathdoc.fr/item/VSPUI_2019_15_2_a6/
%G en
%F VSPUI_2019_15_2_a6
A. Silva; A. Lozkins; L. R. Bertoldi; S. Rigo; V. M. Bure. Semantic Textual Similarity on Brazilian Portuguese: An approach based on language-mixture models. Vestnik Sankt-Peterburgskogo universiteta. Prikladnaâ matematika, informatika, processy upravleniâ, Tome 15 (2019) no. 2, pp. 235-244. http://geodesic.mathdoc.fr/item/VSPUI_2019_15_2_a6/

[1] Gomaa W. H., Fahmy A. A., “A survey of text similarity approaches”, Intern. Journal of Computer Applications, 68:13 (2013), 13–18 | DOI

[2] Freire J., Pinheiro V., Feitosa D., “LEC UNIFOR no ASSIN: FlexSTS-Um framework para Similaridade Semantica Textual”, PROPOR-Intern. conference on the Computational Processing of Portuguese (Tomar, Portugal, 2016) (accessed: 03.08.2018) http://proper206.di.fc.ul.pt/

[3] Barbosa L., Cavalin P., Kormaksson M., Guimaraes V., “Blue man mroup at ASSIN: Using distributed representations for semantic similarity and entailment recognition”, PROPOR-Intern. conference on the Computational Processing of Portuguese (Tomar, Portugal, 2016) (accessed: 03.08.2018) http://proper206.di.fc.ul.pt/

[4] Ferreira R., Lins R. D., Simske S. J., Freitas F., Riss M., “Assessing sentence similarity through lexical, syntactic and semantic analysis”, Computer Speech $\$ Language, 39 (2016), 1–28 | DOI

[5] Hartmann N. S., “Solo queue at ASSIN: Combinando abordagens tradicionais e emergentes”, Linguamática, 8:2 (2016), 59–64

[6] Cer D., Diab M., Agirre E., Lopez-Gazpio I., Specia L., Semeval-2017 Task 1: Semantic Textual Similarity-multilingual and cross-lingual focused evaluation, 2017, arXiv: 1708.00055 | DOI

[7] Hartmann N., Fonseca E., Shulby C., Treviso M., Rodrigues J., Aluisio S., Portuguese word embeddings: Evaluating on word analogies and natural language tasks, arXiv: 1708.06025

[8] Silva A., Rigo S., Alves I. M., Barbosa J., “Avaliando a similaridade sem{â}ntica entre frases curtas através de uma abordagem híbrida”, Proceedings of the 11th Brazilian Symposium in Information and Human Language Technology (Uberl{â}ndia, 2017), 93–102

[9] Pradhan N., Gyanchandani M., Wadhvani R., “A review on text Similarity Technique used in IR and its application”, Intern. Journal of Computer Applications, 120:9 (2015), 29–34 | DOI

[10] Chen F., Lu C., Wu H., Li M., “A semantic similarity measure integrating multiple conceptual relationships for web service discovery”, Expert Systems with Applications, 67 (2017), 19–31 | DOI

[11] Berrahou S. L., Buche P., Dibie J., Roche M., “Xart: Discovery of correlated arguments of $n$-ary relations in text”, Expert Systems with Applications, 73 (2017), 115–124 | DOI

[12] Ferreira R., Cavalcanti G. D., Freitas F., Lins R. D., Simske S. J., Riss M., “Combining sentence similarities measures to identify paraphrases”, Computer Speech $\$ Language, 47 (2018), 59–73 | DOI

[13] Yanaka H., Mineshima K., Martinez-Gomez P., Bekki D., Determining Semantic Textual Similarity using natural deduction proofs, 1707, arXiv: 1707.08713

[14] Kajiwara T., Bollegala D., Yoshida Y., Kawarabayashi K. I., “An iterative approach for the global estimation of sentence similarity”, PloS one, 12:9 (2017), e0180885 | DOI

[15] Brychcín T., Svoboda L., “UWB at Semeval-2016 Task 1: Semantic Textual Similarity using lexical, syntactic, and semantic information”, Proceedings of the 10th Intern. Workshop on Semantic Evaluation (SemEval-2016) (San Diego, 2016), 588–594

[16] Kashyap A., Han L., Yus R., Sleeman J., Satyapanich T., Gandhi S., Finin T., “Robust semantic text similarity using LSA, machine learning, and linguistic resources”, Language Resources and Evaluation, 50:1 (2016), 125–161 | DOI

[17] Oliveira Alves A., Rodrigues R., Gon{ç}alo Oliveira H., “ASAPP: Alinhamento Sem{â}ntico Automático de Palavras aplicado ao Portugu{ê}s (eng. ASAPP: Automatic semantic alignment for phrases applied to portuguese)”, Linguamática, 8:2 (2016), 43–58 | MR

[18] Cavalcanti A. P., de Mello R. F. L., Ferreira M. A. D., Rolim V. B., Tenório J. V. S., “Statistical and semantic features to measure sentence similarity in Portuguese”, 2017 Brazilian conference on Intelligent Systems (BRACIS), IEEE, 342–347

[19] Mikolov T., Chen K., Corrado G., Dean J., Efficient estimation of word representations in vector space, 2013, arXiv: 1301.3781

[20] Fialho P., Marques R., Martins B., Coheur L., Quaresma P., Measuring Semantic Similarity and Recognizing Textual Entailment, INESC-ID@ASSIN

[21] Faruqui M., Tsvetkov Y., Rastogi P., Dyer C., Problems with evaluation of word embeddings using word similarity tasks, 2016, arXiv: 1605.02276

[22] Paiva V., Rademaker A., Melo G., “Openwordnet-pt: An open brazilian wordnet for reasoning”, COLING 2012 (Mumbai, 2012)

[23] Miller G. A., “WordNet: a lexical database for English”, Communications of the ACM, 38:11 (1995), 39–41 | DOI

[24] Lozkins A., Bure V. M., “The probabilistic method of finding the local-optimum of clustering”, Vestnik of Saint Petersburg University. Series 10. Applied Mathematics. Computer science. Control Processes, 2016, no. 1, 28–37