Exploring the effectiveness of methods for persona extraction

K. Zaitsev

K. Zaitsev

Zapiski Nauchnykh Seminarov POMI, Investigations on applied mathematics and informatics. Part IV, Tome 540 (2024), pp. 61-81 Cet article a éte moissonné depuis la source Math-Net.Ru

Voir la notice du chapitre de livre

Résumé

The paper presents a study of methods for extracting information about dialogue participants and evaluating their performance in Russian. To train models for this task, the Multi-Session Chat dataset was translated into Russian using multiple translation models, resulting in improved data quality. A metric based on the F-score concept is presented to evaluate the effectiveness of the extraction models. The metric uses a trained classifier to identify the dialogue participant to whom the persona belongs. Experiments were conducted on MBart, FRED-T5, Starling-7B, which is based on the Mistral, and Encoder2Encoder models. The results demonstrated that all models exhibited an insufficient level of recall in the persona extraction task. The incorporation of the NCE Loss improved the model's precision at the expense of its recall. Furthermore, increasing the model's size led to enhanced extraction of personas.

Export
Comment citer

@article{ZNSL_2024_540_a3,
     author = {K. Zaitsev},
     title = {Exploring the effectiveness of methods for persona extraction},
     journal = {Zapiski Nauchnykh Seminarov POMI},
     pages = {61--81},
     year = {2024},
     volume = {540},
     language = {en},
     url = {http://geodesic.mathdoc.fr/item/ZNSL_2024_540_a3/}
}

TY  - JOUR
AU  - K. Zaitsev
TI  - Exploring the effectiveness of methods for persona extraction
JO  - Zapiski Nauchnykh Seminarov POMI
PY  - 2024
SP  - 61
EP  - 81
VL  - 540
UR  - http://geodesic.mathdoc.fr/item/ZNSL_2024_540_a3/
LA  - en
ID  - ZNSL_2024_540_a3
ER  -

%0 Journal Article
%A K. Zaitsev
%T Exploring the effectiveness of methods for persona extraction
%J Zapiski Nauchnykh Seminarov POMI
%D 2024
%P 61-81
%V 540
%U http://geodesic.mathdoc.fr/item/ZNSL_2024_540_a3/
%G en
%F ZNSL_2024_540_a3

K. Zaitsev. Exploring the effectiveness of methods for persona extraction. Zapiski Nauchnykh Seminarov POMI, Investigations on applied mathematics and informatics. Part IV, Tome 540 (2024), pp. 61-81. http://geodesic.mathdoc.fr/item/ZNSL_2024_540_a3/

Bibliographie
Cité par

[1] S. Banerjee and A. Lavie, “METEOR: An Automatic Metric for MT Evaluation with Improved Correlation with Human Judgments”, Proceedings of the ACL Workshop on Intrinsic and Extrinsic Evaluation Measures for Machine Translation and/or Summarization (Ann Arbor, Michigan), Association for Computational Linguistics, 2005, 65–72

[2] E. Chu, P. Vijayaraghavan, and D. Roy, “Learning Personas from Dialogue with Attentive Memory Networks”, Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing (Brussels, Belgium), Association for Computational Linguistics, 2018, 2638–2646 | DOI

[3] Y. Chen, Y. Liu, L. Chen, and Y. Zhang, “DialogSum: A Real-Life Scenario Dialogue Summarization Dataset”, Findings of the Association for Computational Linguistics: ACL-IJCNLP 2021 (Online), Association for Computational Linguistics, 2021, 5062–5074 | DOI

[4] M.R. Costajussà et al., No Language Left Behind: Scaling Human-Centered Machine Translation, 2022, arXiv: 2207.04672

[5] D. Dementieva, D. Moskovskiy, D. Dale, and A. Panchenko, “Exploring Methods for Cross-Lingual Text Style Transfer: The Case of Text Detoxification”, Proceedings of the 13th International Joint Conference on Natural Language Processing and the 3rd Conference of the Asia-Pacific Chapter of the Association for Computational Linguistics (Nusa Dua, Bali), v. 1, Association for Computational Linguistics, 2023, 1083–1101 | MR

[6] Y. Gao et al., Retrieval-Augmented Generation for Large Language Models: A Survey, 2023, arXiv: 2312.10997

[7] B. Gliwa, I. Mochol, M. Biesek, and A. Wawer, “SAMSum Corpus: A Human-Annotated Dialogue Dataset for Abstractive Summarization”, Proceedings of the 2nd Workshop on New Frontiers in Summarization (Hong Kong, China), Association for Computational Linguistics, 2019, 70–79 | DOI

[8] E.J. Hu, Y. Shen, P. Wallis, Z. Allen-Zhu, Y. Li, S. Wang, L. Wang, and W. Chen, “LoRA: Low-Rank Adaptation of Large Language Models”, International Conference on Learning Representations, 2022

[9] A.Q. Jiang et al., Mistral 7B, 2023, arXiv: 2310.06825

[10] G. Lee, V. Hartmann, J. Park, D. Papailiopoulos, and K. Lee, “Prompted LLMs as Chatbot Modules for Long Open-Domain Conversation”, Findings of the Association for Computational Linguistics: ACL 2023 (Toronto, Canada), Association for Computational Linguistics, 2023, 4536–4554 | DOI

[11] M. Lewis, Y. Liu, N. Goyal, M. Ghazvininejad, A. Mohamed, O. Levy, V. Stoyanov, and L. Zettlemoyer, “BART: Denoising Sequence-to-Sequence Pre-Training for Natural Language Generation, Translation, and Comprehension”, Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics (Online), Association for Computational Linguistics, 2020, 7871–7880 | DOI

[12] P. Lewis et al., “Retrieval-Augmented Generation for Knowledge-Intensive NLP Tasks”, Proceedings of the 34th International Conference on Neural Information Processing Systems (NIPS'20) (Red Hook, NY, USA), Curran Associates Inc., 2020, 793, 9459–9474

[13] C.-Y. Lin, “ROUGE: A Package for Automatic Evaluation of Summaries”, Text Summarization Branches Out (Barcelona, Spain), Association for Computational Linguistics, 2004, 74–81

[14] Y. Liu, J. Gu, N. Goyal, X. Li, S. Edunov, M. Ghazvininejad, M. Lewis, and L. Zettlemoyer, “Multilingual Denoising Pre-Training for Neural Machine Translation”, Transactions of the Association for Computational Linguistics, 8 (2020), 726–742 | DOI

[15] V. Mikhailov, T. Shamardina, M. Ryabinin, A. Pestova, I. Smurov, and E. Artemova, “RuCoLA: Russian Corpus of Linguistic Acceptability”, Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing (Abu Dhabi, United Arab Emirates), Association for Computational Linguistics, 2022, 5207–5227 | DOI

[16] N. Muennighoff et al., “Crosslingual Generalization through Multitask Finetuning”, Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Toronto, Canada), Association for Computational Linguistics, 2023, 15991–16111 | DOI

[17] A. van den Oord, Y. Li, and O. Vinyals, Representation Learning with Contrastive Predictive Coding, 2019, arXiv: 1807.03748

[18] K. Papineni, S. Roukos, T. Ward, and W.-J. Zhu, “BLEU: A Method for Automatic Evaluation of Machine Translation”, Proceedings of the 40th Annual Meeting of the Association for Computational Linguistics (Philadelphia, Pennsylvania, USA), Association for Computational Linguistics, 2002, 311–318

[19] I. Proskurina, E. Artemova, and I. Piontkovskaya, “Can BERT Eat RuCoLA? Topological Data Analysis to Explain”, Proceedings of the 9th Workshop on Slavic Natural Language Processing 2023 (SlavicNLP 2023) (Dubrovnik, Croatia), Association for Computational Linguistics, 2023, 123–137 | DOI

[20] C. Raffel, N. Shazeer, A. Roberts, K. Lee, S. Narang, M. Matena, Y. Zhou, W. Li, and P.J. Liu, “Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer”, Journal of Machine Learning Research, 21:1 (2020), 140 | MR

[21] S. Rothe, S. Narayan, and A. Severyn, “Leveraging Pre-Trained Checkpoints for Sequence Generation Tasks”, Transactions of the Association for Computational Linguistics, 8 (2020), 264–280 | DOI

[22] J. Tiedemann, “Parallel Data, Tools, and Interfaces in OPUS”, Proceedings of the Eighth International Conference on Language Resources and Evaluation (LREC'12) (Istanbul, Turkey), European Language Resources Association (ELRA), 2012, 2214–2218

[23] A. Vaswani, N. Shazeer, N. Parmar, J. Uszkoreit, L. Jones, A.N. Gomez, L. Kaiser, and I. Polosukhin, “Attention is All You Need” (Long Beach, CA, USA), Advances in Neural Information Processing Systems, 30, 2017, 5998–6008

[24] L. Wang, N. Yang, X. Huang, L. Yang, R. Majumder, and F. Wei, Multilingual E5 Text Embeddings: A Technical Report, 2024, arXiv: 2402.05672 | MR

[25] J. Xu, A. Szlam, and J. Weston, “Beyond Goldfish Memory: Long-Term Open-Domain Conversation”, Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Dublin, Ireland), v. 1, Association for Computational Linguistics, 2022, 5180–5197

[26] L. Xue, N. Constant, A. Roberts, M. Kale, R. Al-Rfou, A. Siddhant, A. Barua, and C. Raffel, “mT5: A Massively Multilingual Pre-Trained Text-to-Text Transformer”, Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (Online), Association for Computational Linguistics, 2021, 483–498 | DOI

[27] S. Zhang, E. Dinan, J. Urbanek, A. Szlam, D. Kiela, and J. Weston, Personalizing Dialogue Agents: I Have a Dog, Do You Have Pets Too?, Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (Melbourne, Australia), v. 1, Association for Computational Linguistics, 2018, 2204–2213 | DOI

[28] T. Zhang, V. Kishore, F. Wu, K.Q. Weinberger, and Y. Artzi, “BERTScore: Evaluating Text Generation with BERT”, International Conference on Learning Representations, 2020

[29] W. Zhou, Q. Li, and C. Li, “Learning to Predict Persona Information for Dialogue Personalization Without Explicit Persona Description”, Findings of the Association for Computational Linguistics: ACL 2023 (Toronto, Canada), Association for Computational Linguistics, 2023, 2979–2991 | DOI

[30] B. Zhu, E. Frick, T. Wu, H. Zhu, K. Ganesan, W.L. Chiang, J. Zhang, and J. Jiao, Starling-7B: Improving LLM Helpfulness Harmlessness with RLAIF, 2023 https://openreview.net/forum?id=GqDntYTTbk#discussion

[31] D. Zmitrovich et al., A Family of Pretrained Transformer Language Models for Russian, 2023, arXiv: 2309.10931

Parcourir par

Geodesic

Parcourir par