Towards Russian summarization: can architecture solve data limitations problems?

A. Akhmetgareeva; A. Abramov; I. Kuleshov; V. Leschuk; A. Fenogenova

A. Akhmetgareeva ; A. Abramov ; I. Kuleshov ; V. Leschuk ; A. Fenogenova

Zapiski Nauchnykh Seminarov POMI, Investigations on applied mathematics and informatics. Part IV, Tome 540 (2024), pp. 5-26

Cet article a éte moissonné depuis la source Math-Net.Ru

Voir la notice du chapitre de livre

Résumé

In this work, we investigate the automatic summarization problem, focusing on its significance, challenges, and methodologies, particularly in the context of the Russian language. We highlight the limitations of current evaluation metrics and datasets, representing diverse summarization scenarios. We study various approaches, including the formats of supervised fine-tuning, a comparison of models designed for Russian and those with cross-lingual capabilities, and the influence of reinforcement learning alignment on the final results. Contributions of this work include an examination of the summarization task for the Russian language, publication of a new instruction-based dataset and the best open-source model, and insights for further advances in the field.

Export
Comment citer

@article{ZNSL_2024_540_a0,
     author = {A. Akhmetgareeva and A. Abramov and I. Kuleshov and V. Leschuk and A. Fenogenova},
     title = {Towards {Russian} summarization: can architecture solve data limitations problems?},
     journal = {Zapiski Nauchnykh Seminarov POMI},
     pages = {5--26},
     year = {2024},
     volume = {540},
     language = {en},
     url = {http://geodesic.mathdoc.fr/item/ZNSL_2024_540_a0/}
}

TY  - JOUR
AU  - A. Akhmetgareeva
AU  - A. Abramov
AU  - I. Kuleshov
AU  - V. Leschuk
AU  - A. Fenogenova
TI  - Towards Russian summarization: can architecture solve data limitations problems?
JO  - Zapiski Nauchnykh Seminarov POMI
PY  - 2024
SP  - 5
EP  - 26
VL  - 540
UR  - http://geodesic.mathdoc.fr/item/ZNSL_2024_540_a0/
LA  - en
ID  - ZNSL_2024_540_a0
ER  -

%0 Journal Article
%A A. Akhmetgareeva
%A A. Abramov
%A I. Kuleshov
%A V. Leschuk
%A A. Fenogenova
%T Towards Russian summarization: can architecture solve data limitations problems?
%J Zapiski Nauchnykh Seminarov POMI
%D 2024
%P 5-26
%V 540
%U http://geodesic.mathdoc.fr/item/ZNSL_2024_540_a0/
%G en
%F ZNSL_2024_540_a0

A. Akhmetgareeva; A. Abramov; I. Kuleshov; V. Leschuk; A. Fenogenova. Towards Russian summarization: can architecture solve data limitations problems?. Zapiski Nauchnykh Seminarov POMI, Investigations on applied mathematics and informatics. Part IV, Tome 540 (2024), pp. 5-26. http://geodesic.mathdoc.fr/item/ZNSL_2024_540_a0/

Bibliographie
Cité par

[1] A. Nenkova and K. McKeown, “Automatic Summarization”, Found. Trends Inf. Retr, 5 (2011) | DOI

[2] W. Kryściński, B. McCann, C. Xiong, and R. Socher, Evaluating the Factual Consistency of Abstractive Text Summarization, 2019, arXiv: 1910.12840

[3] L. Ouyang, J. Wu, X. Jiang, D. Almeida, C.L. Wainwright, P. Mishkin, C. Zhang, S. Agarwal, K. Slama, A. Ray, J. Schulman, J. Hilton, F. Kelton, L. Miller, M. Simens, A. Askell, P. Welinder, P. Christiano, J. Leike, and R. Lowe, Training Language Models to Follow Instructions with Human Feedback, 2022, arXiv: 2203.02155

[4] N. Stiennon, L. Ouyang, J. Wu, D.M. Ziegler, R. Lowe, C. Voss, A. Radford, D. Amodei, and P. Christiano, Learning to Summarize from Human Feedback, 2022, arXiv: 2009.01325

[5] M. Shu, J. Wang, C. Zhu, J. Geiping, C. Xiao, and T. Goldstein, On the Exploitability of Instruction Tuning, 2023, arXiv: 2306.17194

[6] J. Zhang, Y. Zhao, M. Saleh, and P.J. Liu, PEGASUS: Pre-training with Extracted Gap-sentences for Abstractive Summarization, 2020, arXiv: 1912.08777

[7] M. Lewis, Y. Liu, N. Goyal, M. Ghazvininejad, A. Mohamed, O. Levy, V. Stoyanov, and L. Zettlemoyer, BART: Denoising Sequence-to-Sequence Pre-training for Natural Language Generation, Translation, and Comprehension, 2019, arXiv: 1910.13461

[8] P. Christiano, J. Leike, T.B. Brown, M. Martic, S. Legg, and D. Amodei, Deep Reinforcement Learning from Human Preferences, 2023, arXiv: 1706.03741

[9] L. Gao, J. Schulman, and J. Hilton, Scaling Laws for Reward Model Overoptimization, 2022, arXiv: 2210.10760

[10] I. Gusev, Dataset for Automatic Summarization of Russian News, 2020, arXiv: 2006.11063

[11] I.O. Gusev, “Importance of Copying Mechanism for News Headline Generation”, Komp. Lingv. Intell. Tekhnol., 18 (2019), 229–236

[12] P. Liang, R. Bommasani, T. Lee, D. Tsipras, D. Soylu, M. Yasunaga, Y. Zhang, D. Narayanan, Y. Wu, A. Kumar, B. Newman, B. Yuan, B. Yan, C. Zhang, C. Cosgrove, C. Manning, C. Ré, D. Acosta-Navas, D. Hudson, and Y. Koreeda, Holistic Evaluation of Language Models, 2022, arXiv: 2211.09110

[13] O. Shliazhko, A. Fenogenova, M. Tikhonova, A. Kozlova, V. Mikhailov, and T. Shavrina, “mGPT: Few-Shot Learners Go Multilingual”, Trans. Assoc. Comput. Linguist., 12 (2024), 58–79 | DOI

[14] Y. Bai, A. Jones, K. Ndousse, A. Askell, A. Chen, N. DasSarma, and J. Kaplan, Training a Helpful and Harmless Assistant with Reinforcement Learning from Human Feedback, 2022, arXiv: 2204.05862

[15] Y. Liu and M. Lapata, “Text Summarization with Pretrained Encoders”, Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP), 2019, 3730–3740 | MR

[16] E. Clark, S. Rijhwani, S. Gehrmann, J. Maynez, R. Aharoni, V. Nikolaev, and A. Parikh, “SEAHORSE: A Multilingual, Multifaceted Dataset for Summarization Evaluation”, Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing, 2023, 9397–9413 | DOI

[17] V. Rennard, G. Shang, J. Hunter, and M. Vazirgiannis, “Abstractive Meeting Summarization: A Survey”, Trans. Assoc. Comput. Linguist., 11 (2023), 861–884 | DOI

[18] H. Koh, J. Ju, M. Liu, and S. Pan, “An Empirical Survey on Long Document Summarization: Datasets, Models, and Metrics”, ACM Comput. Surv., 55 (2022) | Zbl

[19] M. Cao, A Survey on Neural Abstractive Summarization Methods and Factual Consistency of Summarization, 2022, arXiv: 2204.09519

[20] L. Wang, N. Yang, X. Huang, L. Yang, R. Majumder, and F. Wei, Multilingual E5 Text Embeddings: A Technical Report, 2024, arXiv: 2402.05672 | MR

[21] T. Hasan, A. Bhattacharjee, M.S. Islam, K. Mubasshir, Y.-F. Li, Y.-B. Kang, M.S. Rahman, and R. Shahriyar, “XL-Sum: Large-Scale Multilingual Abstractive Summarization for 44 Languages”, Findings of the Association for Computational Linguistics: ACL-IJCNLP 2021 (Aug. 2021, Online), Association for Computational Linguistics, 2021, 4693–4703 | DOI

[22] S. Mukherjee, A. Mitra, G. Jawahar, S. Agarwal, H. Palangi, and A. Awadallah, Orca: Progressive Learning from Complex Explanation Traces of GPT-4, 2023, arXiv: 2306.02707

[23] R. Nallapati, B. Zhou, C. Dos Santos, C. Gulcehre, and B. Xiang, “Abstractive Text Summarization Using Sequence-to-Sequence RNNs and Beyond”, Proceedings of the Conference on Empirical Methods in Natural Language Processing (EMNLP) (Feb. 2016), 280–290

[24] F. Ladhak, E. Durmus, C. Cardie, and K. McKeown, “WikiLingua: A New Benchmark Dataset for Cross-Lingual Abstractive Summarization”, Findings of the Association for Computational Linguistics: EMNLP 2020 (Nov. 2020, Online), Association for Computational Linguistics, 2020, 4034–4048 | DOI

[25] B. Gliwa, I. Mochol, M. Biesek, and A. Wawer, “SAMSum Corpus: A Human-Annotated Dialogue Dataset for Abstractive Summarization”, Proceedings of the 2nd Workshop on New Frontiers in Summarization (Nov. 2019, Hong Kong, China), Association for Computational Linguistics, 2019, 70–79 | DOI

[26] Y. Chen, Y. Liu, L. Chen, and Y. Zhang, “DialogSum: A Real-Life Scenario Dialogue Summarization Dataset”, Findings of the Association for Computational Linguistics: ACL-IJCNLP 2021 (Aug. 2021, Online), Association for Computational Linguistics, 5062–5074

[27] A. Fabbri, W. Kryściński, B. McCann, C. Xiong, R. Socher, and D. Radev, “SummEval: Re-Evaluating Summarization Evaluation”, Trans. Assoc. Comput. Linguist., 9 (2021), 391–409 | DOI

[28] T. Zhang, V. Kishore, F. Wu, K.Q. Weinberger, and Y. Artzi, “BERTScore: Evaluating Text Generation with BERT”, International Conference on Learning Representations (ICLR), 2020 https://openreview.net/forum?id=SkeHuCVFDr

[29] K. Papineni, S. Roukos, T. Ward, and W.J. Zhu, “BLEU: a Method for Automatic Evaluation of Machine Translation”, Proceedings of the Conference on Empirical Methods in Natural Language Processing, 2002

[30] C.-Y. Lin, “ROUGE: A Package for Automatic Evaluation of Summaries”, Text Summarization Branches Out (Jul. 2004, Barcelona, Spain), Association for Computational Linguistics, 74–81 https://aclanthology.org/W04-1013

[31] M. Popović, “chrF: Character N-Gram F-Score for Automatic MT Evaluation”, Proceedings of the Tenth Workshop on Statistical Machine Translation (Sep. 2015, Lisbon, Portugal), eds. O. Bojar et al., Association for Computational Linguistics, 392–395

[32] S. Banerjee and A. Lavie, “Meteor: An Automatic Metric for MT Evaluation with High Levels of Correlation with Human Judgments”, Proceedings of ACL-WMT, 2004, 65–72

[33] Y. Bai, J. Ying, Y. Cao, X. Lv, Y. He, X. Wang, J. Yu, K. Zeng, Y. Xiao, H. Lyu, J. Zhang, J. Li, and L. Hou, Benchmarking Foundation Models with Language-Model-as-an-Examiner, 2023, arXiv: 2306.04181

[34] Z. Li, X. Xu, T. Shen, C. Xu, J.-C. Gu, and C. Tao, Leveraging Large Language Models for NLG Evaluation: A Survey, 2024, arXiv: 2401.07103

[35] A.Q. Jiang, A. Sablayrolles, A. Mensch, C. Bamford, D.S. Chaplot, D. de las Casas, F. Bressand, G. Lengyel, G. Lample, L. Saulnier, L.R. Lavaud, M.-A. Lachaux, P. Stock, T. Le Scao, T. Lavril, T. Wang, T. Lacroix, and W. El Sayed, Mistral 7B, 2023, arXiv: 2310.06825

[36] N. Shazeer and M. Stern, Adafactor: Adaptive Learning Rates with Sublinear Memory Cost, 2018, arXiv: 1804.04235

[37] I. Loshchilov and F. Hutter, Decoupled Weight Decay Regularization, 2019, arXiv: 1711.05101 | MR

[38] A. Fenogenova et al., MERA: A Comprehensive LLM Evaluation in Russian, 2024, arXiv: 2401.04531

[39] H. Touvron et al., Llama 2: Open Foundation and Fine-Tuned Chat Models, 2023, arXiv: 2307.09288

[40] T. Shavrina, A. Fenogenova, A. Emelyanov, D. Shevelev, E. Artemova, V. Malykh, V. Mikhailov, M. Tikhonova, A. Chertok, and A. Evlampiev, RussianSuperGLUE: A Russian Language Understanding Evaluation Benchmark, 2020, arXiv: 2010.15925

[41] D. Zmitrovich et al., A Family of Pretrained Transformer Language Models for Russian, 2023, arXiv: 2309.10931

[42] R. Bommasani et al., On the Opportunities and Risks of Foundation Models, 2022, arXiv: 2108.07258

[43] J. Zhao, Z. Zhang, L. Gao, Q. Zhang, T. Gui, and X. Huang, LLaMA Beyond English: An Empirical Study on Language Capability Transfer, 2024, arXiv: 2401.01055 | MR

Parcourir par

Geodesic

Parcourir par