Multilingual Pretrained based Multi-feature Fusion Model for English Text Classification

Ruijuan Zhang

Ruijuan Zhang

Computer Science and Information Systems, Tome 22 (2025) no. 1 Cet article a éte moissonné depuis la source Computer Science and Information Systems website

Voir la notice de l'article

Résumé

Deep learning methods have been widely applied to English text classification tasks in recent years, achieving strong performance. However, current methods face two significant challenges: (1) they struggle to effectively capture longrange contextual structure information within text sequences, and (2) they do not adequately integrate linguistic knowledge into representations for enhancing the performance of classifiers. To this end, a novel multilingual pre-training based multifeature fusion method is proposed for English text classification (MFFMP-ETC). Specifically, MFFMP-ETC consists of the multilingual feature extraction, the multilevel structure learning, and the multi-view representation fusion. MFFMP-ETC utilizes the Multilingual BERT as deep semantic extractor to introduce language information into representation learning, which significantly endows text representations with robustness. Then, MFFMP-ETC integrates Bi-LSTM and TextCNN into multilingual pre-training architecture to capture global and local structure information of English texts, via modelling bidirectional contextual semantic dependencies and multi-granularity local semantic dependencies. Meanwhile, MFFMP-ETC devises the multi-view representation fusion within the invariant semantic learning of representations to aggregate consistent and complementary information among views. MFFMP-ETC synergistically integrates Multilingual BERT’s deep semantic features, Bi-LSTM’s bidirectional context processing, and TextCNN local feature extraction, offering a more comprehensive and effective solution for capturing long-distance dependencies and nuanced contextual information in text classification. Finally, results on three datasets show MFFMP-ETC conducts a new baseline in terms of accuracy, sensitivity, and precision, verifying progressiveness and effectiveness of MFFMP-ETC in the text classification.

Keywords: Multi-feature fusion, multilingual pretrained model, English text classification, multi-level structure learning

@article{CSIS_2025_22_1_a6,
     author = {Ruijuan Zhang},
     title = {Multilingual {Pretrained} based {Multi-feature} {Fusion} {Model} for {English} {Text} {Classification}},
     journal = {Computer Science and Information Systems},
     year = {2025},
     volume = {22},
     number = {1},
     url = {http://geodesic.mathdoc.fr/item/CSIS_2025_22_1_a6/}
}

TY  - JOUR
AU  - Ruijuan Zhang
TI  - Multilingual Pretrained based Multi-feature Fusion Model for English Text Classification
JO  - Computer Science and Information Systems
PY  - 2025
VL  - 22
IS  - 1
UR  - http://geodesic.mathdoc.fr/item/CSIS_2025_22_1_a6/
ID  - CSIS_2025_22_1_a6
ER  -

%0 Journal Article
%A Ruijuan Zhang
%T Multilingual Pretrained based Multi-feature Fusion Model for English Text Classification
%J Computer Science and Information Systems
%D 2025
%V 22
%N 1
%U http://geodesic.mathdoc.fr/item/CSIS_2025_22_1_a6/
%F CSIS_2025_22_1_a6

Ruijuan Zhang. Multilingual Pretrained based Multi-feature Fusion Model for English Text Classification. Computer Science and Information Systems, Tome 22 (2025) no. 1. http://geodesic.mathdoc.fr/item/CSIS_2025_22_1_a6/

Parcourir par

Geodesic

Parcourir par