Extracting factual information about the pandemic from open Internet sources
Matematičeskaâ biologiâ i bioinformatika, Tome 17 (2022) no. 2, pp. 423-440.

Voir la notice de l'article provenant de la source Math-Net.Ru

A large number of different source data is needed for multi-agent models of the spread of infectious diseases. Most of them are not directly accessible. Therefore, one of the key problems to design such models is the development of tools for obtaining data from various sources. This article presents approaches that allow to extract the values of the parameters of the functioning of the simulated society and statistical data on the development of the pandemic from text messages published in the Internet. The proposed method and software implementation provide intelligent search of open source information in the Internet and process of unstructured data. The data collected this way used to set up parameters of mathematical model, which provides ability to study various scenarios and predict progress of the epidemic in concrete regions. The emphasis of the proposed approach is placed on two main technologies. The first is the use of regular expressions. The second is analysis using machine learning methods. The use of the regular expression method allows for high-speed text processing, but its applicability is limited by a strong dependence on the context. Machine learning allows to adapt the information context of the message, but at the same time there is a relatively large amount of time spent on analysis. To improve the accuracy of the analysis and to level the shortcomings of each of these approaches, ways of combining these technologies are proposed. The article presents the obtained results of optimization of algorithms for obtaining the necessary data.
@article{MBB_2022_17_2_a1,
     author = {E. Yu. Akulinina and A. L. Karmanov and N. A. Teplykh and V. V. Vlasov and V. I. Baluta and S. S. Varykhanov and A. A. Karandeev and V. P. Osipov and Yu. G. Rykov and B. N. Chetverushkin},
     title = {Extracting factual information about the pandemic from open {Internet} sources},
     journal = {Matemati\v{c}eska\^a biologi\^a i bioinformatika},
     pages = {423--440},
     publisher = {mathdoc},
     volume = {17},
     number = {2},
     year = {2022},
     language = {ru},
     url = {http://geodesic.mathdoc.fr/item/MBB_2022_17_2_a1/}
}
TY  - JOUR
AU  - E. Yu. Akulinina
AU  - A. L. Karmanov
AU  - N. A. Teplykh
AU  - V. V. Vlasov
AU  - V. I. Baluta
AU  - S. S. Varykhanov
AU  - A. A. Karandeev
AU  - V. P. Osipov
AU  - Yu. G. Rykov
AU  - B. N. Chetverushkin
TI  - Extracting factual information about the pandemic from open Internet sources
JO  - Matematičeskaâ biologiâ i bioinformatika
PY  - 2022
SP  - 423
EP  - 440
VL  - 17
IS  - 2
PB  - mathdoc
UR  - http://geodesic.mathdoc.fr/item/MBB_2022_17_2_a1/
LA  - ru
ID  - MBB_2022_17_2_a1
ER  - 
%0 Journal Article
%A E. Yu. Akulinina
%A A. L. Karmanov
%A N. A. Teplykh
%A V. V. Vlasov
%A V. I. Baluta
%A S. S. Varykhanov
%A A. A. Karandeev
%A V. P. Osipov
%A Yu. G. Rykov
%A B. N. Chetverushkin
%T Extracting factual information about the pandemic from open Internet sources
%J Matematičeskaâ biologiâ i bioinformatika
%D 2022
%P 423-440
%V 17
%N 2
%I mathdoc
%U http://geodesic.mathdoc.fr/item/MBB_2022_17_2_a1/
%G ru
%F MBB_2022_17_2_a1
E. Yu. Akulinina; A. L. Karmanov; N. A. Teplykh; V. V. Vlasov; V. I. Baluta; S. S. Varykhanov; A. A. Karandeev; V. P. Osipov; Yu. G. Rykov; B. N. Chetverushkin. Extracting factual information about the pandemic from open Internet sources. Matematičeskaâ biologiâ i bioinformatika, Tome 17 (2022) no. 2, pp. 423-440. http://geodesic.mathdoc.fr/item/MBB_2022_17_2_a1/

[1] V. L. Makarov, A. R. Bakhtizin, E. D. Sushko, A. F. Ageeva, “Modelirovanie epidemii COVID-19 preimuschestva agent-orientirovannogo podkhoda”, Ekonomicheskie i sotsialnye peremeny: fakty, tendentsii, prognoz, 13:4 (2020), 58–73

[2] Meriya Moskvy potratila 516 mln. rublei na pokupku dannykh o peredvizheniyakh gorozhan, (data obrascheniya: 13.05.2022) https://www.rbc.ru/politics/04/03/2019/5c7cd5fe9a794760d9cfb900

[3] Onlain statistika agentstva LiveStats, (data obrascheniya: 20.04.2022) http://www.internetlivestats.com

[4] A. A. Sukhanov, A. S. Maratkanov, “Analiz osnovnykh istochnikov sotsialnykh dannykh v rossiiskom segmente seti Internet”, International Scientific Review, 2017, no. 1, 20–22

[5] A. O. Shigarov, “Proekt intellektualnoi sistemy izvlecheniya tablichnoi informatsii iz nestrukturirovannykh tekstov”, Vestnik Buryatskogo gosudarstvennogo universiteta, 2013, no. 9, 110–118

[6] L. M. Pivovarova, “Faktograficheskii analiz teksta v sisteme podderzhki prinyatiya reshenii”, Vestnik SPbGU. Ser. 9. Filologiya. Vostokovedenie. Zhurnalistika, 2010, no. 4, 190–197

[7] N. A. Vlasova, “K probleme razmetki tekstov na russkom yazyke dlya zadachi izvlecheniya faktograficheskoi informatsii”, Programmnye sistemy: teoriya i prilozheniya, 5:4 (22) (2014), 67–82

[8] A. E. Khmelnov, A. O. Shigarov, “Metod izvlecheniya tablits iz neformatirovannogo teksta”, Vychislitelnye tekhnologii, 13:1 (2008), 93–101

[9] A. N. Vinogradov, I. N. Vozdvizhenskii, D. A. Kormalev, E. P. Kurshev, “Modelirovanie vremennogo aspekta opisaniya situatsii v zadache izvlecheniya informatsii iz tekstov”, Programmnye sistemy: teoriya i prilozheniya, 5:4 (22) (2014), 215–229

[10] A. V. Vitsentii, V. V. Dikovitskii, M. G. Shishaev, “Tekhnologiya izvlecheniya i vizualizatsii prostranstvennykh dannykh, poluchennykh pri analize tekstov”, Trudy Kolskogo nauchnogo tsentra RAN, 11:8 (11) (2020), 115–119

[11] A. Kao, S. Poteet, Natural language processing and text mining, Springer-Verlag, London, 2007, 272 pp.

[12] N. O. Krutikov, N. G. Podakov, V. A. Zhilyakova, “Razrabotka sistemy izvlecheniya informatsii iz tekstov na russkom yazyke v oblasti kriminalistiki”, Problemy informatiki, 2016, no. 3, 70–84

[13] A. V. Komarova, A. A. Menschikov, A. V. Polev, Yu. A. Gatchin, “Metod avtomatizirovannogo izvlecheniya adresov iz nestrukturirovannykh tekstov”, International Journal of Open Information technologies, 5:11 (2017), 21–27

[14] Fridl D., Regulyarnye vyrazheniya, Simvol-Plyus, SPb., 2008, 608 pp.

[15] A. L. Smolyakov, “Izvlechenie znanii iz tekstovoi informatsii s pomoschyu metoda shablonov”, Vestnik SPbGU. Prikladnaya matematika. Informatika. Protsessy upravleniya, 2008, no. 2, 44–50

[16] Proekt Natasha. Nabor kachestvennykh otkrytykh instrumentov dlya obrabotki estestvennogo russkogo yazyka (NLP), (data obrascheniya: 13.05.2022) https://habr.com/ru/post/516098

[17] A. S. Goncharov, M. A. Goncharova, “Robotizatsiya obrabotki obraschenii grazhdan po voprosam sotsialnogo i bytovogo obsluzhivaniya”, Nauka bez granits. Tekhnicheskie nauki, 2021, no. 6 (58), 40–52

[18] S. Sharoff, J. Nivre, The proper place of men and machines in language technology Processing Russian without any linguistic knowledge, 2011 (data obrascheniya: 01.12.2022 ) http://corpus.leeds.ac.uk/serge/publications/2011-dialog.pdf

[19] E. S. Inshakova, L. L. Iomdin, L. G. Mityushin, V. G. Sizov, T. I. Frolova, L. L. Tsinman, “SinTagRus segodnya”, Trudy Instituta russkogo yazyka im. V. V. Vinogradova, 2019, no. 21, 14–40

[20] V. B. Barakhnin, O. Yu. Kozhemyakina, R. I. Mukhamediev, Yu. S. Borzilova, K. O. Yakunin, “Proektirovanie struktury programmnoi sistemy obrabotki korpusov tekstovykh dokumentov”, Biznes-informatika, 13:4 (2019), 60–72

[21] R. V. Trotsenko, M. V. Bolotov, “Protsess izvlecheniya dannykh iz raznotipnykh istochnikov”, Privolzhskii nauchnyi vestnik, 2014, no. 12, 52–54

[22] Z. T. Talgatova, “Analiz i sravnenie suschestvuyuschikh modelei protsessov ETL dlya khranilisch dannykh”, Tekhnicheskie nauki ot teorii k praktike, 2016, no. 1 (54), 85–94