About algorithms for searching computer information
Prikladnaya Diskretnaya Matematika. Supplement, no. 16 (2023), pp. 126-129.

Voir la notice de l'article provenant de la source Math-Net.Ru

Due to the large growth in the volume of data, there are many problems with information processing. If the search for information is a key task of the system, for example, the search for malware by antivirus programs, you need to know the principles of its organization. In this paper, we study the algorithms for searching a substring in a string: the naive search algorithm, the Knuth — Morris — Pratt algorithm, the Boyer — Moore algorithm, the Rabin — Karp algorithm, as well as wildcards applicable to them (substitution characters, ‘‘matching’’ with any character or their sequence). As a result, a C# program has been developed and implemented to search for files by various parameters (file name, extension, size and content) using the above algorithms and methods. The program allows you to scan a given directory to search for malware. Computational experiments were carried out, including changing the maximum number of characters of the sample and text, the corresponding conclusions were drawn. The overall best file search time (it is enough to find the first occurrence) turned out to be using the Boyer — Moore algorithm, the worst — using the Rabin — Karp algorithm. To search for files for small given data and parameters, you can use naive search, for medium and large data and parameters for small samples it is better to use the Knuth — Morris — Pratt algorithm, for large ones — Boyer — Moore algorithm.
Keywords: Boyer — Moore algorithm, cybersecurity, file search, Knuth — Morris — Pratt algorithm, Rabin — Karp algorithm, scanning, substring search in a string.
@article{PDMA_2023_16_a31,
     author = {A. V. Zharkova and A. G. Musugalieva},
     title = {About algorithms for searching computer information},
     journal = {Prikladnaya Diskretnaya Matematika. Supplement},
     pages = {126--129},
     publisher = {mathdoc},
     number = {16},
     year = {2023},
     language = {ru},
     url = {http://geodesic.mathdoc.fr/item/PDMA_2023_16_a31/}
}
TY  - JOUR
AU  - A. V. Zharkova
AU  - A. G. Musugalieva
TI  - About algorithms for searching computer information
JO  - Prikladnaya Diskretnaya Matematika. Supplement
PY  - 2023
SP  - 126
EP  - 129
IS  - 16
PB  - mathdoc
UR  - http://geodesic.mathdoc.fr/item/PDMA_2023_16_a31/
LA  - ru
ID  - PDMA_2023_16_a31
ER  - 
%0 Journal Article
%A A. V. Zharkova
%A A. G. Musugalieva
%T About algorithms for searching computer information
%J Prikladnaya Diskretnaya Matematika. Supplement
%D 2023
%P 126-129
%N 16
%I mathdoc
%U http://geodesic.mathdoc.fr/item/PDMA_2023_16_a31/
%G ru
%F PDMA_2023_16_a31
A. V. Zharkova; A. G. Musugalieva. About algorithms for searching computer information. Prikladnaya Diskretnaya Matematika. Supplement, no. 16 (2023), pp. 126-129. http://geodesic.mathdoc.fr/item/PDMA_2023_16_a31/

[1] Gasfild D., Stroki, derevya i posledovatelnosti v algoritmakh: Informatika i vychislitelnaya biologiya, Nevskii dialekt, BKhV-Peterburg, SPb., 2003

[2] Kormen T., Leizerson Ch., Riverst M., Algoritmy: postroenie i analiz, Izd-vo «Vilyams», M., 2005

[3] Makkonnell Dzh., Osnovy sovremennykh algoritmov, Tekhnosfera, M., 2004

[4] GOST R 51188-98. Zaschita informatsii. Ispytaniya programmnykh sredstv na nalichie kompyuternykh virusov. Tipovoe rukovodstvo, IPK Izdatelstvo standartov, M., 2003

[5] ClamAV Download, , 2017 http://www.clamav.net/downloads