Application of Benford's law for quality assessment of preventive screening data
Matematičeskaâ biologiâ i bioinformatika, Tome 17 (2022) no. 2, pp. 230-249.

Voir la notice de l'article provenant de la source Math-Net.Ru

An empirical Benford's law which describes the probability of the appearance of certain first significant digits in many distributions taken from real life, is used to identify anomalies in various kinds of data. Our aim was to test Benford's law to assess the quality of mass preventive screening data on the example of bioelectrical impedance analysis (BIA) data from Moscow health centers. As was shown earlier, such a data is characterized by a high level of contamination by artificially generated and falsified data. A generated 2010–2019 database of BIA measurements contained 1361019 measurement records in the age range of the examined persons from 5 to 96 years. Application of the expert quality assessment algorithm, which was used as a reference for evaluation of the effectiveness of Benford analysis, revealed a high percentage of incorrect data (66.5%) which was dominated by falsified data. To characterize the degree of the data compliance with Benford's law, the mean absolute deviations of the frequency distributions of the first and first two significant digits deviations from the proper values and chi-squared statistics for the tenth powers of the standardized resistance, reactance, and resistance index values were assessed for each health center. A significant correlation was observed between the data deviation from Benford's law and the percentage of incorrect data as provided by the expert quality assessment algorithm ($\rho_{\mathrm{max}}$ = 0.66 and 0.62 for the mean absolute deviations and $\chi^2$ statistics, respectively, based on the resistance value and the first significant digit). It is suggested that deviation of the BIA data from Benford's law serves as a sufficient, but not a necessary, condition for their contamination. For those health centers, in which most of the incorrect data were represented by multiple measurements of the same person under the guise of different ones, the data were in good agreement with Benford's law. If the structure of incorrect data was dominated by measurements of the calibration block, software emulations of BIA measurements and outliers, then the use of Benford's law made it possible to effectively rank health centers by the level of data authenticity.
@article{MBB_2022_17_2_a0,
     author = {O. A. Starunova and S. G. Rudnev and A. E. Ivanova and V. G. Semenova and V. I. Starodubov},
     title = {Application of {Benford's} law for quality assessment of preventive screening data},
     journal = {Matemati\v{c}eska\^a biologi\^a i bioinformatika},
     pages = {230--249},
     publisher = {mathdoc},
     volume = {17},
     number = {2},
     year = {2022},
     language = {ru},
     url = {http://geodesic.mathdoc.fr/item/MBB_2022_17_2_a0/}
}
TY  - JOUR
AU  - O. A. Starunova
AU  - S. G. Rudnev
AU  - A. E. Ivanova
AU  - V. G. Semenova
AU  - V. I. Starodubov
TI  - Application of Benford's law for quality assessment of preventive screening data
JO  - Matematičeskaâ biologiâ i bioinformatika
PY  - 2022
SP  - 230
EP  - 249
VL  - 17
IS  - 2
PB  - mathdoc
UR  - http://geodesic.mathdoc.fr/item/MBB_2022_17_2_a0/
LA  - ru
ID  - MBB_2022_17_2_a0
ER  - 
%0 Journal Article
%A O. A. Starunova
%A S. G. Rudnev
%A A. E. Ivanova
%A V. G. Semenova
%A V. I. Starodubov
%T Application of Benford's law for quality assessment of preventive screening data
%J Matematičeskaâ biologiâ i bioinformatika
%D 2022
%P 230-249
%V 17
%N 2
%I mathdoc
%U http://geodesic.mathdoc.fr/item/MBB_2022_17_2_a0/
%G ru
%F MBB_2022_17_2_a0
O. A. Starunova; S. G. Rudnev; A. E. Ivanova; V. G. Semenova; V. I. Starodubov. Application of Benford's law for quality assessment of preventive screening data. Matematičeskaâ biologiâ i bioinformatika, Tome 17 (2022) no. 2, pp. 230-249. http://geodesic.mathdoc.fr/item/MBB_2022_17_2_a0/

[1] Benford F., “The law of anomalous numbers”, Proc. Am. Phil. Soc., 78:4 (1938), 551–572

[2] C. Durtschi, W. Hillison, C. Pacini, “The effective use of Benford's law to assist in detecting fraud in accounting data”, J. Forensic Accounting, 5:1 (2004), 17–33

[3] W. R. Jr. Mebane, Election forensics: vote counts and Benford's law, (data obrascheniya: 02.11.2022) https://www-personal.umich.edu/w̃mebane/pm06.pdf

[4] A. Khosravani, C. Rasinariu, Emergence of Benford's law in music, 2018, arXiv: (data obrascheniya: 11.10.2022) 1805.06506 [physics.soc-ph]

[5] J. F. Coeurjolly, Digit analysis for Covid-19 reported data, 2020, arXiv: (data obrascheniya: 11.10.2022) 2005.05009 [stat.AP]

[6] S. Newcomb, “Note on the frequency of use of different digits in natural numbers”, Am. J. Math, 4:1 (1881), 39–40

[7] J. Franel, “A propos des tables de logarithmes”, Vjschr. Naturf. Ges. Zurich, 62:1-2 (1917), 286–295

[8] E. G. Boring, “The logic of normal law of error in mental measurement”, Am. J. Psychology, 31:1 (1920), 1–30

[9] T. P. Hill, “A statistical derivation of the significant-digit law”, Statist. Sci, 10:4 (1995), 354–363 | DOI

[10] M. J. Nigrini, Benford's law: application for forensic accounting, auditing and fraud detection, New Jersey, Wiley and Sons, 2012, 352 pp.

[11] A. Berger, T. P. Hill, E. Rogers, Benford online bibliography. 2009–2022, (data obrascheniya: 11.10.2022) http://www.benfordonline.net

[12] A. Berger, T. P. Hill, “A basic theory of Benford's law”, Probab. Surveys, 8 (2011), 1–126 | DOI

[13] A. G. Chuchalin, “Profilaktika i kontrol khronicheskikh neinfektsionnykh zabolevanii”, Pulmonologiya, 2009, no. 1, 5–10

[14] O. S. Kobyakova, E. S. Kulikov, R. D. Malykh, G. E. Chernogoryuk, I. A. Deev, E. A. Starovoitova, N. A. Kirillova, T. A. Zagromova, M. A. Balaganskaya, “Strategii profilaktiki khronicheskikh neinfektsionnykh zabolevanii: sovremennyi vzglyad na problemu”, Kardiovaskulyarnaya terapiya i profilaktika, 18:4 (2020), 92–98 | DOI

[15] NCD Risk Factor Collaboration, “Worldwide trends in body-mass index, underweight, overweight, and obesity from 1975 to 2016: a pooled analysis of 2416 population-based measurement studies in 1289 million children, adolescents, and adults”, Lancet, 390:10113 (2017), 2627–2642 | DOI

[16] R. Silverio, D. C. Goncalves, M. F. Andrade, M. Seelaender, Coronavirus disease 2019 (COVID-19) and nutritional status: the missing link?, Adv. Nutr, 12:3 (2021), 682–692 | DOI

[17] S. B. Heymsfield, T. G. Lohman, Z. Wang, S. B. Going (eds.), Human body composition, 2nd ed., Human Kinetics, Champaign, IL, 2005, 533 pp.

[18] D. V. Nikolaev, A. V. Smirnov, I. G. Bobrinskaya, S. G. Rudnev, Bioimpedansnyi analiz sostava tela cheloveka, Nauka, M., 2009, 392 pp.

[19] N. V. Pogosova, E. K. Vergazova, A. K. Ausheva, S. S. Suvorov, S. A. Boitsov, “Tsentry zdorovya: dostignutye rezultaty i perspektivy”, Profilakticheskaya meditsina, 17:4 (2014), 16–24

[20] O. V. Krivonos, S. A. Boitsov, N. V. Pogosova, Yu. M. Yufereva, O. O. Yanushevich, E. M. Kuzmina, V. V. Neroev, V. A. Tutelyan, A. K. Baturin, A. V. Pogozheva, E. A. Bryun, Okazanie meditsinskoi pomoschi vzroslomu naseleniyu v tsentrakh zdorovya, metodicheskie rekomendatsii, M., 2012, 110 pp.

[21] V. I. Starodubov, S. G. Rudnev, D. V. Nikolaev, K. A. Korostylev, “Federalnyi informatsionnyi resurs tsentrov zdorovya: sovremennoe sostoyanie i perspektivy razvitiya”, Sotsialnye aspekty zdorovya naseleniya, 45:5 (2015) (data obrascheniya: 11.10.2022) http://vestnik.mednet.ru/content/view/706/30/lang,ru/

[22] O. A. Starunova, S. G. Rudnev, V. I. Starodubov, “HCViewer: software and technology for quality control and processing raw mass data of preventive screening”, Russ. J. Numer. Anal. Math. Model, 32:5 (2017), 315–326 | DOI

[23] S. G. Rudnev, N. P. Soboleva, S. A. Sterlikov, D. V. Nikolaev, O. A. Starunova, S. P. Chernykh, T. A. Eryukova, V. A. Kolesnikov, O. A. Melnichenko, E. G. Ponomareva, Bioimpedansnoe issledovanie sostava tela naseleniya Rossii, RIO TsNIIOIZ, M., 2014, 493 pp.

[24] O. A. Starunova, S. G. Rudnev, V. I. Starodubov, HCViewer: programma dlya avtomatizirovannogo analiza kachestva, filtratsii i obrabotki massovykh dannykh profilakticheskogo skrininga v tsentrakh zdorovya, Svidetelstvo o gos. registratsii programmy dlya EVM No 2020665580 ot 27.11.2020 g.

[25] M. Mikkers, W. Sauter, P. Vincke, J. Boertjens, Healthcare fraud, corruption and waste in Europe: national and academic perspectives, Eleven International Publishing, The Hague, 2017, 336 pp.

[26] Global Health Care Anti-Fraud Network, , 2022 (data obrascheniya: 11.10.2022) http://www.ghcan.org

[27] Sistema otsenki rezultativnosti i effektivnosti kontrolno-nadzornoi deyatelnosti, , Federalnaya sluzhba po nadzoru v sfere zdravookhraneniya (data obrascheniya: 11.10.2022) https://roszdravnadzor.gov.ru/reform/effectiveness

[28] Cinelli C., Package «benford.analysis». Benford analysis for data validation and forensic analytics, Version 0.1.5, December 21, 2018 (data obrascheniya: 11.10.2022) https://cran.r-project.org/web/packages/benford.analysis/benford.analysis.pdf

[29] Data-Field 23106. Impedance of whole body. Data, , UK Biobank, 2022 (data obrascheniya: 11.10.2022) https://biobank.ndph.ox.ac.uk/ukb/field.cgi?id=23106

[30] J. Morrow, Benford's law, families of distributions and a test basis, Centre for Economic Performance, London School of Economics and Political Science, 2014, 29 pp. (data obrascheniya: 11.10.2022) http://cep.lse.ac.uk/pubs/download/dp1291.pdf

[31] V. I. Starodubov, S. G. Rudnev, D. V. Nikolaev, K. A. Korostylev, “O kachestve dannykh profilakticheskogo skrininga v tsentrakh zdorovya i sposobe povysheniya effektivnosti byudzhetnykh raskhodov”, Analiticheskii vestnik Soveta Federatsii FS RF, 44:597 (2015), 43–49