Python fuzzing for trustworthy machine learning frameworks
Zapiski Nauchnykh Seminarov POMI, Investigations on applied mathematics and informatics. Part II–2, Tome 530 (2023), pp. 38-50 Cet article a éte moissonné depuis la source Math-Net.Ru

Voir la notice du chapitre de livre

Ensuring the security and reliability of machine learning frameworks is crucial for building trustworthy AI-based systems. Fuzzing, a popular technique in the secure software development lifecycle (SSDLC), can be used to develop secure and robust software. Popular machine learning frameworks such as PyTorch and TensorFlow are complex and written in multiple programming languages including C/C++ and Python. We propose a dynamic analysis pipeline for Python projects using the Sydr-Fuzz toolset. Our pipeline includes fuzzing, corpus minimization, crash triaging, and coverage collection. Crash triaging and severity estimation are important steps to ensure that the most critical vulnerabilities are addressed promptly. Furthermore, the proposed pipeline is integrated in GitLab CI. To identify the most vulnerable parts of the machine learning frameworks, we analyze their potential attack surfaces and develop fuzz targets for PyTorch, TensorFlow, and related projects such as h5py. Applying our dynamic analysis pipeline to these targets, we were able to discover 3 new bugs and propose fixes for them.
@article{ZNSL_2023_530_a3,
     author = {I. Yegorov and E. Kobrin and D. Parygina and A. Vishnyakov and A. Fedotov},
     title = {Python fuzzing for trustworthy machine learning frameworks},
     journal = {Zapiski Nauchnykh Seminarov POMI},
     pages = {38--50},
     year = {2023},
     volume = {530},
     language = {en},
     url = {http://geodesic.mathdoc.fr/item/ZNSL_2023_530_a3/}
}
TY  - JOUR
AU  - I. Yegorov
AU  - E. Kobrin
AU  - D. Parygina
AU  - A. Vishnyakov
AU  - A. Fedotov
TI  - Python fuzzing for trustworthy machine learning frameworks
JO  - Zapiski Nauchnykh Seminarov POMI
PY  - 2023
SP  - 38
EP  - 50
VL  - 530
UR  - http://geodesic.mathdoc.fr/item/ZNSL_2023_530_a3/
LA  - en
ID  - ZNSL_2023_530_a3
ER  - 
%0 Journal Article
%A I. Yegorov
%A E. Kobrin
%A D. Parygina
%A A. Vishnyakov
%A A. Fedotov
%T Python fuzzing for trustworthy machine learning frameworks
%J Zapiski Nauchnykh Seminarov POMI
%D 2023
%P 38-50
%V 530
%U http://geodesic.mathdoc.fr/item/ZNSL_2023_530_a3/
%G en
%F ZNSL_2023_530_a3
I. Yegorov; E. Kobrin; D. Parygina; A. Vishnyakov; A. Fedotov. Python fuzzing for trustworthy machine learning frameworks. Zapiski Nauchnykh Seminarov POMI, Investigations on applied mathematics and informatics. Part II–2, Tome 530 (2023), pp. 38-50. http://geodesic.mathdoc.fr/item/ZNSL_2023_530_a3/

[1] Atheris: A coverage-guided, native Python fuzzer, https://github.com/google/atheris

[2] Casr: Collect crash reports, triage, and estimate severity, https://github.com/ispras/casr

[3] Coverage.py: A tool for measuring code coverage of Python programs, https://coverage.readthedocs.io

[4] Fix of endless loop error in TensorFlow, https://github.com/tensorflow/tensorflow/pull/56455/files

[5] Fix out of bounds in hdf5/src/h5fint.c:2859, https://github.com/HDFGroup/hdf5/pull/2691

[6] FuzzedDataProvider, https://github.com/google/fuzzing/blob/master/docs/split-inputs.md#fuzzed-data-provider

[7] genhtml: Generate html view from lcov coverage data files, https://linux.die.net/man/1/genhtml

[8] Google sanitizers, https://github.com/google/sanitizers

[9] h5py: HDF5 for Python, https://github.com/h5py/h5py

[10] HDF5 project, https://github.com/HDFGroup/hdf5

[11] Hypothesis library, https://hypothesis.works/

[12] Null pointer dereference in third_party/flatbuffers/include/flatbuffers/vector.h:158:48, https://github.com/pytorch/pytorch/issues/95061

[13] OSS-Fuzz: Continuous fuzzing for open source software, https://github.com/google/oss-fuzz

[14] OSS-Sydr-Fuzz h5py project, https://github.com/ispras/oss-sydr-fuzz/tree/master/projects/h5py

[15] OSS-Sydr-Fuzz: Hybrid fuzzing for open source software, https://github.com/ispras/oss-sydr-fuzz

[16] OSS-Sydr-Fuzz PyTorch project, https://github.com/ispras/oss-sydr-fuzz/tree/master/projects/pytorch-py

[17] OSS-Sydr-Fuzz TensorFlow project, https://github.com/ispras/oss-sydr-fuzz/tree/master/projects/tensorflow-py

[18] Out of bounds access on read in hdf5/src/h5fint.c:2859:13, https://github.com/HDFGroup/hdf5/issues/2432

[19] PyTorch project, https://github.com/pytorch/pytorch

[20] Segmentation fault in flatbuffers when parsing malformed modules, https://github.com/pytorch/pytorch/pull/95221

[21] SEGV in flatbuffers/base.h:406:23, https://github.com/pytorch/pytorch/issues/95062

[22] Sydr-Fuzz trophies, https://github.com/ispras/oss-sydr-fuzz/blob/master/TROPHIES.md

[23] TensorFlow: An open source machine learning framework for everyone, https://github.com/tensorflow/tensorflow

[24] TensorFlow Keras module, https://www.tensorflow.org/api_docs/python/tf/keras?version=nightly

[25] TorchVision project, https://github.com/pytorch/vision

[26] Using instrumentation with Atheris and native extensions, https://github.com/google/atheris/blob/master/native_extension_fuzzing.md

[27] A. Fioraldi, D. Maier, H. Eißfeldt, and M. Heuse, “AFL++: Combining incremental steps of fuzzing research”, 14th USENIX Workshop on Offensive Technologies (WOOT 20), 2020

[28] H. V. Pham, T. Lutellier, W. Qi, and L. Tan, “CRADLE: Cross-backend validation to detect and localize bugs in deep learning libraries”, 2019 IEEE/ACM 41st International Conference on Software Engineering (ICSE), 2019, 1027–1038

[29] G. Savidov and A. Fedotov, “Casr-Cluster: Crash clustering for linux applications”, Ivannikov ISPRAS Open Conference (ISPRAS), IEEE, 2021, 47–51 | DOI

[30] K. Serebryany, D. Bruening, A. Potapenko, and D. Vyukov, “AddressSanitizer: A fast address sanity checker”, 2012 USENIX Annual Technical Conference (USENIX ATC 12), 2012, 309–318

[31] K. Serebryany, “Continuous fuzzing with libFuzzer and AddressSanitizer”, 2016 IEEE Cybersecurity Development (SecDev), IEEE, 2016, 157 | DOI

[32] K. Serebryany, OSS-Fuzz — Google's continuous fuzzing service for open source software, USENIX Association, 2017

[33] A. Vishnyakov, D. Kuts, V. Logunova, D. Parygina, E. Kobrin, G. Savidov, and A. Fedotov, “Sydr-Fuzz: Continuous hybrid fuzzing and dynamic analysis for security development lifecycle”, 2022 Ivannikov ISPRAS Open Conference (ISPRAS), IEEE, 2022

[34] Z. Wang, M. Yan, J. Chen, S. Liu, and D. Zhang, “Deep Learning Library Testing via Effective Model Generation”, ESEC/FSE 2020, ACM, 2020, 788–799

[35] A. Wei, Y. Deng, C. Yang, and L. Zhang, “Free Lunch for Testing: Fuzzing Deep-Learning Libraries from Open Source”, ICSE '22, ACM, 2022, 995–1007 | DOI