On simplifying expressions with mixed Boolean-arithmetic
Modelirovanie i analiz informacionnyh sistem, Tome 30 (2023) no. 2, pp. 140-159.

Voir la notice de l'article provenant de la source Math-Net.Ru

Mixed Boolean-Arithmetic expressions (MBA-expressions) with $t$ integer $n$-bit variables are often used for program obfuscations. Obfuscation consists of replacing short expressions with longer equivalent expressions that seem to take the analyst more time to explore. The paper shows that to simplify linear MBA-expressions (reduce the number of terms), a technique similar to the technique of decoding linear codes by information sets can be applied. Based on this technique, algorithms for simplifying linear MBA-expressions are constructed: an algorithm for finding an expression of minimum length and an algorithm for reducing the length of an expression. Based on the length reduction algorithm, an algorithm is constructed that allows to estimate the resistance of an MBA-expression to simplification. We experimentally estimate the dependence of the average number of terms in a linear MBA-expression returned by simplification algorithms on $n$, the number of decoding iterations, and the power of the set of Boolean functions, by which a linear combination with a minimum number of nonzero coefficients is sought. The results of the experiments for all considered $t$ and $n$ show that if before obfuscation the linear MBA-expression contained $r=1,2,3$ terms, then the developed simplification algorithms with a probability close to one allow using the obfuscated version of this expression find an equivalent one with no more than $r$ terms. This is the main difference between the information set decoding technique and the well-known techniques for simplifying linear MBA-expressions, where the goal is to reduce the number of terms to no more than $2^t$. We also found that for randomly generated linear MBA-expressions with increasing $n$, the average number of terms in the returned expression tends to $2^t$ and does not differ from the average number of terms in the linear expression returned by known simplification algorithms. The results obtained, in particular, make it possible to determine $t$ and $n$ for which the number of terms in the simplified linear MBA-expression on average will not be less than the given one.
Keywords: code obfuscation, decoding by information sets.
Mots-clés : MBA-expressions, simplification of MBA-expressions
@article{MAIS_2023_30_2_a2,
     author = {Y. V. Kosolapov},
     title = {On simplifying expressions with mixed {Boolean-arithmetic}},
     journal = {Modelirovanie i analiz informacionnyh sistem},
     pages = {140--159},
     publisher = {mathdoc},
     volume = {30},
     number = {2},
     year = {2023},
     language = {ru},
     url = {http://geodesic.mathdoc.fr/item/MAIS_2023_30_2_a2/}
}
TY  - JOUR
AU  - Y. V. Kosolapov
TI  - On simplifying expressions with mixed Boolean-arithmetic
JO  - Modelirovanie i analiz informacionnyh sistem
PY  - 2023
SP  - 140
EP  - 159
VL  - 30
IS  - 2
PB  - mathdoc
UR  - http://geodesic.mathdoc.fr/item/MAIS_2023_30_2_a2/
LA  - ru
ID  - MAIS_2023_30_2_a2
ER  - 
%0 Journal Article
%A Y. V. Kosolapov
%T On simplifying expressions with mixed Boolean-arithmetic
%J Modelirovanie i analiz informacionnyh sistem
%D 2023
%P 140-159
%V 30
%N 2
%I mathdoc
%U http://geodesic.mathdoc.fr/item/MAIS_2023_30_2_a2/
%G ru
%F MAIS_2023_30_2_a2
Y. V. Kosolapov. On simplifying expressions with mixed Boolean-arithmetic. Modelirovanie i analiz informacionnyh sistem, Tome 30 (2023) no. 2, pp. 140-159. http://geodesic.mathdoc.fr/item/MAIS_2023_30_2_a2/

[1] B. Barak et al., “On the (im)possibility of obfuscating programs”, Advances in Cryptology - CRYPTO 2001, Lecture Notes in Computer Science, 2139, Springer, 2001, 1–18 | DOI | MR | Zbl

[2] Y. Zhou, A. Main, Y. X. Gu, and H. Johnson, “Information hiding in software with mixed boolean-arithmetic transforms”, in Information Security Applications, WISA 2007, Lecture Notes in Computer Science, 4867, Springer, 2007, 61–75 | DOI

[3] S. Gulwani, O. Polozov, and R. Singh, “Program synthesis”, Foundations and Trends in Programming Languages, 4:1-2 (2017), 1–119 | DOI

[4] B. Reichenwallner and P. Meerwald-Stadler, “Efficient Deobfuscation of Linear Mixed Boolean-Arithmetic Expressions”, Proceedings of the 2022 ACM Workshop on Research on offensive and defensive techniques in the context of Man At The End (MATE) attacks, 2022, 19–28

[5] L. Zobernig, Mathematical Aspects of Program Obfuscation, PhD thesis, The University of Auckland, 2020

[6] P. Garba and M. Favaro, “Saturn-software deobfuscation framework based on LLVM”, Proceedings of the 3rd ACM Workshop on Software Protection, 2019, 27–38 | DOI

[7] N. Eyrolles, Obfuscation with Mixed Boolean-Arithmetic Expressions: reconstruction, analysis and simplification tools, PhD thesis, Université Paris-Saclay, 2017

[8] D. Xu et al., “Boosting SMT solver performance on mixed-bitwise-arithmetic expressions”, Proceedings of the 42nd ACM SIGPLAN International Conference on Programming Language Design and Implementation, 2021, 651–664 | Zbl

[9] B. Liu, J. Shen, J. Ming, Q. Zheng, J. Li, and D. Xu, “MBA-Blast: Unveiling and Simplifying Mixed Boolean-Arithmetic Obfuscation”, Proceedings of the 30th USENIX Security Symposium, 2021, 1701–1718

[10] E. Berlekamp, R. McEliece, and H. Van Tilborg, “On the inherent intractability of certain coding problems (corresp.)”, IEEE Transactions on Information Theory, 24:3 (1978), 384–386 | DOI | MR | Zbl

[11] E. Prange, “The use of information sets in decoding cyclic codes”, IRE Transactions on Information Theory, 8:5 (1962), 5–9 | DOI | MR

[12] C. Peters, “Information-set decoding for linear codes over $mathbbF_q$”, Post-Quantum Cryptography, PQCrypto 2010, Lecture Notes in Computer Science, 6061, Springer, 2010, 81–94 | DOI | MR | Zbl

[13] V. Weger, N. Gassner, and J. Rosenthal, A survey on code-based cryptography, 2022