Vector graphics generation with LLMs: approaches and models
Zapiski Nauchnykh Seminarov POMI, Investigations on applied mathematics and informatics. Part II–2, Tome 530 (2023), pp. 24-37
Cet article a éte moissonné depuis la source Math-Net.Ru

Voir la notice du chapitre de livre

The task of generating vector graphics with AI is under-researched. Recently, large language models (LLMs) have been successfully applied to many downstream tasks. For example, modern LLMs achieve remarkable quality in code generation tasks and are open for public access. This study compares approaches to vector graphics generation with LLMs, namely ChatGPT (GPT-3.5) and GPT-4. GPT-4 has noticeable improvements compared to ChatGPT. Both models easily generate geometric primitives but struggle even with simple objects. The results produced by GPT-4 visually resemble the prompts but are inaccurate. GPT-4 is able to correct the output according to instructions. Additionally, it is challenging for both models to recognize an object from an SVG image. Both models recognize only primitive objects correctly.
@article{ZNSL_2023_530_a2,
     author = {B. Timofeenko and V. Efimova and A. Filchenkov},
     title = {Vector graphics generation with {LLMs:} approaches and models},
     journal = {Zapiski Nauchnykh Seminarov POMI},
     pages = {24--37},
     year = {2023},
     volume = {530},
     language = {en},
     url = {http://geodesic.mathdoc.fr/item/ZNSL_2023_530_a2/}
}
TY  - JOUR
AU  - B. Timofeenko
AU  - V. Efimova
AU  - A. Filchenkov
TI  - Vector graphics generation with LLMs: approaches and models
JO  - Zapiski Nauchnykh Seminarov POMI
PY  - 2023
SP  - 24
EP  - 37
VL  - 530
UR  - http://geodesic.mathdoc.fr/item/ZNSL_2023_530_a2/
LA  - en
ID  - ZNSL_2023_530_a2
ER  - 
%0 Journal Article
%A B. Timofeenko
%A V. Efimova
%A A. Filchenkov
%T Vector graphics generation with LLMs: approaches and models
%J Zapiski Nauchnykh Seminarov POMI
%D 2023
%P 24-37
%V 530
%U http://geodesic.mathdoc.fr/item/ZNSL_2023_530_a2/
%G en
%F ZNSL_2023_530_a2
B. Timofeenko; V. Efimova; A. Filchenkov. Vector graphics generation with LLMs: approaches and models. Zapiski Nauchnykh Seminarov POMI, Investigations on applied mathematics and informatics. Part II–2, Tome 530 (2023), pp. 24-37. http://geodesic.mathdoc.fr/item/ZNSL_2023_530_a2/

[1] O. Akiyama, ASCII art synthesis with convolutional networks, 2017

[2] T. B. Brown, B. Mann, N. Ryder, M. Subbiah, J. Kaplan, P. Dhariwal, A. Neelakantan, P. Shyam, G. Sastry, A. Askell, S. Agarwal, A. Herbert-Voss, G. Krueger, T. Henighan, R. Child, A. Ramesh, D. M. Ziegler, J. Wu, C. Winter, C. Hesse, M. Chen, E. Sigler, M. Litwin, S. Gray, B. Chess, J. Clark, C. Berner, S. McCandlish, A. Radford, I. Sutskever, and D. Amodei, Language models are few-shot learners, 2020

[3] A. Carlier, M. Danelljan, A. Alahi, and R. Timofte, DeepSVG: A hierarchical generative network for vector graphics animation, 2020

[4] M. Chen, J. Tworek, H. Jun, Q. Yuan, H. P. de Oliveira Pinto, J. Kaplan, H. Edwards, Y. Burda, N. Joseph, G. Brockman, A. Ray, R. Puri, G. Krueger, M. Petrov, H. Khlaaf, G. Sastry, P. Mishkin, B. Chan, S. Gray, N. Ryder, M. Pavlov, A. Power, L. Kaiser, M. Bavarian, C. Winter, P. Tillet, F. P. Such, D. Cummings, M. Plappert, F. Chantzis, E. Barnes, A. Herbert-Voss, W. H. Guss, A. Nichol, A. Paino, N. Tezak, J. Tang, I. Babuschkin, S. Balaji, S. Jain, W. Saunders, C. Hesse, A. N. Carr, J. Leike, J. Achiam, V. Misra, E. Morikawa, A. Radford, M. Knight, M. Brundage, M. Murati, K. Mayer, P. Welinder, B. McGrew, D. Amodei, S. McCandlish, I. Sutskever, and W. Zaremba, Evaluating large language models trained on code, 2021 | Zbl

[5] Y. Deng, A. Kanervisto, J. Ling, and A. M. Rush, Image-to-markup generation with coarse-to-fine attention, 2017

[6] J. Devlin, M.-W. Chang, K. Lee, and K. Toutanova, BERT: Pre-training of deep bidirectional transformers for language understanding, 2019

[7] K. Frans, L. B. Soros, and O. Witkowski, ClipDraw: Exploring text-to-drawing synthesis through language-image encoders, 2021

[8] D. Ha and D. Eck, A neural representation of sketch drawings, 2017

[9] Tricking ChatGPT: Do anything now prompt injection, HungryMinded, (Accessed: 2023-03-03) https://medium.com/seeds-for-the-future/tricking-chatgpt-do-anything-now-prompt-injection-a0f65c307f6b

[10] A. Jain, A. Xie, and P. Abbeel, Vectorfusion: Text-to-svg by abstracting pixel-based diffusion models, 2022, arXiv: 2211.11319

[11] Reverse prompt engineering for fun and (no) profit, Latent Space, (Accessed: 2023-03-03) https://www.latent.space/p/reverse-prompt-eng

[12] T.-M. Li, M. Lukáč, G. Michaël, and J. Ragan-Kelley, “Differentiable vector graphics rasterization for editing and learning”, ACM Trans. Graph., 39:6, Proc. SIGGRAPH Asia (2020), 193:1–193:15

[13] X. Ma, Y. Zhou, X. Xu, B. Sun, V. Filev, N. Orlov, Y. Fu, and H. Shi, “Towards layer-wise image vectorization”, Proceedings of the IEEE conference on computer vision and pattern recognition, 2022

[14] GPT-4 technical report, OpenAI, 2023

[15] Generating SVG images with ChatGPT, Praeclarum, (Accessed: 2023-04-04) https://praeclarum.org/2023/04/03/chatsvg.html

[16] A. Radford, J. W. Kim, C. Hallacy, A. Ramesh, G. Goh, S. Agarwal, G. Sastry, A. Askell, P. Mishkin, J. Clark, G. Krueger, and I. Sutskever, Learning transferable visual models from natural language supervision, 2021 | Zbl

[17] D. Shiryaev, Drawing mona lisa with ChatGPT, (Accessed: 2023-04-04) https://neural.love/blog/chatgpt-svg

[18] The W3C SVG specification (version 1.1), (Accessed: 2023-01-01) http://www.w3.org/TR/SVG11/

[19] A. Vaswani, N. Shazeer, N. Parmar, J. Uszkoreit, L. Jones, A. N. Gomez, L. Kaiser, and I. Polosukhin, Attention is all you need, 2017 | Zbl

[20] S. Zhang, Z. Chen, Y. Shen, M. Ding, J. B. Tenenbaum, and C. Gan, “Planning with large language models for code generation”, The Eleventh International Conference on Learning Representations, 2023