Voir la notice du chapitre de livre
@article{ZNSL_2023_529_a10,
author = {V. Firsanova},
title = {What do text-to-image models know about the languages of the world?},
journal = {Zapiski Nauchnykh Seminarov POMI},
pages = {157--175},
year = {2023},
volume = {529},
language = {en},
url = {http://geodesic.mathdoc.fr/item/ZNSL_2023_529_a10/}
}
V. Firsanova. What do text-to-image models know about the languages of the world?. Zapiski Nauchnykh Seminarov POMI, Investigations on applied mathematics and informatics. Part II–1, Tome 529 (2023), pp. 157-175. http://geodesic.mathdoc.fr/item/ZNSL_2023_529_a10/
[1] Twitter, , 2023 (Last accessed 12 Mar 2023) https://twitter.com/BarneyFlames/status/1531736708903051265
[2] S. R. Borgwaldt, T. Joyce, Typology of writing systems, Benjamins Current Topics, 51, John Benjamins Publishing, 2013
[3] T. Brown, B. Mann, N. Ryder, M. Subbiah, J. D. Kaplan, P. Dhariwal, A. Neelakantan, P. Shyam, G. Sastry, A. Askell, et al., “Language models are few-shot learners”, Advances in Neural Information Processing Systems, 33 (2020), 1877–1901
[4] K. Clark, U. Khandelwal, O. Levy, C. D. Manning, What does bert look at? an analysis of bert's attention, 2019, arXiv: 1906.04341
[5] Craiyon, , 2023 (Last accessed 15 Mar 2023) https://www.craiyon.com/
[6] Statistics of common crawl monthly archives, Common Crawl, , 2023 (Last accessed 14 Mar 2023) https://commoncrawl.github.io/cc-crawl-statistics/plots/languages
[7] K. Crowson, Vqgan+clip tutorial, , 2023 (Last accessed 17 Mar 2023) https://colab.research.google.com/github/justinjohn0306/VQGAN-CLIP/blob/main/VQGAN
[8] K. Crowson, S. Biderman, D. Kornis, D. Stander, E. Hallahan, L. Castricato, E. Raff, “Vqgan-clip: Open domain image generation and editing with natural language guidance”, European Conference on Computer Vision, Springer, 2022, 88–105
[9] P. T. Daniels, W. Bright, The world's writing systems, Oxford University Press, 1996
[10] G. Daras, A. G. Dimakis, Discovering the hidden vocabulary of dalle-2, 2022, arXiv: 2206.00169
[11] J. Devlin, M.-W. Chang, K. Lee, K. Toutanova, Bert: Pre-training of deep bidirectional transformers for language understanding, 2018, arXiv: 1810.04805
[12] P. Esser, R. Rombach, B. Ommer, “Taming transformers for high-resolution image synthesis”, Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, 2021, 12873–12883
[13] Languages of the world, Ethnologue, , 2023 (Last accessed 14 Mar 2023) https://www.ethnologue.com/
[14] A. I. Forever, Kandinsky 2.0, , 2023 (Last accessed 12 Mar 2023) https://github.com/ai-forever/Kandinsky-2.0
[15] I. Gelb, A study of writing, University of Chicago Press, Chicago, 1952
[16] H. He, X. Chen, C. Wang, J. Liu, B. Du, D. Tao, Yu Qiao, Diff-font: Diffusion model for robust one-shot font generation, 2022, arXiv: 2212.05895
[17] J. Hewitt, C. D. Manning, “A structural probe for finding syntax in word representations”, Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, v. 1, Long and Short Papers, 2019, 4129–4138
[18] S. Hochreiter, J. Schmidhuber, “Long short-term memory”, Neural Computation, 9:8 (1997), 1735–1780
[19] Diffusers, HuggingFace, , 2023 (Last accessed 17 Mar 2023) https://github.com/huggingface/diffusers
[20] V, Liu, L. B. Chilton, “Design guidelines for prompt engineering text-to-image generative models”, Proceedings of the 2022 CHI Conference on Human Factors in Computing Systems, 2022, 1–23
[21] P. Lyu, X. Bai, C. Yao, Z. Zhu, T. Huang, W. Liu, “Auto-encoder guided gan for chinese calligraphy synthesis”, 14th 2017 IAPR International Conference on Document Analysis and Recognition (ICDAR), v. 1, IEEE, 2017, 1095–1100
[22] OpenAI, Chatgpt, , 2023 (Last accessed 12 Mar 2023) https://openai.com/blog/chatgpt
[23] Dall-e, OpenAI, , 2023 (Last accessed 15 Mar 2023) https://labs.openai.com/
[24] L. Ouyang, J. Wu, X. Jiang, D. Almeida, C. Wainwright, P. Mishkin, C. Zhang, S. Agarwal, K. Slama, A. Ray, et al., “Training language models to follow instructions with human feedback”, Advances in Neural Information Processing Systems, 35 (2022), 27730–27744
[25] V. A. Plungyan, “Modern linguistic typology”, Herald of the Russian Academy of Sciences, 81 (2011), 101–113
[26] A. Radford, J. W. Kim, C. Hallacy, A. Ramesh, G. Goh, S. Agarwal, G. Sastry, A. Askell, P. Mishkin, J. Clark, et al., “Learning transferable visual models from natural language supervision”, International conference on machine learning, PMLR 2021, 8748–8763
[27] A. Radford, J. Wu, R. Child, D. Luan, D. Amodei, I. Sutskever, et al., “Language models are unsupervised multitask learners”, OpenAI Blog, 1:8 (2019), 9
[28] A. Ramesh, P. Dhariwal, A. Nichol, C. Chu, M. Chen, Hierarchical text-conditional image generation with clip latents, 2022, arXiv: 2204.06125
[29] R. Rombach, A. Blattmann, D. Lorenz, P. Esser, B. Ommer, “High-resolution image synthesis with latent diffusion models”, Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, 2022, 10684–10695
[30] C. Saharia, W. Chan, S. Saxena, L. Li, J. Whang, E. L. Denton, K. Ghasemipour, R. G. Lopes, B. K. Ayan, T. Salimans, et al., “Photorealistic text-to-image diffusion models with deep language understanding”, Advances in Neural Information Processing Systems, 35 (2022), 36479–36494
[31] V. Sanh, A. Webson, C. Raffel, S. H. Bach, L. Sutawika, Z. Alyafeai, A. Chaffin, A. Stiegler, T. Le Scao, A. Raja, et al., Multitask prompted training enables zero-shot task generalization, 2021, arXiv: 2110.08207
[32] R. Tang, A. Pandey, Z. Jiang, G. Yang, K. Kumar, J. Lin, F. Ture, What the daam: Interpreting stable diffusion using cross attention, 2022, arXiv: 2210.04885
[33] Usage statistics of content languages for websites, W3Techs, , 2023 (Last accessed 14 Mar 2023) https://w3techs.com/technologies/overview/content_language
[34] H. Zhang, J. Y. Koh, J. Baldridge, H. Lee, Y. Yang, “Cross-modal contrastive learning for text-to-image generation”, Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, 2021, 833–842