Learning abstract visual reasoning via task decomposition: A case study in Raven progressive matrices
International Journal of Applied Mathematics and Computer Science, Tome 34 (2024) no. 2, pp. 309-321.

Voir la notice de l'article provenant de la source Library of Science

Learning to perform abstract reasoning often requires decomposing the task in question into intermediate subgoals that are not specified upfront, but need to be autonomously devised by the learner. In Raven progressive matrices (RPMs), the task is to choose one of the available answers given a context, where both the context and answers are composite images featuring multiple objects in various spatial arrangements. As this high-level goal is the only guidance available, learning to solve RPMs is challenging. In this study, we propose a deep learning architecture based on the transformer blueprint which, rather than directly making the above choice, addresses the subgoal of predicting the visual properties of individual objects and their arrangements. The multidimensional predictions obtained in this way are then directly juxtaposed to choose the answer. We consider a few ways in which the model parses the visual input into tokens and several regimes of masking parts of the input in self-supervised training. In experimental assessment, the models not only outperform state-of-the-art methods but also provide interesting insights and partial explanations about the inference. The design of the method also makes it immune to biases that are known to be present in some RPM benchmarks.
Keywords: abstract visual reasoning, Raven progressive matrices, machine learning, problem decomposition
Mots-clés : abstrakcyjne rozumowanie wizualne, uczenie maszynowe, rozkład problemu
@article{IJAMCS_2024_34_2_a9,
     author = {Kwiatkowski, Jakub and Krawiec, Krzysztof},
     title = {Learning abstract visual reasoning via task decomposition: {A} case study in {Raven} progressive matrices},
     journal = {International Journal of Applied Mathematics and Computer Science},
     pages = {309--321},
     publisher = {mathdoc},
     volume = {34},
     number = {2},
     year = {2024},
     language = {en},
     url = {http://geodesic.mathdoc.fr/item/IJAMCS_2024_34_2_a9/}
}
TY  - JOUR
AU  - Kwiatkowski, Jakub
AU  - Krawiec, Krzysztof
TI  - Learning abstract visual reasoning via task decomposition: A case study in Raven progressive matrices
JO  - International Journal of Applied Mathematics and Computer Science
PY  - 2024
SP  - 309
EP  - 321
VL  - 34
IS  - 2
PB  - mathdoc
UR  - http://geodesic.mathdoc.fr/item/IJAMCS_2024_34_2_a9/
LA  - en
ID  - IJAMCS_2024_34_2_a9
ER  - 
%0 Journal Article
%A Kwiatkowski, Jakub
%A Krawiec, Krzysztof
%T Learning abstract visual reasoning via task decomposition: A case study in Raven progressive matrices
%J International Journal of Applied Mathematics and Computer Science
%D 2024
%P 309-321
%V 34
%N 2
%I mathdoc
%U http://geodesic.mathdoc.fr/item/IJAMCS_2024_34_2_a9/
%G en
%F IJAMCS_2024_34_2_a9
Kwiatkowski, Jakub; Krawiec, Krzysztof. Learning abstract visual reasoning via task decomposition: A case study in Raven progressive matrices. International Journal of Applied Mathematics and Computer Science, Tome 34 (2024) no. 2, pp. 309-321. http://geodesic.mathdoc.fr/item/IJAMCS_2024_34_2_a9/

[1] Barrett, D., Hill, F., Santoro, A., Morcos, A. and Lillicrap, T. (2018). Measuring abstract reasoning in neural networks, in J. Dy and A. Krause (Eds), Proceedings of the 35th International Conference on Machine Learning, Proceedings of Machine Learning Research, Vol. 80, PMLR, Cambridge, pp. 511-520.

[2] Benny, Y., Pekar, N. and Wolf, L. (2021). Scale-localized abstract reasoning, Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Nashville, USA, pp. 12557-12565.

[3] Bongard, M. (1970). Pattern Recognition, Spartan Books, Baltimore.

[4] Defays, D. (1995). Numbo: A study in cognition and recognition, https://www.researchgate.net/publication/262363566_Numbo_a_study_in_cognition_and_recognition.

[5] Deng, J., Dong, W., Socher, R., Li, L.-J., Li, K. and Fei-Fei, L. (2009). Imagenet: A large-scale hierarchical image database, 2009 IEEE Conference on Computer Vision and Pattern Recognition, Miami, USA, pp. 248-255.

[6] Dosovitskiy, A., Beyer, L., Kolesnikov, A., Weissenborn, D., Zhai, X., Unterthiner, T., Dehghani, M., Minderer, M., Heigold, G., Gelly, S., Uszkoreit, J. and Houlsby, N. (2020). An image is worth 16x16 words: Transformers for image recognition at scale, arXiv: 2010.11929.

[7] Hahne, L., Lüddecke, T., Wörgötter, F. and Kappel, D. (2019). Attention on abstract visual reasoning, CoRR: abs/1911.05990.

[8] Hofstadter, D.R. (1995). Fluid Concepts & Creative Analogies: Computer Models of the Fundamental Mechanisms of Thought, Basic Books, New York.

[9] Hu, S., Ma, Y., Liu, X., Wei, Y. and Bai, S. (2020). Hierarchical rule induction network for abstract visual reasoning, https://www.researchgate.net/publication/339324056_Hierarchical_Rule_Induction_Network_for_Abstract_Visual_Reasoning.

[10] Hu, S., Ma, Y., Liu, X., Wei, Y. and Bai, S. (2021). Stratified rule-aware network for abstract visual reasoning, Proceedings of the AAAI Conference on Artificial Intelligence, pp. 1567-1574, (virtual).

[11] Kim, Y., Shin, J., Yang, E. and Hwang, S.J. (2020). Few-shot visual reasoning with meta-analogical contrastive learning, in H. Larochelle et al. (Eds), Advances in Neural Information Processing Systems, Vol. 33, Curran Associates, Inc., Red Hook, pp. 16846-16856.

[12] Lei Ba, J., Kiros, J.R. and Hinton, G.E. (2016). Layer normalization, arXiv: 1607.06450.

[13] Luo, W., Li, Y., Urtasun, R. and Zemel, R. (2017). Understanding the effective receptive field in deep convolutional neural networks, arXiv: 1701.04128.

[14] Małkiński, M. and Mańdziuk, J. (2022a). Deep learning methods for abstract visual reasoning: A survey on Raven’s progressive matrices, arXiv: 2201.12382.

[15] Małkiński, M. and Mańdziuk, J. (2022b). Multi-label contrastive learning for abstract visual reasoning, IEEE Transactions on Neural Networks and Learning Systems 35(2): 1941-1953, DOI: 10.1109/TNNLS.2022.3185949.

[16] Raven, J.C. (1936). Mental Tests Used in Genetic, the Performance of Related Individuals on Tests Mainly Educative and Mainly Reproductive, MSc thesis, University of London, London.

[17] Spratley, S., Ehinger, K. and Miller, T. (2020). A closer look at generalisation in Raven, Computer Vision, ECCV 2020: 16th European Conference, Glasgow, UK, pp. 601-616, DOI: 10.1007/978-3-030-58583-9_36.

[18] Tan, M. and Le, Q. (2019). EfficientNet: Rethinking model scaling for convolutional neural networks, in K. Chaudhuri and R. Salakhutdinov (Eds), Proceedings of the 36th International Conference on Machine Learning, Proceedings of Machine Learning Research, Vol. 97, PMLR, Cambridge, pp. 6105-6114.

[19] Tan, M. and Le, Q.V. (2021). EfficientNetV2: Smaller models and faster training, in M. Meila and T. Zhang (Eds), Proceedings of the 38th International Conference on Machine Learning, ICML 2021, Proceedings of Machine Learning Research, Vol. 139, PMLR, Cambrige, pp. 10096-10106.

[20] Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, L. and Polosukhin, I. (2017). Attention is all you need, in I. Guyon et al. (Eds), Advances in Neural Information Processing Systems, Vol. 30, Curran Associates, Inc., Red Hook.

[21] Wu, Y., Dong, H., Grosse, R.B. and Ba, J. (2020). The scattering compositional learner: Discovering objects, attributes, relationships in analogical reasoning, CoRR: abs/2007.04212.

[22] Zhang, C., Gao, F., Jia, B., Zhu, Y. and Zhu, S.-C. (2019a). Raven: A dataset for relational and analogical visual reasoning, Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Long Beach, USA, pp. 5312-5322.

[23] Zhang, C., Jia, B., Gao, F., Zhu, Y., Lu, H. and Zhu, S.-C. (2019b). Learning perceptual inference by contrasting, in H. Wallach et al. (Eds), Advances in Neural Information Processing Systems, Vol. 32, Curran Associates, Inc., Red Hook.

[24] Zhuo, T. and Kankanhalli, M.S. (2021). Effective abstract reasoning with dual-contrast network, 9th International Conference on Learning Representations, ICLR 2021, (virtual).