Detection of human body parts on the image using the neural networks and the attention model
Journal of the Belarusian State University. Mathematics and Informatics, Tome 2 (2022), pp. 94-106.

Voir la notice de l'article provenant de la source Math-Net.Ru

Human body parts detection is a challenging task, which has a lot of applications. In this paper, we propose an algorithm to detect human body parts on images using the OpenPose neural network and the attention model. The novelty of the proposed algorithm is that it is based on a convolutional neural network that uses non-parametric representation to associate the body parts with people in an image in combination with the attention model that learns to focus on specific regions of the input image. The algorithm is part of the Smart Cropping system developed by the authors with the aim to cut necessary pieces of clothing in images and prepare e-commerce catalogues.
Keywords: Human body parts detection; attention model; convolutional neural network; Smart Cropping.
@article{BGUMI_2022_2_a8,
     author = {V. V. Sorokina and S. V. Ablameyko},
     title = {Detection of human body parts on the image using the neural networks and the attention model},
     journal = {Journal of the Belarusian State University. Mathematics and Informatics},
     pages = {94--106},
     publisher = {mathdoc},
     volume = {2},
     year = {2022},
     language = {ru},
     url = {http://geodesic.mathdoc.fr/item/BGUMI_2022_2_a8/}
}
TY  - JOUR
AU  - V. V. Sorokina
AU  - S. V. Ablameyko
TI  - Detection of human body parts on the image using the neural networks and the attention model
JO  - Journal of the Belarusian State University. Mathematics and Informatics
PY  - 2022
SP  - 94
EP  - 106
VL  - 2
PB  - mathdoc
UR  - http://geodesic.mathdoc.fr/item/BGUMI_2022_2_a8/
LA  - ru
ID  - BGUMI_2022_2_a8
ER  - 
%0 Journal Article
%A V. V. Sorokina
%A S. V. Ablameyko
%T Detection of human body parts on the image using the neural networks and the attention model
%J Journal of the Belarusian State University. Mathematics and Informatics
%D 2022
%P 94-106
%V 2
%I mathdoc
%U http://geodesic.mathdoc.fr/item/BGUMI_2022_2_a8/
%G ru
%F BGUMI_2022_2_a8
V. V. Sorokina; S. V. Ablameyko. Detection of human body parts on the image using the neural networks and the attention model. Journal of the Belarusian State University. Mathematics and Informatics, Tome 2 (2022), pp. 94-106. http://geodesic.mathdoc.fr/item/BGUMI_2022_2_a8/

[1] Chen. Yucheng, Tian. Yingli, H. e. Mingyi, “Monocular human pose estimation: a survey of deep learning-based methods”, Computer Vision and Image Understanding, 192 (2020), 102897 | DOI

[2] E-J. Rolley-Parnell, D. Kanoulas, A. Laurenzi, B. Delhaisse, L. Rozo, D. G. Caldwell, “Bi-manual articulated robot teleoperation using an external RGB-D range sensor”, 15th International conference on control, automation, robotics and vision (Singapore), Institute of Electrical and Electronics Engineers, 2018, 298–304 | DOI

[3] H. Murdock, The ultimate eCommerce product image guide for 2021, Threekit Inc, 2020 https://www.threekit.com/blog/ecommerce-product-image-guide-2020

[4] S. V. Ablameiko, V. V. Krasnoproshin, V. A. Obraztsov, “Modeli i tekhnologii raspoznavaniya obrazov s prilozheniem v intellektualnom analize dannykh”, Vestnik BGU. Fizika. Matematika. Informatika, 3 (2011), 62–72 | MR | Zbl

[5] Liu. Zhao, Zhu. Jianke, B. u. Jiajun, Chen. Chun, “A survey of human pose estimation: the body parts parsing based methods”, Journal of Visual Communication and Image Representation, 32 (2015), 10–19 | DOI

[6] D. C. Luvizon, D. Picard, H. Tabia, 2018 IEEE/CVF conference on computer vision and pattern recognition (Salt Lake City, USA), Conference Publishing Services, IEEE Computer Society, Los Alamitos, 2018, 5137–5146 | DOI

[7] E. Insafutdinov, “DeeperCut: a deeper, stronger, and faster multi-person pose estimation model”, Computer vision – ECCV 2016. 14th European conference (Amsterdam, The Netherlands), Lecture Notes in Computer Science, 9910, Springer, Cham, 2016, 34–50 | DOI

[8] Fang. Hao-Shu, Xie. Shuqin, Tai. Yu-Wing, L. u. Cewu, “RMPE: regional multi-person pose estimation”, 2017 IEEE International conference on computer vision (Venice, Italy), Institute of Electrical and Electronics Engineers, 2017, 2353–2362 | DOI

[9] Chu. Xiao, Yang. Wei, Ouyang. Wanli, M. a. Cheng, A. L. Yuille, Wang. Xiaogang, “Multi-context attention for human pose estimation”, 2017 IEEE conference on computer vision and pattern recognition (Honolulu, USA), Institute of Electrical and Electronics Engineers, 2017, 5669–5678 | DOI

[10] K. He, G. Gkioxari, P. Dollár, R. Girshick, “Mask R-CNN”, 2017 IEEE International conference on computer vision (Venice, Italy), Institute of Electrical and Electronics Engineers, 2017, 2980–2988 | DOI

[11] A. Toshev, C. Szegedy, “DeepPose: human pose estimation via Deep Neural Networks”, 2014 IEEE conference on computer vision and pattern recognition (Columbus, USA), Institute of Electrical and Electronics Engineers, 2017, 1653–1660 | DOI

[12] J. Tompson, A. Jain, Y. LeCun, C. Bregler, “Join training of a convolutional network and a graphical model for human pose estimation”, 28th annual conference on Neural Information Processing Systems (Montreal, Canada), Advances in Neural Information Processing Systems, 27, Curran Associates Inc, Red Hook, 2015, 1799–1807

[13] J. Tompson, R. Goroshin, A. Jain, Y. LeCun, C. Bregler, “Efficient object localization using convolutional networks”, 2015 IEEE conference on computer vision and pattern recognition (Boston, USA), Institute of Electrical and Electronics Engineers, 2015, 648–656 | DOI | MR

[14] Y. Yang, D. Ramanan, “Articulated human detection with flexible mixtures of parts”, IEEE Transactions on Pattern Analysis and Machine Intelligence, 35:12 (2013), 2878–2890 | DOI

[15] Z. Cao, T. Simon, S. Wei, Y. Sheikh, “Realtime multi-person 2D pose estimation using part affinity fields”, 2017 IEEE conference on computer vision and pattern recognition (Honolulu, USA), Institute of Electrical and Electronics Engineers, 2017, 1302–1310 | DOI

[16] D. Bahdanau, K. H. Cho, Y. Bengio, Neural machine translation by jointly learning to align and translate, 2016, 15 pp., arXiv: 1409.0473

[17] K. Simonyan, A. Zisserman, Very deep convolutional networks for large-scale image recognition, 2015, 14 pp., arXiv: 1409.1556

[18] F. Wang, DMJ. Tax, Survey on the attention based RNN model and its applications in computer vision, 2016, 42 pp., arXiv: 1601.06823

[19] H. e. Kaiming, Zhang. Xiangyu, Ren. Shaoqing, Sun. Jian, “Deep residual learning for image recognition”, 2016 IEEE conference on computer vision and pattern recognition (Las Vegas, USA), Conference Publishing Services, IEEE Computer Society, Los Alamitos, 2016, 770–778 | DOI

[20] T. Sim, S. Baker, M. Bsat, “The CMU Pose, Illumination, and Expression (PIE) database”, Proceedings of Fifth IEEE International conference on automatic face gesture recognition (Washington, USA), Institute of Electrical and Electronics Engineers, 2002, 53–58 | DOI