Classification of high resolution satellite images using improved U-Net

Wang, Yong; Zhang, Dongfang; Dai, Guangming

Wang, Yong ; Zhang, Dongfang ; Dai, Guangming

International Journal of Applied Mathematics and Computer Science, Tome 30 (2020) no. 3, pp. 399-413

Voir la notice de l'article provenant de la source Library of Science

Résumé

Satellite image classification is essential for many socio-economic and environmental applications of geographic information systems, including urban and regional planning, conservation and management of natural resources, etc. In this paper, we propose a deep learning architecture to perform the pixel-level understanding of high spatial resolution satellite images and apply it to image classification tasks. Specifically, we augment the spatial pyramid pooling module with image-level features encoding the global context, and integrate it into the U-Net structure. The proposed model solves the problem consisting in the fact that U-Net tends to lose object boundaries after multiple pooling operations. In our experiments, two public datasets are used to assess the performance of the proposed model. Comparison with the results from the published algorithms demonstrates the effectiveness of our approach.

Keywords: satellite image classification, deep learning, U-net, spatial pyramid pooling
Mots-clés : zdjęcia satelitarne, uczenie głębokie, U-net

@article{IJAMCS_2020_30_3_a0,
     author = {Wang, Yong and Zhang, Dongfang and Dai, Guangming},
     title = {Classification of high resolution satellite images using improved {U-Net}},
     journal = {International Journal of Applied Mathematics and Computer Science},
     pages = {399--413},
     publisher = {mathdoc},
     volume = {30},
     number = {3},
     year = {2020},
     language = {en},
     url = {http://geodesic.mathdoc.fr/item/IJAMCS_2020_30_3_a0/}
}

TY  - JOUR
AU  - Wang, Yong
AU  - Zhang, Dongfang
AU  - Dai, Guangming
TI  - Classification of high resolution satellite images using improved U-Net
JO  - International Journal of Applied Mathematics and Computer Science
PY  - 2020
SP  - 399
EP  - 413
VL  - 30
IS  - 3
PB  - mathdoc
UR  - http://geodesic.mathdoc.fr/item/IJAMCS_2020_30_3_a0/
LA  - en
ID  - IJAMCS_2020_30_3_a0
ER  -

%0 Journal Article
%A Wang, Yong
%A Zhang, Dongfang
%A Dai, Guangming
%T Classification of high resolution satellite images using improved U-Net
%J International Journal of Applied Mathematics and Computer Science
%D 2020
%P 399-413
%V 30
%N 3
%I mathdoc
%U http://geodesic.mathdoc.fr/item/IJAMCS_2020_30_3_a0/
%G en
%F IJAMCS_2020_30_3_a0

Wang, Yong; Zhang, Dongfang; Dai, Guangming. Classification of high resolution satellite images using improved U-Net. International Journal of Applied Mathematics and Computer Science, Tome 30 (2020) no. 3, pp. 399-413. http://geodesic.mathdoc.fr/item/IJAMCS_2020_30_3_a0/

Bibliographie
Cité par

[1] Badrinarayanan, V., Kendall, A. and Cipolla, R. (2017). Segnet: A deep convolutional encoder-decoder architecture for image segmentation, IEEE Transactions on Geoscience Remote Sensing 39(12): 2481–2495.

[2] Bei, Z., Bo, H. and Zhong, Y. (2017). Transfer learning with fully pretrained deep convolution networks for land-use classification, IEEE Geoscience Remote Sensing Letters PP(99): 1–5.

[3] Caesar, H., Uijlings, J. and Ferrari, V. (2018). Coco-stuff: Thing and stuff classes in context, Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Salt Lake City, UT, USA, pp. 1209–1218.

[4] Carreira, J., Rui, C., Batista, J. and Sminchisescu, C. (2012). Semantic segmentation with second-order pooling, European Conference on Computer Vision, Firenze, Italy, pp. 430–443.

[5] Carreira, J. and Sminchisescu, C. (2011). CPMC: Automatic object segmentation using constrained parametric min-cuts, IEEE Transactions on Pattern Analysis and Machine Intelligence 34(7): 1312–1328.

[6] Castelluccio, M., Poggi, G., Sansone, C. and Verdoliva, L. (2015). Land use classification in remote sensing images by convolutional neural networks, Acta Ecologica Sinica 28(2): 627–635.

[7] Chandra, S. and Kokkinos, I. (2016). Fast, exact and multi-scale inference for semantic image segmentation with deep Gaussian CRFS, European Conference on Computer Vision, Amsterdam, The Netherlands, pp. 402–418.

[8] Chandra, S., Usunier, N. and Kokkinos, I. (2017). Dense and low-rank Gaussian CRFs using deep embeddings, IEEE International Conference on Computer Vision, Honolulu, HI, USA, pp. 5103–5112.

[9] Chao, P., Zhang, X., Gang, Y., Luo, G. and Jian, S. (2017). Large kernel matters—Improve semantic segmentation by global convolutional network, IEEE Conference on Computer Vision and Pattern Recognition, Venice, Italy, pp. 4353–4361.

[10] Chen, L.C., Papandreou, G., Kokkinos, I., Murphy, K. and Yuille, A.L. (2017a). DeepLab: Semantic image segmentation with deep convolutional nets, atrous convolution, and fully connected CRFs, IEEE Transactions on Pattern Analysis Machine Intelligence 40(4): 834–848.

[11] Chen, L.-C., Papandreou, G., Schroff, F. and Adam, H. (2017b). Rethinking atrous convolution for semantic image segmentation, arXiv 1706.05587.

[12] Chen, L.C., Zhu, Y., Papandreou, G., Schroff, F. and Adam, H. (2018). Encoder-decoder with atrous separable convolution for semantic image segmentation, Proceedings of the European Conference on Computer Vision (ECCV), Munich, Germany, pp. 801–818.

[13] Cleve, C., Kelly, M., Kearns, F.R. and Moritz, M. (2008). Classification of the wildland–urban interface: A comparison of pixel- and object-based classifications using high-resolution aerial photography, Computers Environment Urban Systems 32(4): 317–326.

[14] Cordts, M., Omran, M., Ramos, S., Rehfeld, T., Enzweiler, M., Benenson, R., Franke, U., Roth, S. and Schiele, B. (2016). The cityscapes dataset for semantic urban scene understanding, Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Las Vegas, NV, USA, pp. 3213–3223.

[15] Fu, J., Jing, L., Wang, Y. and Lu, H. (2019). Stacked deconvolutional network for semantic segmentation, IEEE Transactions on Image Processing PP(99): 1–1.

[16] Fulkerson, B., Vedaldi, A. and Soatto, S. (2009). Class segmentation and object localization with superpixel neighborhoods, IEEE International Conference on Computer Vision, Kyoto, Japan, pp. 670–677.

[17] Gang, C., Weng, Q., Hay, G.J. and He, Y. (2018). Geographic object-based image analysis (geobia): Emerging trends and future opportunities, GIScience Remote Sensing 55(2): 159–182.

[18] Gibbons, J. and Chakraborti, S. (2011). The Wilcoxon rank-sum test and confidence interval, Nonparametric Statistical Inference 59(4): 290–293.

[19] Gong, C., Han, J., Lei, G., Liu, Z., Bu, S. and Ren, J. (2015). Effective and efficient midlevel visual elements-oriented land-use classification using VHR remote sensing images, IEEE Transactions on Geoscience Remote Sensing 53(8): 4238–4249.

[20] Grauman, K. and Darrell, T. (2005). Pyramid match kernels: Discriminative classification with sets of image features, 10th IEEE International Conference on Computer Vision, Beijing, China, pp. 1458–1465.

[21] He, K., Zhang, X., Ren, S. and Sun, J. (2015). Spatial pyramid pooling in deep convolutional networks for visual recognition, IEEE Transactions on Pattern Analysis Machine Intelligence 37(9): 1904–16.

[22] Kim, J.H., Lee, H., Hong, S.J., Kim, S., Park, J., Hwang, J.Y. and Choi, J.P. (2018). Objects segmentation from high-resolution aerial images using U-Net with pyramid pooling layers, IEEE Geoscience and Remote Sensing Letters 16(1): 115-119.

[23] Lazebnik, S., Schmid, C. and Ponce, J. (2006). Beyond bags of features: Spatial pyramid matching for recognizing natural scene categories, IEEE Computer Society Conference on Computer Vision and Pattern Recognition, New York, NY, USA, pp. 2169–2178.

[24] Lin, G., Milan, A., Shen, C. and Reid, I. (2017). Refinenet: Multi-path refinement networks with identity mappings for high-resolution semantic segmentation, IEEE International Conference on Computer Vision, Venice, Italy, pp. 1925–1934.

[25] Liu,W., Rabinovich, A. and Berg, A. (2015). Parsenet: Looking wider to see better, arXiv 1506.04579.

[26] Long, J., Shelhamer, E. and Darrell, T. (2015). Fully convolutional networks for semantic segmentation, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Boston, MA, USA, pp. 3431–3440.

[27] Maggiori, E., Tarabalka, Y., Charpiat, G. and Alliez, P. (2016). Convolutional neural networks for large-scale remote sensing image classification, IEEE Transactions on Geoscience Remote Sensing 55(2): 645–657.

[28] Marcos, D., Volpi, M., Kellenberger, B. and Tuia, D. (2018). Land cover mapping at very high resolution with rotation equivariant CNNs: Towards small yet accurate models, ISPRS Journal of Photogrammetry and Remote Sensing 145(5): 96–107.

[29] Marmanis, D., Schindler, K., Wegner, J., Galliani, S., Datcu, M. and Stilla, U. (2016). Classification with an edge: Improving semantic image segmentation with boundary detection, ISPRS Journal of Photogrammetry and Remote Sensing 135(7): 158–172.

[30] Mi, Z. and Hu, X. (2017). Learning dual multi-scale manifold ranking for semantic segmentation of high-resolution images, Remote Sensing 9(5): 500.

[31] Miao, L., Zang, S., Bing, Z., Li, S. and Wu, C. (2014). A review of remote sensing image classification techniques: The role of spatio-contextual information, European Journal of Remote Sensing 47(1): 389–411.

[32] Noh, H., Hong, S. and Han, B. (2015). Learning deconvolution network for semantic segmentation, IEEE International Conference on Computer Vision, Santiago, Chile, pp. 1520–1528.

[33] Peng, D., Zhang, Y. and Guan, H. (2019). End-to-end change detection for high resolution satellite images using improved UNet++, Remote Sensing 11(11): 1382.

[34] Pohlen, T., Hermans, A., Mathias, M. and Leibe, B. (2017). Full-resolution residual networks for semantic segmentation in street scenes, IEEE Conference on Computer Vision and Pattern Recognition, Honolulu, HI, USA, pp. 4151–4160.

[35] Razavian, A.S., Azizpour, H., Sullivan, J. and Carlsson, S. (2014). CNN features off-the-shelf: An astounding baseline for recognition, IEEE Conference on Computer Vision and Pattern Recognition, Columbus, OH, USA, pp. 806–813.

[36] Ren, S., Girshick, R., Girshick, R. and Sun, J. (2017). Faster r-CNN: Towards real-time object detection with region proposal networks, IEEE Transactions on Pattern Analysis Machine Intelligence 39(6): 1137–1149.

[37] Ronneberger, O., Fischer, P. and Brox, T. (2015). U-Net: Convolutional networks for biomedical image segmentation, International Conference on Medical Image Computing Computer-Assisted Intervention, Munich, Germany, pp. 234–241.

[38] Scott, G.J., England, M.R., Starms, W.A., Marcum, R.A. and Davis, C.H. (2017). Training deep convolutional neural networks for land-cover classification of high-resolution imagery, IEEE Geoscience Remote Sensing Letters 14(9): 1638–1642.

[39] Sharma, A., Liu, X., Yang, X. and Shi, D. (2017). A patch-based convolutional neural network for remote sensing image classification, Neural Networks 95(7): 19.

[40] Shotton, J., Johnson, M. and Cipolla, R. (2008). Semantic texton forests for image categorization and segmentation, Conference on Computer Vision and Pattern Recognition, Anchorage, AK, USA, pp. 1–8.

[41] Shotton, J., Winn, J., Rother, C. and Criminisi, A. (2009). Textonboost for image understanding: Multi-class object recognition and segmentation by jointly modeling texture, layout, and context, International Journal of Computer Vision 81(1): 2–23.

[42] Simonyan, K. and Zisserman, A. (2014). Very deep convolutional networks for large-scale image recognition, arXiv 1409.1556.

[43] Sivic and Zisserman (2003). Video Google: A text retrieval approach to object matching in videos, Proceedings of the 9th IEEE International Conference on Computer Vision, Nice, France, Vol.2, pp. 1470–1477.

[44] Tao, L., Abd-Elrahman, A., Morton, J. and Wilhelm, V.L. (2018). Comparing fully convolutional networks, random forest, support vector machine, and patch-based deep convolutional neural networks for object-based wetland mapping using images from small unmanned aircraft system, GIScience Remote Sensing 55(2): 243–264.

[45] Vemulapalli, R., Tuzel, O., Liu, M.Y. and Chellappa, R. (2016). Gaussian conditional random field network for semantic segmentation, Computer Vision Pattern Recognition, Las Vegas, NV, USA, pp. 3224–3233.

[46] Volpi, M. and Tuia, D. (2017). Dense semantic labeling of subdecimeter resolution images with convolutional neural networks, IEEE Transactions on Geoscience and Remote Sensing 55(2): 881–893.

[47] Yang, H., Yu, B., Luo, J. and Chen, F. (2019). Semantic segmentation of high spatial resolution images with deep neural networks, GIScience Remote Sensing 56(5): 1–20.

[48] Zhang, C., Xin, P., Li, H., Gardiner, A. and Atkinson, P.M. (2018a). A hybrid MLP-CNN classifier for very fine resolution remotely sensed image classification, ISPRS Journal of Photogrammetry Remote Sensing 140(7): 133–144.

[49] Zhang, P., Ke, Y., Zhang, Z., Wang, M., Li, P. and Zhang, S. (2018b). Urban land use and land cover classification using novel deep learning models based on high spatial resolution satellite imagery, IEEE Transactions on Geoscience Remote Sensing 18(11): 3717.

[50] Zhao, H., Shi, J., Qi, X., Wang, X. and Jia, J. (2017). Pyramid scene parsing network, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Honolulu, HI, USA, pp. 6230–6239.

[51] Zhao, W. and Du, S. (2016). Learning multiscale and deep representations for classifying remotely sensed imagery, ISPRS Journal of Photogrammetry Remote Sensing 113(3): 155–165.

[52] Zhou, B., Hang, Z., Puig, X., Fidler, S., Barriuso, A. and Torralba, A. (2017). Scene parsing through ADE20K dataset, Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Honolulu, HI, USA, pp. 633–641.

[53] Zhuowen, T. and Xiang, B. (2010). Auto-context and its application to high-level vision tasks and 3D brain image segmentation, IEEE Transactions on Pattern Analysis Machine Intelligence 32(10): 1744–1757.

Parcourir par

Geodesic

Parcourir par