Columnar database coprocessor for computing cluster system
Vestnik Ûžno-Uralʹskogo gosudarstvennogo universiteta. Seriâ Vyčislitelʹnaâ matematika i informatika, Tome 4 (2015) no. 4, pp. 5-31 Cet article a éte moissonné depuis la source Math-Net.Ru

Voir la notice de l'article

The paper is devoted to the design and implementation issues of columnar coprocessor for RDBMS. Columnar coprocessor (CCOP) is developed on the base of columnar data storage model. It is designed for large computing cluster systems. CCOP can utilize CPUs as well as manycore coprocessors MIC. CCOP maintains the columnar indices with surrogate keys stored in distributed main memory. The partitioning is performed on the base of domain-interval model. For the data warehouse workload, CCOP demonstrates performance much higher than row-stores do.
Keywords: columnar coprocessor, CCOP, distributed columnar indices, computing cluster systems, manycore coprocessors
Mots-clés : domain-interval fragmentation, MIC architecture.
@article{VYURV_2015_4_4_a0,
     author = {E. V. Ivanova and L. B. Sokolinsky},
     title = {Columnar database coprocessor for computing cluster system},
     journal = {Vestnik \^U\v{z}no-Uralʹskogo gosudarstvennogo universiteta. Seri\^a Vy\v{c}islitelʹna\^a matematika i informatika},
     pages = {5--31},
     year = {2015},
     volume = {4},
     number = {4},
     language = {ru},
     url = {http://geodesic.mathdoc.fr/item/VYURV_2015_4_4_a0/}
}
TY  - JOUR
AU  - E. V. Ivanova
AU  - L. B. Sokolinsky
TI  - Columnar database coprocessor for computing cluster system
JO  - Vestnik Ûžno-Uralʹskogo gosudarstvennogo universiteta. Seriâ Vyčislitelʹnaâ matematika i informatika
PY  - 2015
SP  - 5
EP  - 31
VL  - 4
IS  - 4
UR  - http://geodesic.mathdoc.fr/item/VYURV_2015_4_4_a0/
LA  - ru
ID  - VYURV_2015_4_4_a0
ER  - 
%0 Journal Article
%A E. V. Ivanova
%A L. B. Sokolinsky
%T Columnar database coprocessor for computing cluster system
%J Vestnik Ûžno-Uralʹskogo gosudarstvennogo universiteta. Seriâ Vyčislitelʹnaâ matematika i informatika
%D 2015
%P 5-31
%V 4
%N 4
%U http://geodesic.mathdoc.fr/item/VYURV_2015_4_4_a0/
%G ru
%F VYURV_2015_4_4_a0
E. V. Ivanova; L. B. Sokolinsky. Columnar database coprocessor for computing cluster system. Vestnik Ûžno-Uralʹskogo gosudarstvennogo universiteta. Seriâ Vyčislitelʹnaâ matematika i informatika, Tome 4 (2015) no. 4, pp. 5-31. http://geodesic.mathdoc.fr/item/VYURV_2015_4_4_a0/

[1] Chernyshev G.A., “Physical Layer Organization of Columnar DBMS”, Proceedings of the SPIIRAS, 2013, no. 7, 204–222

[2] D.J. Abadi, P.A. Boncz, S. Harizopoulos, S. Idreos, S. Madden, “The Design and Implementation of Modern Column-Oriented Database Systems”, Foundations and Trends in Databases, 5:3 (2013), 197–280 | DOI | MR

[3] S. Idreos, F. Groffen, N. Nes, S. Manegold, S. Mullender, M.L. Kersten, “MonetDB: Two Decades of Research in Column-oriented Database Architectures”, IEEE Data Engineering Bulletin, 35:1 (2012), 40–45

[4] P.A. Boncz, M. Zukowski, N. Nes, “MonetDB/X100: Hyper-pipelining query execution”, Proceedings of the Second Biennial Conference on Innovative Data Systems Research (CIDR) (January 4-7, Asilomar, CA, USA), 2005, 225–237

[5] M. Stonebraker, D.J. Abadi, A. Batkin, X. Chen, M. Cherniack, M. Ferreira, E. Lau, A. Lin, S.R. Madden, E.J. O'Neil, P.E. O'Neil, A. Rasin, N. Tran, S.B. Zdonik, “A Column-Oriented DBMS”, Proceedings of the 31st International Conference on Very Large Data Bases (VLDB'05) (August 30 – September 2, 2005, Trondheim, Norway), ACM, 2005, 553–564

[6] R. MacNicol, B. French, “Sybase IQ multiplex - designed for analytics”, Proceedings of the Thirtieth International Conference on Very Large Data Bases (August 31 – September 3, 2004, Toronto, Canada), Morgan Kaufmann, 2004, 1227–1230 | DOI

[7] M. Zukowski, P.A. Boncz, “Vectorwise: Beyond column stores”, IEEE Data Engineering Bulletin, 35:1 (2012), 21–27

[8] A. Lamb, M. Fuller, R. Varadarajan, N. Tran, B. Vandier, L. Doshi, C. Bear, “The Vertica analytic database: C-store 7 years later”, Proceedings of the VLDB Endowment, 5:12 (2012), 1790–1801 | DOI

[9] R. Barber, P. Bendel, M. Czech, O. Draese, F. Ho, N. Hrle, S. Idreos, M. S. Kim, O. Koeth, J. G. Lee, T.T. Li, G.M. Lohman, K. Morfonios, R. Müller, K. Murthy, I. Pandis, L. Qiao, V. Raman, R. Sidle, K. Stolze, S. Szabo, “Business Analytics in (a) Blink”, IEEE Data Engineering Bulletin, 35:1 (2012), 9–14

[10] P.-A. Larson, C. Clinciu, C. Fraser, E.N. Hanson, M. Mokhtar, M. Nowakiewicz, V. Papadimos, S.L. Price, S. Rangarajan, R. Rusanu, M. Saubhasik, “Enhancements to SQL server column stores”, Proceedings of the 2013 ACM SIGMOD International Conference on Management of Data (SIGMOD'13) (une 22-27, 2013, New York, NY, USA), ACM, 2013, 1159–1168 | DOI

[11] P.-A. Larson, C. Clinciu, E.N. Hanson, A. Oks, S.L. Price, S. Rangarajan, A. Surna, Q. Zhou, “SQL server column store indexes”, Proceedings of the 2011 ACM SIGMOD International Conference on Management of data (SIGMOD'11) (June 12-16, 2011, Athens, Greece), ACM, 2011, 1177–1184 | DOI

[12] P.-A. Larson, E.N. Hanson, S.L. Price, “Columnar Storage in SQL Server 2012”, IEEE Data Engineering Bulletin, 35:1 (2012), 15–20

[13] F. Färber, N. May, W. Lehner, P. Große, I. Müller, H. Rauhe, J. Dees, “The SAP HANA Database - An Architecture Overview”, IEEE Data Engineering Bulletin, 35:1 (2012), 28–33 | MR

[14] R.A. Weiss, Technical Overview of the Oracle Exadata Database Machine and Exadata Storage Server, White Paper, , Oracle Corporation, 35 pp. (data obrascheniya: 29.10.2015) http://www.oracle.com/technetwork/database/exadata/exadata-technical-whitepaper-134575.pdf

[15] A Drill-Down into EXASolution, Technical Whitepaper, , EXASOL AG, 2014, 15 pp. (data obrascheniya: 22.10.2015) http://info.exasol.com/whitepaper-exasolution-2-en.html

[16] A Peek under the Hood, Technical Whitepaper, , EXASOL AG, 2014, 16 pp. (data obrascheniya: 22.10.2015) http://www.breos.com/sites/default/files/pdf/downloads/exasol_whitepaper.pdf

[17] EXASolution, Business Whitepaper, , EXASOL AG, 2015, 11 pp. (data obrascheniya: 27.10.2015) http://info.exasol.com/business-whitepaper-exasolution-en.html

[18] Actian SQL Analytics in Hadoop, A Technical Overview, , Actian Corporation, 2015, 16 pp. (data obrascheniya: 27.10.2015) http://bigdata.actian.com/SQLAnalyticsinHadoop

[19] D. Ślȩzak, M. Kowalski, “Towards approximate SQL: infobright's approach”, Proceedings of the 7th international conference on Rough sets and current trends in computing (RSCTC'10), Springer-Verlag, 2010, 630–639 | DOI

[20] SAND CDBMS: A Technological Overview, White Paper, , SAND Technology, 2010, 16 pp. (data obrascheniya: 29.10.2015) http://www.sand.com/downloads/side2239eadd/wp_sand_cdbms_technological_overview_en.pdf

[21] D.J. Abadi, S.R. Madden, N. Hachem, Column-Stores vs. Row-Stores: How Different Are They Really?, Proceedings of the 2008 ACM SIGMOD international conference on Management of data (June 9-12, 2008, Vancouver, BC, Canada), ACM, 2008, 967–980 | DOI

[22] E. Ivanova, L. Sokolinsky, “Decomposition of Natural Join Based on Domain-Interval Fragmented Column Indices”, Proceedings of the 38th International Convention on Information and Communication Technology, Electronics and Microelectronics, MIPRO (May 25-29, 2015, Opatija, Croatia), IEEE, 2015, 223–226 | DOI

[23] Ivanova E.V., Sokolinsky L.B., “Decomposition of Grouping Operation Based on Fragmented Column Indices”, Science of SUSU, SUSU publishing center, Chelyabinsk, 2015, 15–23

[24] Ivanova E.V., Sokolinsky L.B., “Decomposition of Intersection and Join Operations Based on Domain-Interval Fragmented Column Indices”, Bulletin of South Ural State University. Series: Computational Mathematics and Software Engineering, 4:1 (2015), 44–56 | DOI

[25] Ivanova E.V., “Using Distributed Column Hash Indices for the Query for Very Large Databases”, Proceedings of the International Scientific Conference Scientific Service on the Internet: the Variety of Supercomputing Worlds (September 22–27, 2014, Novorossiysk, Russia), Bulletin of publishing house of the Moscow university, M., 102–104

[26] D. Huffman, “A method for the construction of minimum-redundancy codes”, Proceedings of the I.R.E., 40:9 (1952), 1098–1101 | DOI | Zbl

[27] J. Ziv, A. Lempel, “A universal algorithm for sequential data compression”, IEEE Transactions on Information Theory, 23:3 (1977), 337–343 | DOI | MR | Zbl

[28] D.J. Abadi, S.R. Madden, M. Ferreira, “Integrating compression and execution in column-oriented database systems”, Proceedings of the 2006 ACM SIGMOD international conference on Management of data (June 26–29, 2006, Chicago, Illinois), ACM, 2006, 671–682 | DOI

[29] M.A. Bassiouni, “Data Compression in Scientific and Statistical Databases”, IEEE Transactions on Software Engineering, 11:10 (1985), 1047–1058 | DOI

[30] S.S. Ruth, P.J. Kreutzer, “Data Compression for Large Business Files”, Datamation, 19:9 (1972), 62–66

[31] M.A. Roth, S.J. Van Horn, “Database compression”, ACM SIGMOD Record, 22:3 (1993), 31–39 | DOI

[32] P. Deutsch, J. L. Gailly, ZLIB Compressed Data Format Specification version 3.3, RFC Editor, 1996 | DOI

[33] G. Roelofs, J. Gailly, M. Adler., Zlib: A Massively Spiffy Yet Delicately Unobtrusive Compression Library, (data obrascheniya: 20.09.2015) http://www.zlib.net/

[34] P. Deutsch, DEFLATE Compressed Data Format Specification version 1.3, RFC Editor, 1996 | DOI

[35] TPC Benchmark H, Standard Specification, Version 2.17.1, , Transaction Processing Performance Council, 2014, 136 pp. (data obrascheniya: 29.10.2015) http://www.tpc.org/tpc_documents_current_versions/pdf/tpch2.17.1.pdf

[36] J. Gray, P. Sundaresan, S. Englert, K. Baclawski, P.J. Weinberger, “Quickly generating billion-record synthetic databases”, Proceedings of the 1994 ACM SIGMOD International Conference on Management of Data (May 24–27, 1994, Minneapolis, Minnesota), ACM Press, 1994, 243–252 | DOI

[37] T. Ungerer, B. Robič, J. Šilc, “A survey of processors with explicit multithreading”, ACM Computing Surveys, 35:1 (2003), 29–63 | DOI