Modeling and querying facts with period timestamps in data warehouses
International Journal of Applied Mathematics and Computer Science, Tome 29 (2019) no. 1, pp. 31-49.

Voir la notice de l'article provenant de la source Library of Science

In this paper, we study various ways of representing and querying fact data that are time-stamped with a time period in a data warehouse. The main focus is on how to represent the time periods that are associated with the facts in order to support convenient and efficient aggregations over time. We propose three distinct logical models that represent time periods as sets of all time points in a period (instant model), as pairs of start and end time points of a period (period model), and as atomic units that are explicitly stored in a new period dimension (period model). The period dimension is enriched with information about the days of each period, thereby combining the former two models. We use four different classes of aggregation queries to analyze query formulation, query execution, and query performance over the three models. An extensive empirical evaluation on synthetic and real-world datasets and the analysis of the query execution plans reveal that the period model is the best choice in terms of runtime and space for all four query classes.
Keywords: data warehouse, time period, logical model
Mots-clés : hurtownia danych, odcinek czasu, model logiczny
@article{IJAMCS_2019_29_1_a2,
     author = {Mahlknecht, Giovanni and Dign\"os, Anton and Kozmina, Natalija},
     title = {Modeling and querying facts with period timestamps in data warehouses},
     journal = {International Journal of Applied Mathematics and Computer Science},
     pages = {31--49},
     publisher = {mathdoc},
     volume = {29},
     number = {1},
     year = {2019},
     language = {en},
     url = {http://geodesic.mathdoc.fr/item/IJAMCS_2019_29_1_a2/}
}
TY  - JOUR
AU  - Mahlknecht, Giovanni
AU  - Dignös, Anton
AU  - Kozmina, Natalija
TI  - Modeling and querying facts with period timestamps in data warehouses
JO  - International Journal of Applied Mathematics and Computer Science
PY  - 2019
SP  - 31
EP  - 49
VL  - 29
IS  - 1
PB  - mathdoc
UR  - http://geodesic.mathdoc.fr/item/IJAMCS_2019_29_1_a2/
LA  - en
ID  - IJAMCS_2019_29_1_a2
ER  - 
%0 Journal Article
%A Mahlknecht, Giovanni
%A Dignös, Anton
%A Kozmina, Natalija
%T Modeling and querying facts with period timestamps in data warehouses
%J International Journal of Applied Mathematics and Computer Science
%D 2019
%P 31-49
%V 29
%N 1
%I mathdoc
%U http://geodesic.mathdoc.fr/item/IJAMCS_2019_29_1_a2/
%G en
%F IJAMCS_2019_29_1_a2
Mahlknecht, Giovanni; Dignös, Anton; Kozmina, Natalija. Modeling and querying facts with period timestamps in data warehouses. International Journal of Applied Mathematics and Computer Science, Tome 29 (2019) no. 1, pp. 31-49. http://geodesic.mathdoc.fr/item/IJAMCS_2019_29_1_a2/

[1] Ahmed, W., Zimányi, E. and Wrembel, R. (2014). A logical model for multiversion data warehouses, Proceedings of the 16th International Conference on Data Warehousing and Knowledge Discovery, DaWaK 2014, Munich, Germany, pp. 23–34.

[2] Bebel, B., Cichowicz, T., Morzy, T., Rytwinski, F., Wrembel, R. and Koncilia, C. (2015). Sequential data analytics by means of Seq-SQL language, Proceedings of the 26th International Conference on Database and Expert Systems Applications, DEXA 2015, Valencia, Spain, Part I, pp. 416–431.

[3] Ben-Gan, I., Machanic, A., Sarka, D. and Farlee, K. (2015). TSQL Querying, Microsoft Press, Redmond, WA.

[4] Blaschka, M., Sapia, C. and Höfling, G. (1999). On schema evolution in multidimensional databases, Proceedings of the 1st International Conference on Data Warehousing and Knowledge Discovery, DaWaK 1999, Florence, Italy, pp. 153–164.

[5] Bliujute, R., Saltenis, S., Slivinskas, G. and Jensen, C.S. (1998). Systematic change management in dimensional data warehousing, Proceedings of the 3rd International Baltic Workshop on DB and IS, Riga, Latvia, pp. 27–41.

[6] Böhlen, M.H., Dignös, A., Gamper, J. and Jensen, C.S. (2018). Temporal data management—an overview, in E. Zimányi (Ed.), Business Intelligence and Big Data, Springer International Publishing, Cham, pp. 51–83.

[7] Böhlen, M.H., Gamper, J. and Jensen, C.S. (2006a). An algebraic framework for temporal attribute characteristics, Annals of Mathematics and Artificial Intelligence 46(3): 349–374.

[8] Böhlen, M.H., Gamper, J. and Jensen, C.S. (2006b). Multi-dimensional aggregation for temporal data, Proceedings of the 10th International Conference on Extending Database Technology, EDBT 2006, Munich, Germany, pp. 257–275.

[9] Böhlen, M.H., Gamper, J., Jensen, C.S. and Snodgrass, R.T. (2009). SQL-based temporal query languages, in L. Liu and M. Tamer Özsu (Eds.), Encyclopedia of Database Systems, Springer, New York, NY, pp. 2762–2768.

[10] Bouros, P. and Mamoulis, N. (2017). A forward scan based plane sweep algorithm for parallel interval joins, Proceedings of the VLDB Endowment 10(11): 1346–1357.

[11] Cafagna, F. and Böhlen, M.H. (2017). Disjoint interval partitioning, The VLDB Journal 26(3): 447–466.

[12] Dignös, A., Böhlen, M.H. and Gamper, J. (2012). Temporal alignment, Proceedings of the ACM SIGMOD International Conference on Management of Data, SIGMOD 2012, Scottsdale, AZ, USA, pp. 433–444.

[13] Dignös, A., Böhlen, M.H. and Gamper, J. (2013). Query time scaling of attribute values in interval timestamped databases, Proceedings of the 29th IEEE International Conference on Data Engineering, ICDE 2013, Brisbane, Australia, pp. 1304–1307.

[14] Dignös, A., Böhlen, M.H., Gamper, J. and Jensen, C.S. (2016). Extending the kernel of a relational DBMS with comprehensive support for sequenced temporal queries, ACM Transactions on Database Systems 41(4): 26:1–26:46.

[15] Eder, J., Koncilia, C. and Morzy, T. (2002). The COMET metamodel for temporal data warehouses, Proceedings of the 14th International Conference on Advanced Information Systems Engineering, CAiSE 2002, Toronto, Canada, pp. 83–99.

[16] Faisal, S. and Sarwar, M. (2014). Handling slowly changing dimensions in data warehouses, Journal of Systems and Software 94: 151–160.

[17] Gao, D., Jensen, C.S., Snodgrass, R.T. and Soo, M.D. (2005). Join operations in temporal databases, The VLDB Journal 14(1): 2–29.

[18] Garani, G., Adam, G.K. and Ventzas, D. (2016). Temporal data warehouse logical modelling, International Journal of Data Mining, Modelling and Management 8(2): 144–159.

[19] Golfarelli, M. and Rizzi, S. (2009a). Data Warehouse Design: Modern Principles and Methodologies, McGraw-Hill, Inc., New York, NY.

[20] Golfarelli, M. and Rizzi, S. (2009b). A survey on temporal data warehousing, International Journal of Data Warehousing and Mining 5(1): 1–17.

[21] Golfarelli, M. and Rizzi, S. (2011). Temporal data warehousing: Approaches and techniques, in D. Taniar and L. Chen (Eds.), Integrations of Data Warehousing, Data Mining and Database Technologies—Innovative Approaches, Information Science Reference, London, pp. 1–18.

[22] Goller, M. and Berger, S. (2013). Slowly changing measures, Proceedings of the 16th International Workshop on Data Warehousing and OLAP, DOLAP 2013, San Francisco, CA, USA, pp. 47–54.

[23] Goller, M. and Berger, S. (2015). Handling measurement function changes with slowly changing measures, Information Systems 53: 107–123.

[24] Höpken, W., Fuchs, M., Höll, G., Keil, D. and Lexhagen, M. (2013). Multi-dimensional data modelling for a tourism destination data warehouse, Proceedings of the International Conference on Information and Communication Technologies in Tourism 2013, Insbrusck, Austria, pp. 157–169.

[25] Jensen, C.S., Pedersen, T.B. and Thomsen, C. (2010). Multidimensional Databases and Data Warehousing, Synthesis Lectures on Data Management, Morgan Claypool Publishers, San Rafael, CA.

[26] Jensen, C.S. and Snodgrass, R.T. (2009). Temporal database, in L. Liu and M. Tamer Özsu (Eds.), Encyclopedia of Database Systems, Springer, New York, NY, p. 2957.

[27] Jensen, C.S., Soo, M.D. and Snodgrass, R.T. (1994). Unifying temporal data models via a conceptual model, Information Systems 19(7): 513–547.

[28] Kimball, R. and Ross, M. (2013). The Data Warehouse Toolkit: The Definitive Guide to Dimensional Modeling, 3rd Edn., Wiley Publishing, Hoboken, NJ.

[29] Kline, N. and Snodgrass, R.T. (1995). Computing temporal aggregates, Proceedings of the 11th International Conference on Data Engineering, ICDE 1995, Taipei, Taiwan, pp. 222–231.

[30] Koncilia, C. (2003). A bi-temporal data warehouse model, Proceedings of the 15th Conference on Advanced Information Systems Engineering, CAiSE 2003, Klagenfurt, Austria, Vol. 74.

[31] Koncilia, C., Morzy, T., Wrembel, R. and Eder, J. (2014). Interval OLAP:Analyzing interval data, Proceedings of the 16th International Conference on Data Warehousing and Knowledge Discovery, DaWaK 2014, Munich, Germany, pp. 233–244.

[32] Lenz, H. and Shoshani, A. (1997). Summarizability in OLAP and statistical data bases, Proceedings of the 9th International Conference on Scientific and Statistical Database Management, SSDBM 1997, Olympia, WA, USA, pp. 132–143.

[33] Lorentzos, N.A. (2009). Period-stamped temporal models, in L. Liu and M. Tamer Özsu (Eds.), Encyclopedia of Database Systems, Springer, New York, NY, pp. 2094–2098.

[34] Malinowski, E. and Zimányi, E. (2008). A conceptual model for temporal data warehouses and its transformation to the ER and the object-relational models, Data Knowledge Engineering 64(1): 101–133.

[35] Melton, J. and Simon, A.R. (2002). Advanced SQL query expressions, in J. Melton and A.R. Simon (Eds.), SQL: 1999, Morgan Kaufmann, Burlington, VA, pp. 265–353.

[36] Moon, B., Vega Lopez, I.F. and Immanuel, V. (2003). Efficient algorithms for large-scale temporal aggregation, IEEE Transactions on Knowledge and Data Engineering 15(3): 744–759.

[37] Piatov, D. and Helmer, S. (2017). Sweeping-based temporal aggregation, Proceedings of the 15th International Symposium on Advances in Spatial and Temporal Databases, SSTD 2017, Arlington, VA, USA, pp. 125–144.

[38] Piatov, D., Helmer, S. and Dignös, A. (2016). An interval join optimized for modern hardware, Proceedings of the 32nd IEEE International Conference on Data Engineering, ICDE 2016, Helsinki, Finland, pp. 1098–1109.

[39] Toman, D. (2009). Point-stamped temporal models, in L. Liu and M. Tamer Özsu (Eds.), Encyclopedia of Database Systems, Springer, New York, NY, pp. 2119–2123.

[40] Wrembel, R. and Bebel, B. (2007). Metadata management in a multiversion data warehouse, Journal on Data Semantics 8: 118–157.

[41] Yang, J. and Widom, J. (2003). Incremental computation and maintenance of temporal aggregates, The VLDB Journal 12(3): 262–283.

[42] Zhang, D., Markowetz, A., Tsotras, V.J., Gunopulos, D. and Seeger, B. (2001). Efficient computation of temporal aggregates with range predicates, Proceedings of the 20th ACM SIGACT-SIGMOD-SIGART Symposium on Principles of Database Systems, PODS 2001, Santa Barbara, CA, USA, pp. 237–245.

[43] Zhang, D., Tsotras, V.J. and Seeger, B. (2002). Efficient temporal join processing using indices, Proceedings of the 18th International Conference on Data Engineering, ICDE 2002, San Jose, CA, USA, pp. 103–113.