Efficient storage, retrieval and analysis of poker hands: An adaptive data framework
International Journal of Applied Mathematics and Computer Science, Tome 27 (2017) no. 4, pp. 713-726.

Voir la notice de l'article provenant de la source Library of Science

In online gambling, poker hands are one of the most popular and fundamental units of the game state and can be considered objects comprising all the events that pertain to the single hand played. In a situation where tens of millions of poker hands are produced daily and need to be stored and analysed quickly, the use of relational databases no longer provides high scalability and performance stability. The purpose of this paper is to present an efficient way of storing and retrieving poker hands in a big data environment. We propose a new, read-optimised storage model that offers significant data access improvements over traditional database systems as well as the existing Hadoop file formats such as ORC, RCFile or SequenceFile. Through index-oriented partition elimination, our file format allows reducing the number of file splits that needs to be accessed, and improves query response time up to three orders of magnitude in comparison with other approaches. In addition, our file format supports a range of new indexing structures to facilitate fast row retrieval at a split level. Both index types operate independently of the Hive execution context and allow other big data computational frameworks such as MapReduce or Spark to benefit from the optimized data access path to the hand information. Moreover, we present a detailed analysis of our storage model and its supporting index structures, and how they are organised in the overall data framework. We also describe in detail how predicate based expression trees are used to build effective file-level execution plans. Our experimental tests conducted on a production cluster, holding nearly 40 billion hands which span over 4000 partitions, show that multi-way partition pruning outperforms other existing file formats, resulting in faster query execution times and better cluster utilisation.
Keywords: big data, storage model design, data architecture, data access, path optimization
Mots-clés : zbiór danych, architektura danych, udostępnianie danych, optymalizacja obszaru
@article{IJAMCS_2017_27_4_a3,
     author = {Gorawski, M. and Lorek, M.},
     title = {Efficient storage, retrieval and analysis of poker hands: {An} adaptive data framework},
     journal = {International Journal of Applied Mathematics and Computer Science},
     pages = {713--726},
     publisher = {mathdoc},
     volume = {27},
     number = {4},
     year = {2017},
     language = {en},
     url = {http://geodesic.mathdoc.fr/item/IJAMCS_2017_27_4_a3/}
}
TY  - JOUR
AU  - Gorawski, M.
AU  - Lorek, M.
TI  - Efficient storage, retrieval and analysis of poker hands: An adaptive data framework
JO  - International Journal of Applied Mathematics and Computer Science
PY  - 2017
SP  - 713
EP  - 726
VL  - 27
IS  - 4
PB  - mathdoc
UR  - http://geodesic.mathdoc.fr/item/IJAMCS_2017_27_4_a3/
LA  - en
ID  - IJAMCS_2017_27_4_a3
ER  - 
%0 Journal Article
%A Gorawski, M.
%A Lorek, M.
%T Efficient storage, retrieval and analysis of poker hands: An adaptive data framework
%J International Journal of Applied Mathematics and Computer Science
%D 2017
%P 713-726
%V 27
%N 4
%I mathdoc
%U http://geodesic.mathdoc.fr/item/IJAMCS_2017_27_4_a3/
%G en
%F IJAMCS_2017_27_4_a3
Gorawski, M.; Lorek, M. Efficient storage, retrieval and analysis of poker hands: An adaptive data framework. International Journal of Applied Mathematics and Computer Science, Tome 27 (2017) no. 4, pp. 713-726. http://geodesic.mathdoc.fr/item/IJAMCS_2017_27_4_a3/

[1] Abouzeid, A., Bajda-Pawlikowski, K., Abadi, D., Silberschatz, A. and Rasin, A. (2009). HadoopDB: An architectural hybrid of MapReduce and DBMS technologies for analytical workloads, Proceedings of the VLDB Endowment 2(1): 922–933, DOI: 10.14778/1687627.1687731.

[2] Alamoudi, A., Grover, R., Carey, M.J. and Borkar, V. (2015). External data access and indexing in AsterixDB, Proceedings of the 24th ACM International on Conference on Information and Knowledge Management, Melbourne, Australia, pp. 3–12, DOI: 10.1145/2806416.2806428.

[3] Ambekar, G., Chikane, T., Sheth, S., Sable, A. and Ghag, K. (2015). Anticipation of winning probability in poker using data mining, International Conference on Computer, Communication and Control, Indore, India, pp. 1–6, DOI: 10.1109/IC4.2015.7375593.

[4] Delaney, K. (2009). Microsoft SQL Server 2008 Internals, Microsoft Press, Redmond, WA.

[5] Hadoop (2014). Apache Hadoop, http://hadoop.apache.org.

[6] HDFS (2016). HDFS architecture, https://hadoop.apache.org/docs/stable/hadoop-project-dist/hadoop-hdfs/HdfsDesign.html.

[7] Hive (2014). Apache Hive, http://hive.apache.org.

[8] Jiang, D., Ooi, B.C., Shi, L. and Wu, S. (2010). The performance of MapReduce: And in-depth study, Proceedings of the VLDB Endowment 3(1–2): 472–483, DOI: 10.14778/1920841.1920903.

[9] Mealing, R. and Shapiro, J. (2015). Opponent modelling by expectation-maximisation and sequence prediction in simplified poker, IEEE Transactions on Computational Intelligence and AI in Games PP(99): 472–483, DOI:10.1109/TCIAIG.2015.2491611.

[10] Miltersen, P.B. and Sørensen, T.B. (2007). A near-optimal, Proceedings of the 6th International Joint Conference on Autonomous Agents and Multiagent Systems, Honolulu, HI, USA, pp. 1168–1175, DOI:10.1145/1329125.1329357.

[11] Mullins, C.S. (2000). DB2 Developer’s Guide, Fourth Edition, Sams, Indianapolis, IN.

[12] MySQL (2016). MySQL internals manual: Writing a custom storage engine, http://dev.mysql.com/doc/internals/en/custom-engine.html.

[13] ORC (2016). Apache ORC, http://orc.apache.org/docs.

[14] PostgreSQL (2016). PostgreSQL documentation: Database page layout, https://www.postgresql.org/docs/9.1/static/storage-page-layout.html.

[15] RCFile (2016). Apache Hive, http://hive.apache.org/javadocs/r2.2.0/api/org/apache/hadoop/hive/ql/io/RCFile.html.

[16] Richter, S., Quiané-Ruiz, J., Schuh, S. and Dittrich, J. (2014). Towards zero-overhead static and adaptive indexing in Hadoop, The VLDB Journal 23(3): 469–494, DOI: 10.1007/s00778-103-0332-z.

[17] Shvachko, K., Kuang, H., Radia, S. and Chansler, R. (2010). The Hadoop distributed file system, 2010 IEEE 26th Symposium on Mass Storage Systems and Technologies (MSST), pp. 1–10, DOI: 10.1109/MSST.2010.5496972.

[18] Teófilo, L.F. and Reis, L.P. (2011). Identifying player’s strategies in no limit Texas Hold’em poker through the analysis of individual moves, EPIA Conference on Artificial Intelligence, Lisbon, Portugal, pp. 70–83.

[19] Teófilo, L.F., Reis, L.P. and Cardoso, H.L. (2013). Estimating the probability of winning for Texas Hold’em poker agents, IEEE/WIC/ACM International Conference on Intelligent Agent Technology, Washington, DC, USA, pp. 369–374, DOI: 10.1109/WI-IAT.2013.134.

[20] Teófilo, Reis, L.P. and Cardoso, H.L. (2014). A profitable online no-limit poker playing agent, Proceedings of the 2014 IEEE/WIC/ACM International Joint Conferences on Web Intelligence (WI) and Intelligent Agent Technologies (IAT), Washington, DC, USA, Vol. 03, pp. 286–293, DOI: 10.1109/WI-IAT.2014.179.

[21] Thusoo, A., Sarma, J.S., Jain, N., Shao, Z., Chakka, P., Zhang, N., Anthony, S., Liu, H. and Murthy, R. (2010). Hive—a petabyte scale data warehouse using Hadoop, Data Engineering (ICDE), 2010 IEEE 26th International Conference on, Long Beach, CA, USA, pp. 996–1005, DOI: 10.1109/ICDE.2010.5447738.

[22] YARN (2016). Apache Hadoop YARN, http://hadoop. apache.org/docs/stable2/hadoop-yarn/hadoop-yarn-site/YARN.html.