Representative sampling algorithm for database systems based on the partitioned parallelism
    
    
  
  
  
      
      
      
        
Vestnik Ûžno-Uralʹskogo gosudarstvennogo universiteta. Seriâ Vyčislitelʹnaâ matematika i informatika, Tome 3 (2014) no. 4, pp. 36-50
    
  
  
  
  
  
    
      
      
        
      
      
      
    Voir la notice de l'article provenant de la source Math-Net.Ru
            
              			Sampling is a popular approach to very large databases processing in a wide range of applications, e.g. data mining, histograms construction, query execution cost estimation, etc. Use of either the sample instead of the original database can reduce the accuracy of the results, but offset by a reduction of time executing processing. Representative sampling allows you to save the sample of certain characteristics of the database. However, existing algorithms for representative sampling can not be used for pas-parallel database systems because it does not take into account the characteristics of the data distribution fissionable by the compute nodes of the cluster system. In this paper we propose al-representative sampling algorithm for parallel relational database systems based on the slice of parallelism. The results of computational experiments on the proposed algorithm, showing adequate maintenance of representativity database properties distributed across the nodes of a cluster system.
			
            
            
            
          
        
      
                  
                    
                    
                    
                    
                    
                      
Keywords: 
relational databases, parallel database systems, representative sampling.
                    
                  
                
                
                @article{VYURV_2014_3_4_a1,
     author = {D. D. Yantsen and M. L. Zymbler},
     title = {Representative sampling algorithm for database systems based on the partitioned parallelism},
     journal = {Vestnik \^U\v{z}no-Uralʹskogo gosudarstvennogo universiteta. Seri\^a Vy\v{c}islitelʹna\^a matematika i informatika},
     pages = {36--50},
     publisher = {mathdoc},
     volume = {3},
     number = {4},
     year = {2014},
     language = {ru},
     url = {http://geodesic.mathdoc.fr/item/VYURV_2014_3_4_a1/}
}
                      
                      
                    TY - JOUR AU - D. D. Yantsen AU - M. L. Zymbler TI - Representative sampling algorithm for database systems based on the partitioned parallelism JO - Vestnik Ûžno-Uralʹskogo gosudarstvennogo universiteta. Seriâ Vyčislitelʹnaâ matematika i informatika PY - 2014 SP - 36 EP - 50 VL - 3 IS - 4 PB - mathdoc UR - http://geodesic.mathdoc.fr/item/VYURV_2014_3_4_a1/ LA - ru ID - VYURV_2014_3_4_a1 ER -
%0 Journal Article %A D. D. Yantsen %A M. L. Zymbler %T Representative sampling algorithm for database systems based on the partitioned parallelism %J Vestnik Ûžno-Uralʹskogo gosudarstvennogo universiteta. Seriâ Vyčislitelʹnaâ matematika i informatika %D 2014 %P 36-50 %V 3 %N 4 %I mathdoc %U http://geodesic.mathdoc.fr/item/VYURV_2014_3_4_a1/ %G ru %F VYURV_2014_3_4_a1
D. D. Yantsen; M. L. Zymbler. Representative sampling algorithm for database systems based on the partitioned parallelism. Vestnik Ûžno-Uralʹskogo gosudarstvennogo universiteta. Seriâ Vyčislitelʹnaâ matematika i informatika, Tome 3 (2014) no. 4, pp. 36-50. http://geodesic.mathdoc.fr/item/VYURV_2014_3_4_a1/
