De-duplication on the backup system with information storage in a database
Modelirovanie i analiz informacionnyh sistem, Tome 24 (2017) no. 2, pp. 215-226.

Voir la notice de l'article provenant de la source Math-Net.Ru

Prevention of data loss from digital media includes such a process as a backup. It can be done manually by copying data to external media or automated on a schedule by using special software. There are the remote backup systems, when data are saved over the network to the remote repository. Such systems are multi-user and they process large amounts of data. Shared storage can meet files containing the same fragments. The elimination of repeated data is based on the mechanism of de-duplication. It is a method of information compression, when the search of copies is performed in the entire dataset rather than within a single file. The main advantage of using this technology is a significant saving of disk space. However, the mechanism of eliminating repetitive data can significantly reduce the speed of saving and restoring information. This article is devoted to the problem of implementing such a mechanism in the backup system with information storage in a relational database. In this paper we consider an example of implementation of such a system working in two modes: with the de-duplication of data and without it. The article illustrates a class diagram for the development of a client part of application as well as the description of tables and relationships between them in a database that belongs to the backend. The author offers an algorithm of saving data wiht de-duplication, and also gives the results of comparative tests on the speed of the algorithms of saving and restoring information when working with relational database management systems from different manufacturers.
Mots-clés : file, de-duplication
Keywords: data, backup, database.
@article{MAIS_2017_24_2_a6,
     author = {S. M. Taranin},
     title = {De-duplication on the backup system with information storage in a database},
     journal = {Modelirovanie i analiz informacionnyh sistem},
     pages = {215--226},
     publisher = {mathdoc},
     volume = {24},
     number = {2},
     year = {2017},
     language = {ru},
     url = {http://geodesic.mathdoc.fr/item/MAIS_2017_24_2_a6/}
}
TY  - JOUR
AU  - S. M. Taranin
TI  - De-duplication on the backup system with information storage in a database
JO  - Modelirovanie i analiz informacionnyh sistem
PY  - 2017
SP  - 215
EP  - 226
VL  - 24
IS  - 2
PB  - mathdoc
UR  - http://geodesic.mathdoc.fr/item/MAIS_2017_24_2_a6/
LA  - ru
ID  - MAIS_2017_24_2_a6
ER  - 
%0 Journal Article
%A S. M. Taranin
%T De-duplication on the backup system with information storage in a database
%J Modelirovanie i analiz informacionnyh sistem
%D 2017
%P 215-226
%V 24
%N 2
%I mathdoc
%U http://geodesic.mathdoc.fr/item/MAIS_2017_24_2_a6/
%G ru
%F MAIS_2017_24_2_a6
S. M. Taranin. De-duplication on the backup system with information storage in a database. Modelirovanie i analiz informacionnyh sistem, Tome 24 (2017) no. 2, pp. 215-226. http://geodesic.mathdoc.fr/item/MAIS_2017_24_2_a6/

[1] Taranin S. M., “Backup with Storage in a Database”, Modeling and Analysis of Information Systems, 23:4 (2016), 479–491 (in Russian) | MR

[2] Kazakov V. G., Fedosin S. A., “Technologii i algoritmi reservnogo kopirovania”, Vserossiyskiy konkursniy otbor obzorno-analiticheskih statey po prioritetnomu napravleniu “Informacionno-telekommunikacionnie sistemi”, 2008, 1–49 (in Russian)

[3] Medeiros J., “NTFS Forensics: A Programmers View of Raw Filesystem Data Extraction”, Grayscale Research, 2008, 1–27 | Zbl

[4] Kazakov V. G., Fedosin S. A., Plotnikova N. P., “Method of adaptive dedublication with multilevel block indexing”, Fundamental research, 2013, no. 8, 1322–1325

[5] Sears R., Catharine van Ingen, Gray J., To BLOB or Not To BLOB: Large Object Storage in a Database or a Filesystem?, Technical Report MSR-TR-2006–45, 2006, 11 pp.

[6] Zhu N., Chiueh T., Portable and Efficient Continuous Data Protection for Network File Servers, Stony Brook University, 2007, 17 pp.

[7] Meyer D. T., Bolosky W. J., “A Study of Practical Deduplication”, ACM Transactions on Storage, 7:4 (2012), 1–13 | DOI | Zbl

[8] Storer M. W., Greenan K., Long D. D. E., Miller E. L., “Secure Data Deduplication”, Proceedings of the 4th ACM international workshop on Storage security and survivability, 2008, 1–10 | DOI

[9] Renzel K., Keller W., Client/Server Architectures for Business Information Systems. A Pattern Language, 1997, 25 pp.

[10] Date C. J., An Introduction to Database Systems, 8 ed., Pearson Education, Inc., 2004

[11] Groff J., Weinberg P., Oppel A., SQL The Complete Reference, 3 ed., The McGraw-Hill Companies, 2010

[12] Date C. J., SQL and Relational Theory. How to Write Accurate SQL Code, O'Reilly Media Inc., 2009

[13] Mistry R., Misner S., Introducing Microsoft SQL Server 2008 R2, Microsoft Press, 2010

[14] Maksimov V., Kozlenko L. A., Markin C. P., Bojchenko I. A., “Zashchishchennaya relyacionnaya SUBD Linter”, Otkrytye sistemy. SUBD, 1999, no. 11–12 (in Russian)

[15] Tanenbaum A. S., Bos H., Modern Operating Systems, 4 ed., Pearson Education, Inc., 2015