Integration of missing data imputation tools for time series in real-time mode into a relational DBMS
Vestnik Ûžno-Uralʹskogo gosudarstvennogo universiteta. Seriâ Vyčislitelʹnaâ matematika i informatika, Tome 14 (2025) no. 1, pp. 30-46
Voir la notice de l'article provenant de la source Math-Net.Ru
The article addresses the problem of integrating time series imputation into relational database management systems (RDBMS). A method called ImputeDB is proposed, which enables the real-time integration of neural network-based imputation models into the PostgreSQL RDBMS. The imputation of missing values is carried out through triggers (stored functions automatically executed by the RDBMS kernel when new data is inserted). When a trigger is activated, missing values are replaced by synthetic ones generated by a neural network model. Using the proposed method, a database application programmer can integrate the process of imputing missing values into the standard time series processing pipeline within the PostgreSQL RDBMS, without relying on external services. The proposed approach includes a set of components implemented as user-defined functions (UDFs) in Python and PL/Python: Trigger Constructor, Model Manager, Model Storage, and Imputer. The Trigger Constructor is used to create triggers that automatically perform imputation of missing values in inserted data. The Model Manager is responsible for training neural network models, while the Model Storage is used to save these models in a file-based repository. The Imputer, in turn, synthesizes missing values using the trained models. Experiments were conducted to evaluate the performance of the ImputeDB method. The experiments measured the processing time of data insertion with automatic gap imputation as a function of the time series dimensionality. Experiments were performed under two scenarios: single and multiple insertions. Neural network-based imputation models with various architectures, including recurrent neural networks, autoencoders, and transformers, were employed. The experimental results demonstrated that under conditions of increasing time series dimensionality and rising overhead from network requests and data transfer, ImputeDB exhibits superior performance. Specifically, the system achieved an efficiency gain of 22.5% compared to another approach, while maintaining the accuracy of the employed imputation methods.
Keywords:
time series, DBMS, PostgreSQL, missing value imputation, neural networks.
@article{VYURV_2025_14_1_a1,
author = {A. A. Yurtin},
title = {Integration of missing data imputation tools for time series in real-time mode into a relational {DBMS}},
journal = {Vestnik \^U\v{z}no-Uralʹskogo gosudarstvennogo universiteta. Seri\^a Vy\v{c}islitelʹna\^a matematika i informatika},
pages = {30--46},
publisher = {mathdoc},
volume = {14},
number = {1},
year = {2025},
language = {ru},
url = {http://geodesic.mathdoc.fr/item/VYURV_2025_14_1_a1/}
}
TY - JOUR AU - A. A. Yurtin TI - Integration of missing data imputation tools for time series in real-time mode into a relational DBMS JO - Vestnik Ûžno-Uralʹskogo gosudarstvennogo universiteta. Seriâ Vyčislitelʹnaâ matematika i informatika PY - 2025 SP - 30 EP - 46 VL - 14 IS - 1 PB - mathdoc UR - http://geodesic.mathdoc.fr/item/VYURV_2025_14_1_a1/ LA - ru ID - VYURV_2025_14_1_a1 ER -
%0 Journal Article %A A. A. Yurtin %T Integration of missing data imputation tools for time series in real-time mode into a relational DBMS %J Vestnik Ûžno-Uralʹskogo gosudarstvennogo universiteta. Seriâ Vyčislitelʹnaâ matematika i informatika %D 2025 %P 30-46 %V 14 %N 1 %I mathdoc %U http://geodesic.mathdoc.fr/item/VYURV_2025_14_1_a1/ %G ru %F VYURV_2025_14_1_a1
A. A. Yurtin. Integration of missing data imputation tools for time series in real-time mode into a relational DBMS. Vestnik Ûžno-Uralʹskogo gosudarstvennogo universiteta. Seriâ Vyčislitelʹnaâ matematika i informatika, Tome 14 (2025) no. 1, pp. 30-46. http://geodesic.mathdoc.fr/item/VYURV_2025_14_1_a1/