Theoretically effective asymptotically optimal universal coding of partially defined sources
Prikladnaâ diskretnaâ matematika, no. 1 (2020), pp. 30-56.

Voir la notice de l'article provenant de la source Math-Net.Ru

Let $A_0=\{a_i: i\in M\}$ be a finite alphabet of basic symbols, $A=\{a_T: T\in\mathcal{T}\}$, $\mathcal{T}\subseteq 2^M$, — an alphabet of underdetermined symbols. Any symbol $a_i$, $i\in T$, is considered to be a specification of the symbol $a_T$. The symbol $a_M$, denoted by $*$, is called indefinite. An underdetermined source $X$ generates symbols $a_T\in A$ independently with probabilities $p_T$. The entropy of the source $X$ is the quantity $$ \mathcal{H}(X)= \min_Q\left(-\textstyle\sum\limits_{T\in\mathcal{T}} p_T\log\sum\limits_{i\in T}q_i\right), $$ where the minimum is taken over the set of probability vectors $Q=(q_i, i\in M)$, $\log x=\log_2x$. For source $X$, we consider separable block coding with a block length of $n$. Encoding $K$ of underdetermined words $v\in A^n$ should ensure that the code $K(v)$ of word $v$ allow one to recover some specification of $v$. Coding is universal if it is independent of the probabilities $p_T$, $T\in\mathcal{T}$. Characteristic of coding quality is average code length $$ \bar l^{(n)}=\frac{1}{n}\textstyle\sum\limits_{v\in A^n}p(v)|K(v)|, $$ where $p(v)=p_{T_1}\ldots p_{T_n}$ is the probability of the appearance of the word $v=a_{T_1}\ldots a_{T_n}$ at the source output, $|K(v)|$ is the codeword length for $v$. Earlier, the author established that for arbitrary source $X$ and for any block coding method, the inequality $\bar l^{(n)}\ge\mathcal{H}(X)$ holds, and that there is universal block coding, for which $\bar l^{(n)}\le\mathcal{H}(X)+\text{O}\Bigl(\dfrac{\log n}{n}\Bigr)$. The upper bound here is obtained by random coding which only establishes existence of the estimation, but does not provide an appropriate algorithm. Important for applications are issues of complexity of procedures. Coding method considered theoretically effective if the complexities of coding and decoding are estimated by a some polynomial of the size $n$ of problem. This paper presents a polynomial coding method for underdetermined sources whose alphabet has the form $A=A_0\cup\{*\}=\{a_0, a_1,\ldots,a_{m-1},*\}$. Such sources are called partially defined. For them, entropy can be explicitly expressed: $$ \mathcal{H}(P)=(1-p_*)\log(1-p_*)-\textstyle\sum\limits_{i=0}^{m-1}p_i\log p_i. $$ The main content of the paper is the proof of the following result. For a partially defined source $X$, there is an universal method of block coding that provides an estimate of the average length of the code $$ \bar l^{(n)}\le\mathcal{H}(P) + \text{O}\Bigl(\frac{\log \log n}{\log^{1/2}n}\Bigr) $$ and allows you to encode and decode using RAM-programs with complexity O$(n^2)$. An analogue of this result is also valid for other types of undetermined sources whose entropy is explicitly representable.
Keywords: underdetermined source, partially defined source, universal coding, polynomial method, source entropy, quasientropy of a word, frequency class, combinatorial entropy, representative set.
@article{PDM_2020_1_a3,
     author = {L. A. Sholomov},
     title = {Theoretically effective asymptotically optimal universal coding of partially defined sources},
     journal = {Prikladna\^a diskretna\^a matematika},
     pages = {30--56},
     publisher = {mathdoc},
     number = {1},
     year = {2020},
     language = {ru},
     url = {http://geodesic.mathdoc.fr/item/PDM_2020_1_a3/}
}
TY  - JOUR
AU  - L. A. Sholomov
TI  - Theoretically effective asymptotically optimal universal coding of partially defined sources
JO  - Prikladnaâ diskretnaâ matematika
PY  - 2020
SP  - 30
EP  - 56
IS  - 1
PB  - mathdoc
UR  - http://geodesic.mathdoc.fr/item/PDM_2020_1_a3/
LA  - ru
ID  - PDM_2020_1_a3
ER  - 
%0 Journal Article
%A L. A. Sholomov
%T Theoretically effective asymptotically optimal universal coding of partially defined sources
%J Prikladnaâ diskretnaâ matematika
%D 2020
%P 30-56
%N 1
%I mathdoc
%U http://geodesic.mathdoc.fr/item/PDM_2020_1_a3/
%G ru
%F PDM_2020_1_a3
L. A. Sholomov. Theoretically effective asymptotically optimal universal coding of partially defined sources. Prikladnaâ diskretnaâ matematika, no. 1 (2020), pp. 30-56. http://geodesic.mathdoc.fr/item/PDM_2020_1_a3/

[1] Gallager R. G., Information Theory and Reliable Communication, Wiley Publ., N.Y., 1968, 608 pp. | Zbl

[2] Sholomov L. A., “Elements of the underdetermined information theory”, Prikladnaya Diskretnaya Matematika. Prilozhenie, 2009, no. 2, 18–42 (in Russian)

[3] Sholomov L. A., “Compression of partial defined information”, Nelineinaya dinamika i upravlenie, 4, Fizmatlit, M., 2004, 377–396 (in Russian)

[4] Aho A., Hopcroft J., Ulman J., The Design and Analisis of Computer Algorithms, Addison-Wesley Publ. Co., 1976, 480 pp. | MR

[5] Sholomov L. A., “Entropy of a system of partially defined sequences with nested domains”, Nelineinaya Dinamika i Upravlenie, 3, Fizmatlit Publ., M., 2003, 305–320 (in Russian)

[6] Kolmogorov A. N., “Three approaches to the quantitative definition of information”, Problems Inform. Transmissions, 1965, no. 1, 1–7 | MR | Zbl

[7] Chashkin A. V., Discrete Mathematics, Academia, M., 2012, 352 pp. (in Russian)

[8] Shannon C. E., “The synthesis of two-terminal switching circuits”, Bell Syst. Techn. J., 28:1 (1949), 59–98 | DOI | MR

[9] Lupanov O. B., “On a certain approach to the synthesis of control systems — the principle of local coding”, Problemy Kibernetiki, 14, Nauka Publ., M., 1965, 31–110 (in Russian)

[10] Nechiporuk E. I., “Complexity of gating circuits which are realized by Boolean matrices with undetermined elements”, Soviet Phis. Dokl., 1966, no. 10, 591–593

[11] Sholomov L. A., “Realization of partial Boolean functions by circuits from functional elements.”, Systems Theory Res., 1971, no. 21, 211–223 | MR | Zbl

[12] Andreev A. E., “On the complexity of the realization of Boolean functions by circuits of functional elements”, Discrete Math. Appl., 1:3 (1991), 251–261 | DOI | MR | Zbl | Zbl

[13] Chashkin A. V., “Methods for computing partial Boolean functions”, Diskretnye Modeli v Teorii Upravlyayushchih Sistem, VII Mezhdunarodnaya Konferentsiya, MAKS Press, M., 2006, 390–404 (in Russian)

[14] Sholomov L. A., “On functionals characterizing the complexity of systems of undetermined Boolean functions”, Systems Theory Res., 1970, no. 20, 123–140 | MR

[15] Andreev A. E., Clementi A. E. F., Rolim J. D. P., “Worst-case hardness suffices for derandomization: A new method for hardness–randomness trade-offs”, LNCS, 1256, 1997, 177–187 | MR | Zbl

[16] Miltersen P. B., On the Shannon function for partially defined Boolean functions, , 1999 http://www.brics.dk/b̃romille/Papers/index.html

[17] Madatyan H. A., “On the implementation of not everywhere defined $k$-valued matrices of a given “density” by valve circuits of depth two”, Metody Diskretnogo Analiza v Teorii Bulevyx Funktsiy i Skhem, 35, Institute of Mathematics Publ. House, Novosibirsk, 1980, 71–82 (in Russian) | MR | Zbl

[18] Andreev A. E., Complexity of Nondeterministic Functions, BRICS report. Ser. RS-94-2, Febr. 1994, 47 pp.

[19] Chashkin A. V., “Computing of underdetermined functions”, Sbornik Lektsiy Molodezhnyh Nauchnyh Shkol, Diskretnaya Matematika i ee Prilozheniya, VI, Keldysh Inst. of Appl. Math. Publ. House, M., 2011, 29–40 (in Russian)

[20] Krichevsky R. E., “Occam's razor, partially specified Boolean functions, string matching, and independent sets”, Information and Computation, 1994, no. 108, 158–174 | DOI | MR | Zbl

[21] Krichevsky R., Universal Compression and Retrieval, Kluwer Acad. Publ., 2010, 219 pp. | MR | MR

[22] Sholomov L. A., “Encoding of partially defined discrete memoryless sources”, Doclady Mathematics, 70:1 (2004), 651–653 | MR | Zbl

[23] Vasil'ev Yu. L., Glagolev V. V., “Metric properties of disjunctive normal forms”, Discrete Mathematics and Mathematical Problems of Cybernetics, 1, Nauka, M., 1974, 99–148 (in Russian)

[24] Cramer H., Mathematical Methods of Statistics, Prinston University Press, 1946, 575 pp. | MR | Zbl