For many complicated real life systems, we can measure many different
characteristics: e.g., to describe the state of a human body, we can
measure temperature, blood pressure, blood sugar contents, etc. If the
resulting characteristics $x_1,\ldots,x_n$ were independent of each
other, i.e., if the set of possible combinations $(x_1,...,x_n)$ was a cube
in $n-$dimensional space, then it would have been extremely difficult
to design a corresponding control. Luckily, the characteristics are
* not* independent. Combinations that are actually possible form a
small subset of this cube, and we can usually classify these
combinations into few groups ({\it clusters}); e.g., for a human body,
clusters correspond, crudely speaking, to different illnesses.

If we know clusters, then the control is very simple: we just pick an appropriate control for each cluster, and then, for each set of measurement results, find what cluster it belongs to ({\it diagnose} the situation), and then apply the control corresponding to this cluster.

The main problem is, therefore, to find the clusters. There exist many classification algorithms. A cluster is a set of points in $n-$dimensional space that are close to each other; therefore, classification algorithms usually operate with {\it distances} between the points $\vec x=(x_1\,\ldots,x_n)$.

The larger the dimension $n$ of the characteristic space, the more data the algorithm needs to process, and hence, the slower the classification. It is, therefore, desirable to decrease the dimension $n$.

In principle, the amount of information to process can be decreased if we take into consideration that measurements are usually imprecise; therefore, we do not measure the values $x_i$ with absolute accuracy, and hence, we do not get the distances precisely. So, we would like to map the original space $R^n$ into a lower-dimensional space so that distances will be approximately preserved.

Let's look for a linear map $(x_1,...,x_n)\to\sum x_ie_i$ for some vectors $e_i$ from the desired lower-dimensional space. This mapping preserves distance iff $e_i\cdot e_j=\delta_{ij}$ for all $i$ and $j$ (here, $\delta_{ij}=1$ when $i=j$ and $=0$ otherwise). We want to preserve the distance with some accuracy $\varepsilon>0$. So, we want to find $e_i$ for which $|e_i\cdot e_j-\delta_{ij}|\le\varepsilon$ for all $i$ and $j$.

In the paper under review, it is shown that for a fixed $\varepsilon$, for every $m$, in an $m-$dimensional space, we can choose $n(m)$ vectors $e_i$ with this property, where $n(m)$ grows exponentially with $m$ (i.e., $n(m)\sim a^m$ for some $a(\varepsilon)>1$). Thus, for fixed accuracy, the size of the state space can be drastically reduced.

The inequality $|e_i\cdot e_j-\delta_{ij}|\le\varepsilon$ only guarantees that the distances between unit coordinate vectors are approximately preserved. What we need is an estimate for arbitrary vectors $\vec x$ and $\vec y$, i.e., that $\sum (x_i-y_i)^2\approx (\sum (x_i-y_i)e_i)^2$. This estimate can be obtained if we take into consideration that objects usually have few abnormal ($\ne$ average) characteristics. Let's denote the maximal possible number of such abnormal characteristics by $A$. We can re-scale the characteristics so that "normal" would mean $x_i=0$, and deviations are $\le 1$. Then, for the difference $d_i=x_i-y_i$, we get $$|\,|\sum d_ie_i|^2-\sum d_i^2|=|\sum d_id_ie_i\cdot e_j-\sum d_id_j\delta_{ij}|\le $$ $$\sum_{i,j}|d_i|\cdot |d_j|\cdot |e_i\cdot e_j-\delta_{ij}|\le\sum |d_i|\cdot |d_j|\varepsilon=\varepsilon(\sum |d_i|)^2.$$ Since each of the two vectors $x_i$, $y_i$ has at most $A$ non-zero components of size $\le 1$, we have $\sum |x_i|\le A$, $\sum |y_i|\le A$, and hence, $\sum |d_i|=\sum|x_i-y_i|\le 2A$. Hence, $$|\,|\sum d_ie_i|^2-\sum d_i^2|\le 4 A^2\varepsilon.$$

A commentary by V. Kreinovich