Efficient Truncated Block-Diagonal SVD with Heap-Based Value Selection
Introduction
Truncated singular value decompositions (SVDs) are a very useful tool in data science, physics and far beyond, since they provide a way to come up with low-rank approximations of some input data, which can even be proven to be optimal under certain conditions. [CITE]
[explain definition of SVD and truncated SVD here]
The singular value decomposition (SVD) is one of the cornerstones of numerical linear algebra.
Given a matrix
These entries are referred to as the singular values of
Why do we care?
The SVD exposes the intrinsic rank structure of a matrix.
By keeping only the largest singular values and discarding the rest, we obtain the best low-rank approximation to
This property makes truncating the SVD invaluable in compression (images, video, audio), denoising, and tensor network algorithms, where we only need the most significant directions in data.
As an example, we can have a look at the singular values of an image of a cup of coffee, and consider the red, green and blue channel separately as a matrix:
We can immediately see that there are some values that are quite large, while there is a relatively large amount of values that are comparatively small.
This is where the truncation or compression comes in, by keeping only the
A straightforward approach might be to simply perform the (untruncated) SVD first, and then post-truncate the resulting
In order to compute the SVD, we can typically make use of LAPACK routines that are widely available, so I’ll assume here that there is a straightforward way to obtain them.
Of interest to us though, is that typically the singular values will be returned in a sorted manner, conventionally from largest to smallest value.
Making use of this makes the truncation rather straightforward, as we may simply discard the smallest (last) values until our desired criterion is reached.
1 | function svd_trunc(A, k) |
1 | function rank_approx(img, k) |
During one of my research projects, I ran into the issue of doing the same analysis of a blockdiagonal input matrix.
This particular structure is interesting since it allows for an efficient implementation of the SVD itself: each of the blocks can be decomposed separately, which could not only be parallellized efficiently, but also greatly reduces the overall cost of the algorithm.
Consider the following example: we compare decomposing a single
The cost of an SVD scales cubically with the size of the input matrices, giving the cost of decomposing
For the blockdiagonal
This leads to a cost
We can of course verify this as well, using random input data as a benchmark:
1 |
There is however a slight hiccup in the truncation now, as the singular values are no longer provided in a sorted manner.
Therefore if we wish to select the largest
We may therefore distill the problem as follows: given
The “Need” for Efficient Truncation
One naive way to solve this issue is found by simply concatenating all the values, sorting that list, and from that ordering destill the number of values for each of the input vectors.
If we ignore some of the overhead and focus only on the scaling, we have to sort an array of
This could look like:
1 |
If you are following along closely, you may have realized that this could really just be the solution, and you could stop reading here.
In particular, the improvements presented below are not expected to actually affect the runtime of the final algorithm.
The truth is that that runtime will be dominated by the time spent computed the SVDs, not computing the truncation, as is already apparent from the difference in scaling.
In fact, the scaling even predicts that for larger input sizes, the effect of the truncation will relatively only diminish.
Nevertheless, if for some reason you are like me, and are somewhat interested in algorithms and data structures, you might agree that we could possibly do better.
I do want to strongly emphasize however that the value here is mostly educational, and the maintenance and possible errors of more sophisticated approaches does not necessarily weigh up to the actual performance increase.
There are two main components to the naive approach that seem somewhat unnecessary, or that might be improved upon.
The first thing to consider is that it might be possible to make use of the structure of the vectors provided, which are each already sorted.
The second consideration is if it is really necessary to first instantiate the full vector of values, to then discard that vector afterwards.
In the next sections we aim to address these parts.
Merge-Sort
Max-Heap
Conclusion
References
Appendix: Code
Efficient Truncated Block-Diagonal SVD with Heap-Based Value Selection
install_url to use ShareThis. Please set it in _config.yml.