Lukas Personal Blog

Posted 2025-09-26Updated 2025-10-317 minutes read (About 1061 words)

Efficient Truncated Block-Diagonal SVD with Heap-Based Value Selection

Introduction

Truncated singular value decompositions (SVDs) are a very useful tool in data science, physics and far beyond, since they provide a way to come up with low-rank approximations of some input data, which can even be proven to be optimal under certain conditions. [CITE]
[explain definition of SVD and truncated SVD here]

The singular value decomposition (SVD) is one of the cornerstones of numerical linear algebra.
Given a matrix , the SVD factors it as , where and are orthogonal matrices, and is diagonal with non-negative entries .
These entries are referred to as the singular values of .

Why do we care?
The SVD exposes the intrinsic rank structure of a matrix.
By keeping only the largest singular values and discarding the rest, we obtain the best low-rank approximation to in terms of both the spectral and Frobenius norms, as proven by the Eckart–Young theorem.
This property makes truncating the SVD invaluable in compression (images, video, audio), denoising, and tensor network algorithms, where we only need the most significant directions in data.

As an example, we can have a look at the singular values of an image of a cup of coffee, and consider the red, green and blue channel separately as a matrix:

singular values of a cup of coffee

We can immediately see that there are some values that are quite large, while there is a relatively large amount of values that are comparatively small.
This is where the truncation or compression comes in, by keeping only the largest values we can reduce the required storage, while still keeping the image intact.

SVD-compressed cups of coffee

A straightforward approach might be to simply perform the (untruncated) SVD first, and then post-truncate the resulting , and .
In order to compute the SVD, we can typically make use of LAPACK routines that are widely available, so I’ll assume here that there is a straightforward way to obtain them.
Of interest to us though, is that typically the singular values will be returned in a sorted manner, conventionally from largest to smallest value.
Making use of this makes the truncation rather straightforward, as we may simply discard the smallest (last) values until our desired criterion is reached.

function svd_trunc(A, k)
    U, S, V = svd(A)
    return U[:, 1:k] * Diagonal(S[1:k]) * V[:, 1:k]'
end

function rank_approx(img, k)
    channels = map(eachslice(channelview(img), dims = 1)) do channel
        return clamp01!(svd_trunc(channel, k)) # ensure values are in [0, 1] to retain valid image
    end
    return colorview(RGB, channels...)
end

During one of my research projects, I ran into the issue of doing the same analysis of a blockdiagonal input matrix.
This particular structure is interesting since it allows for an efficient implementation of the SVD itself: each of the blocks can be decomposed separately, which could not only be parallellized efficiently, but also greatly reduces the overall cost of the algorithm.
Consider the following example: we compare decomposing a single matrix with a blockdiagonal matrix of the same size, containing equal-sized blocks of .
The cost of an SVD scales cubically with the size of the input matrices, giving the cost of decomposing as .
For the blockdiagonal however, we have to perform different SVDs, but each of them is smaller.
This leads to a cost , which should thus be a factor faster.
We can of course verify this as well, using random input data as a benchmark:

There is however a slight hiccup in the truncation now, as the singular values are no longer provided in a sorted manner.
Therefore if we wish to select the largest singular values now, we have to do a little bit more work to do the relevant bookkeeping.
We may therefore distill the problem as follows: given vectors of (descending) sorted values, find the (local) number of values to keep for each of the input vectors in order to end up with the (globally) largest values.

The “Need” for Efficient Truncation

One naive way to solve this issue is found by simply concatenating all the values, sorting that list, and from that ordering destill the number of values for each of the input vectors.
If we ignore some of the overhead and focus only on the scaling, we have to sort an array of values, therefore we obtain a cost of .
This could look like:

If you are following along closely, you may have realized that this could really just be the solution, and you could stop reading here.
In particular, the improvements presented below are not expected to actually affect the runtime of the final algorithm.
The truth is that that runtime will be dominated by the time spent computed the SVDs, not computing the truncation, as is already apparent from the difference in scaling.
In fact, the scaling even predicts that for larger input sizes, the effect of the truncation will relatively only diminish.
Nevertheless, if for some reason you are like me, and are somewhat interested in algorithms and data structures, you might agree that we could possibly do better.
I do want to strongly emphasize however that the value here is mostly educational, and the maintenance and possible errors of more sophisticated approaches does not necessarily weigh up to the actual performance increase.

There are two main components to the naive approach that seem somewhat unnecessary, or that might be improved upon.
The first thing to consider is that it might be possible to make use of the structure of the vectors provided, which are each already sorted.
The second consideration is if it is really necessary to first instantiate the full vector of values, to then discard that vector afterwards.
In the next sections we aim to address these parts.

Quick Start

Create a new post

Run server

Generate static files

Deploy to remote sites

Introduction

The “Need” for Efficient Truncation

Merge-Sort

Max-Heap

Conclusion

References

Appendix: Code

Links

Recents

Archives

Tags