Article 3PXAN Optimal low-rank matrix approximation

Optimal low-rank matrix approximation

by
John
from John D. Cook on (#3PXAN)
Matrix compression

Suppose you have an m by n matrix A, where m and n are very large, that you'd like to compress. That is, you'd like to come up with an approximation of A that takes less data to describe.

For example, consider a high resolution photo that as a matrix of gray scale values. An approximation to the matrix could be a lower resolution representation that takes less space.

I heard a rumor many years ago that space probes would compress images by interpreting an image as a matrix and sending back a few eigenvalues and eigenvectors. That sounded fascinating, but what about images that aren't square? If a matrix M is not square, then you can't have Mx = Ix for a scalar I because the left and right sides of the equation would have different dimensions.

Truncated SVD

Approximate a rectangular matrix requires using something more general than eigenvalues and eigenvectors, and that is singular values and singular vectors.

Suppose our matrix A has the singular value decomposition

SVD_A.svg

This can be written as

svd_sum.svg

where the Ifi are the singular values and p is the number of singular values that are non-zero. The ui and vi are the ith columns of U and V respectively. Singular values are nonnegative, and listed in decreasing order.

We can form an approximation to A by truncating the sum above. Instead of taking all the singular values, and their corresponding left and right singular vectors, we only take the k largest singular values and their corresponding vectors.

svd_truncated.svg

This is called the truncated SVD.

We started by assuming A had dimensions m by n and that both were large. Storing A requires mn numbers. Storing Ak requires only k(1 + m + n) numbers. This is a big savings if k is much smaller than m and n.

Eckart-Young theorem

So how good an approximation is Ak ? Turns out it is optimal, in the least squares sense. This is the content of the Eckart-Young theorem. It says that the best least squares (2-norm) approximation of A by a rank k matrix is given by Ak.

Not only that, the theorem says the 2-norm error is given by the first singular value that we didn't use, i.e.

svd_truncated_error.svg

Related post

Singular value decomposition and pseudoinverse

_ubICBWpkEw
External Content
Source RSS or Atom Feed
Feed Location http://feeds.feedburner.com/TheEndeavour?format=xml
Feed Title John D. Cook
Feed Link https://www.johndcook.com/blog
Reply 0 comments