The Way of Data: Eigenvalues and Eigenvectors
I’ve heard the terms eigenvalues and eigenvectors countless times, but if I’m being honest, I’ve never truly grasped their meaning. That’s kind of a bummer as they seem to appear everywhere from plain mathematics to machine learning which makes it feel like something worth understanding.
This is my attempt to break them down in a way that makes sense, not just mathematically but intuitively. Let’s start from the basics and build up from there.
Introduction
Let’s start with the basics:
- The eigenvalues of a matrix are obtained by solving the following equation , called the characteristic equation, with the identity matrix, and the variable representing the eigenvalues.
- The eigenvectors of are obtained by solving
What does it look like with a basic example ?
Take the following matrix:
- We first compute
Then, the determinant
and its roots which gives us and
Now we want to find and which happen to be
You might be tempted to ask “But what does this actually look like ?”
Before transformation | After transformation |
---|---|
![]() | ![]() |
is our initial eigenvector and is a random vector. See what happens when we multiply both vector by our matrix ? Both vectors are scaled but only rotated.
That’s it ! An eigenvector is a vector that, when multiplied by its corresponding matrix, has its direction unchanged (or reversed).
In truth, we should be talking about the eigenvalues and eigenvectors of a linear transformation which is more general than a mere matrix. We’ll keep our focus on matrices for now.
The intuition behind the characteristic equation
Great, now we understand a bit better what eigenvalues and eigenvectors represent. But why are eigenvalues the roots of this characteristic equation ?
Well, let’s approach our problem the other way around: We want to find a vector that does not undergo rotation when multiplied by . In other words we want to find: We multiply by a scalar as we only care about rotations.
By rearranging the terms we find that this is equivalent to
An obvious solution is the vector
(i.e., the trivial solution).
We do not care about this one, as it does not tell us anything about , and therefore would like this equation to have non-zero solutions. In other words, we want to have singular matrix, that is to have no inverse (Because for any invertible matrix the linear transformation mapping to is bijective). We know that a matrix is singular if and only if its determinant equals zero.
There we have it:
For to have non-zero solutions it must satisfy .
As we wrap-up this introduction one might be tempted to ask what’s next. And I could not really blame them… I mean we just explained what eigenvalues and eigenvectors represent, how to compute them, and why does this computation actually makes sense.
However, think about it for a second, sure it’s cool to know that there exist vectors that, when a given linear transformation is applied to them, do not rotate. But isn’t there more to it ? Why would someone look for such a specific thing without further utility. To that I would answer that the mathematics have their reasons that reason ignores but if it was just for the sake of finding some mathematic bizarerie we would probably not find these two notions in fields like machine learning or physics.