Introduction
The answer kind of surprised me. The derivation was trivial, once I learned what exactly this matrix is used for. I mean in the big picture of 3D rendering. Being sort of, like a really tiny bit nervous about forgetting all this I decided to write it down.
Transforms for x & y
When I first attempted to derive this matrix
the only thing I knew was that it’s supposed to transform coordinates from
camera frustum into an NDC coordinate space: $[-1,1]\times[-1,1]\times[-1,1]$.
That’s literally the first thing I read on some
answer on SO. Then there was a link to the derivation. I followed it, and got
staggered by a page full of some middle-school algebra. It looked overwhelming.
It wasn’t complicated, but I wasn’t a hundred percent sure I’m going to be able
to follow author’s reasoning. But then I thought: “it’s cool, in its essence
it’s just a transform between two linear subspaces” – and I just read about
those. I felt confident, I had Hoffman & Kuntz on my side (I thought so).
So, I started reading into it. Coasted through
the derivation of formulas for
projections.
I mean from the image above it’s quite clear
why for any given x/y the following holds (I’m gonna give only formula for ‘x’,
for ‘y’ it’s the same):
Note: $z_n$ denotes $n$, or near plane.
However, soon I hit a brick wall. There were line equations, and stuff got plugged into those, in a manner, which I found jarring. I tried to imagine, where these lines would lie, and failed. Have you ever had this desperate feeling, when you want to yell at the monitor: “what the hell does that even mean?!”, but you know that no one will answer? I had been drowning in it for a couple of seconds and then I put two and two together.
By that point I’ve already projected
my $x$ coordinates onto the near plane, and was quite sure that its value had
very little reason of being less then $l$ or more than $r$(or vice a versa
depending on your coordinate system orientation). Also, I knew that NDC “wants”
it to be in between -1 and 1. And then I just happen to know this formula, for
rescaling a value in one range into a new one:
$$val_{new}=\frac{new_{high}-new_{low}}{range_{high}-range_{low}}(val_{range}-range_{low})+new_{low}$$
Here $range$ denotes the input range, which we want to transform from, in our case for $x_g$ coordinate it is $[r,l]$; the $new$ correspondingly denotes the range we are transforming to, in our case it’s NDCs $[-1,1]$.
This formula looks almost exactly like a line equation: $y=mx+b$, except for the fact that $x = (val_{range} – range_{low})$. That’s why I’ve always felt more comfortable thinking about it in terms of ranges, instead of in terms of lines. It goes something like: “suppose I have a value between 1 and 2, but I want it to be between 0 and 1, what should I do?” Obviously, I have to subtract 1 from my input – this way it will automatically lay between 0 and 1. In this manner, toying with simple ranges, I’ve always been able to arrive at the formula above – it’s all about squeezing and stretching and shifting a little to left and a little to the right. I feel like here it’s best to fiddle with the concrete examples, and leave the words like “line equation” out of the picture.
However, reiterating on the example above, how do I know that I should
subtract and scale, instead of, say use a function like:
Or perhaps something, that is less seemingly ludicrous, and has the desired range? Why
would I choose linear transform, out of all the possible functions?
Currently, I
don’t have the exact answer for this question. I’m fairly certain, that it has
to do with the fact that linear transform doesn’t “significantly alter the
behavior" of the function it’s applied to. For instance, if I had a function $f(x)$, and I wanted to get a new function:
$g(x) = m*f(x)+b$,
then the derivative of g would be different from f only by a constant multiplier:
$$\frac{dg}{dx}=m\frac{df}{dx}$$
I’m quite sure that it’s a big deal. I feel like it might be fun to toss the boring rescaling formula out of this derivation, and replace it with something more unstable and fun, but we will stick with what we have for the time being, and do the fun thing in another post.
Anyhow, plugging the equations for $x_n$, $y_n$ in the rescaling formula above yields:
Note: I omitted some of the trivial transformations you need to arrive at this specific form of expressions.
Important Digression
To finish
up the derivation, at this point, we need to recall what we are deriving.
Specifically, we are looking for the matrix representing the following
transformation (spoiler alert!):
We have already
partially filled out this matrix with the expressions to transform $x_g$ and $y_g$ to
the NDC space:
Plugging
them into matrix gives us:
We place the
coefficients so, that multiplication will yield:
Dividing
that by $z_g$ will give us the NDC coordinates. Now, all we have to do is
understand, why in the world would we divide by $z_g$. Or at least find a decent
reason to write it off to. To do so, let’s start with an intriguing question.
Isn’t it peculiar, that we have a four component vector, to represent something in 3D? What is $w$? The answer to this question lies in understanding, what are homogeneous coordinates. Now, understanding this is really quite simple. Imagine a point in space, and a plane, which lies above it. We draw a ray from that point- it will cross the plane at some coordinates: $(x_c,y_c,z_c)$. For further convenience, let’s say, that this plane lies parallel to x/y plane of the orthonormal coordinate basis with the beginning at the said point, and that it satisfies an equation: $z = 1$, in this basis (which together with the point of origin gives an affine coordinate system). All of that would look something like this:
Now, when
we have said and drawn this for our convenience, let’s get back to our
arbitrary ray, which starts from the aforementioned point, and crosses a plane
at some coordinates $(x_c, y_c, z_c)$ (point $A$ on the image above). Since our plane
satisfies the equation $z = 1$, the coordinates of the crossing will be $(x_c, y_c,
1)$. Now let’s take any other point on this line, say $B$, with coordinates: $(x_l,
y_l, z_l)$. It is clear, that if we divide each of these by $z_l$, we’d get $(\frac{x_l}{z_l}, \frac{y_l}{z_l}, 1)$ – a point on our plane. What is a little less clear is
that these will be the coordinates of the very same point $A$. If we take point $C$, and transform its coordinates in a similar manner, we would again, get the
coordinates of $A$. And, even more generally, if we take any point on this line,
and divide its coordinates we would end up in $A$. This property, allows us to
map (bijectively I daresay) (almost) each ray originating in point $O$ to a
certain point on a plane. The coordinates of a point (say $B$ or $C$) belonging to
a certain ray in this construct are called “homogeneous” coordinates (“hom-” –
same, “genos” – kin).
Homogeneous coordinates are
important in 3D graphics since they very naturally model the idea of rendering a
3D scene as being observed from a camera point of view. In this case a point $O$ will be the camera and lines going out of this point – are rays of light, which
intersect some object in a 3D space. Of course, in reality rays of light go
into the camera, instead of going out of it, but we will omit this detail, as
insignificant.
You may
have noticed, how in the discussion of the homogeneous coordinates we were
talking about triplets: $(x_l, y_l, z_l)$. And here is why – x and y coordinates
are enough, if we want a way of mapping rays onto points, however, for rendering
a 3D scene we would usually need a z coordinate. Thus we would need a four component vector.
But the fact that it’s four component doesn’t really change its essence – you
still divide (meaning rendering pipeline does this for you) each of the
components by the fourth one, to get what you need:
This
thought actually gives us the fourth row of the transformation matrix:
And, even more so
– since we know that $z$coordinate will be inevitable divided by $z_g$, we can get
a head start on deriving expression for $z$ coordinate:
Note: later by the application of the same rescaling formula we used to get NDC coordinates, x/y coords will be rescaled from $\left[-1,1\right]\times \left[-1,1\right]$ to $\left[0,screen_{width}\right]\times\left[0,screen_{height}\right]$. Z coordinate will be used in more esoteric ways to create some calming illusions, of shadows, volume and whatnot.
Wrapping it up
Using the insights we gained in the previous section, let’s find the final row of a matrix, you know, the i,j,k,l in this: $$ \left[\begin{array}& i & j & k & l\end{array}\right]\left[\begin{array}{c}x_{g} \\ y_{g} \\ z_{g} \\ w\end{array}\right] $$Conclusion
This derivation was built kind of backwards – firstly we found the expressions for the projections of $x_g$ and $y_g$ onto the near plane (by the way intrinsically performing perspective projection), then we rescaled them, and only afterwards we have written down what we are looking for, and figured, that we’d have to divide by everything by $z_g$. After that I placed into all of this a concept of homogenous coordinates – to explain the division, and added info about occlusion and shadows, to explain why matrix is 4x4 (if you omit depth test, z coordinate is no longer needed and the matrix would be 3x3). There is a more direct way of deriving this matrix – it involves multiplying several scaling matrices. Find it if you prefer a more direct approach. Derive a matrix for orthogonal projection if you like doing homework. Basically, just follow your heart (responsibly).
No comments:
Post a Comment