I Love Affine Transforms
It’s true. I first discovered them when I was in 10th grade and trying to figure out how to rotate an object. I was using the following fomulae:
x’ = x cos(t) – y sin(t)
y’ = x sin(t) + y cos(t)
This is very close to an affine transform. A full Affine transform looks like this:
x’ = ax + cy + e
y’ = bx + dy + f
The nice thing about this is that you can pop it into a matrix like this:
| a b 0 |
| c d 0 |
| e f 1 |
This is nice because when you have a point, (x, y), you can write it as a matrix, [x y 1] and do matrix multiplication to get x’ and y’.
So why do I love affine transforms? Because they make graphics operations a whole lot easier. Consider PDF pages. A page in PDF consists of a media box and an optional crop box and an orientation that determines how a page should be rendered. Now, PDF says that there are only 4 page orientations, so if you’re thinking that when you’re rendering a page you could just case out the 4 orientations and 2 states of crop boxedness giving you 8 cases of transformations to write. 8 cases – that’s not bad – you can do that. But that’s only half. You’re doing a transformation from PDF page space to, say, image space. You’ll probably want to do the reverse. OK – that’s only another 8 cases. 16 cases – bigger, but manageable, right? Wrong. Trust me – you’ll screw one up.
Let me help out with a picture:
The top page is a conceptual PDF page. The origin is in the lower left. The dashed box is the crop box, showing the imageable area.
The bottom pages show how the top page could be pixel rendered in each of the 4 orientations. In these bottom boxes, the origin is in the upper left. The question is, how do we build the matrix to make this happen?
The first thing I did was write out an affine transform to scale the image from the pixel rendered size to the PDF size:
x’ = xs x + 0y + 0
y’ = 0x + ys y + 0
Then I wrote an Affine transform to transform an upper right origin image to a lower left origin image:
x’’ = 1x’ + 0y’ + cbl
y’’ = 0x’ + –1y’ + (cbh + cbb)
(where cbl, cbh, and cbb are the left margin, total height, and bottom margin of the crop box).
From there, we substitute x’ and y’ into the second equation and simplify:
x’’ = xs x + 0y’ + cbl
y” = 0x’ + –ys y + (cbh + cbb)
What a get from this is a matrix:
| xs 0 0 |
| 0 -ys 0 |
| cbl cbh+cbb 1 |
which will do the transform for me.
Doing similar math gives me the following transforms for each of the rest of the rotations:
x’’ = 0x = ys y + (cbw + cbl)
y’’ = –xs x + (cbh + cbb)
x’’ = –xs x + 0y + (cbl + cbw)
y’’ = 0x + ys y + cbb
x’’ = 0x + ys y + cbl
y’’ = xs x + 0y + cbb
these terms then go into the equations. Notice that whether or not I have a crop box is immaterial. I can choose either crop box (if present) or media box and it’s the same. This means that I’ve taken my 8 cases and reduced it to 4.
Now, there’s an easier way to do this. I could’ve done three matrices: scale, rotate, translate and instead composed the matrices and I would’ve gotten the same answer at the end. Nice. It’s the same process, actually, just treating the equations as matrices and doing the linear algebra.
The final step is to get the remaining 4 cases – the reverse transformation. Well, it turns out that we don’t want to think about those. Sure we could do this same work, but instead we’ll calculate the inverse of the forward transformation using linear algebra. To that, given the following matrix:
| a b c |
| d e f |
| g h i |
You need the the following:
| ie – hf –(ib – hc) fb - ec |
| –(id – gf) ia – gc –(fa – dc) |
| (hd – ge) –(ha – gb) ea - db |
and then multiply the whole matrix by 1/det where det is the determinant of the original matrix. The determinant is defined like this:
det = (a * e * i) + (b * f * g) + (c * d * h) - (a * f * h) - (b * d * i) - (c * e * g)
Now, the nice thing about this is that we know that c and f are 0 and i will be 1, so we can simplify the determinant to this:
det = (a * e) - (b * d)
and our inverse matrix looks like this when we simplify:
| e/det –b/det 0 |
| –d/det a/det 0 |
| (hd-ge)/det –(ha-gb)/det 1 |
And the only check we need to make is to ensure that det is non-zero (divide by zero is BAD!).
The question is, how would be get a 0 determinant? There are a few things that would cause it and they are all degenerate transforms. For example, using a 0 scale or shearing to a line segment. The original points are not recoverable.
With that, I’m going to give you my own matrix transform code which is very similar to the one in .NET, except that it works with doubles as the matrix elements and is a little better about handling many common matrix actions and classifying a given matrix.
public enum TransformType
{
None = 0,
Identity = 0,
Translate,
Scale,
NonUniformScale,
Rotate,
Skew,
Other
}
public class Transform
{
private double[] _arr;
private TransformType _transformType;
public Transform()
{
_arr = new double[6]{1.0, 0.0, 0.0, 1.0, 0.0, 0.0 };
_transformType = TransformType.Identity;
}
public Transform(double[] arr)
{
if (arr.Length != 6)
throw new ArgumentOutOfRangeException("arr");
_arr = arr;
Classify();
}
public Transform(double a, double b, double c, double d, double e, double f)
{
_arr = new double[6]{a, b, c, d, e, f};
Classify();
}
private void Classify()
{
// scale is [ sx 0 0 sy 0 0]
if (_arr[1] == 0.0 && _arr[2] == 0.0 && _arr[4] == 0.0 && _arr[5] == 0.0)
{
if (_arr[0] == _arr[3])
{
if (_arr[0] == 0.0)
_transformType = TransformType.Other;
else if (_arr[0] == 1.0)
{
_transformType = TransformType.Identity;
}
_transformType = TransformType.Scale;
}
else
{
_transformType = TransformType.NonUniformScale;
}
return;
}
// translation is [1 0 0 1 tx ty]
if (_arr[0] == 1.0 && _arr[3] == 1.0 && _arr[1] == 0.0 && _arr[2] == 0.0)
{
_transformType = TransformType.Translate;
return;
}
// rotation is [cos sin -sin cos 0 0]
if (_arr[4] == 0.0 && _arr[5] == 0.0 && _arr[0] == _arr[3] && _arr[1] == -_arr[2])
{
_transformType = TransformType.Rotate;
return;
}
// skew is [1 tan(x) tan(y) 1 0 0]
if (_arr[0] == 1.0 && _arr[3] == 1.0 && _arr[4] == 0.0 && _arr[5] == 0.0)
{
_transformType = TransformType.Skew;
}
}
public double[] Matrix { get { return _arr; } }
public double this[int i] { get { return _arr[i]; } set { _arr[i] = value; } }
public TransformType TransformType { get { Simplify(); return _transformType; } }
public void Transform(double x, double y, out double xprime, out double yprime)
{
xprime = _arr[0] * x + _arr[2] * y + _arr[4];
yprime = _arr[1] * x + _arr[3] * y + _arr[5];
}
public PointF Transform(PointF src)
{
double xp, yp;
Transform(src.X, src.Y, out xp, out yp);
return new PointF((float)xp, (float)yp);
}
public void Concat(Transform t)
{
double a, b, c, d, e, f;
double A, B, C, D, E, F;
a = t[0];
b = t[1];
c = t[2];
d = t[3];
e = t[4];
f = t[5];
A = _arr[0];
B = _arr[1];
C = _arr[2];
D = _arr[3];
E = _arr[4];
F = _arr[5];
_arr[0] = a*A + b*C;
_arr[1] = a*B + b*D;
_arr[2] = c*A + d*C;
_arr[3] = c*B + d*D;
_arr[4] = e*A + f*C;
_arr[5] = e*B + f*D;
Classify();
}
public static Transform Scale(double s) { return new Transform(s, 0.0, 0.0, s, 0.0, 0.0); }
public static Transform Scale(double x, double y) { return new Transform(x, 0.0, 0.0, y, 0.0, 0.0); }
public static Transform Translate(double x, double y) { return new Transform(1.0, 0.0, 0.0, 1.0, x, y); }
public static Transform Rotate(double theta)
{
double ct = Math.Cos(theta); double st = Math.Sin(theta);
return new Transform(ct, st, -st, ct, 0.0, 0.0);
}
public static Transform Skew(double x, double y)
{
return new Transform(1.0, Math.Tan(x), Math.Tan(y), 1.0, 0.0, 0.0);
}
// determinant of a 3x3 matrix
// in reality, our 2x3 matrix is really 3 x 3
// | A B 0 | | a b c |
// | C D 0 | -> | d e f |
// | E F 1 | | g h i |
//
// where a-f correspond to _arr[0]-_arr[5]
public double Determinant()
{
//double a, b, c, d, e, f, g, h, i;
//a = _arr[0];
//b = _arr[1];
//c = 0;
//d = _arr[2];
//e = _arr[3];
//f = 0;
//g = _arr[4];
//h = _arr[5];
//i = 1.0;
// return (a * e * i) + (b * f * g) + (c * d * h) - (a * f * h) - (b * d * i) - (c * e * g);
// simplify:
//return (a * e) - (b * d);
// simplify again:
return (_arr[0] * _arr[3]) - (_arr[1] * _arr[2]);
}
public bool IsInvertable() { return Determinant() != 0.0; }
public Transform GetInverse()
{
double det = Determinant();
if (det == 0.0)
throw new OverflowException("unable to calculate the inverse of the matrix - determinant is 0");
// this is the standard inversion process of a general 3x3 matrix, but
// since c and f are 0 and i is 1, we can eliminate a pile of terms
// by letting them drop out
// it is derived from getting the adjoint matrix and dividing each element
// by the determinant.
// Certain terms drop out natually and this code represents a simplification
// of the process that ignores terms that aren't used in the result.
double a, b, d, e, g, h;
a = _arr[0];
b = _arr[1];
//c = 0;
d = _arr[2];
e = _arr[3];
//f = 0;
g = _arr[4];
h = _arr[5];
//i = 0;
// call the more direct constructor, use fewer temps
return new Transform(new double[] {
e / det, -b / det,
-d / det, a / det,
(h*d - g*e) / det, -(h*a - g*b) / det });
//double A, B, C, D, E, F;
//A = e/det;
//B = -b/det;
//C = -d/det;
//D = a/det;
//E = (h*d - g*e)/det;
//F = -(h*a - g*b)/det;
//return new Transform(A, B, C, D, E, F);
}
}