TIFF: Where Theory and Practice Collide
This is an article about trivia surrounding the TIFF file format and what happens when widely available applications don’t behave well.
TIFF is a container format that is designed to house images compressed by various means as well as metadata surrounding the image. It is designed to represent many different data formats and data types. The problem is that it is a very detailed specification and there may be dozens of ways to represent the same image. Just to start, data may be stored in either little endian or big endian format, instantly doubling the number of possible ways an image can be stored.
One particular variation within a TIFF is the notion of the photometric interpretation of image data. In short, where the minimum value for a pixel is black or the minimum is white. It turns out that there is a bug in the Microsoft Picture and Fax Viewer that makes an egregious error when a multipage TIFF contains a black and white image with one page with a “min is black” representation. It will display all pages incorrectly inverted, whether or not they are “min is black.”
I created a test app that looks like this:
AtalaImage blackIsZero = GetBlackIsZeroImage();
AtalaImage whiteIsZero = GetWhiteIsZeroImage();
SaveIntoTiff(whiteIsZero, blackIsZero, "nonnormalized.tif");
NormalizeToMinIsWhite(whiteIsZero);
NormalizeToMinIsWhite(blackIsZero);
SaveIntoTiff(whiteIsZero, blackIsZero, "minwhitenormalized.tif");
NormalizeToMinIsBlack(whiteIsZero);
NormalizeToMinIsBlack(blackIsZero);
SaveIntoTiff(whiteIsZero, blackIsZero, "minblacknormalized.tif");
GetBlackIsZeroImage() and GetWhiteIsZeroImage() both return the same image, just with opposite senses of the photometric interpretation of black and white. In this case, the image should be a white square with a black cross on it, and each file should represent two pages of the same image. DotImage opens and displays every page in every file correctly, but if you open nonnormalized.tif or minblacknormalized.tif with the Picture and Fax Viewer, it will display all pages as a black background with a white cross. In the case of nonnormalized.tif, it doesn’t matter in which order the pages are added, the result is the same: two incorrect pages.
It would be nice to be able to take the high road and simply say, “we’re right, Microsoft is wrong.” The problem is that the Picture and Fax Viewer is installed on every machine with Windows XP and dotImage is not. Instead, we have to take the pragmatic view that if you generate TIFF files with black and white pages, then we recommended that the pages should be encoded in min-is-white format for maximum compatibility. This is sad, because the TIFF spec is very flexible in allowing document producers to choose the format that is easiest for them. The onus is on the consumer of TIFF data to interpret it correctly. This is a very hard task (after all, we do it), and I don’t harbor any anger for MS not getting it right in all cases, but it is frustrating to have to compensate for it in a very common case.
Therefore, here is code to “correct” a 1-bit black and white image so that it is min-is-white:
public static void NormalizeToMinIsWhite(AtalaImage image)
{
Normalize(image, true);
}
static void Normalize(AtalaImage image, bool minIsWhite)
{
if (image == null)
throw new ArgumentNullException("image");
if (image.PixelFormat != PixelFormat.Pixel1bppIndexed)
return;
bool isWhiteZero = IsWhiteIsZero(image);
if ((minIsWhite && isWhiteZero) || (!minIsWhite && !isWhiteZero))
return;
InvertCommand invert = new InvertCommand();
if (!invert.InPlaceProcessing)
throw new Exception("This should never happen; we're really depending on this being in place processing.");
invert.Apply(image);
Color col0 = image.Palette.GetEntry(0);
image.Palette.SetEntry(0, image.Palette.GetEntry(1));
image.Palette.SetEntry(1, col0);
}
static bool IsWhiteIsZero(AtalaImage image)
{
// sleazy - get the luminance of the palette 0 and 1 and compare, brighter is more white than black
int luma0 = GetLuminance(image, 0);
int luma1 = GetLuminance(image, 1);
return luma0 > luma1;
}
static int GetLuminance(AtalaImage image, int paletteIndex)
{
Color c = image.Palette.GetEntry(paletteIndex);
return (int)(c.R * 0.3 + c.G * 0.59 + c.B * 0.11);
}
NormalizeToMinIsWhite() may change the given image by inverting it and swapping the palette, so use it cautiously.