Limit Your Memoization, Please
Memoization is a great pattern. We’ve known about it for years outside of functional programming. It is essentially a combination of a cache and a function pointer. For some input, x, you look up x in the cache and if it’s there, you return the found value. If not, you call the function on x, cache the result then return it. This is especially exciting when you look at F# where working with functions as data is natural.
Referring to Don Syme’s and Matthew Podwysocki’s blogs, you can find some canonical examples of memoization. Each of these, and the version in Chris Smith’s Programming F# book are based on Dictionary objects and end up looking like this:
let memoize (f: 'a -> 'b)=
let cache = new Dictionary<_, _>()
fun x ->
let ok, res = cache.TryGetValue(x)
if ok then res
else
let res = f x
cache.[x] <- res
res
which is OK as far as demonstrating how memoization works for a CS 200 level class or demonstrating the power of using memoization on expensive functions. It is, however, a solidly poor choice when dealing with real world problems.
Therefore, I’m going to present a real-world problem that lends itself to memoization and is not something you would ever, ever want to ship to clients using the above pattern.
Since we’re in image processing, we care a lot about how our algorithms perform. Image processing is a funny domain because things that you might dismiss as cheap blow up when then need to be performing hundreds of millions of times – especially things that involve memory.
Consider this problem – I have a 24 bit continuous color image that I would like represented as an 8 bit paletted image, with a specific palette applied. To do this, I need a function that given a color gives me the best matching entry in that palette.
Working with Color objects can be a bit of a bummer and having to do abstract Palette look ups will also slow us down, so we’ll build an array of tuples that matches a palette:
let inline tupledColor (c:Color) = ((int c.B), (int c.G), (int c.R))
let paletteToTupleArray (sourcePalette:Palette) =
let limit = sourcePalette.Colors-1
let result:(int*int*int)[] = Array.zeroCreate sourcePalette.Colors
for i in 0..limit do
result.[i] <- sourcePalette.GetEntry(i) |> tupledColor
result
I’m also promoting the byte values of the colors to ints since in .NET bytes and ints are internally 4 bytes *anyway* and the math with be less bogged down.
For matching two colors, we need the distance between them. For this problem, using Euclidean distance is good enough, but we’ll use the square of the distance to avoid square roots:
let inline delta a b = a - b
let inline colorDistanceSquared (b1, g1, r1) (b2, g2, r2) =
let db = delta b1 b2
let dg = delta g1 g2
let dr = delta r1 r2
db * db + dg * dg + dr * dr
and in this, you can see one of the reasons I like F# – inline functions. I can write my delta function and use it without the cost of a runtime function call and it makes the color distance function (itself inline) easier to read.
Now we have a function to look up the color. This is a tail-recursive helper function that returns the closest match. bestDist and bestIndex are accumulators of the best results so far, i is our current index, color is the color we’re searching for and colors is our palette. Note that I could eliminate the limit parameter and use colors.Length, but I’d rather use a local than a member. There’s probably a clever way to do with with fold or find, but I think this is good enough:
let rec findBestColorLookup bestDist bestIndex i limit (color:int*int*int) (colors:(int*int*int)[]) =
if i >= limit then bestIndex
else
let dist = colorDistanceSquared color colors.[i]
if dist = 0 then i
elif dist < bestDist then findBestColorLookup dist i (i + 1) limit color colors
else findBestColorLookup bestDist bestIndex (i+1) limit color colors
Then our final findBestColor looks like this:
let findBestColor (b, g, r) (colors:(int*int*int)[]) = findBestColorLookup worstDistance 0 0 colors.Length (b, g, r) colors
Now I can define a type to act as a color matcher for a given palette:
type colorMatcher (sourcePalette:Palette) =
let colors = paletteToTupleArray sourcePalette
member this.matchColor (b, g, r) =
findBestColor (b, g, r) colors
which is nice, but will run like a dog, running the palette array. This is a perfect time to use memoizing to make it faster. Here’s the problem – in a 24 bit image, you will have a possible 224 colors to match. Even a moderate subset of that will be heinously expensive in a memoized version of this function. The expense comes from the cost of expanding the underlying hashtable every time you overload it. So let’s extend the memoization pattern to include a clamp to keep this from going berserk:
let clampedMemoize (f: 'a -> 'b) limit =
let cache = new Dictionary<_, _>()
fun x ->
let ok, res = cache.TryGetValue(x)
if ok then res
else
let res = f x
if cache.Count < limit then cache.[x] <- res
res
which now will ignore the cache when you start crossing the limit. This lets us keep memory use from getting out of control. Now I can rewrite my color matcher with memoization:
type colorMatcher (sourcePalette:Palette) =
let colors = paletteToTupleArray sourcePalette
let memoFind = clampedMemoize (fun (b, g, r) -> findBestColor (b, g, r) colors) paletteCacheLimit
member this.matchColor (b, g, r) =
memoFind (b, g, r)
Now if I do some refactoring to allow partial function application I can neaten up the code a bit:
let findBestColor (colors:(int*int*int)[]) (b, g, r) = findBestColorLookup worstDistance 0 0 colors.Length (b, g, r) colors
type colorMatcher (sourcePalette:Palette) =
let colors = paletteToTupleArray sourcePalette
let memoFind = clampedMemoize (findBestColor colors) paletteCacheLimit
member this.matchColor (b, g, r) =
memoFind (b, g, r)
one thing is left to investigate – we’re hashing a tuple. We don’t have to. We could rebind those bits into a single int and use that as a key in our dictionary. This is a way of saying, “hey, I know more about this problem domain than you do, so let me do the hashing. Here’s a clamped memoizer with an outboard hash:
let clampedMemoizeWithHasher (f: 'a -> 'b) (h: 'a -> int) limit =
let cache = new Dictionary<int, _>()
fun x ->
let hash = h x
let ok, res = cache.TryGetValue(hash)
if ok then res
else
let res = f x
if cache.Count < limit then cache.[hash] <- res
res
At this point as well, this type of solution might benefit from having the hashed color put into a balanced binary tree instead of hashing.
So the lesson in this (and in many CS problems) is that you need to really understand the specific domain of your problem before applying a sweeping technique like memoization.