Steve's Tech Talk : Limit Your Memoization, Please

Limit Your Memoization, Please

Memoization is a great pattern. We’ve known about it for years outside of functional programming. It is essentially a combination of a cache and a function pointer. For some input, x, you look up x in the cache and if it’s there, you return the found value. If not, you call the function on x, cache the result then return it. This is especially exciting when you look at F# where working with functions as data is natural.

Referring to Don Syme’s and Matthew Podwysocki’s blogs, you can find some canonical examples of memoization. Each of these, and the version in Chris Smith’s Programming F# book are based on Dictionary objects and end up looking like this:

    let memoize (f: 'a -> 'b)=

        let cache = new Dictionary<_, _>()

        fun x ->

            let ok, res = cache.TryGetValue(x)

            if ok then res

            else

                let res = f x

                cache.[x] <- res

                res

which is OK as far as demonstrating how memoization works for a CS 200 level class or demonstrating the power of using memoization on expensive functions. It is, however, a solidly poor choice when dealing with real world problems.

Therefore, I’m going to present a real-world problem that lends itself to memoization and is not something you would ever, ever want to ship to clients using the above pattern.

Since we’re in image processing, we care a lot about how our algorithms perform. Image processing is a funny domain because things that you might dismiss as cheap blow up when then need to be performing hundreds of millions of times – especially things that involve memory.

Consider this problem – I have a 24 bit continuous color image that I would like represented as an 8 bit paletted image, with a specific palette applied. To do this, I need a function that given a color gives me the best matching entry in that palette.

Working with Color objects can be a bit of a bummer and having to do abstract Palette look ups will also slow us down, so we’ll build an array of tuples that matches a palette:

    let inline tupledColor (c:Color) = ((int c.B), (int c.G), (int c.R))

   

    let paletteToTupleArray (sourcePalette:Palette) =

        let limit = sourcePalette.Colors-1

        let result:(int*int*int)[] = Array.zeroCreate sourcePalette.Colors

        for i in 0..limit do

            result.[i] <- sourcePalette.GetEntry(i) |> tupledColor

        result

I’m also promoting the byte values of the colors to ints since in .NET bytes and ints are internally 4 bytes *anyway* and the math with be less bogged down.

For matching two colors, we need the distance between them. For this problem, using Euclidean distance is good enough, but we’ll use the square of the distance to avoid square roots:

    let inline delta a b = a - b


    let inline colorDistanceSquared (b1, g1, r1) (b2, g2, r2) =

        let db = delta b1 b2

        let dg = delta g1 g2

        let dr = delta r1 r2

        db * db + dg * dg + dr * dr

and in this, you can see one of the reasons I like F# – inline functions. I can write my delta function and use it without the cost of a runtime function call and it makes the color distance function (itself inline) easier to read.

Now we have a function to look up the color. This is a tail-recursive helper function that returns the closest match. bestDist and bestIndex are accumulators of the best results so far, i is our current index, color is the color we’re searching for and colors is our palette. Note that I could eliminate the limit parameter and use colors.Length, but I’d rather use a local than a member. There’s probably a clever way to do with with fold or find, but I think this is good enough:

    let rec findBestColorLookup bestDist bestIndex i limit (color:int*int*int) (colors:(int*int*int)[]) =

        if i >= limit then bestIndex

        else

            let dist = colorDistanceSquared color colors.[i]

            if dist = 0 then i

            elif dist < bestDist then findBestColorLookup dist i (i + 1) limit color colors

            else findBestColorLookup bestDist bestIndex (i+1) limit color colors

Then our final findBestColor looks like this:

    let findBestColor (b, g, r) (colors:(int*int*int)[]) = findBestColorLookup worstDistance 0 0 colors.Length (b, g, r) colors

Now I can define a type to act as a color matcher for a given palette:

    type colorMatcher (sourcePalette:Palette) =

        let colors = paletteToTupleArray sourcePalette 

        member this.matchColor (b, g, r) =

            findBestColor (b, g, r) colors

which is nice, but will run like a dog, running the palette array. This is a perfect time to use memoizing to make it faster. Here’s the problem – in a 24 bit image, you will have a possible 2²⁴ colors to match. Even a moderate subset of that will be heinously expensive in a memoized version of this function. The expense comes from the cost of expanding the underlying hashtable every time you overload it. So let’s extend the memoization pattern to include a clamp to keep this from going berserk:

    let clampedMemoize (f: 'a -> 'b) limit =

        let cache = new Dictionary<_, _>()

        fun x ->

            let ok, res = cache.TryGetValue(x)

            if ok then res

            else

                let res = f x

                if cache.Count < limit then cache.[x] <- res

                res

which now will ignore the cache when you start crossing the limit. This lets us keep memory use from getting out of control. Now I can rewrite my color matcher with memoization:

    type colorMatcher (sourcePalette:Palette) =

        let colors = paletteToTupleArray sourcePalette 

        let memoFind = clampedMemoize (fun (b, g, r) -> findBestColor (b, g, r) colors) paletteCacheLimit

        member this.matchColor (b, g, r) =

            memoFind (b, g, r)

Now if I do some refactoring to allow partial function application I can neaten up the code a bit:

    let findBestColor (colors:(int*int*int)[]) (b, g, r) = findBestColorLookup worstDistance 0 0 colors.Length (b, g, r) colors


    type colorMatcher (sourcePalette:Palette) =

        let colors = paletteToTupleArray sourcePalette 

        let memoFind = clampedMemoize (findBestColor colors) paletteCacheLimit

        member this.matchColor (b, g, r) =

            memoFind (b, g, r)

one thing is left to investigate – we’re hashing a tuple. We don’t have to. We could rebind those bits into a single int and use that as a key in our dictionary. This is a way of saying, “hey, I know more about this problem domain than you do, so let me do the hashing. Here’s a clamped memoizer with an outboard hash:

    let clampedMemoizeWithHasher (f: 'a -> 'b) (h: 'a -> int) limit =

        let cache = new Dictionary<int, _>()

        fun x ->

            let hash = h x

            let ok, res = cache.TryGetValue(hash)

            if ok then res

            else

                let res = f x

                if cache.Count < limit then cache.[hash] <- res

                res

At this point as well, this type of solution might benefit from having the hashed color put into a balanced binary tree instead of hashing.

So the lesson in this (and in many CS problems) is that you need to really understand the specific domain of your problem before applying a sweeping technique like memoization.

Published Monday, July 26, 2010 4:18 PM by Steve Hawley

Comments

Monday, July 26, 2010 5:05 PM by Twitter Trackbacks for Steve's Tech Talk : Limit Your Memoization, Please [atalasoft.com] on Topsy.com

# Twitter Trackbacks for Steve's Tech Talk : Limit Your Memoization, Please [atalasoft.com] on Topsy.com

PingBack from http://topsy.com/www.atalasoft.com/cs/blogs/stevehawley/archive/2010/07/26/limit-your-memoization-please.aspx?utm_source=pingback&utm_campaign=L2

Friday, July 30, 2010 3:05 PM by Rick Minerich's Development Wonderland

# F# Discoveries This Week 07/30/2010

It’s Friday and once again I’ve gathered up more F# content for your viewing pleasure.  If you happen

Anonymous comments are disabled

Steve's Tech Talk

This Blog

Syndication

Search

Navigation

Tags

Recent Posts

Archives

Limit Your Memoization, Please

Comments

# Twitter Trackbacks for Steve's Tech Talk : Limit Your Memoization, Please [atalasoft.com] on Topsy.com

# F# Discoveries This Week 07/30/2010