Steve's Tech Talk : Building Pure Managed DotImage

Building Pure Managed DotImage

We recently released DotImage 10 and if you’ve been following us for a while, you know that we are committed to building the best .NET imaging components. Since I started at Atalasoft, I have been looking at the issue of making as much of our internals in entirely managed code. From the very beginning, it was a daunting task. I estimated that to do the entire project would be around three person years of engineering time. This begs the question, “is it really worth it?”

On the down side:

managed code runs, on average, 1.5x slower than unmanaged code. In image processing, this time stacks up quickly. Operations are routinely repeated billions of times, so we really have to keep an eye on costs.
Translating to managed code appears to not add much value as the code appears to be the same – no features added

On the plus side:

dotImage will run on the client in Silverlight applications as well as hosted .NET where unmanaged code isn’t allowed
Managed code is far more stable – array bounds checking alone is a big win
Managed code is simpler to author to and simpler to deploy
Managed code is future proof on different processors/OSes
Managed code is easier to scale across multiple cores/CPUs

How did we do this? I’ve been playing a chess game with our API over the past 5 years. I have been carefully and strategically refactoring our image processing code to use slightly different abstractions. In DotImage 4.0, I refactored the ImageCommand class to turn image processing into a boiler plate task. In DotImage 5.0, I added the notion of PixelAccessors and PixelMemory abstractions in order to address memory. In DotImage 9.0, I fully deprecated the ImageData IntPtr for accessing PixelMemory (I think this is the first API that I ever “broke” on purpose in dotImage). In DotImage 9.0, I added the notion of Locking/Unlocking PixelMemory before accessing it and formalized the process of how memory is allocated. All of these steps are were necessary to provide a platform base that was ready for being built against a managed platform.

This laid the ground work. The next step was to apply a set of porting strategies. My goal was to reuse as much code as possible in this port. This meant using our regular C# code (slam dunk), removing any unsafe code, porting C/C++ (and in some cases choosing to use the new code in both managed and unmanaged ports), adapting APIs when possible (Silverlight doesn’t have System.Drawing, which means no Rectangle, Point, Size or Color objects), writing unit tests that ensure that output matches, running benchmarks to find and eliminate bottlenecks, and so on.

Finally, there is a new secret weapon in our arsenal. A fair amount of DotImage 10 is written in F#.

Yes, you read that right. When Rick Minerich was working here, he became an F# MVP and is a big proponent of F#. I evaluated it and found that for certain tasks it was better than C# in terms of performance while maintaining suitable readability and for certain tasks, it leant itself better to certain algorithms (OctTree based color quantization stands out). It was far from perfect. For example, I wrote a very straight forward LRU scanline cache in F# based on internal types and found that the cache was bottlenecking in the generic aggregate classes I was using. There was nothing in my control that I could do to make those internal classes perform better other than to reimplement the cache in side-effect-heavy C#. After doing that, the cost of cache operations in my benchmarks vanished, which is how it should be.

On the good side, we were able to heavily leverage inline functions in F# to get code like this:

    // given a palette, generate a luminance 16 bit array of bytes
    let luminance16ArrayFromPalette (p:Palette) =
        [|
            for i in 0..(p.Colors - 1) do
                let lum = (byte (p.GetEntry(i) |> tupledColor |> luminance)) |> eightBitTo16Bits
                yield [| loByte lum; hiByte lum; |]
                    |]

This is code that, given a Palette, gives me back a lookup table of 16 bit gray equivalents. tupleColor converts a Color object into a tuple, luminance calculates the luminance value of a tupled color, eightBitTo16Bits converts an 8 bit pixel value into a 16 bit pixel value. loByte and hiByte do exactly what it says on the box. Since each of these are inlines, the F# optimizer can actually do something useful with the code. In my experience so far, the C# optimizer doesn’t really do much, if anything. So why do we care about this? It’s that lurking 1.5x managed code cost. In my measurements, C#->IL->target CPU does about 1.5x the work of C++->target CPU. Quite honestly, for a virtual language to a virtual machine, this is a very low cost. By using F#, we were able to address this cost by using inlining, code profiling, scanline caching, memoization and other techniques. In many cases we ended up with code that ran in equivalent time to C++ code or in some cases faster.

You have to be aware of costs. For example, in considering the task of converting an image in one pixel format to another, you might think that the right way is to write generic pixel sequence that given a function maps source pixel format values to dest pixel format values. Then you just write 132 tiny functions that get selected for passing in. The problem is that is cost of the sequence and the cost of the repeated function application are too high. When I say too high, I mean that they show up in the bench marks at all. In most operations, I found that operating on a scanline basis was the appropriate approach. So a typical pixel format transform looks like this:

let destImage = new AtalaImage(width, sourceImage.Height, targetPixelFormat)
let (srcRow:byte[]) = Array.zeroCreate(sourceImage.RowStride)
let (rowTransform:PixelTransformFunction) = this.getPixelTransform sourceImage destImage
use paSrc = sourceImage.PixelMemory.AcquirePixelAccessor()
use paDst = destImage.PixelMemory.AcquirePixelAccessor()
for y in 0..yLimit do
    paSrc.GetReadOnlyScanline(y, srcRow)
    let destRow = paDst.AcquireNextScanline()
    rowTransform srcRow destRow width
destImage

rowTransform is a function that operates on a source row of bytes, a dest row of bytes and the width of the row in pixels. getPixelTransform is a match that uses partial function application to return a uniform transform function.

For example, to transform an 8 bit gray row to a 24 bit color row, I used the following function:

member private this.grayToBgr (colors:byte[][]) (srcRow:byte[]) (destRow:byte[]) (width:int) =
    let bytesPerPixel = colors.[0].Length
    for x in 0..(width-1) do
        let destX = x * bytesPerPixel
        array.Copy(colors.[(int srcRow.[x])], 0, destRow, destX, bytesPerPixel)

you’ll note that this function has more arguments than what I described. That’s OK because in in getPixelTransform, this function is bound to the color look up table for the first argument, leaving a function with the signature that we want. This function also works for gray to bgra, gray to 16 bit per channel bgr and gray to 16 bit per channel bgra.

I’m sure F# experts will cringe at my code (loops instead of sequences or recursion), but sometimes a loop is really just a loop no matter how you dress it up. Where things got really nice was in pushing the functional aspect of the language, which I blogged about earlier on the importance of limiting memoization.

At this point, I will issue a formal apology for all the arguments I had with my CS professors who pushed functional programming heavily. I pushed back. Hard. The main reason I was so stubborn about FP was not that I didn’t understand the techniques nor the gains in potential reliability of code. The ends simply didn’t justify the cost of the means in my mind. At this point, I believe that the technologies have finally leveled, and in a statically typed functional language you are finally on even or better ground than an imperative language.

This is not to say that we didn’t have issues with F#. I found several compiler bugs for which we got some quick work-arounds from Don Syme and his team. I also found some interesting .NET interoperability challenges, but in the end I was able to meet one of my prime rules for picking F#: any object written in F# should be method signature identical to the C# equivalent so that our customers shouldn’t need to know or care about the .NET language implementation choice under the hood. The code should work, it should work well and with no surprises.

This release is just the beginning. DotImage 10 is great and it will be getting better. Trust me.

Published Monday, August 01, 2011 1:35 PM by Steve Hawley

Comments

Wednesday, August 17, 2011 10:10 PM by RickM

# re: Building Pure Managed DotImage

Hi Steve,

I'm glad to hear that F# worked out well for you in your journey towards a fully unmanaged DotImage 10.0. I'd love to see a post with some benchmarks of where it's performance is both good and bad. In the best case the community might be able to point out ways to make it faster, in the worst we all must just learn a thing or two about writing fast F#.

Cheers!

-Rick

Anonymous comments are disabled

Steve's Tech Talk

This Blog

Syndication

Search

Navigation

Tags

Recent Posts

Archives

Building Pure Managed DotImage

Comments

# re: Building Pure Managed DotImage