Chasing Your Tail in F#
I don’t like the looping constructs in F#. They don’t feel as thought out as the rest of the language and they feel more like Pascal looping constructs, which have their own issues.
In C-like languages, a loop looks like this:
for (start-exp; condition; repeated-exp) { /* ... */ }
which in most cases looks like this in practice:
int sum = 0;
for (int i=0; i < arr.Length; i++)
sum += SomeOperation(arr[i]);
this is an idiomatic expression in C-like languages for “sum the result of a function on an each element of an array”. In F#, you would probably use a fold instead – but ignoring the specifics of this case, let’s look at the F# equivalent loop:
let mutable sum = 0
for i = 0 to arr.Length do
sum <- SomeOperation arr.[i]
Terrific. Except for two things. First, sum is mutable which is something you want to avoid and second, this loop is wrong. It has an off-by-one error in the looping construct since F# for-loops starts and ends are inclusive. This is a Pascal-ism and it works in Pascal, because of the ways that Pascal arrays are declared:
TYPE MyArr = ARRAY[1..10] OF INTEGER;
(* ... *)
VAR I, SUM:INTEGER;
BEGIN
SUM := 0;
FOR I := 1 TO 10 DO
SUM := SUM + SomeOperation(arr[I]);
END;
This works because Pascal arrays are defined in terms of inclusive ranges, not in terms of an overall size. Since F# uses .NET arrays, the arrays will be defined in terms of an overall size. Therefore, the main for-loop construct is inappropriate for the language. I would suggest that F# instead include the following construct:
for i = 0 upto arr.Length do
sum <- SomeOperation arr.[i]
This mates with the native arrays better and is less likely to induce the off-by-one error. The actual choice of keyword is unimportant. The clash in meanings between upto and downto is not lost on me – using ‘until’ would be as good a choice.
Unfortunately, looping constructs in F# do not allow for short circuits. There is neither break nor continue nor is there a way to return a value early from a loop. This is because there is no real return value but rather an implicit equivalence of the value of an expression with the value of the function.
So to work around this issue, when I have loops like this, I write them using tail recursion. I can take the for loop above and write it like this instead:
let rec tailFunc arr i i_end acc =
if i >= i_end then acc
else someFunc arr (i + 1) i_end (acc + SomeOperation(arr.[i]))
tailFunc arr 0 arr.Length 0
What I’ve done is broken the loop into its own tail-recursive function. I put the end case first since it is essentially chaff and should be ignored. This kind of loop runs very efficiently and consumes nearly no stack space. This tells me that this is an effective idiom for the language, and I get the benefit of not having to remember to use arr.Length-1, so I don’t get an off-be-one error.
In addition, this construct does lend itself to short-circuiting. Let’s consider the following C# function:
int ShortCircuitOperation(int[] arr)
{
int sum = 0;
for (int i=0; i < arr.Length; i++) {
int opresult = SomeOperation(arr[i]);
if (opresult < 0)
break;
sum += opresult;
}
return sum;
}
This is essentially, sum all results of the operations on an arr up until the the operation returns a negative result. Writing this in F# tail-recursively becomes this:
let rec ShortCircuitOperation (arr:int[]) i i_end acc =
if i >= i_end then acc
let opresult = SomeOperation(arr.[i])
if opresult < 0 then acc
else ShortCircuitOperation arr (i + 1) i_end (acc + opresult)
// ...
let answer = ShortCircuitOperation a 0 a.Length 0
This function can’t be written with F# for-loops as is (although you could do it with a seq, I suppose). Again, the important part is not the example itself but the concept: I’m running a loop that is supposed to run some number of times, but may in fact have to stop early and I can’t know that a priori.
Where the rubber hits the road is that I wrote a version of adaptive threshold in F# such that it is pixel-per-pixel identical to our existing adaptive threshold command. The code contains no for-loops. Except for image data and public class properties, the data within the code is immutable – there are no explicit side effects. It does contain 15 tail-recursive functions that perform the various operations. It runs about 1.5x slower than the equivalent C++ version, but instead of going through pointers to memory and doing pointer juggling, it is pure managed code and never touches a pointer and goes through an abstraction layer to get at pixels. So even with the pixel memory abstraction, it’s still reasonably performant.
To give you an idea of how this stiches into dotImage, here is a function that computes the minimum and maximum pixel values in an 8-bit gray image:
let rec RowMinMax (row: byte[]) (x:int) (x_end:int) (minval:int) (maxval:int) =
if x >= x_end then minval,maxval
else
let value = (int row.[x])
RowMinMax row (x + 1) xlimit (min minval value) (max maxval value)
let rec MinMaxXY (pa:PixelAccessor) (y:int) (y_end:int) (minval:int) (maxval:int) (row:byte[]) =
if y >= y_end then minval,maxval
else
pa.GetReadOnlyScanline(y, row)
let currmin,currmax = RowMinMax row 0 row.Length minval maxval
MinMaxXY pa (y + 1) ylimit currmin currmax row
let MinMax (pm:PixelMemory) =
use pa = pm.AcquirePixelAccessor()
let row = Array.zeroCreate<byte> pm.RowStride
MinMaxXY pa 0 pm.Height 255 0 row
// ...
let minval,maxval = MinMax atalaImage.PixelMemory
the resulting code is performant, fairly easy to read and safe.