Seek and Ye Shall Find
I was building on some benchmarks on some heavy stream handling code that I wrote. It’s part of a simple state machine lexer that tokenizes input. One bottleneck I found was surprising – read on.
The lexer uses one of my favorite patterns: peek. This uses the ability to look ahead precisely one byte in a stream to help determine the end of a token. I had an obvious implementation of peek in my scanner:
private int LLPeekChar()
{
int i = _stream.ReadByte();
if (i >=0)
_stream.Seek(-1, SeekOrigin.Current);
return i;
}
This is straight forward, but I had quite a surprise when I ran a substantial chunk of text through my lexer:
Routine Name |
Time |
Time with Children |
Shared Time |
Hit Count |
LLPeekChar |
1.98 |
1.98 |
100.00 |
473683 |
Now, most people would think that 2 seconds for 473,683 iterations is not bad. That’s .004 milliseconds per call. I wasn’t happy, though – this is not a heavy operation and I knew when I wrote it that I would be unhappy – I already knew that Seek is an expensive operation. So much so that this writing this line of code in any FileStream.Seek-heavy operation will make a huge difference:
if (postion != _stm.Position)
_stm.Seek(position, SeekOrigin.Begin);
I’m not making this up – seeking to the same position costs more than catching that case. I recommend that if you hate two argument seek as much as I do that when you write an extension method to do the most common case, that you write it this way:
public static long Seek(this Stream stm, long position)
{
if (position == stm.Position)
return position;
return stm.Seek(position, SeekOrigin.Begin);
}
I decided to rework this by writing an adapter class on Stream. This class would adapt a Stream but add the ability to un-get a single byte. This is not unlike the C Standard IO call ungetc().
What I do is build a Stream subclass that is constructed with a Stream. This will be a read-only Stream, to keep it simple.
Since most of the code is brief, I’ll put in the entire implementation. The code uses a single byte buffer (this is easily extended, but I really don’t need or want the overhead of a full stack of byte – exercise left to the reader, and all that). I also use a magic number to tell if the buffer byte is valid or not. You could use a bool flag, but again – simple. The only thing baroque in this implementation is that I keep track of how much to alter the stream Position property. I could test the buffer and alter it that way, this to me makes the code less error prone, especially if it grows in the future.
using System;
using System.Collections.Generic;
using System.Text;
using System.IO;
namespace Atalasoft.StreamTesting
{
internal class UngetStream : Stream
{
const int kInvalidByte = -2;
Stream _stm;
long _streamOffset = 0;
int _b = kInvalidByte;
public UngetStream(Stream stm)
{
_stm = stm;
}
public override bool CanRead
{
get { return true; }
}
public override bool CanWrite
{
get { return false; }
}
public override int ReadByte()
{
if (_b >= 0)
{
int val = _b;
_b = kInvalidByte;
_streamOffset = 0;
return val;
}
return _stm.ReadByte();
}
public override int Read(byte[] buffer, int offset, int count)
{
int nRead = 0;
if (_b >= 0)
{
buffer[offset++] = (byte)_b;
count--;
_b = kInvalidByte;
_streamOffset = 0;
nRead++;
}
if (count > 0)
{
return nRead + _stm.Read(buffer, offset, count);
}
return nRead;
}
public override long Seek(long offset, SeekOrigin origin)
{
_b = kInvalidByte;
_streamOffset = 0;
return _stm.Seek(offset, origin);
}
public override long Position
{
get
{
return _streamOffset + _stm.Position;
}
set
{
Seek(value, SeekOrigin.Begin);
}
}
public override long Length
{
get { return _stm.Length; }
}
public void Unget(byte b)
{
if (_b >= 0)
throw new IOException("attempt to unget twice");
_b = b;
_streamOffset = -1;
}
public override void Write(byte[] buffer, int offset, int count)
{
throw new NotImplementedException();
}
public override void SetLength(long value)
{
throw new NotImplementedException();
}
public override void Flush()
{
throw new NotImplementedException();
}
public override bool CanSeek
{
get { return true; }
}
}
}
This is just a two-state entity: ungotten byte pending or not pending. So my implementation of LLPeekChar changes to this:
private int LLPeekChar()
{
int i = _stream.ReadByte();
if (i >= 0)
_stream.Unget((byte)i);
return i;
}
Now when I measure the performance, I get the following result:
Routine Name |
Time |
Time with Children |
Shared Time |
Hit Count |
LLPeekChar |
0.01 |
0.03 |
43.92 |
473683 |
The performance is now 66x faster if you include the performance of Unget. More importantly, a 2 second hit in the total time running this app vanishes into the realm of imperceptible, as it should. The total time per iteration is now 6.3 x 10-5 milliseconds or 63 nanoseconds. I can live with that. Can you?
As a final note, I did not preemptively optimize this code. I did a full day of tests resulting in me doing some heavy refactoring – I had a routine that was taking 14 seconds of my total runtime which is now also taking 0.03 seconds. When I was done with everything else significant within my control, LLPeekChar was the highest time thief and this type of optimization is clean and simple.