Some Managed/Unmanaged Exception Tricks
I just spent 5 days tracking down a Heisenbug. Historically, I've tracked down a lot of these but not because I create a lot of them, because I'm good at finding them.
In this particular case, I had a set of unit tests that were failing on our build server but did not fail consistently on my machine. This is not a good sign to start with. Eventually, I managed to get consistent failure, which allowed me to discover that managed code was throwing an SEHException somewhere. What this really means is that something low-level was throwing an exception that .NET grabbed and tried to turn in a managed exception but had no real information on what to do.
What I wanted to do was this - go to the Exceptions window in VisualStudio, check the right exception or set of exceptions and break when it was being thrown. The trouble was that the moment that I tried to do this, the bug no longer reproduced. I couldn't trap the exception when it was thrown nor could I determine which exception specifically was being thrown. I tried installing an se_translator to catch the exception, but that never got hit.
Since I did know that the exception was being thrown from unmanaged code, I could trap it using try { } catch (...) { }. I employed divide and conquer to get to the exception. I did this by taking a known method that threw and dividing it into two blocks. Given an original method like this:
{ // original method
// code block A - code omitted
// code block B - code omitted
}
I transformed it into this:
{ // new method
try {
// code block A - code omitted
} catch (...) {
throw "fail.";
}
try {
// code block B - code omitted
} catch (...) {
throw "fail.";
}
}
And set breakpoints on each throw. If the first break point got hit, it means code block A was throwing, otherwise it was B. Then I subdivided further until I found the line responsible, which in turn was a method call. Lather, rinse, repeat. This is the time to be plodding and consistent, not insightful. I kept at it until I found that some code in stl was throwing std::bad_alloc. Plain and simple - I was out of memory. I altered top level code to catch std::bad_alloc, verified that it caught the exception and then improved all the error handling up into managed code.
The problem here was not std::bad_alloc being thrown (although it needed to be caught). The problem was that somehow I was running out of memory when I was pretty sure I had enough. I did some checking and found that this particular block of code had a "large block cache" where it would reuse large blocks of allocated memory when possible, but until it did, they hung around. I didn't like this, so I made sure the cache was routinely flushed on chunky API boundaries. Then I had a look at the unit tests and found that many of them had been written without disposing IDisposable objects that were holding onto large chunks of memory. No doubt, the heap was getting badly fragmented between spuriously disposed objects and the large block cache. Problem now solved on all fronts.
Since test runs took upwards of 10-15 minutes each, I spent the in between time catching up on my reading. Specifically, I was reading CLR via C# by Jeffrey Richter. In it I found a nice little gem - if you have the following chunk of code:
static void Main(string[] args)
{
try
{
ThrowingMethod();
}
catch
{
Console.WriteLine("Something awful happened.");
}
}
static void ThrowingMethod()
{
throw new ArgumentException("For no good reason");
}
If you have a breakpoint on the Console.WriteLine, you can examine the exception by entering $exception into a watch window in the debugger:
this is really handy! It unfortunately doesn't work in unmanaged C++ in a catch (...) { }, but I'll take what I can get.