Woo-hoo! Compiler Bug!
I just finished tracking down a Visual C++ compiler bug. This bug exists in the optimizer in VC 2003 and 2005. It does not exist in VC 2008. It only happens with optimization turned on (ie, Release build).
Finding this bug was interesting in that we had a defect reported from a customer that caused an exception to be raised when trying to load a particular image. This image had an error in it and was causing a throw deep within the codec. This looked like an easy bug to find. Unfortunately for me, unit tests would pass on my machine, but not on the build server. This turned a routine bug into a nightmare bug - the debugger was not likely to help me. What I ended up doing was taking the library that (most likely) contained the bug and built it release, and used that in the debug build on my machine. This allowed me to reproduce the bug (and proving that it was a release bug). Using divide and conquer, I found where the exception was being thrown and to my great surprise, where the catch was being totally missed.
The bug is this:
if you have a try/catch block in C++ that only calls "pure" C code, the optimizer sees an opportunity to remove the catch block as C can't throw. If however, the C code calls back into C++ and that code throws, the exception will blow past the handler.
It is arguable that C++ code called from C should never throw without catching before returning to the caller. If the catch block is outside the C, it could cause memory or resource leaks. In this particular case, the function being called was a function pointer acting as an event handler. Basically, it was signaling to the caller, "something bad has happened - I've cleaned up, now it's your turn".
To illustrate how this happens, I've created a minimal code example to reproduce it. This code is a simple C++ app that calls a C function, PerformOperation with a C++ callback. PerformOperation prints a message than calls the callback if it's non-null. The C++ implementation of the callback throws an empty class to signal a failure. The calling code should catch this error and report failure. In debug, you get correct behavior. In release, you get an unhandled exception.
File - Perf.h - a header describing an external operation in C:
#ifndef _H_Perf
#define _H_Perf
#ifdef __cplusplus
extern "C" {
#endif
typedef void (*fPerformer)();
extern void PerformOperation(fPerformer pf);
#ifdef __cplusplus
}; // _cplusplus
#endif
#endif
File - Perf.c - an implementation of the function PerformOperation
#include "Perf.h"
#include "stdio.h"
#ifdef __cplusplus
extern "C" {
#endif
void PerformOperation(fPerformer pf)
{
printf("Performing operation...");
if (pf)
pf();
}
#ifdef __cplusplus
};
#endif
File - OptimizerBug.cpp - calls the performer from C++ with a C++ callback that throws
#include <iostream>
#include <tchar.h>
#include "Perf.h"
using namespace std;
class PerfError { };
static void CPPPerform()
{
throw PerfError();
}
int _tmain(int argc, _TCHAR* argv[])
{
bool fail = false;
try {
PerformOperation(CPPPerform);
}
catch (PerfError) {
fail = true;
}
cout << (fail ? "fail." : "pass.");
return 0;
}
Here's what happens - in a debug build, you will see "Performing operation...fail." - this is correct output. In a release build, the code will crash with an unhandled exception. I need to also stress that the function PerformOperation needs to live in its own file. If it lives in OptimizerBug.cpp, the optimizer goes even further and notices that PerformOperation can be inlined, and since it's only being used once, the callback can be inlined too and that it will always throw. It's a nice chunk for call->call optimization, but it makes the bug go away. If the implementation is in a different file, the optimizer doesn't inline.
Here's the assembly output for the release build in VC 2005:
; 20 : bool fail = false;
; 21 : try {
; 22 : PerformOperation(CPPPerform);
;
; Here's the call to PerformOperation
;
push OFFSET ?CPPPerform@@YAXXZ ; CPPPerform
call _PerformOperation
; 23 : }
; 24 : catch (PerfError) {
; 25 : fail = true;
; 26 : }
; 27 :
; 28 : cout << (fail ? "fail." : "pass.");
;
; and here's the call to cout, operator << - you'll notice
; that the fail = true is not here.
;
push OFFSET ??_C@_05MFHHNNDH@pass?4?$AA@
push OFFSET ?cout@std@@3V?$basic_ostream@DU?$char_traits@D@std@@@1@A ; std::cout
call ??$?6U?$char_traits@D@std@@@std@@YAAAV?$basic_ostream@DU?$char_traits@D@std@@@0@AAV10@PBD@Z ; std::operator<<<std::char_traits<char> >
add esp, 12 ; 0000000cH
; 29 :
; 30 : return 0;
xor eax, eax
There is a workaround - the most basic is to refactor to never throw in a C++ callback called from C. Where this is not possible, the routine with the catch block needs to be surrounded by:
#if NDEBUG
#pragma optimize("", off)
#endif
#if NDEBUG
#pragma optimize("", on)
#endif
This bug is NOT fixed by changing the catch to catch(...) - the optimizer will take out the handler no matter what.
On a more meta level, I want to talk more about bugs that happen only in release and not in debug. These are among the most frustrating bugs as it looks like you have to shed your main tool for tracking them down - your debugger. In my case, I was able to isolate the behavior and build that particular component with release. You can still use the debugger in release, but it's not as useful as you might think since the optimizer may shift the order of operations of things and you will see some truly bizarre behavior. For example, I watched the execution of an if (condition) statement where condition was false - and the debugger stepped into the block (!!). This was because a lot of the method had been optimizer rearranged to reduce size and increase speed. I find it easier to use the Disassembly window in VisualStudio so I can better see what the compiled code is. While doing this, I spotted the missing catch block - quite the WTF.