Welcome to Atalasoft Community Sign in | Help

Making the same mistakes

There is a mistake in software engineering that we, as engineers, keep making again and again and again.  We apparently are really bad about resource management or we're really naive.  I can't decide which.  Probably both.

We already learned that we were way too trusting of memory management, so most modern systems use garbage collection to handle that for us.  As it turns out, most of these systems are pretty heinous about resource management that doesn't fall into the memory category.  So in Java, we created the finally() method, except that there's no guarantee that it will ever be called, so it's useless.  In .NET we can use the IDisposable pattern (see IDisposable Made E-Z) or create some other kind of convention.

This is the problem.  When you create a convention, you need to trust that forever and ever, all clients of your code will correctly adopt this convention and follow it to the letter.  The problem is that we're people and we're just not very good about following directions.

I really like stack allocated variables in C++ in that I can make something happen when they're constructed and also something can happen when they go out of scope.  This is tremendous and it bugs me that they don't really exist in C#.  In fact, I'd like to see the ability to make stack allocated variables other than structs so that I can make them come and go automatically and reliably.  This makes it so easy to prevent resource leaks because you can follow an Acquire/Release model implicity in the lifetime of a call.

The .NET event pattern is another case of this.  When you add an event handler to an event, IntelliSense is really nice about offering to implement the full call to construct your new event delegate and then to also implement the stubs for the method itself.  This is great, except that I've noticed that most of the time this code is not at all what you really want.

Using our OcrEngine as a model, you could do something like:

private OcrDocument DoRecognize(OcrEngine engine, ImageSource source)
{
    engine.DocumentProgress += new OcrDocumentProgressEventHandler(_engine_DocumentProgress);
    return engine.Recognize(source);
}


Which feels natural and IntelliSense saves you a bunch of typing.  The problem is that if this method gets called several times with the same engine object, it will attach many event handlers onto it, none of which will get released, nor can they be.  There is no way to effectively iterate over the listeners to an event and find one, because we don't know what to look for.  So instead, you need to add and remove the handlers:

private OcrDocument DoRecognize(OcrEngine engine, ImageSource source)
{
    OcrDocumentProgressEventHandler docHandler = new OcrDocumentProgressEventHandler(_engine_DocumentProgress);
    engine.DocumentProgress += docHandler;
    OcrDocument doc = engine.Recognize(source);
    engine.DocumentProgress -= docHandler;
    return doc;
}


Now this will work better, but the code looks substantially worse.  And it's also wrong - if there is an exception in the processing (and trust me, there is a lot that can go wrong), the exception will blow through this without cleaning up anything, so your engine will still have an extra event handler on it.  OK, try three:

private OcrDocument DoRecognize(OcrEngine engine, ImageSource source)
{

    OcrDocumentProgressEventHandler docHandler = new OcrDocumentProgressEventHandler(_engine_DocumentProgress);
    engine.DocumentProgress += docHandler;
    try {
        OcrDocument doc = engine.Recognize(source);
        return doc;
    }
    finally {
        engine.DocumentProgress -= docHandler;
    }
}


This will work just fine, but the code looks horrible.

In C++, I'd be inclined to make a helper class that installs the event handler on construction and removes it on destruction, so the code might look like this:

//...
private:
OcrDocument *DoRecognize(OcrEngine *engine, ImageSource *source)
{
    DocProgressInstaller installer(engine, new OcrDocumentProgressEventHandler(_engine_DocumentProgress));
    return engine->Recognize(source);
}
//...

class DocProgressInstaller {
public:
    DocProgressInstaller(OcrEngine *engine, OcrDocumentProgressEventHandler *handler) : m_engine(engine), m_handler(handler)
    {
        m_engine->DocumentProgress += m_handler;
    }
    virtual ~DocProgressInstaller() { m_engine -= m_handler; }
private:
    OcrEngine *m_engine;
    OcrDocumentProgressEventHandler *m_handler;
}



And lo and behold, the code where this is used looks very nice and is readable.  This tedious code is hidden behind the scenes and operates correctly during a throw.  But we're talking C# here, so it doesn't work.

Again, if I could have the ability to make variables do something nice when they go out of scope, I'd be very happy - so happy that I'd willing accept some new keywords in the language to make that apparent and to allow some new semantics.  I imagine that you could either use stack or local as the keyword and use the following semantics:

stack variables need to be passed in to a method that declares a parameter as a stack variable (a la ref and out).  Stack variables are destroyed when the variable goes out of scope.  Stack variables cannot be assigned to, from, or returned.

That's a lot of restrictions but it does some great stuff: it allows me, the developer, to decide where the memory comes from and takes a fair amount of pressure off the garbage collector (and that's a great thing, IMHO.  I talked with a former co-worker who had done a project that involved doing static analysis of garbage collected programs and he created a versions that would do stack allocation and he got substantial performance increases that way).  It allows resource management to happen automatically on clear boundaries.  It makes my code easier to read.

Wanting to make something come and go at particular points in time is a recurring CS problem.  Shouldn't it be an easy one to do right instead of an easy one to do wrong?
Published Thursday, October 26, 2006 3:34 PM by Steve Hawley

Comments

Thursday, February 15, 2007 11:53 PM by jimbot

# re: Stack-allocated variables

Steve,

   Since you want to use stack-allocated variables, you might want to look at C#'s "stackalloc" keyword. A method that uses it must be declared as "unsafe". It can only be used when initializing local variables. It allocates memory from the stack, not the heap. This memory is not managed (nor can it be moved by) the garbage collector. This keyword is discussed on pgs. 544-545 of Schildt's "C# 2.0: The Complete Reference", 2nd ed.

   Since only a pointer to a given type is defined, use of this construct is less sophisticated than defining a class. There's no destructor with custom-written code that's executed when this stack memory is allocated.

   You might want to look at this link, which describes using "stackalloc" to allocate and access 2-dimensional arrays:

http://forums.microsoft.com/MSDN/ShowPost.aspx?PostID=550586&SiteId=1

   It should be useful for your 2-dimensional image processing work. Note that the memory derallocation may be deferred until execution returns from the method, and not necessarily upon exit of the block caontaining the "stackalloc". Enjoy!

                               Jimbot

Friday, February 16, 2007 7:42 PM by jimbot

# re: Stack-allocated variables

Oops, I made an error at the end of my 2nd paragraph. I should have said:

There's no destructor with custom-written code that's executed when this stack memory is DE-allocated.

                           Jimbot

Monday, February 19, 2007 9:58 PM by jimbot

# re: Stack-allocated variables

Steve,

   The default stack size reserved (from virtual memory) upon creation of a thread is 1 MB. Images with large dimensions and wide color resolution per pixel can exceed this size. Therefore, a program that uses the unsafe "stackalloc" construct should be run in a thread whose reserved stack size can be specified to be as large as necessary. The following page discusses an overload (new to .NET 2.0) to the Thread class's constructor  with an additional parameter for specifying the maximum stack size :

http://geekswithblogs.net/gyoung/archive/2006/05/02/76961.aspx

   The link below (which is within the above page) says in its "Remarks" section to avoid using this overload! The reason given is that this overload should not be used to alleviate a stack overflow. Rather, the overconsumption of stack space must be programmer error, and should be corrected. Enjoy!

http://msdn2.microsoft.com/en-US/library/5cykbwz4.aspx

                             jimbot

Monday, February 19, 2007 10:16 PM by jimbot

# re: Stack-allocated variables

Steve,

   In my first reply, regarding how the derallocation of memory (allocated via "stackalloc") may be deferred until execution returns from the method, and not necessarily upon exit of the block containing the "stackalloc", the invocation of Dispose on object(s) specified within the parenthesis of a using statement/block exhibits similar behavior. Dispose is not necessarily called upon exit of the block. At the latest, Dispose() will be called upon exit of the encompassing method. I should provide a link that discusses and backs up this behavior of "using".

                            jimbot

Wednesday, April 18, 2007 11:29 PM by jimbot

# Other applications of stack: flood fill

   These URLs discuss stack-based flood fills:

http://student.kuleuven.be/~m0216922/CG/floodfill.html#4-Way_Method_With_Stack

http://drowningintechnicaldebt.com/blogs/shawnweisfeld/archive/2006/12/04/Stack-Based-Flood-Fill-Algorithm.aspx

   This Wikipedia page discusses a stack-based/recursive implementation:

http://en.wikipedia.org/wiki/Flood_fill

                            jimbot

Friday, April 18, 2008 9:17 AM by Steve's Tech Talk

# Intraeducation

In a previous blog entry , I wrote about the importance of professional development, especially from

Anonymous comments are disabled