Welcome to Atalasoft Community Sign in | Help

Solving the Expression Problem with OOP

Yesterday I watched a presentation by Dr. Ralf Lämmel about The Expression Problem.  The problem is, in a nutshell, how do you build an extensible data model and an extensible operation model that meets three goals:

  1. Code-level modularization
  2. Separate compilation
  3. Static type safety

In watching his presentation, I was perturbed by his dismissal of OO approaches to solve this problem.  To my eyes he used strawman arguments by presenting several techniques which don’t work.  The reason that I found this upsetting is that I have worked on many large systems that were built in C that had all of these things.  One of the benefits of working at Adobe in the 1990’s was that I got to work on and look at four different modular extensibility models: PhotoShop, Acrobat, Premiere, and Illustrator.  Each of these presented fairly complicated data models and allowed the addition of extensions that manipulated existing objects, extended them, presented new objects, and new functionality.  All of this was done in a way that was type-safe and had to have modularization and separate compilation.  And while the extension model in PhotoShop was fairly primitive, the Premiere extension model was such that the actual application for Premiere was nothing more than a host for plug-ins and a small amount of code to facilitate communication with an abstract operating system layer.  The first time I saw that, my jaw dropped that the actual application was nearly nothing.

So how is this accomplished?  In the expression problem, you can do this by making a clear separation between your data and your operations.  In fact, it’s a fairly good model to think of the work needed as an orthogonal grid:

  Operations    
Data  Eval PrettyPrint Compile
Add EvalAdd PPAdd CompileAdd
Sub EvalSub PPSub CompileSub
Mul EvalMul PPMul SompileMul

You see that if I add a new data type, I need to add a set of operations as well.  Similarly, if I add a new operation, I need to define that operation of each existing data type.

The overall operational model is one that I like to think of as Everlasting Gobstopper (or jaw breaker, if you prefer).  Here is what a jawbreaker looks like when you cut it open:

Jawbreaker[1] It’s a set of shells layered on top of each other.  This is the model that we’ll work with (it’s actually a little weirder than this), in that any given delivery of code is considered to be a solidified API and extensions that build on this work with ONE particular shipment in order to add to it.  Each new shipment is a new set of extensions that build a new delicious layer on the existing code.  To do otherwise is to create a very hard to support system – it can be done, but there will be runtime checks for inconsistency and contingency plans to handle them.

How can we make this work?  Let’s start with a base Expr class:

public class Expr { }

and then let’s create a repository for them:

public ExprRepository {
   public ExprRepository() { Expressions = new List<ExprPackage>(); } 
   public List<ExprPackage> Expressions { get; private set; }
}

I haven’t defined ExprPackage.  It probably will be little more than an Expr type but it could be a factory method as well.  Now I’m going to define an interface and a class to represent operations:

public interface IVerbPackage { }
public class Verb {
    public Verb() { Implementations = new VerbRespository(); }
    public string Name { get; set; }
    public VerbRepository Implementations { get; private set; }
    public Type ReturnType { get; set; }
}

So the question is, what is a VerbRepository?  A VerbRespository is a collection of objects that contain a Type object for the class EType where EType inherits from Expr and a MethodInfo for a method that has the following signature, where RType is the return type.  I’m not going to define that – I’ll hand wave it away.  It could be a Dictionary mapping EType to an object that contains the MethodInfo – that would be fine.

public RType SomeMethod(Dispatcher d, Verb v, ETtype e);

So now we have some holes to fill in.  I just introduced a Dispatcher object.  What does that do?  Simply, it executes a particular verb for a particular expression type:

public class Dispatcher {
    public object Dispatch(Verb v, Expr e) {
        MethodInfo mi = v.Implemtations.GetMethod(e.Type);
        if (mi == null) { throw new ArgumentException("Incomplete implementation for Verb " + v.Name;); }
        object target = v.Implementations.GetInstance(e.Type);
        object[] args = new object[] { this, v, e };
        // in actuality, this needs to be in a try/catch
        object returnVal = mi.Invoke(target, args);
        return returnVal;
    }
}
        

So now I have code that based on the type of e looks up the appropriate method implementation for the given verb and executes it.

So how do create verbs?  Like this:

public PrettyPrintImpl : IVerbPackage {
    [Verb("PrettyPrint")]
    public string Add(Dispatcher d, Verb v, ExprAdd e)
    {
        string lhs = d.Dispatch(v, e.Left) as string;
        string rhs = d.Dispatch(v, e.Right) as string;
        return lhs + " + " + rhs;
    }    
}

The hosting application looks at all available assemblies and pulls any classes that implement IVerbPackage, then it extracts MethodInfo objects that are decorated with VerbAttribute and uses the string to label the verb (there are other ways to do this).  Now my implentation is type strong.  The only thing I don’t like is the cast – it’s cruft.  By defining the first method that implements a verb named PrettyPrint, I’m defining how all verbs should act.  I should instead define Dispatch like this:

public T Dispatch<T>(Verb v, Expr e) {
    MethodInfo mi = v.Implementations.GetMethod(e.Type);
    if (mi == null) { throw new ArgumentException("..."); }
    object target = v.Implementations.GetInstance(e.Type);
    object[] args = new object[] { this, v, e };
    object returnVal = mi.Invoke(target, args);
    return (T)returnVal;
}

This lets me rewrite PrettyPrint like this:

[Verb("PrettyPrint")]
public string Add(Dispatcher d, Verb b, ExprAdd e)
{
    string lhs = d.Dispatch<string>(v, e.Left);
    string rhc = d.Dispatch<string>(v, e,Right);
    return lhs + " + " + rhs;
}

I’ve moved the cast to a more appropriate place and now I’m honoring the contract.  I will get a runtime error if I use the wrong flavor of Dispatch – sorry, that’s a practical limitation.

So what happens when the world starts?

Some code scans a directory for assemblies and finds all assemblies that have classes that inherit from Expr and classes that implement IVerbPackage.  The system attempts to build the orthogonal collection of Verbs for the given Expr.  When this is complete, the system can now start executing verbs on the data.  What do you do on non-orthogonal Verbs?  There are a number of choices.  The most obvious ones are nothing or throw – I can’t say I like either of these.  Another option might be to allow a method to define default behavior – another method maybe decorated as [VerbDefault(“name”)] and to use this, if it exists, for the verb, otherwise disqualify the Verb.

There are three bad things about this implementation.  The first is that I could easily forget to implement a given Verb for a given Expr.  This is avoidable – imagine that Dispatcher contained in it a method called DefineVerb(Type t, string name, ProgrammingLanguage lang), which when executed generates an entire verb set for a given Expr type t.  This is effectively using the extension system to write your next extension.  Awesome.

The second bad thing is two developers who are in parallel writing new extensions – one writes a new verb, the other writes a new Expr.  Both extensions can’t co-exist because they didn’t know about each other when they were written.  This is effectively applying two new layers onto the gobstopper simultaneously.  Reject one – or either could fix the incomplete implementation.  Having default verb implementations goes a long way to help this problem.  I understand that this is an ongoing problem with people who create complicated Ruby Gems.

The final problem is human error on calling Dispatch.  What if I dispatch on the wrong type?  Fail.  Can’t the compiler catch this?  The better way, I think is to try to code to avoid the problem.  One way is to instead of providing a dispatcher, provide a delegate that does the dispatching and is thus fixed on return type.  The code that scans for verbs would only accept methods of this form:

public ReturnType Method(Dispatcher d, Func<Dispatcher, Verb, Expr, ReturnType> del, Expr e);

And in this, I eliminate the Dispatch method in Dispatcher as I’m passing it in to the function.  Now my prettyprint looks like this:

[Verb("PrettyPrint")]
public string Add(Dispatcher d, Func<Dispatcher, Verb, Expr, string> del, Verb v, AddExpr e)
{
    string lhs = del(d, v, e.Left);
    string rhs = del(d, v, e,Right);
    return lhs + " + " + rhs;
}

And tada, I have compiler/runtime enforced type safety.  Now, where does that Func<> come from?  I write it on the fly. That’s not exactly beginner code, but it’s straight forward to do.

I would like to point out now that for a production system, this extensibility model not sufficient.  Generally speaking the best models operate in terms of published APIs.  Clients of an API request from a host an API by name (or type) and by version.  This allows APIs to exist with legacy versions as well as revisions.  This gives you non-breaking legacy support for (nearly) free.  Also, any good extension model should have the facility to load/unload extensions on the fly.  This requires an event model to allow clients to deal with their APIs going away.  Finally there is a missing tier in the Expression Problem – there is the sense of data and operation, but missing is the notion of meaning.  For example, Add(Expr1, Expr2) is perfectly valid as long as Expr1 and Expr2 are numeric, but what if Expr1 is and array?  A function declaration?  A sequence expression?  An argument list?  There is nothing to handle this well in the model.  Nor is there a coherent generic factory model for constructing expressions.  Hopefully that would be addressed too.

Extensibility is not a new concept in software engineering.  By looking at the tools that are available in modern OOP systems, it is possible to create a solution to the Expression Problem that is compact, type-safe, and readable – all elements of practical engineering.

Published Thursday, August 12, 2010 10:03 AM by Steve Hawley

Comments

Anonymous comments are disabled