Even More IEnumerable<T> Fun
This post is going to cover how to use (and abuse) extension methods to make it easier to write compilers and interpreters or to write code metrics tools.
Right now, it’s straight forward to loop over a set of paths, load assemblies (that can be loaded) and then loop over the types and then the methods. Your code ends up fairly ugly though. I know because I’ve written that code for unit testing.
I have a chunk of code that from a set of assemblies, gives me a list of classes that inherit from ImageCommand and themselves are concrete. This code is a static method called ImageCommandForEachExcept – essentially, it runs a delegate on each class. The code is straight forward, but it is ugly. I wrote it to the .NET 1.1 framework and was limited by the set of tools that I had at the time.
Here is what I wanted to write (had it been available at the time):
public static IEnumerable<ImageCommand> GetImageCommands(IEnumerable<Type> types)
{
var imageCommands = from t in types
where
t.IsPublic && !t.IsAbstract && !t.IsInterface && t.IsSubclassOf(ImageCommand)
select t;
Type[] emptyTypes = new Type[0];
foreach (Type t in imageCommands)
{
ConstructorInfo ci = t.GetConstructor(emptyList);
if (ci == null)
continue;
ImageCommand command = ci.Invoke(null) as ImageCommand;
if (command == null)
continue;
yield return command;
}
}
from here, I can do:
foreach (ImageCommand command in GetImageCommands(myAssembly.GetTypes()) { … }
and that’s fairly beautiful, as far as code goes, but I want more – I’d like to be to, given a folder, get all the types from all the assemblies within that folder, so I created the following class:
public class AssemblyEnum : IEnumerable<Assembly>
{
private IEnumerable<string> _paths;
private static bool IsDll(string path)
{
string ext = Path.GetExtension(path);
return ext != null && ext.ToLower().EndsWith("dll");
}
public AssemblyEnum(string path)
{
if (IsDll(path))
{
_paths = new string[] { path };
}
else
{
_paths = Directory.GetFiles(path, "*.dll");
}
}
public AssemblyEnum(IEnumerable<string> paths)
{
_paths = paths;
}
public IEnumerator<Assembly> GetEnumerator()
{
foreach (string path in _paths)
{
if (!IsDll(path))
continue;
Assembly assem = null;
try
{
assem = Assembly.LoadFile(path);
}
catch
{
}
if (assem != null)
yield return assem;
}
}
System.Collections.IEnumerator System.Collections.IEnumerable.GetEnumerator()
{
return GetEnumerator();
}
}
AssemblyEnum, given a path or a set of paths enumerates all the dll’s and attempts to load an assembly from the path.
Given that, I create the following extension methods:
public static IEnumerable<Assembly> Assemblies(this IEnumerable<string> paths)
{
return new AssemblyEnum(paths);
}
public static IEnumerable<Assembly> Assemblies(this string path)
{
return new AssemblyEnum(path);
}
which let me do this:
var assemblies = @“path\to\my\folder”.Assemblies();
now, given an Assemby, I can get all the types out of it with the following extension methods:
public static IEnumerable<Type> Types(this Assembly assem)
{
Type[] types = null;
try
{
types = assem.GetTypes();
}
catch
{
types = new Type[0];
}
return types;
}
public static IEnumerable<Type> Types(this IEnumerable<Assembly> assemblies)
{
return assemblies.SelectMany(x => x.Types());
}
This lets me do this:
var types = @“path\to\my\folder”.Assemblies().Types();
which gives me an IEnumerable<Type> for all types in all the assemblies. It also avoids a problem that Assembly.GetTypes() has, which is it throws if the assembly can’t be loaded. Honestly, though, I don’t like the .Assemblies().Types() – I’d like to shorten that up a little bit, so let’s create a few more extension methods:
public static IEnumerable<Type> Types(this string path)
{
return path.Assemblies().SelectMany(x => x.Types());
}
public static IEnumerable<Type> Types(this IEnumerable<string> paths)
{
return paths.Assemblies().SelectMany(x => x.Types());
}
Now, I can simple to this:
var types = @“path\to\my\folder”.Types();
which is a heinous abuse of extension methods – a string going to a list of types?! Yikes – the problem really is that a path can be represented by a string, but a string is not really a path.
Now, before you think that I have gone far enough, let me throw in four more extension methods:
public static IEnumerable<MethodInfo> Methods(this string path)
{
return path.Types().SelectMany(x => x.GetMethods());
}
public static IEnumerable<MethodInfo> Methods(this IEnumerable<string> paths)
{
return paths.SelectMany(x => x.Methods());
}
public static IEnumerable<MethodInfo> Methods(this IEnumerable<Assembly> assemblies)
{
return assemblies.Types().SelectMany(x => x.GetMethods());
}
public static IEnumerable<MethodInfo> Methods(this Type type)
{
return type.GetMethods();
}
These let me get all the Methods from a path to a dll (or path to a folder of dlls), a collection of paths to dlls, a collection of assemblies, or a type.
So now, I can do this:
int publicCount = @"path\to\my\folder".Types().Where(x => x.IsPublic).Count();
which gives me the count of all the public classes in all the assemblies in the folder. I can also get a list of all the public methods in public classes by doing this:
var methods = @”path\to\my\folder”.Methods().Where(x => x.IsPublic && x.DeclaringType.IsPublic);
because Methods() is built to using nested calls to SelectMany and so on, it is lazy – so that cost of getting the methods is cheap. So consider this, I’m writing a compiler or interpreter with type inference. If I have an IEnumerable<ObjectsICanCallWithinThisScope>, then finding a matching set of methods is a LINQ operation. In fact, writing a linker is a LINQ operation.
Given earlier work here on making streams or tokens enumerable, I’m thinking that a compiler should start to look like this under the hood:
IEnumerable<AstNode> BuildAst(Stream stm, Grammar g)
{
IEnumerable<Token> scanner = new Tokenizer(stm, g);
Parser p = new Parser(scanner, g);
return p.Parse();
}
then you can write a code generator that operates only on IEnumerable<AstNode>. You can write optimizers that operate on IEnumerable<AstNode> – nanopass becomes a closer reality because enumeration of the tree is so easy comparatively. You can find nodes that are candidates for strength reduction via a LINQ query:
from p in nodes where p is AstArithmeticNode && IsConstantExpr(p.LeftChild) && IsConstantExpr(p.RightChild) select p;