More IEnumerable<T> Fun
This blog post will be about a practical example of using IEnumerable<T> to make solving common problems easier.
Here’s a common abstract problem – walk a tree of nodes visiting each node and possibly perform an operation on a node’s contents. An concrete version of that is finding one or more files within a file system.
The typical way to do that is with recursion. And to you I say, that way madness lies. The implementation, however is simple:
public class RecursiveDirectoryWalker
{
public static List<string> WalkPath(string path, Func<string, bool> directoryFilter, Func<string, bool> fileFilter)
{
if (path == null)
throw new ArgumentNullException("path");
List<string> paths = new List<string>();
WalkPath(path, directoryFilter, fileFilter, paths);
return paths;
}
private static void WalkPath(string path, Func<string, bool> directoryFilter, Func<string, bool> fileFilter, List<string> paths)
{
string[] files = Directory.GetFiles(path);
foreach (string file in files)
{
if (fileFilter == null || fileFilter(file))
{
paths.Add(file);
}
}
string[] dirs = Directory.GetDirectories(path);
foreach (string dir in dirs)
{
if (directoryFilter == null || directoryFilter(dir))
WalkPath(dir, directoryFilter, fileFilter, paths);
}
}
}
In this code, I use a helper routine to pass a list of "found paths”. A found path is a path for which the fileFilter returns true (or always if fileFilter is null). The helper routine takes a list of strings to which it will add found files, rather than returning them. This makes the recursion somewhat easier to write.
The problem here is the recursion. If you have any kind of realistic stack limit, I will deliver you a file system which will overflow your stack. Plain and simple, you can never ship production code based on this routine. It is a time bomb waiting to happen.
Instead, routines like this should be either heap based or tail recursive. Since C#, at present, doesn’t support tail recursion, I’ll cover heap-based. To do heap based, the typical approach is to use a heap-based stack rather than the VM’s stack. This is usually thought of as a stack of current state (or the deltas to return you to your state). There’s still another approach, which works well for this problem: a queue of work to do. The task is this: get the head of the work queue, process the files found there, then for each directory found enqueue it as work to do later.
I’m going to go one step further still and implement this directory walker non-recursively and as IEnumerable<T>:
public class DirectoryWalker : IEnumerable<string>
{
private string _seedPath;
Func<string, bool> _directoryFilter, _fileFilter;
public DirectoryWalker(string seedPath) : this(seedPath, null, null)
{
}
public DirectoryWalker(string seedPath, Func<string, bool> directoryFilter, Func<string, bool> fileFilter)
{
if (seedPath == null)
throw new ArgumentNullException(seedPath);
_seedPath = seedPath;
_directoryFilter = directoryFilter;
_fileFilter = fileFilter;
}
public IEnumerator<string> GetEnumerator()
{
Queue<string> directories = new Queue<string>();
directories.Enqueue(_seedPath);
Queue<string> files = new Queue<string>();
while (files.Count > 0 || directories.Count > 0)
{
if (files.Count > 0)
{
yield return files.Dequeue();
}
if (directories.Count > 0)
{
string dir = directories.Dequeue();
string[] newDirectories = Directory.GetDirectories(dir);
string[] newFiles = Directory.GetFiles(dir);
foreach (string path in newDirectories)
{
if (_directoryFilter == null || _directoryFilter(path))
directories.Enqueue(path);
}
foreach (string path in newFiles)
{
if (_fileFilter == null || _fileFilter(path))
files.Enqueue(path);
}
}
}
}
System.Collections.IEnumerator System.Collections.IEnumerable.GetEnumerator()
{
return GetEnumerator();
}
}
Instead of using static methods, I now have a class which implement IEnumerable<string>. In the class, I also include a seed path for where to start searching and the filters.
GetEnumerator is simplicity – there are two queues, a queue of files and a queue of directories. While the queue of files is not empty, we yield return the head. Otherwise, if the queue of directories is non-empty, we dequeue a directory and use that as a seed for finding more files and more directories.
The surprising thing is that both solutions are pretty close to the same length. The non-recursive version is also quite readable, thanks to the use of yield return. I especially like how it transforms the client code, in that I can now write something like this:
foreach (string s in new DirectoryWalker(@"C:\tfsroot\ISLib 8.0", null, (x => x.EndsWith(".obj"))))
{
Console.WriteLine(s);
}
Which is nice, short and sweet. Also, the recursive version accumulates the results into a list and returns it whereas this version works piecemeal. That isn’t to say that we couldn’t build the recursive version around IEnumerable<T> – we can. It just fits nicely into the state-machine style of the non-recursive version. The other nice thing about using IEnumerable<T> is that many directory search routines carry in a predicate to ask whether or not you should cancel the search and to provide UI feedback/IO coverup – this is not necessary here, just break out of the for loop.
One thing I brushed over is the filter functions. I’m using Func<T, TResult> to shorthand up a delegate definition that I can use for asking the question “do I want this directory/file”. This type of function is called a predicate. In my initial version, I made specific delegates, but I decided to use the generic Func<> instead. For the usage example, I put in a lambda expression for the predicate “does the path end with .obj”. In practice, it might be better to use Path.GetExtension() to check how it ends and use a case insensitive comparison.