uncategorized

Working around findstr's regexp limitations

It’s my job to solve difficult problems involving Exchange Server, and this often involves a lot of various types of tracing. Almost daily, I find myself needing to parse through huge amounts of text to find the relevant information. For one issue alone, I currently have over 20 GB of traces in the form of text files.

Usually, I can get by with findstr. This handy little tool is included with Windows, is very fast, and supports regular expressions… sort of. Running findstr /? produces this quick reference:

Image

It tells us to refer to the online documentation for full information on findstr regular expressions, but if you go there, you’ll find the same ten options listed. If you’ve ever looked at any regexp documentation, you know there are a lot more options than this. The regexp quick reference on MSDN lists over 70.

Eventually, I ran into an issue where the lack of full regexp support in findstr was a showstopper. I really, really needed to OR two regular expressions and have all the results combined in one set of results, chronologically from the top of the trace to the bottom of the trace. With findstr, there is apparently no way to do this, because it doesn’t support the bar character which represents an OR in a regexp.

A quick search led to a helpful StackOverflow thread (which was closed as “not constructive” for some reason), but it seems the tools of choice for most people are GUI tools - grepWin or PowerGREP. A few people mentioned using plain old grep via Cygwin or GnuWin32. I have Cygwin installed on one of my machines, but that seems like a lot of stuff to install just to search a text file.

Maybe it’s just me, but when it looks like I need to run an installer to accomplish a very basic task, I start cringing and looking for other options. When you support Windows software for a living, you spend a significant chunk of your life clicking through setup screens. That’s one of the reasons I’m in love with Chocolatey. If you haven’t tried Chocolatey, you should. It will significantly reduce the number of setup screens in your life. Check out Boxstarter too, while you’re at it.

Via Chocolatey, I stumbled across a nice little utility called BareGrep, which almost hit every checkbox on my wish list: It’s a tiny exe, no installer, and it accepts command line parameters. Unfortunately, it displays the results in a GUI, which is a deal-breaker.

Finally, I decided the best option for me was to reinvent the wheel using Powershell and .NET. My initial script was very simple and just did this:

1
2
3
4
5
6
7
8
9
10
$regex = New-Object System.Text.RegularExpressions.Regex("Some Pattern")
$reader = New-Object System.IO.StreamReader($file)
while ($null -ne ($buffer = $reader.ReadLine()))
{
if ($regex.IsMatch($buffer))
{
$buffer
}
}
$reader.Close()

I ended up making it a little fancier in order to validate parameters and support multiple files and recursion. It would probably run faster if I converted it to C#, but so far the Powershell version has been fast enough that it doesn’t matter much.

Maybe one day findstr will get updated with better regexp support. Until then, I’m using this script. I still have BareGrep in my path, as well. The GUI results aren’t that bad when I don’t need to pipe the output to a new file and process them further.