Regex unit test suite?

I’m reading Mastering Regular Expressions, by Jeffrey Friedl.  Regular expressions come in a lot of different flavors and dialects.  In reading the book, I realized that when I’ve used grep for text searching, sometimes my regex has failed because I was using the + metacharacter, which grep doesn’t support! (I’m using Cygwin’s GNU grep 2.5.3)

Wouldn’t it be nice to have a regex unit test suite that you could run a utility against and see for certain what metacharacters it supports?  I’m envisioning something sort of like a “configure” script, except instead of storing configuration settings it would just print them to the screen.

Some settings that might be useful:

  • Does this tool support the + metacharacter?
  • For grouping, should I use ( ) or \( \) ?
  • Does this tool support the {min,max} (or \{min,max\}) syntax?

[Update 1/15/2009: I’m now in Chapter 4 of Jeff Friedl’s Mastering Regular Expressions, and by now I know of other this I’d like to test:

  • Lazy quantifiers: ??, *?, +?, {max,min}?
  • Possessive quantifiers: *+, ++, ?+, {min,max}+
  • Atomic grouping: (?>…)
  • Which kind of regex engine does the tool use: Traditional NFA, DFA, or POSIX NFA?

]

Though I’m calling it a suite, probably a fairly monolithic single file o tests would be sufficient.  It seems that separate version of the suite would need to be made for each language, but all the same tests would be there in each version…

Has anyone done something like this already, I wonder?

Advertisements

4 thoughts on “Regex unit test suite?

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google+ photo

You are commenting using your Google+ account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

w

Connecting to %s