A simplifying regex

Regular expressions can get pretty complex.  I’m in the process of trying to master them, but from time to time I worry that I should just give up on the concept — that perhaps there are generally easier ways of accomplishing the same thing, and regexes are a waste of time.

Then I see an example like today’s that renews my belief that  it’s worthwhile to master these regexes:

Background

A co-worker had a method that received a String of comma-separated values.  Inside the method, he did a split(), with a comma as the delimiter. A simplified version of the class function in question:


public class Splitter {
    public String[] split(String s) {
        return s.split(",");
    }
}

…and its test:


public class SplitterTest {
	@Test
	public void testSplitBasic() {
		List<String> splitString = new Splitter().splitBasic("bob, sam,harry");
		assertEquals(3, splitString.size());
		assertEquals("bob", splitString.get(0));
		assertEquals(" sam", splitString.get(1));
		assertEquals("harry", splitString.get(2));
	}
}

This test passes.  So far, so good.

The Problem

He wanted to enhance the split method to support escaping a comma within the string.  When the split method encountered an escaped comma (“\,”) , it should not consider the comma a delimiter (and it should eat the backslash).

So we would want this test to pass:


        List<String> splitString = new Splitter().splitSmart("bob\\, sam,harry");
        assertEquals(2, splitString.size());
        assertEquals("bob, sam", splitString.get(0));
        assertEquals("harry", splitString.get(1));

splitSmart(): a Non-Regex Implementation

He had a working implementation that looked something like this:


    public List<String> splitSmart(String s) {
        List<String> list = new ArrayList<String>();

        String concat = "";
        boolean concatenating = false;
        for (String x : s.split(",")) {
            if (x.endsWith("\\")) {
                concat += x.substring(0, x.length() - 1) + ",";
                concatenating = true;
            } else if (concatenating) {
                concat += x;
                list.add(concat);
                concatenating = false;
            } else {
                list.add(x);
                concatenating = false;
            }
        }
        return list;
    }

This makes the test pass, but my co-worker was not happy with it.  Too clunky.

A Simplification using a Regex

I had to think about it for a few minutes, but eventually it came to me that what we wanted as a delimiter was a comma not preceded by a backslash.  Looks like a great opportunity to use… negative lookbehind!

The regex way to say “a comma not preceded by a backslash” is:

(?X) is the regex way of saying “not preceded by X” (in this case, a backslash, which has to be escaped) and the comma is the thing to match.

Now we can simplify splitSmart() down to this:

    public List splitSmart(String s) {
        List list = new ArrayList();

        for (String x : s.split(“(?remove a backslash that is followed by a comma, using positive lookahead!

Conclusion

My co-worker was pleased to use the regex-totin’ split-n-replace version of the code.  We both agreed it looked cleaner and simpler, even with the somewhat odd-looking lookbehind syntax and the double-escaped backslashes.  For my part, I was happy to be able to apply my regex learning to help someone.  :)

Advertisements

,

  1. #1 by Timur Alhimenkov on January 27, 2009 - 4:17 pm

    Wow! Thank you!
    I always wanted to write in my blog something like that. Can I take part of your post to my blog?
    Of course, I will add backlink?

    Regards, Timur Alhimenkov

  2. #2 by danielmeyer on January 28, 2009 - 11:27 am

    Timur,
    Sure, that would be fine.
    -Daniel-

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s