Java Regex: Dynamic Replacements with Lambda Expressions

Regular expressions are a powerful tool to find patterns in strings. Static replacements are also relatively easy to implement. When it comes to dynamic replacements, things get more interesting.

Static replacements

Suppose we want to convert all ISO dates in a string into the European format DD.MM.YYYY for our users.

String input = "Lorem ipsum 2023-11-07 dolor sit 2021-09-14 amet.";
Pattern isoDatePattern = Pattern.compile("(\\d{4})-(\\d{2})-(\\d{2})");
String output = isoDatePattern.matcher(input).replaceAll("$3.$2.$1");
// "Lorem ipsum 07.11.2023 dolor sit 14.09.2021 amet."

For a static replacement, we can access each group of a match using the dollar notation.

But suppose we now want to show our users the dates in a long form, depending on their locale. The result should look something like this:

Locale locale = Locale.US;
String input = "Lorem ipsum 2023-11-07 dolor sit 2021-09-14 amet.";
String output = // ...
// "Lorem ipsum November 7, 2023 dolor sit September 14, 2021 amet."

In this case, we cannot simply transform the groups. Instead, we have to dynamically execute additional code for each match. Depending on the Java version we are using, we have various options.

First, let's extract the actual conversion logic into a new method. The logic is independent of the replacement options, which we will look at in a moment. This will also help us to keep the following code examples short.

String extractAndFormatDateWithLocale(MatchResult isoDate, Locale locale) {
    int year = Integer.parseInt(isoDate.group(1));
    int month = Integer.parseInt(isoDate.group(2));
    int dayOfMonth = Integer.parseInt(isoDate.group(3));
    LocalDate date = LocalDate.of(year, month, dayOfMonth);

    DateTimeFormatter dateFormatter = DateTimeFormatter
            .ofLocalizedDate(FormatStyle.LONG)
            .withLocale(locale);

    return date.format(dateFormatter);
}

By the way, I don't like that almost all tutorials and blog posts about regular expressions use magic numbers. Let's face it, group(3) is neither readable nor maintainable. We'd have to go to the pattern and count parentheses first... In this blog post I wrote about an alternative: One Step Towards Maintainable Regular Expressions In Java

Java 8: Using StringBuilder

Let's start with Java 8. The obvious option is to iterate over the matches, replace them and concatenate them with the remaining parts of the string. With start() and end() we can access the start and end index of each match.

Locale locale = Locale.US;
String input = "Lorem ipsum 2023-11-07 dolor sit 2021-09-14 amet.";

Pattern isoDatePattern = Pattern.compile("(\\d{4})-(\\d{2})-(\\d{2})");
Matcher isoDates = isoDatePattern.matcher(input);
StringBuilder outputBuilder = new StringBuilder();
int lastEnd = 0;
while (isoDates.find()) {
    outputBuilder.append(input, lastEnd, isoDates.start());
    String formattedDate = extractAndFormatDateWithLocale(isoDates, locale);
    outputBuilder.append(formattedDate);
    lastEnd = isoDates.end();
}
outputBuilder.append(input, lastEnd, input.length());

String output = outputBuilder.toString();
// "Lorem ipsum November 7, 2023 dolor sit September 14, 2021 amet."

That's a lot of code... Let's see how we can solve that more readable.

Using appendReplacement

Java 8 provides another option with appendReplacement. Let's take a look at the code first.

Locale locale = Locale.US;
String input = "Lorem ipsum 2023-11-07 dolor sit 2021-09-14 amet.";

Pattern isoDatePattern = Pattern.compile("(\\d{4})-(\\d{2})-(\\d{2})");
Matcher isoDates = isoDatePattern.matcher(input);
StringBuffer outputBuilder = new StringBuffer();
while (isoDates.find()) {
    String formattedDate = extractAndFormatDateWithLocale(isoDates, locale);
    isoDateMatcher.appendReplacement(outputBuilder, formattedDate);
}
isoDateMatcher.appendTail(outputBuilder);

String output = outputBuilder.toString();
// "Lorem ipsum November 7, 2023 dolor sit September 14, 2021 amet."

The code has become a little shorter. In particular, we no longer have to deal with indexes.

Two method calls are interesting here: Matcher.appendReplacement(..) and Matcher.appendTail(..)

The appendReplacement(..) method does the following: It adds all characters up to the beginning of the match to the StringBuffer. It also adds the replacement (here: formattedDate) to the StringBuffer.

The method appendTail(..) appends the remaining characters from the last match to the end of the string to the StringBuffer.

Important: If you use this approach, you must not forget to add the appendTail(..) call after the while loop. Otherwise we would lose the characters after the last match.

In Java 8, however, this option only works with StringBuffer. Java 9 introduced an overloaded method for StringBuilder.

This option is a little shorter than using string indexes. However, the while loop and the call to appendTail(..) are still boilerplate code that we would have to write every time.

Java 9+: replaceAll with Lambda Expressions

With Java 9, the well-known replaceAll method has been overloaded so that a lambda expression can be passed.

Locale locale = Locale.US;
String input = "Lorem ipsum 2023-11-07 dolor sit 2021-09-14 amet.";

Pattern isoDatePattern = Pattern.compile("(\\d{4})-(\\d{2})-(\\d{2})");
String output = isoDatePattern.matcher(input)
                              .replaceAll(isoDate -> extractAndFormatDateWithLocale(isoDate, locale));
// "Lorem ipsum November 7, 2023 dolor sit September 14, 2021 amet."

Short and concise.

The behavior is similar to the other replaceAll method: The passed Function (lambda expression) is called for each match. Java itself takes care of iterating and concatenating behind the scenes.

Wrapping Up

Whatever option you use: First separate the parsing of the string into its parts and the actual replacement logic. This makes the code easier to read.

And since Java 9, Matcher.replaceAll(Function) is a very elegant way of dynamically calling a function for each match and replacing the match with the function result. This allows us to focus on reading or writing our business logic in the code.

Comments