Make a Perl-style regex interpreter behave like a basic or extended regex interpreter

Tags: grep regex java
Question!

I am writing a tool to help students learn regular expressions. I will probably be writing it in Java.

The idea is this: the student types in a regular expression and the tool shows which parts of a text will get matched by the regex. Simple enough.

But I want to support several different regex "flavors" such as:

  • Basic regular expressions (think: grep)
  • Extended regular expressions (think: egrep)
  • A subset of Perl regular expressions, including the character classes \w, \s, etc.
  • Sed-style regular expressions

Java has the java.util.Regex class, but it supports only Perl-style regular expressions, which is a superset of the basic and extended REs. What I think I need is a way to take any given regular expression and escape the meta-characters that aren't part of a given flavor. Then I could give it to the Regex object and it would behave as if it was written for the selected RE interpreter.

For example, given the following regex:

^\w+[0-9]{5}-(\d{4})?$

As a basic regular expression, it would be interpreted as:

^\\w\+[0-9]\{5\}-\(\\d\{4\}\)\?$

As an extended regular expression, it would be:

^\\w+[0-9]{5}-(\\d{4})?$

And as a Perl-style regex, it would be the same as the original expression.

Is there a "regular expression for regular expressions" than I could run through a regex search-and-replace to quote the non-meta characters? What else could I do? Are there alternative Java classes I could use?



Answers

If your target is a Unix / Linux system, why just shell out to the definitive host of each regex? ie, use grep for BRE, egrep for ERE, perl for PCRE, etc? The only thing your module would need to do is the UI. Most of the regex testers that I have seen (that are decent) use a variant of this approach.

If you want yet another library suggestion, look at TRE for the BRE / ERE / POSIX / AWK part. It does not support back references, so PCRE / Python / Ruby / JS / Java is out...

By : dawg


if you want your students to learn regex,why not use a freely available tool -- regex Coach -- http://www.weitz.de/regex-coach/ on the net that is pretty good to learn and evaluate regexes ?

look at this SO thread on a similar issue -- http://stackoverflow.com/questions/89718/is-there-anything-like-regexbuddy-in-the-open-source-world

BR,
~A

By : anjanb


I have written something similar: http://stackoverflow.com/questions/172303/is-there-a-regular-expression-to-detect-a-valid-regular-expression#172316

You could take part of that expression, and match each token separatly:

[^?+*{}()[\]\\]                # literal characters
\\[A-Za-z]                     # Character classes
\\\d+                          # Back references
\\\W                           # Escaped characters
\[\^?(?:\\.|[^\\])+?\]         # Character classs
\((?:\?[:=!>]|\?<[=!])?        # Beginning of a group
\)                             # End of a group
(?:[?+*]|\{\d+(?:,\d*)?\})\??  # Repetition
\|                             # Alternation

For each match, you could have some dictionary of appropriate replacements in the target flavor.



This video can help you solving your question :)
By: admin