Make a Perl-style regex interpreter behave like a basic or extended regex interpreter

I am writing a tool to help students learn regular expressions. I will probably be writing it in Java.

The idea is this: the student types in a regular expression and the tool shows which parts of a text will get matched by the regex. Simple enough.

But I want to support several different regex "flavors" such as:

  • Basic regular expressions (think: grep)
  • Extended regular expressions (think: egrep)
  • A subset of Perl regular expressions, including the character classes \w, \s, etc.
  • Sed-style regular expressions

Java has the java.util.Regex class, but it supports only Perl-style regular expressions, which is a superset of the basic and extended REs. What I think I need is a way to take any given regular expression and escape the meta-characters that aren't part of a given flavor. Then I could give it to the Regex object and it would behave as if it was written for the selected RE interpreter.

For example, given the following regex:


As a basic regular expression, it would be interpreted as:


As an extended regular expression, it would be:


And as a Perl-style regex, it would be the same as the original expression.

Is there a "regular expression for regular expressions" than I could run through a regex search-and-replace to quote the non-meta characters? What else could I do? Are there alternative Java classes I could use?

Asked by: Luke533 | Posted: 28-01-2022

Answer 1

Alternatively, you could use Jakarta ORO?

This supports the following regex 'flavors':

  • Perl5 compatible regular expressions
  • AWK-like regular expressions
  • glob expressions

Answered by: Ned743 | Posted: 01-03-2022

Answer 2

check out this post for a 'regular expression for regular expressions': Is there a regular expression to detect a valid regular expression?

You can use this as a basis for your module.

Answered by: Kellan664 | Posted: 01-03-2022

Answer 3

I have written something similar: Is there a regular expression to detect a valid regular expression?

You could take part of that expression, and match each token separatly:

[^?+*{}()[\]\\]                # literal characters
\\[A-Za-z]                     # Character classes
\\\d+                          # Back references
\\\W                           # Escaped characters
\[\^?(?:\\.|[^\\])+?\]         # Character classs
\((?:\?[:=!>]|\?<[=!])?        # Beginning of a group
\)                             # End of a group
(?:[?+*]|\{\d+(?:,\d*)?\})\??  # Repetition
\|                             # Alternation

For each match, you could have some dictionary of appropriate replacements in the target flavor.

Answered by: Chloe401 | Posted: 01-03-2022

Answer 4

If your target is a Unix / Linux system, why just shell out to the definitive host of each regex? ie, use grep for BRE, egrep for ERE, perl for PCRE, etc? The only thing your module would need to do is the UI. Most of the regex testers that I have seen (that are decent) use a variant of this approach.

If you want yet another library suggestion, look at TRE for the BRE / ERE / POSIX / AWK part. It does not support back references, so PCRE / Python / Ruby / JS / Java is out...

Answered by: Kate200 | Posted: 01-03-2022

Answer 5

if you want your students to learn regex,why not use a freely available tool -- regex Coach -- on the net that is pretty good to learn and evaluate regexes ?

look at this SO thread on a similar issue --


Answered by: Jack624 | Posted: 01-03-2022

Similar questions

java - How to use result of jdbc interpreter and cassandra interpreter in a single paragraph in Apache Zeppelin

I am trying to make a notebook in apache zeppelin by combining results of two different interpreters. One is jdbc and another is cassandra. How can i combine the result of both interpreter in a single paragraph.

jvm - How exactly does the Java interpreter or any interpreter work?

I have been figuring out the exact working of an interpreter, have googled around and have come up with some conclusion, just wanted it to be rectified by someone who can give me a better understanding of the working of interpreter. So what i have understood is: An interpreter is a software program that converts code from high level language to machine format. speaking specifically about jav...

Java "Virtual Machine" vs. Python "Interpreter" parlance?

It seems rare to read of a Python "virtual machine" while in Java "virtual machine" is used all the time. Both interpret byte codes; why call one a virtual machine and the other an interpreter?

java - How to Represent Classes in an Abstract Syntax Tree Based Interpreter

I have read the related questions, but none of them appears to address the question directly. I am working on writing a PHP script interpreter. I have the AST generating proper nodes for everything except classes. Handling classes is a bit different than handling functions, so I am looking for how to handle classes that are standalone, and that extend other classes. I have looked at ANTLR, but I can't afford the...

linux - Is GIJ (GNU Interpreter for Java) stable enough for commercial use?

I have been asked to write a java program on linux platform. According to system admin, the JRE on the linux system is GIJ, which is supposed to be compatible to JRE 1.4.2. java version "1.4.2" gij (GNU libgcj) version 4.1.2 20080704 (Red Hat 4.1.2-44) Copyright (C) 2006 Free Software Foundation, Inc. This is free software; see the source for copying conditions. There is NO warranty; not even for MERCHA...

jvm - can we implement a java interpreter in hardware that executes Java bytecodes natively?

if we implement java interpreter in hardware then how we can achieve architecture neutrality of java bytecode... does java uses JIT(just in time interpreter) ? and how all this is related to Virtual Machine concept of operating system and java virtual machine(JVM)

Forth Interpreter in Java

Here I found a Simple Forth Interpreter implemented in Java. However I don't understand the significance of it if I want to use it? What could be the advantage of the Forth Interpreter: If the final compiled code to be executed by the JVM is still "Byte code" what would we the Forth Interpreter be doing? Will it help in writing...

java - Call javascript interpreter from a script

I've written some scripts in Javascript under Rhino 1.7, one of them starts a minimal http server and accepts JS commands in input. Now, if I call (from within Rhino): engine = ScriptEngineManager().getEngineByName("JavaScript"); I get the builtin JS engine (from Java 1.6), that is an older version of Rhino, and lacks some functions (like JavaAdapter for multiple interfaces). ...

java - What does the status code of the perl interpreter mean?

I'm trying to execute a copy of the Perl interpreter using Java's Runtime.exec(). However, it returned error code 9. After running the file a few times, the perl interpreter mysteriously started to return code 253 with no changes in my command at all. What does code 253 / code 9 mean? A Google search for perl interpreter's exit codes turned up nothing...

java - Is the hotspot JVM Bytecode Interpreter a tracing JIT?

The question pretty much says it all, I've been looking around for an answer even through the VM spec but I it doesn't explicitly state it.

java - Building a compiler or interpreter using Python

Closed. This question does not meet Stack Overflow guid...

java - Write a simple interpreter, or find one I can use?

I need a very simple interpreter that is written in Java. The language is going to be simple. I just need string operators, like "contains and equals". I need logically AND, OR. Along with parenthesis. "some string" CONTAINS "ring" AND ("some string" EQUALS "input" OR "some other string" CONTAINS "other") This simply needs to evaluate to true or false. Are...

Still can't find your answer? Check out these amazing Java communities for help...

Java Reddit Community | Java Help Reddit Community | Java Community | Java Discord | Java Programmers (Facebook) | Java developers (Facebook)