ANTLR Hello World! - Arithmetic Expression Parser

ANTLR Hello World! - Arithmetic Expression Parser

Ever wondered how all these programming languages understand what you write? This article reveals the truth: Language Parsing. It is often referred to as parsing, syntax analysis, or syntactic analysis. Regardless of the term, it is the process of analyzing a string of symbols, either in natural language, computer languages or data structures, conforming to the rules of a formal grammar. The following diagram depicts the language parsing process:

Language Parser

As you can see, the Language Parser (which is part of the compiler) takes an input (which is the source code), validates it against the Language Grammar and produces an Abstract Syntax Tree (commonly known as AST which is representing the source code in a tree structure).

ANTLR (ANother Tool for Language Recognition) is a tool to define such grammar and to build a parser automatically using that grammar. It also provides two high-level design patterns to analyze the AST: Visitor and Listener. ANTLR is being used by several languages and frameworks including Ballerina, Siddhi, and Presto SQL. This article introduces ANTLR to you using a hello world application to evaluate basic mathematical expressions as a string.

If you are developing a complete Calculator, you may need to consider the exp4j library (I have already written an article on how to use exp4j: Android: Simple Calculator in Kotlin). In this article, we will create a Calculator that supports only arithmetic addition, subtraction, multiplication and division expressions. Let's get our hand dirty.

Requirements:

Step 1:
Create a new Maven project in IntelliJ Idea with a group id: com.javahelps.antlr and an artifact id: antlr-demo.

Step 2:
Add the antlr4-runtime dependency as shown below:
<?xml version="1.0" encoding="UTF-8"?>
<project xmlns="http://maven.apache.org/POM/4.0.0"
         xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
         xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 http://maven.apache.org/xsd/maven-4.0.0.xsd">
    <modelVersion>4.0.0</modelVersion>

    <groupId>com.javahelps.antlr</groupId>
    <artifactId>antlr-demo</artifactId>
    <version>1.0-SNAPSHOT</version>

    <dependencies>
        <dependency>
            <groupId>org.antlr</groupId>
            <artifactId>antlr4-runtime</artifactId>
            <version>4.7.2</version>
        </dependency>
    </dependencies>

</project>

Step 3:
Add the maven-compiler-plugin to specify the Java version (use 1.8 or the latest Java version. I am using Java 11 here) and the antlr4-maven-plugin to auto-generate parser and other required classes.
<?xml version="1.0" encoding="UTF-8"?>
<project xmlns="http://maven.apache.org/POM/4.0.0"
         xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
         xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 http://maven.apache.org/xsd/maven-4.0.0.xsd">
    <modelVersion>4.0.0</modelVersion>

    <groupId>com.javahelps.antlr</groupId>
    <artifactId>antlr-demo</artifactId>
    <version>1.0-SNAPSHOT</version>

    <dependencies>
        <dependency>
            <groupId>org.antlr</groupId>
            <artifactId>antlr4-runtime</artifactId>
            <version>4.7.2</version>
        </dependency>
    </dependencies>

    <build>
        <plugins>

            <plugin>
                <groupId>org.apache.maven.plugins</groupId>
                <artifactId>maven-compiler-plugin</artifactId>
                <version>3.8.0</version>
                <configuration>
                    <source>11</source>
                    <target>11</target>
                </configuration>
            </plugin>

            <plugin>
                <groupId>org.antlr</groupId>
                <artifactId>antlr4-maven-plugin</artifactId>
                <version>4.7.2</version>

                <executions>
                    <execution>
                        <goals>
                            <goal>antlr4</goal>
                        </goals>
                        <configuration>
                            <listener>false</listener>
                            <visitor>true</visitor>
                        </configuration>
                    </execution>
                </executions>
            </plugin>

        </plugins>
    </build>

</project>

The antlr4-maven-plugin automatically scans your src folder and generates required classes when you build the project. In the configuration, we ask the ANTLR to generate only the visitor pattern, not the listener.

Visitor pattern allows you to control the nodes to visit and satisfies all kind of requirements most of the time. Especially when your grammar is too complex, the Visitor pattern allows you to skip unnecessary components. On the other hand, if you use Listener pattern, ANTLR will create an implicit visitor and visits the AST from top to bottom. If there is a method implemented in the listener to listen to a node- visit event, it will be called with the required information. In this example, we can choose either of them but in my experience Visitor is more powerful than the Listener. Therefore, in this article, I only show you how to use the Visitor pattern to visit the AST.

Step 4:
Create a new hierarchy of folders in the src/main folder: antlr4/com/javahelps/antlrdemo/calculator. In the next step, we will create an ANTLR grammar file. The antlr4-maven-plugin use this folder hierarchy to define the package name of auto-generated classes.

Step 5:
Create a new file named Calculator.g4 with the following code in the above-created folder.
The ANTLR v4 Grammar Plugin is recommended for IntelliJ Idea users for syntax highlighting and rule navigation.
grammar Calculator;

operation
    : left=NUMBER operator='+' right=NUMBER
    | left=NUMBER operator='-' right=NUMBER
    | left=NUMBER operator='*' right=NUMBER
    | left=NUMBER operator='/' right=NUMBER
    ;

NUMBER
   : ('0' .. '9') + ('.' ('0' .. '9') +)?
   ;

WS : (' ' | '\t')+ -> channel(HIDDEN);


Let's dive into the grammar definition. ANTLR grammar has two building blocks: TOKEN and parser rule. Tokens are written in all uppercase and parser rules are written in all lower case. In this grammar, there are two tokens: WS (spaces or tabs) which are ignored by the ANTLR by adding them to the hidden channel. NUMBER token is represented by a regular expression to match all positive numbers.

The parser rule operation is an arithmetic addition, subtraction, multiplication or division. As you can see, the operands are named left and right, and the operator is named as operator to be easily identified later in the visitor implementation.

Step 6:
Create a new package com.javahelps.antlrdemo.calculator in the src/main/java folder.

Step 7:
Create a new class CalculatorVisitorImpl inside that package with the following code:
package com.javahelps.antlrdemo.calculator;

public class CalculatorVisitorImpl extends CalculatorBaseVisitor<Double> {

    @Override
    public Double visitOperation(CalculatorParser.OperationContext ctx) {
        if (ctx.operator == null) {
            throw new UnsupportedOperationException("An operator of +, -, /, * is required to perform the operation");
        }
        String operator = ctx.operator.getText();
        double left = Double.parseDouble(ctx.left.getText());
        double right = Double.parseDouble(ctx.right.getText());

        switch (operator) {
            case "+":
                return left + right;
            case "-":
                return left - right;
            case "/":
                return left / right;
            case "*":
                return left * right;
            default:
                throw new UnsupportedOperationException("Calculator does not support " + operator);
        }
    }
}

This class extends the CalculatorBaseVisitor class which is supposed to be generated in the same package (but in target/generated-sources/antlr4 folder). If you get an error in IntelliJ saying that the CalculatorBaseVisitor is not found, open the IntelliJ terminal (or external terminal in the same folder) and run this command to build the project and auto-generate the parser and visitor classes.
mvn clean install

If the error still persists, right click on the project and click on MavenReimport to import the newly generated sources into the project. The visitOperation method is used to visit the operation parser rule we defined in the grammar. As you can see, we check the value of the operator and return the result according to the provided operator.

Step 8:
Create another class Calculator in the same package with the following code:
package com.javahelps.antlrdemo.calculator;

import org.antlr.v4.runtime.CharStream;
import org.antlr.v4.runtime.CharStreams;
import org.antlr.v4.runtime.CodePointCharStream;
import org.antlr.v4.runtime.CommonTokenStream;
import org.antlr.v4.runtime.tree.ParseTree;

public class Calculator {

    public static void main(String[] args) {

        Calculator calculator = new Calculator();
        System.out.println(calculator.calculate("2 + 5"));  // 7.0
        System.out.println(calculator.calculate("2 * 5"));  // 10.0
        System.out.println(calculator.calculate("5 - 3"));  // 2.0
        System.out.println(calculator.calculate("5 / 3"));  // 1.6666666666666667
        System.out.println(calculator.calculate("5 # 3"));  // Error: line 1:2 token recognition error at: '#'
    }

    private Double calculate(String source) {
        CodePointCharStream input = CharStreams.fromString(source);
        return compile(input);
    }

    private Double compile(CharStream source) {
        CalculatorLexer lexer = new CalculatorLexer(source);
        CommonTokenStream tokenStream = new CommonTokenStream(lexer);
        CalculatorParser parser = new CalculatorParser(tokenStream);
        ParseTree tree = parser.operation();
        CalculatorVisitorImpl visitor = new CalculatorVisitorImpl();
        return visitor.visit(tree);
    }
}

The compile method takes a CharStream, creates a lexer out of that stream, convert the lexer into a token stream and parse it into an Abstract Syntax Tree (AST). Later, we visit that AST using the CalculatorVisitorImpl we implemented in Step 7. The calculate method converts a String input into a CharStream object.

Feeding a valid input prints the expected output. On the other hand, an input which does not meet the grammar definition causes a syntax error. These errors are nicely presented as compile-time errors in programming languages. Of course, real compilers like javac checks for more requirements like type checking. However, syntax validation is always required and ANTLR takes care of it in our project.

If you are interested in developing a complete Calculator using ANTLR, check this example grammar provided in the official ANTLR GitHub repository: Calculator Grammar.

You can clone this project from the GitHub repository:




If you have any questions, feel free to comment below. I will try my best to answer your questions.
Previous
Next Post »

Contact Form

Name

Email *

Message *