Mathematical Expression Parsers in Java and C++

Basic Expression Parsing

Click here for advanced expression parsing

When writing your own calculator it is necessary to build a converter that can transform an input mathematical expression such as ( 1 + 8 ) – ( ( 3 * 4 ) / 2 ), into a format that is more suited for evaluation by computers.

When evaluating expressions such as the one above (known as “infix notation“), that which appears simple and intuitive to us humans, is usually not so straightforward to implement in a programming language.

The shunting-yard algorithm is a method for parsing mathematical expressions written in infix notation to Reverse Polish Notation (RPN). The RPN notation is different to infix notation in that every operator (+, -, * etc) comes after the operands (numbers) and there are no parentheses (brackets).

So ( 3 * 4 ) for example becomes 3 4 *.

When given an input string in Reverse Polish Notation, it is then possible to employ a simple algorithm based around the use of a stack in order to evaluate the RPN expression and determine the mathematical result.

The Shunting-yard algorithm

This pseudocode shows how the shunting-yard algorithm converts an expression given in conventional infix notation into Reverse Polish Notation:

For each token
{
    If (token is a number)
    {
        Add number to the output queue
    }
	
    If (token is an operator eg +,-,*...) 
    {
        While (stack not empty AND 
               stack top element is an operator)
        {
            If ((token = left associative AND 
                 precedence 

Implementing the the shunting-yard algorithm in Java

The Java function to implement the shunting yard algorithm described above is as follows:

    // Convert infix expression format into reverse Polish notation
    public static String[] infixToRPN(String[] inputTokens) 
    {
        ArrayList<String> out = new ArrayList<String>();
        Stack<String> stack = new Stack<String>();
        
        // For each tokens
        for (String token : inputTokens) 
        {
            // If token is an operator
            if (isOperator(token)) 
            {  
                // While stack not empty AND stack top element 
                // is an operator
                while (!stack.empty() && isOperator(stack.peek())) 
                {                    
                    if ((isAssociative(token, LEFT_ASSOC)         && 
                         cmpPrecedence(token, stack.peek()) 

Evaluating expressions in Reverse Polish Notation

Once we have obtained our input expression string in RPN format, we evaluate the expression using a simple algorithm, also based around the use of a stack. Pseudocode as follows:

For each token
{
    If (token is a number)
    {
        Push value onto stack
    }

    If (token is an operator)
    {
        Pop 2 top values from the stack
        Evaluate operator using popped values as args
        Push result onto stack
    }
}

The single value remaining on the stack is the evaluation. Java code for this is as follows:

 public static double RPNtoDouble(String[] tokens)
 {        
    Stack<String> stack = new Stack<String>();
        
    // For each token
    for (String token : tokens) 
    {
        // If the token is a number push it onto the stack
        if (!isOperator(token)) 
        {
            stack.push(token);                
        }
        else
        {
            // Pop the two top entries
            Double d2 = Double.valueOf( stack.pop() );
            Double d1 = Double.valueOf( stack.pop() );
                
            //Get the result
            Double result = token.compareTo("+") == 0 ? d1 + d2 : 
                            token.compareTo("-") == 0 ? d1 - d2 :
                            token.compareTo("*") == 0 ? d1 * d2 :
                                                        d1 / d2;               
                                
            // Push result onto stack
            stack.push( String.valueOf( result ));                                                
        }                        
    }        
        
    return Double.valueOf(stack.pop());
}

Example

Consider the expression

( 1 + 2 ) * ( 3 / 4 ) - ( 5 + 6 ).

Passing this as a string argument to the infixToRPN function will yield this RPN format:

1 2 + 3 4 / * 5 6 + -
.

It is then simply a matter of evaluating the RPN format to give the desired mathematical result of -8.75.

Putting it all together: Java Implementation of Basic Expression Parser

Full Java code listing as follows:

package expressionparser;

import java.util.*;
 
public class ExpressionParser 
{
    // Associativity constants for operators
    private static final int LEFT_ASSOC  = 0;
    private static final int RIGHT_ASSOC = 1;
 
    // Operators
    private static final Map<String, int[]> OPERATORS = new HashMap<String, int[]>();
    static 
    {
        // Map
        OPERATORS.put("+", new int[] { 0, LEFT_ASSOC });
        OPERATORS.put("-", new int[] { 0, LEFT_ASSOC });
        OPERATORS.put("*", new int[] { 5, LEFT_ASSOC });
        OPERATORS.put("/", new int[] { 5, LEFT_ASSOC });        
    }
 
    // Test if token is an operator
    private static boolean isOperator(String token) 
    {
        return OPERATORS.containsKey(token);
    }
 
    // Test associativity of operator token
    private static boolean isAssociative(String token, int type) 
    {
        if (!isOperator(token)) 
        {
            throw new IllegalArgumentException("Invalid token: " + token);
        }
        
        if (OPERATORS.get(token)[1] == type) {
            return true;
        }
        return false;
    }
 
    // Compare precedence of operators.    
    private static final int cmpPrecedence(String token1, String token2) 
    {
        if (!isOperator(token1) || !isOperator(token2)) 
        {
            throw new IllegalArgumentException("Invalid tokens: " + token1
                    + " " + token2);
        }
        return OPERATORS.get(token1)[0] - OPERATORS.get(token2)[0];
    }
 
    // Convert infix expression format into reverse Polish notation
    public static String[] infixToRPN(String[] inputTokens) 
    {
        ArrayList<String> out = new ArrayList<String>();
        Stack<String> stack = new Stack<String>();
        
        // For each token
        for (String token : inputTokens) 
        {
            // If token is an operator
            if (isOperator(token)) 
            {  
                // While stack not empty AND stack top element 
                // is an operator
                while (!stack.empty() && isOperator(stack.peek())) 
                {                    
                    if ((isAssociative(token, LEFT_ASSOC)         && 
                         cmpPrecedence(token, stack.peek()) 

C++ Implementation of Basic Expression Parser

#include <iostream>
#include <sstream>
#include <list>
#include <stack>
#include <map>
#include <string>
#include <vector>
#include <iterator>
#include <stdlib.h>
      
const int LEFT_ASSOC  = 0;    
const int RIGHT_ASSOC = 1; 
      
// Map the different operators: +, -, *, / etc  
typedef std::map< std::string, std::pair< int,int > > OpMap;   
typedef std::vector<std::string>::const_iterator cv_iter;
typedef std::string::iterator s_iter;

const OpMap::value_type assocs[] =   
    {  OpMap::value_type( "+", std::make_pair<int,int>( 0, LEFT_ASSOC ) ),    
       OpMap::value_type( "-", std::make_pair<int,int>( 0, LEFT_ASSOC ) ),      
       OpMap::value_type( "*", std::make_pair<int,int>( 5, LEFT_ASSOC ) ),      
       OpMap::value_type( "/", std::make_pair<int,int>( 5, LEFT_ASSOC ) ) };  

const OpMap opmap( assocs, assocs + sizeof( assocs ) / sizeof( assocs[ 0 ] ) );  

// Test if token is an pathensesis  
bool isParenthesis( const std::string& token)        
{        
    return token == "(" || token == ")";      
}      

// Test if token is an operator        
bool isOperator( const std::string& token)        
{        
    return token == "+" || token == "-" ||      
           token == "*" || token == "/";      
}      

// Test associativity of operator token        
bool isAssociative( const std::string& token, const int& type)        
{            
    const std::pair<int,int> p = opmap.find( token )->second;        
    return p.second == type;        
}        

// Compare precedence of operators.        
int cmpPrecedence( const std::string& token1, const std::string& token2 )        
{        
    const std::pair<int,int> p1 = opmap.find( token1 )->second;      
    const std::pair<int,int> p2 = opmap.find( token2 )->second;      

    return p1.first - p2.first;        
}        

// Convert infix expression format into reverse Polish notation        
bool infixToRPN( const std::vector<std::string>& inputTokens,   
                 const int& size,   
                 std::vector<std::string>& strArray )        
{     
    bool success = true;    

    std::list<std::string> out;      
    std::stack<std::string> stack;      

    // While there are tokens to be read    
    for ( int i = 0; i 

So when running the above code as a simple Windows Console application using an example input expression of ( 1 + 2) * ( 3 / 4 )-(5+6) we get the following RPN tokens and mathematical result:

ExpressionParserCpp1

If an inappropriate input expression is used, one with a mis-match in the number of left/right parentheses such as ( 1 + 2 * ( 3 / 4 )-(5+6), then we get an error:

ExpressionParserCpp2

And if we insert an additional minus sign '-' so that the original expression used is now -( 1 + 2) * ( 3 / 4 )-(5+6), the algorithm recognizes that this should get treated as a unary minus operator, resulting in the following output:

ExpressionParserCpp3

UPDATE: 13 April 2013
Extending the algorithm to handle further mathematical expressions

So far our implementation has been applied to handle basic expressions based on standard mathematical operators +, -, *, / etc. The following downloadable C++ code makes a number of further improvements to handle additional mathematical operators sin, cos, tan, log, exp etc, as well as much more complicated subexpressions.

Additional sanity checks are included to make sure there are no mismatched numbers of parentheses, as well as some additional work on the tokenization of RPM strings to handle unary minus operators occuring at positions where an expression is expected eg -8 + 5 = -3 or 11 ^ -7 = 5.13158e-08.

All examples in this code have been verified using Google's online calculator. To use Google's built-in calculator simply enter your mathematical expression into the search box. For example:

GoogleCalculator

Some examples are shown in the following table:

Expression (infix) RPN (postfix) Result
exp( 1.11 ) 1.11 exp 3.034358
sin( cos( 90 * pi / 180 ) ) 90 pi * 180 / cos sin 0.000001
34.5*(23+1.5)/2 34.5 23 1.5 + * 2 / 422.625000
5 + ((1 + 2) * 4) - 3 5 1 2 + 4 * + 3 - 14
( 1 + 2 ) * ( 3 / 4 ) ^ ( 5 + 6 ) 1 2 + 3 4 / 5 6 + ^ * 0.126705
3/2 + 4*(12+3) 3 2 / 4 12 3 + * + 61.5
PI*pow(9/2,2) PI 9 2 / 2 pow * 63.617197
((2*(6-1))/2)*4 2 6 1 - * 2 / 4 * 20
ln(2)+3^5 2 ln 3 5 ^ + 243.693147
11 ^ -7 11 -7 ^ 5.13158e-08
cos ( ( 1.3 + 1 ) ^ ( 1 / 3 ) ) - log ( -2 * 3 / -14 ) 1.3 1 + 1 3 / ^ cos -2 3 * -14 / log - 0.616143
1 * -sin( Pi / 2) 1 Pi 2 / -sin * -1
-8 + 5 -8 5 + -3
1 - (-2^2) - 1 1 1 -2 2 ^ * - 1 - 4

Download the C++ code for handling further mathematical expressions:

The download contains the Visual Studio 2010 code:


The file main.cpp is an example usage of applying the Tokenize method of the ExpressionParser class to first prime the input expression string into a suitable format, which is subsequently converted to Reverse Polish Notation using the InfixToRPN method.

If successful, the reverse Polish Notation is then evaluated to give the mathematical result.

Example usage shown here:

#include "ExpressionParser.h"

// Print iterators in a generic way  
template<typename T, typename InputIterator>  
void Print( const std::string& message,  
            const InputIterator& itbegin,     
            const InputIterator& itend,     
            const std::string& delimiter)    
{    
    std::cout 

Example outputs:

parser1

parser2

Download below. Please contact me if you need any help in using the software.


Latest Comments

  1. vahid 24 January 2012
    • Andy 24 January 2012
      • Yasmin Fathy 10 November 2013
        • happyuk 10 November 2013
          • Yasmin Fathy 10 November 2013
          • happyuk 10 November 2013
        • happyuk 10 November 2013
  2. rakesh kumar 23 February 2012
    • Andy 23 February 2012
  3. rkrenn 9 March 2012
    • Andy 10 March 2012
  4. Kapil Yadav 22 July 2012
    • Andy 22 July 2012
  5. Andy 30 August 2012
    • Andy 30 August 2012
  6. Shaine padilla 12 September 2012
    • Andy 12 September 2012
  7. vinny 10 October 2012
    • Andy 5 November 2012
    • happyuk 14 April 2013
  8. Solo 5 November 2012
    • Andy 5 November 2012
  9. entropyMD 6 November 2012
    • happyuk 14 April 2013
  10. Andy 6 November 2012
  11. Dag 13 November 2012
    • happyuk 14 April 2013
  12. shlomi 31 December 2012
    • happyuk 14 April 2013
  13. Mikail 21 October 2013
    • happyuk 23 October 2013
  14. Pingback: Visual Studio causing problems in c++ code 2 September 2014
  15. STVG 9 March 2016
    • Andy 11 March 2016
  16. Théophile 12 June 2016

Leave a Reply