Skip to main content

Writing An Parser

· 2 min read
Abhinn Pandey

In the realm of code representation, our primary goal is to devise a format that's easily generated by the parser and effortlessly consumed by the interpreter. When deciphering an arithmetic expression like "1 + 2 * 3 - 4," your brain employs the order of operations (PEMDAS - Parentheses, Exponents, Multiplication and Division, Addition and Subtraction) to evaluate it.

Mentally, you organize this expression in a tree structure. In this tree representation, the leaf nodes contain numbers, while interior nodes represent operators with branches corresponding to their operands.

To evaluate an arithmetic expression from this tree, you start from the leaves, computing the values of their subtrees and moving upwards—performing a post-order traversal. The process involves:

Evaluating the tree from the bottom up.

A. Initiating with the entire tree, evaluate the bottom-most operation, 2 * 3.

B. Then, compute the + operation.

C. Proceed with the - operation.

D. Finally, arrive at the ultimate answer.

AST_TREE

Given an arithmetic expression, drawing this type of tree structure is straightforward. Similarly, when provided with such a tree, evaluating it becomes effortless. Hence, it seems intuitive that a workable representation of our code could be a tree mirroring the grammatical structure—specifically, the nesting of operators in the language.

However, in order to define this grammar precisely, we delve into the realm of formal grammars. Just as there are lexical grammars for individual characters and lexemes, we now shift our focus to syntactic grammars, operating at a higher level of granularity. Here, each "letter" in the alphabet corresponds to an entire token, and a "string" signifies a sequence of tokens—essentially, an entire expression.

Let's clarify this transition using a table:

TerminologyLexical GrammarSyntactic Grammar
Alphabet isCharactersTokens
A string isToken or lexemeExpression
Implemented byScannerParser

This shift in focus from lexical to syntactic grammar helps us define the rules and structure of the expressions within the language more precisely, laying the groundwork for the interpreter.