Open In App

How to Parse S-expressions in JavaScript?

Last Updated : 30 Jul, 2024
Comments
Improve
Suggest changes
Like Article
Like
Report

The most simple and powerful notation used in Lisp programming is S-expressions and they are composed of atoms, say numbers or symbols or strings and lists (which too are S-expressions) parsing S-expressions in JavaScript can be done using a few basic techniques.

What are S-expressions?

To represent nested list data structures, we use s-expressions, it has extensive applications in Lisp as well as other functional programming languages, an s-expression is either an atom or a list of s-expressions.

Examples:

Atom: 42, hello, +
List: (1 2 3), (define (square x) (* x x))

Approach

To implement an S-expression parser several things have to be done:

  • Breaks the input string into tokens.
  • Builds a tree out of these tokens representing the s-expression.

Steps to Parse S-expressions in JavaScript

Step 1: Tokenizing the Input

In the first step, you have to break the input string into tokens.

function tokenize(input) {
return input
.replace(/\(/g, ' ( ')
.replace(/\)/g, ' ) ')
.trim()
.split(/\s+/);
}

Step 2: Parsing the Tokens

A function is going to be written by us, that will take a list of tokens and parse them into a tree structure.

function parse(tokens) {
if (tokens.length === 0) {
throw new SyntaxError("Unexpected EOF");
}

let token = tokens.shift();

if (token === '(') {
let list = [];
while (tokens[0] !== ')') {
list.push(parse(tokens));
}
tokens.shift(); // remove ')'
return list;
} else if (token === ')') {
throw new SyntaxError("Unexpected )");
} else {
return atom(token);
}
}

function atom(token) {
if (!isNaN(token)) {
return Number(token);
} else {
return token;
}
}

Step 3: Using the Parser

Having established our tokenizer and parser functions and we are now in a position to employ them in parsing an S-expression.

function parseSExpression(input) {
const tokens = tokenize(input);
return parse(tokens);
}

// Test the parser
const input = "(define (square x) (* x x))";
const parsed = parseSExpression(input);
console.log(JSON.stringify(parsed, null, 2));

Example 1: In the given below example we have a square function that takes one argument x in order to return its value squared, a simple function definition in Lisp like fashion is indicated by S-expression here.

JavaScript
function tokenize(input) {
    return input
        .replace(/\(/g, ' ( ')
        .replace(/\)/g, ' ) ')
        .trim()
        .split(/\s+/);
}

function parse(tokens) {
    if (tokens.length === 0) {
        throw new SyntaxError("Unexpected EOF");
    }

    let token = tokens.shift();

    if (token === '(') {
        let list = [];
        while (tokens[0] !== ')') {
            list.push(parse(tokens));
        }
        tokens.shift(); // remove ')'
        return list;
    } else if (token === ')') {
        throw new SyntaxError("Unexpected )");
    } else {
        return atom(token);
    }
}

function atom(token) {
    if (!isNaN(token)) {
        return Number(token);
    } else {
        return token;
    }
}

function parseSExpression(input) {
    const tokens = tokenize(input);
    return parse(tokens);
}

// Example 1
const input1 = "(define (square x) (* x x))";
const parsed1 = parseSExpression(input1);
console.log("Example 1 - Input S-expression:", input1);
console.log("Example 1 - Parsed S-expression:", JSON.stringify(parsed1, null, 2));

Output
Example 1 - Input S-expression: (define (square x) (* x x))
Example 1 - Parsed S-expression: [
  "define",
  [
    "square",
    "x"
  ],
  [
    "*",
    "x",
    "x"
  ]
]

Example 2: For instance, we are dealing with the parsing of a conditional expression, basically, S-expression represents if statement that checks if x > 0, when true it adds 1 to x else it subtracts 1 from x.

JavaScript
function tokenize(input) {
    return input
        .replace(/\(/g, ' ( ')
        .replace(/\)/g, ' ) ')
        .trim()
        .split(/\s+/);
}

function parse(tokens) {
    if (tokens.length === 0) {
        throw new SyntaxError("Unexpected EOF");
    }

    let token = tokens.shift();

    if (token === '(') {
        let list = [];
        while (tokens[0] !== ')') {
            list.push(parse(tokens));
        }
        tokens.shift(); // remove ')'
        return list;
    } else if (token === ')') {
        throw new SyntaxError("Unexpected )");
    } else {
        return atom(token);
    }
}

function atom(token) {
    if (!isNaN(token)) {
        return Number(token);
    } else {
        return token;
    }
}

function parseSExpression(input) {
    const tokens = tokenize(input);
    return parse(tokens);
}
// Example 2
const input3 = "(if (> x 0) (+ x 1) (- x 1))";
const parsed3 = parseSExpression(input3);
console.log("Example 2 - Input S-expression:", input3);
console.log("Example 2 - Parsed S-expression:", JSON.stringify(parsed3, null, 2));

Output
Example 2 - Input S-expression: (if (> x 0) (+ x 1) (- x 1))
Example 2 - Parsed S-expression: [
  "if",
  [
    ">",
    "x",
    0
  ],
  [
    "+",
    "x",
    1
  ],
  [
    "-",
    "x",
    1
  ...

Conclusion

In conclusion if you want to parse S-expressions in JavaScript you have to tokenize the input string then recursively parse tokens into a tree structure, this resultant structure can be used for other processing like interpreting or evaluating the S-expression and this parser handles basic S-expressions but can be modified to accommodate more complex syntaxes and features as required.


Next Article

Similar Reads