Instaspec

Specify, parse, and transform data in easy-mode.

Instaspec is a concise and easy rule syntax for parsing and transforming data that separates the shape of data from predicates.

Instaspec syntax imitates the comfortable interface of EBNF (ala Instaparse) to describe data. Grammars are translated to Specs. Transformations are facilitated with a construction helper with the same syntax.

Q: What’s the difference between a cat and a comma?

A: One has claws at the end of its paws, and the other is a pause at the end of a clause.

Table of Contents

Usage
Grammars
API
Rationale

Usage

Instaspec is a Clojure(Script) library that requires either Spec or Malli to be a provided dependency.

Adding dependencies

Choose a prerequisite dependency:

metosin/malli {:mvn/version "0.8.9"}

Add Instaspec as a dependency:

app.hummi/instaspec {:mvn/version "0.0.1"}

Parsing

Require instaspec in your namespace:

(ns example.core
  (:require [instaspec.malli :as is]))

Define a grammar:

(def hiccup-grammar
  '{element (or literal tree)
    tree [tag attrs? element*]
    literal (or <nil> <boolean> <number> <string>)
    tag <keyword>
    attrs <map>})

Create a parser from the grammar:

(def hiccup-parser
  (is/parser hiccup-grammar))

Given some data to parse:

(def svg-data
  [:svg {:viewBox [0 0, 10 10]}
   "hello world!"
   [:g
    [:circle {:cx 1, :cy 2}]
    [:rect {:width 2, :height 3}]]])

Grammars are symmetrical, they can take an AST and transform it:

(is/rewrite '{tree     [tag element*]
              literal  nil}
            (hiccup-parser svg-data))
;=>
[:svg :g :rect :rect]

In the above example we extract only tags, and remove literals.

TODO: ~ should invoke functions.

TODO: Aggregation can be done by closing over rewrite functions, but it might be nice to have a more friendly version of that. Perhaps everything should be an aggregation, and provide a default for replacement.

TODO: maybe provide several flavors of transformers?

TODO: Instaparse allows users to remove parts of the AST with <> annotation. This might be useful?

How does it work?

Parsing the data binds names from your grammar:

(hiccup-parser svg-data)
;=>
[tree {tag :svg,
       attrs? {:viewBox [0 0 10 10]},
       element* [[literal "hello world!"]
                 [tree {tag :g,
                        attrs? nil,
                        element* [[tree {tag :circle,
                                         attrs? {:cx 1, :cy 2},
                                         element* []}]
      [tree {tag :rect,
             attrs? {:width 2, :height 3},
             element* []}]]}]]}]

This is a convenient AST that can be reassembled with is/rewrite. Normally you shouldn’t need to look at the output of is/parse.

Handling errors

When failing to parse, an ex-info is thrown. If you prefer some other behavior then rebind the is/parse-fail function:

(binding [is/*parse-fail* (fn my-fail [schema data]
                            (println "Oh no, this is bad...")
                            :fail)]
  (hiccup-parser svg-data))

Validating

Use is/validator in the same way you would use is/parser. A validator only returns true or false.

Generating

See is/generate.

Conveniences

Define a function:

;; TBD
(is/defn [props? not-props] ...)

Grammars

Grammars are data. A grammar consists of name, rule pairs (similar to bindings). Rules may contain predicates and boolean logic. Predicates are identified by <>.

Names occurring in sequences treat ?, +, and * suffixes as regex operands. Names that do not resolve are treated as bindings.

In the example:

'{hiccup (or tree literal)
  tree [tag attrs? hiccup*]
  literal (or <nil> <boolean> <number> <string>)
  tag <keyword>
  attrs <map>}

<nil> is a predicate (resolves to the nil? function)
attrs? is the name for an optional value in a sequence
hiccup* will create a sequence of 0 or more matches
or is a special operator

Under consideration:

<nil>|<boolean> may get mapped to (or <nil> <boolean>)

API

is/rewrite is the primary interface to transform with.

is/parser creates a parser only. Parsers return a hiccup abstract syntax tree (AST).

Rationale

Sequence specifications are clearest when kept separate from predicates. EBNF provides clearer sequence expressions than s-expression RegEx. EBNF decomposes grammar and names those decompositions, which is useful for both parsing and processing.

Spec (Hickey) and similar libraries implement s-expression based RegEx interfaces to specify and parse data. These libraries are powerful.

Instaparse (Engelberg) is easy to use. But it parses text, not data. Much of the convenience is due to a superior interface: users specify grammars in EBNF rather than s-expressions.

Instaspec provides the convenience of Instaparse for data parsing by translating EBNF style grammar into popular data specification libraries.

Meander (Holdbrooks) shows that substitution expressions are an expressive way to construct outputs from parsed inputs. Other libraries tend to leave it up to the user to figure out ways to process what was parsed.

Instaspec provides a convenient abstraction for traversing and processing an AST based upon the names used to construct the EBNF grammar.

Problem

The s-expression interface to existing data parsing libraries conflates sequence parsing with predicates and named value capture. Expressions are deeply nested annotations that correctly define the objective, but are inscrutable. The user interface has been a barrier to adoption of these powerful libraries.
Beyond specifying and parsing, the user still has the job of transforming the data. Expressing this processing often leads to repetition as it requires a custom tree traversal implementation of the same structure already specified.

Solution

Decomplect sequences from predicates, and named value capture. Instaspec is a mapping from EBNF style grammar to Spec library s-expressions. EBNF consists of rules. The first rule a valid value in terms of other rules. A rule can only be a sequence, disjunction, or predicate. This restriction prevents complexion.
Provide implicit destructuring for node handling and recursion.

Goals

Ease of use

Sequences look like the sequence: tree [tag attr? child*]
Sequences only contain names and sequence features (?, +, *)
Declare predicates separately: tag keyword?
Create bindings for the names from the grammar

Augment existing libraries

Don’t try to reinvent or replace them
Don’t limit extensibility and composability
Do try to make existing libraries easier to use with expressive sugar.

Parsing text: Instaparse

EBNF is clear, concise, and precise
Instaparse just works! I can’t imagine how else I’d be able to make a parser
Predicates are literals or string regex rules
Supports different styles of parsing
I wish data parsing libraries were more like Instaparse

But not suitable for data:

Input must be text.
The resulting AST needs to be processed, often according to the same rules you already defined for parsing. So now you are back where you started: parsing again but now with data input instead of text input. This can be partially alleviated by using node transformations for simple nodes like numbers.

Parsing data: Specification libraries

There are many data parsing libraries to choose from:

Library	Primary Author
clojure/spec.alpha	richhickey
metosin/malli.alpha	ikitommi
prismatic/schema	w01fe
clojure/core.match	swannodette
noprompt/meander.epsilon	noprompt
cgrand/seqexp	cgrand
jclaggett/seqex	jclaggett

These libraries are high quality and powerful
Malli and Schema implement data driven specs
Meander has substitution, which is helpful for output transformations
Consuming nested bindings can be a challenge in all libraries
Seqex allows you to explicitly consume the bindings any way you’d like to
Seqex is very composable and extensible (it’s all functions)
Explain why a match fails is an excellent feature that most libraries provide

Motivation: Make sequences clearer

Let’s consider specification of function arguments.

Basic case: Reagent components often benefit from being able to supply optional attributes to be applied to their node (in this case we could create another arity version for optionality):

(defn hexagon
  "attrs must be a map, radius must be a number"
  [attrs? radius]
  [:g (merge {:stroke "green"} attrs)
   [:path {:d (make-points radius)}]])

Slightly more complicated case: Depth First Search over a hiccup tree. The sequence part should only contain sequency things.

tree := tag attrs? child*
tag := keyword?
attrs := map?
child := string? | tree

Limitations

The S-expression Regex interface was selected by library authors for good reasons. S-expressions allow libraries flexibility beyond what EBNF can express. Instaspec cannot expose the full range of capabilities that data parsing libraries have. Even so, the subset of capabilities that it does expose is substantial and useful.

References

Many of the aforementioned parsing libraries draw inspiration from General Parser Combinators in Racket (gll) (Vegard Øye)

The rationale for sequence expressions is explained in Illuminated Macros talk (Chris Houser, Jonathan Claggett)

The rationale for specifications is explained in Spec rationale (Rich Hickey)

Seqex parsing is explained in Structure and Interpretation of Malli Regex Schemas (Jaakkola)

Motivations for Malli are explained in Malli: Inside Data-driven Schemas (Reiman)

Instaparse is explained in Instaparse: What if context-free grammars were as easy to use as regular expressions? (Engelberg)

Name		Name	Last commit message	Last commit date
Latest commit History 21 Commits
.clj-kondo		.clj-kondo
.github/workflows		.github/workflows
examples/instaspec/examples		examples/instaspec/examples
img		img
src/instaspec		src/instaspec
test/instaspec		test/instaspec
.gitignore		.gitignore
Makefile		Makefile
README.adoc		README.adoc
deps.edn		deps.edn
tests.edn		tests.edn

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Instaspec

Usage

Adding dependencies

Parsing

How does it work?

Handling errors

Validating

Generating

Conveniences

Grammars

API

Rationale

Problem

Solution

Goals

Ease of use

Augment existing libraries

Parsing text: Instaparse

Parsing data: Specification libraries

Motivation: Make sequences clearer

Limitations

References

About

Uh oh!

Releases

Packages

Languages

hummiapp/instaspec

Folders and files

Latest commit

History

Repository files navigation

Instaspec

Usage

Adding dependencies

Parsing

How does it work?

Handling errors

Validating

Generating

Conveniences

Grammars

API

Rationale

Problem

Solution

Goals

Ease of use

Augment existing libraries

Parsing text: Instaparse

Parsing data: Specification libraries

Motivation: Make sequences clearer

Limitations

References

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages