Skip to content

dbt-prql #375

@max-sixty

Description

@max-sixty

I've been thinking of ways that people could really start using PRQL practically. Somehow, something needs to transform their PRQL into SQL between the user and the DB.

Some ideas:

  • An specific interactive environment, a bit like PyPrql have built / are building
  • An editor plugin, similar to what Malloy have built
  • A plugin to a batch tool, such as dbt (Thoughts on dbt #13)

They are all compelling! It might be that the dbt plugin is the approach that can get the most amount of traction the fastest. While it might not take advantage of some of the things that PRQL allows in the long term — auto-complete, type-inference — it also has much lower requirements — just compile the query when otherwise compiling the SQL.

dbt would be the natural choice given how widely it's used.

A problem with dbt packages is that IIUC they don't allow for executing arbitrary code — they can only execute macros, which have a very defined scope.

So a couple of alternatives:

  1. dbt package + import hack — We write something that patches the dbt jinja context to allow a prql macro which compiles the contents to SQL — e.g. {{% prql ... %}}
  • The advantage of this is that it's exactly like running existing dbt models, but people can write PRQL
  • The disadvantage is that it requires extreme python hackery — check out @mtkennerly's approach to patching poetry here. Ours would be a bit easier than this, since we don't need to account for multiple installation approaches, but it wouldn't be simple.
  • We'd be using dbt's internal APIs, which could break on dbt upgrades.
  • I'm not sure whether the dbt folks would approve? (@kwigley @drewbanin ?) I'm sure they'd approve of us building on top of dbt, but not so sure about patching their library's jinja environment...
  1. Wrapper tool — We write a tool which pre-compiles files for dbt; like prql dbt models/ -- dbt run -m foo
  • The advantage is that the implementation would be extremely simple
  • The disadvantage is that it adds a layer of indirection to dbt — It means prql might need to understand dbt's file structure, and run a similar process to that which dbt is already running to walk the file tree — in order to compile .prql files to .sql, so that dbt sees the .sql files. Processes which run dbt now need to run a different command.
  • Up a level, it offloads complexity onto to user, rather than encapsulates it in the tool — something we should try and avoid.
  • But possibly it's the practical approach is the short-term.

Either of these requires a way of allowing jinja in PRQL — probably we could just treat {{ }} as comments without giving up anything in the language. (Hopefully people don't need as much jinja with PRQL, given we have abstractions like functions, but we don't need to be strict about it)

This could either be implemented by calling out to PyPrql, or through embedding the rust binary in a python package — this is something dbt do already with https://github.com/dbt-labs/dbt-extractor.

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions