Workflow Overview allows you to develop even complex reporting workflows in a simple, straightforward way - all from inside your notebooks.

What is a workflow?

For us, a (data) workflow means an executable process that performs one or many different tasks in a specific order.

In most cases, this means getting data from someplace, manipulating it, and then - maybe only under certain conditions - sending the data somewhere else.

You might also call this a data pipeline or DAG (Directed Acyclic Graph).

Here's a simple example:

A basic data workflow

Notebooks are Workflows

In, every notebook is a workflow - if it has at least one executable block (that is SQL queries, API calls, or message blocks).
This is automatically the case. You don't need to turn a notebook into a workflow or build workflows in a different place - notebooks are compiled into executable DAGs automatically.

The example graph above could look like this in a notebook:
- a SQL query
- an Email block, using data from the SQL query

A simple reporting workflow in a notebook

On execution, detects that the email needs the results of the query. Therefore it will run the query first and only send the email afterwards. The order of the blocks on the page is irrelevant.

What is a DAG?

A DAG is an acronym for Directed Acyclic Graph, which is a fancy way to say that a workflow:

  • can have tasks with dependencies between each other (graph)
  • can not have circular dependencies (acyclic)
  • is always executed in a specific order (directed)

The most common example of a DAG is probably a simple spreadsheet: Cells can reference each other, but not in a circular way. Then, the spreadsheet figures out in which order to calculate all the cells.

Notebooks are DAGs

Translated to a notebook this means:

  • Blocks can reference the contents of other blocks (e.g. the SQL code or the results of a query can be used in another block). This builds a dependency between the blocks, which could be represented as a graph.
  • Blocks can not have circular dependencies. For example, if the results of query_1 are used in query_2, you can't use results from query_2 in query_1.
  • Blocks in a notebook are executed in order. The exact order depends on the specific dependencies between the blocks. Blocks with no dependencies can be executed at the same time, whereas blocks that depend on each other can only be executed one after another.

The good news is that you don't really need to know or worry about this when working in Compiling the notebook into a DAG and executing it in the right order is something that just happens in the background automatically.

How to build workflows in notebooks

In order to build workflows in, we need to

  1. define executable tasks, e.g. SQL queries, email messages, API calls, or Slack messages
  2. (optionally) reference data from one task in other tasks, which creates dependencies between tasks
  3. (optionally) define custom execution logic, e.g. running a task only under a certain condition or repeating a task multiple times.

In, tasks are defined by blocks in the notebook, dependencies and execution logic are specified using Jinja templating syntax.

Defining Tasks

At this point, supports the following executable tasks:

  • SQL Queries
  • API Requests
  • Emails
  • Slack Messages

You can add such a task by simply adding a block with that type to a notebook.

Task Dependencies

Task dependencies are created when data from one block is used in another block.

For example:

  • The value of a parameter block called parameter_1 can be referenced in a SQL query like this {{ parameter_1 }}
  • The SQL statement of a block query_1 can be referenced in another query like this {{ query_1 }}
  • The results of query_1 can be referenced like this {{ }} (which returns a JSON representation of the results table)
  • The value of the first row of the column example in the results of query_1 would be called like this: {{[0]['example'] }}
  • Similarly, the results of an API request are called like this: {{ }}

Execution Logic

Execution logic is expressed using Jinja syntax, simply by wrapping the executable blocks inside of Jinja expressions.

For example:

  • You can add conditional branching logic like this:
    {% if some_condition = true %}
    do this
    {{ else }}
    do that
    {% endif %}

  • Or loop over a set of tasks:
    {% for i in range(0,10) %}
    do a task using {{ i }}  
    {% endfor %}

In a notebook, it could look like this:

Jump to