Workflow YAML reference

Reference for the workflow export/import YAML format, including node fields, edges, and version differences.

This page documents the YAML format that Transformation Builder uses for exporting and importing workflows. Use this reference when you need to share workflows between environments, store them in version control, or integrate with external runtime systems.

The format has two versions. Version 2 is the current format that Transformation Builder produces on export. It organizes nodes as a flat array in topological order (cte_nodes) with a separate edges array for connections. Version 1 is an older format that uses a nodes key with inline depends_on references for connections. Transformation Builder can import both versions, but always exports version 2.

Export

Produces a YAML file in version 2 format. You can trigger an export from the Export button in the Transformation Builder toolbar, or through the API. The exported file can be stored in a repository, shared with colleagues, or passed to an external runtime for execution.

Import

Accepts both version 2 (the current format) and version 1 (an earlier format with depends_on-based connections). Use the Import function in the Builder and select a .yaml or .yml file.

How the export is formed

The export generator builds the YAML file through four steps:

  1. Topological sort of nodes. The graph of nodes and edges is sorted so that source nodes come first, followed by transformation nodes in dependency order, and finally the Output node. If the graph contains a cycle, a fallback order (by node list position) is used.

  2. Source list per node. For each node, the generator collects an ordered list of predecessor node IDs from the edges where target equals the current node's ID. For example, an SQL Transform node with two inputs will have a sources list containing exactly two IDs in the correct order.

  3. cte_nodes array. Each node is written as a record in the cte_nodes array, in topological order. See cte_nodes structure below for field details.

  4. Top-level YAML assembly. The generator combines the cte_nodes array with the edges list, workflow metadata, and optional fields (viewport, schedule) into the final document.

Top-level keys

The root of a version 2 YAML file contains the following keys:

Key
Presence
Description

version

Always

Format version number. Always 2 for current exports.

name

Always

Workflow name.

description

Always

Workflow description. May be an empty string.

cte_nodes

Always

Array of node records in topological order. See cte_nodes structure.

edges

Always

Array of edge objects. See edges structure.

output_node_id

When an Output node exists

The ID of the Output node in the workflow.

viewport

When set in the workflow

Canvas position and zoom level: { x, y, zoom }. Used to restore the visual layout on import.

schedule

When schedule is enabled

Execution schedule: { cron, timezone }. Default cron is 0 0 * * *, default timezone is UTC.

cte_nodes structure

Each entry in the cte_nodes array represents one node in the workflow graph.

Field
Description

id

Unique node identifier (string).

type

Node type. One of: telematics, business, filter, resample, sql, arithmetic, custom, output.

label

Display name shown on the canvas. Falls back to the node ID if not set.

description

Node description. May be an empty string.

position

Canvas coordinates as { x, y }.

sources

Ordered list of predecessor node IDs, derived from the graph edges. Empty for source nodes.

params

Node configuration parameters. The specific fields depend on the node type. See the Transformation Builder documentation for parameter details per node type.

width, height

Optional. Canvas dimensions for the node, included only when explicitly set.

Params cleaning. The available_tables and available_columns fields are removed from params during export. These fields are populated at runtime when the Builder connects to the database and should not be stored in YAML.

SQL type with multiple sources. When a node of type sql has two or more sources, the export adds a join_spec field to the record. This is an array with one element containing the join configuration:

The type value is taken from the node's join_type parameter (converted to lowercase), and on_condition is taken from join_condition. For SQL nodes with two sources, the join information appears in both params and join_spec.

edges structure

The edges array defines connections between nodes in the workflow graph.

Each edge is an object with two fields:

Field
Description

source

The ID of the node where the edge originates.

target

The ID of the node where the edge terminates.

Edge IDs from the Builder interface are not preserved in the export. On import, new edge IDs are generated automatically.

Import

Version detection

The Builder determines the YAML format version using the following logic:

  • If the root contains version: 2 or the key cte_nodes, the file is processed as version 2.

  • Otherwise, version 1 is expected.

Version 2 import

The Builder iterates the cte_nodes array in order. For each record:

  • The id and type are read. The type is converted to lowercase. Nodes with type python are skipped without raising an error.

  • Parameters are read from the params key (or config as a fallback). For sql-type nodes, if a join_spec field is present in the record, it is assigned to the node's join configuration.

  • The edges array is parsed into source-target pairs, and new edge IDs are generated.

  • The viewport and schedule fields are preserved if present in the YAML.

Version 1 import (backward compatibility)

Version 1 files use a nodes key (array or dictionary) and optionally an edges array or depends_on fields within each node.

The Builder processes version 1 files as follows:

  • Supported node types are the same as version 2: telematics, business, filter, resample, sql, arithmetic, custom, output.

  • Connections between nodes can be defined in two ways: a top-level edges array, or a depends_on list within each node.

  • The inputs and outputs fields on each node are normalized to objects with { name, type } structure.

  • If edges include sourceHandle or targetHandle port identifiers, the import adjusts node ports accordingly for correct display in the Builder interface.

YAML structure template

The following template shows the complete structure of a version 2 YAML file with annotations:

Example

The following example shows a complete version 2 workflow that reads telematics sensor data, joins it with sensor descriptions from the business schema, applies an arithmetic transformation to convert a column type, and writes the results to an output table.

This workflow performs the following steps:

  1. node-telematics-1 reads device_id, device_time, and value columns from the inputs table in the raw_telematics_data schema.

  2. node-business-1 reads sensor_id, device_id, and sensor_label from the sensor_description table in the raw_business_data schema.

  3. node-sql-1 joins the two sources on device_id using a LEFT JOIN, selecting all telematics columns plus the sensor_label from the business source.

  4. node-arithmetic-1 adds a computed column value_num by casting the text value column to a numeric type.

  5. node-output-1 configures the result to be written to the enriched_vehicle_metrics table with append mode, using device_id and device_time as the primary key.

circle-info

The export does not include available_tables or available_columns in params. These fields are populated dynamically when the Builder connects to the database. For SQL nodes with two sources, join information appears in both params and join_spec.

Next steps

Last updated

Was this helpful?