Workflow YAML reference
Reference for the workflow export/import YAML format, including node fields, edges, and version differences.
This page documents the YAML format that Transformation Builder uses for exporting and importing workflows. Use this reference when you need to share workflows between environments, store them in version control, or integrate with external runtime systems.
The format has two versions. Version 2 is the current format that Transformation Builder produces on export. It organizes nodes as a flat array in topological order (cte_nodes) with a separate edges array for connections. Version 1 is an older format that uses a nodes key with inline depends_on references for connections. Transformation Builder can import both versions, but always exports version 2.
How the export is formed
The export generator builds the YAML file through four steps:
Topological sort of nodes. The graph of nodes and edges is sorted so that source nodes come first, followed by transformation nodes in dependency order, and finally the Output node. If the graph contains a cycle, a fallback order (by node list position) is used.
Source list per node. For each node, the generator collects an ordered list of predecessor node IDs from the edges where
targetequals the current node's ID. For example, an SQL Transform node with two inputs will have asourceslist containing exactly two IDs in the correct order.cte_nodes array. Each node is written as a record in the
cte_nodesarray, in topological order. See cte_nodes structure below for field details.Top-level YAML assembly. The generator combines the
cte_nodesarray with the edges list, workflow metadata, and optional fields (viewport, schedule) into the final document.
Top-level keys
The root of a version 2 YAML file contains the following keys:
version
Always
Format version number. Always 2 for current exports.
name
Always
Workflow name.
description
Always
Workflow description. May be an empty string.
output_node_id
When an Output node exists
The ID of the Output node in the workflow.
viewport
When set in the workflow
Canvas position and zoom level: { x, y, zoom }. Used to restore the visual layout on import.
schedule
When schedule is enabled
Execution schedule: { cron, timezone }. Default cron is 0 0 * * *, default timezone is UTC.
cte_nodes structure
Each entry in the cte_nodes array represents one node in the workflow graph.
id
Unique node identifier (string).
type
Node type. One of: telematics, business, filter, resample, sql, arithmetic, custom, output.
label
Display name shown on the canvas. Falls back to the node ID if not set.
description
Node description. May be an empty string.
position
Canvas coordinates as { x, y }.
sources
Ordered list of predecessor node IDs, derived from the graph edges. Empty for source nodes.
params
Node configuration parameters. The specific fields depend on the node type. See the Transformation Builder documentation for parameter details per node type.
width, height
Optional. Canvas dimensions for the node, included only when explicitly set.
Params cleaning. The available_tables and available_columns fields are removed from params during export. These fields are populated at runtime when the Builder connects to the database and should not be stored in YAML.
SQL type with multiple sources. When a node of type sql has two or more sources, the export adds a join_spec field to the record. This is an array with one element containing the join configuration:
The type value is taken from the node's join_type parameter (converted to lowercase), and on_condition is taken from join_condition. For SQL nodes with two sources, the join information appears in both params and join_spec.
edges structure
The edges array defines connections between nodes in the workflow graph.
Each edge is an object with two fields:
source
The ID of the node where the edge originates.
target
The ID of the node where the edge terminates.
Edge IDs from the Builder interface are not preserved in the export. On import, new edge IDs are generated automatically.
Import
Version detection
The Builder determines the YAML format version using the following logic:
If the root contains
version: 2or the keycte_nodes, the file is processed as version 2.Otherwise, version 1 is expected.
Version 2 import
The Builder iterates the cte_nodes array in order. For each record:
The
idandtypeare read. The type is converted to lowercase. Nodes with typepythonare skipped without raising an error.Parameters are read from the
paramskey (orconfigas a fallback). Forsql-type nodes, if ajoin_specfield is present in the record, it is assigned to the node's join configuration.The
edgesarray is parsed into source-target pairs, and new edge IDs are generated.The
viewportandschedulefields are preserved if present in the YAML.
Version 1 import (backward compatibility)
Version 1 files use a nodes key (array or dictionary) and optionally an edges array or depends_on fields within each node.
The Builder processes version 1 files as follows:
Supported node types are the same as version 2:
telematics,business,filter,resample,sql,arithmetic,custom,output.Connections between nodes can be defined in two ways: a top-level
edgesarray, or adepends_onlist within each node.The
inputsandoutputsfields on each node are normalized to objects with{ name, type }structure.If edges include
sourceHandleortargetHandleport identifiers, the import adjusts node ports accordingly for correct display in the Builder interface.
YAML structure template
The following template shows the complete structure of a version 2 YAML file with annotations:
Example
The following example shows a complete version 2 workflow that reads telematics sensor data, joins it with sensor descriptions from the business schema, applies an arithmetic transformation to convert a column type, and writes the results to an output table.
This workflow performs the following steps:
node-telematics-1 reads
device_id,device_time, andvaluecolumns from theinputstable in theraw_telematics_dataschema.node-business-1 reads
sensor_id,device_id, andsensor_labelfrom thesensor_descriptiontable in theraw_business_dataschema.node-sql-1 joins the two sources on
device_idusing aLEFT JOIN, selecting all telematics columns plus thesensor_labelfrom the business source.node-arithmetic-1 adds a computed column
value_numby casting the textvaluecolumn to a numeric type.node-output-1 configures the result to be written to the
enriched_vehicle_metricstable withappendmode, usingdevice_idanddevice_timeas the primary key.
The export does not include available_tables or available_columns in params. These fields are populated dynamically when the Builder connects to the database. For SQL nodes with two sources, join information appears in both params and join_spec.
Next steps
Transformation Builder: Learn how to design workflows using the visual interface.
Transformation layer: Understand how processed data is organized into schemas and how to query it.
Last updated
Was this helpful?