Data Builder¶

The Data Builder is data-conductor's powerful SQL editor and data transformation tool. It allows you to create, test, and manage SQL data steps that form the building blocks of your data pipelines.

Data Builder Overview Complete tour of the Data Builder interface

Overview¶

The Data Builder provides:

SQL Editor with syntax highlighting and auto-completion
Real-time query execution with immediate results
Pipeline variables for dynamic SQL queries
Version management with diff comparison
Preview mode with variable substitution
Multiple database integrations

Main Data Builder interface showing the SQL editor

Creating Data Steps¶

Starting a New Data Step¶

Navigate to Data Builder from the main menu
Click + New to create a new data step
You'll see the SQL editor interface with several tabs:
Editor: Write and edit your SQL
Preview: See SQL with variables replaced
Results: View execution results
Diff: Compare with other versions (when available)

Creating a new data step

Basic Information¶

Fill in the required fields:

Name: A descriptive name for your data step
Description: Optional description of what this step does
Integration: Select which database to connect to

Data Step Form Basic data step configuration

SQL Editor Features¶

Syntax Highlighting and Auto-completion¶

The Monaco editor provides: - SQL syntax highlighting - Auto-completion for keywords - Error detection and highlighting - Multiple cursor support - Find and replace functionality

SQL Editor Features Advanced SQL editor with syntax highlighting

Writing SQL Queries¶

Write your SQL queries in the editor. Here's a simple example:

SELECT
    customer_id,
    customer_name,
    order_date,
    total_amount,
    status
FROM orders
WHERE order_date >= '2024-01-01'
    AND status IN ('completed', 'shipped')
ORDER BY order_date DESC
LIMIT 100

SQL Query Example Writing a SQL query in the editor

Pipeline Variables¶

Pipeline variables make your SQL queries dynamic and reusable.

Working with pipeline variables

Defining Variables¶

Variables use mustache syntax: {{VARIABLE_NAME}}

SELECT
    customer_id,
    customer_name,
    order_date,
    total_amount
FROM orders
WHERE order_date >= '{{START_DATE}}'
    AND order_date <= '{{END_DATE}}'
    AND region = '{{REGION}}'
    AND total_amount >= {{MIN_AMOUNT}}
ORDER BY order_date DESC
LIMIT {{LIMIT}}

SQL with Variables SQL query using pipeline variables

Variable Configuration¶

Click the Variables button to manage your variables:

Variable Name: The name used in your SQL (without braces)
Default Value: Value used when no specific value is provided
Test Value: Value used for testing and preview
Type: String, Number, Boolean, or JSON

Variable Configuration Configuring pipeline variables

Variable Types¶

String Variables

WHERE customer_name = '{{CUSTOMER_NAME}}'

Number Variables

WHERE total_amount >= {{MIN_AMOUNT}}
LIMIT {{ROW_LIMIT}}

Boolean Variables

WHERE active = {{IS_ACTIVE}}

JSON Variables

WHERE metadata @> '{{FILTER_JSON}}'

Preview Mode¶

The Preview tab shows your SQL with all variables replaced with their test values.

Preview Mode Preview showing SQL with variables replaced

This helps you: - Verify variable substitution is working correctly - See the actual SQL that will be executed - Debug variable-related issues

Executing Queries¶

Running Your SQL¶

Write your SQL query in the Editor tab
Configure any variables needed
Click the Execute button
Switch to the Results tab to see output

Query Execution Executing a SQL query and viewing results

Viewing Results¶

The Results tab displays: - Execution time in milliseconds - Row count of results returned - Data table with horizontal scrolling - Error messages if the query failed

Query Results Viewing query execution results

Error Handling¶

If your query fails, you'll see: - The specific error message from the database - Line numbers where applicable - Suggestions for common fixes

Handling SQL query errors

Data Return Options¶

Configure how your data step behaves in pipelines:

Configuring data return options

Return Data to Pipeline¶

When enabled, your query results can be used by other pipeline steps:

Format: Choose JSON, CSV, or XML output format
Variable Name: Name to use when referencing this data in other steps

Example Usage¶

If you set the variable name to customer_data, other pipeline steps can reference it:

SELECT * FROM pipeline_data
WHERE customer_id IN ({{customer_data.customer_ids}})

Version Management¶

data-conductor automatically manages versions of your data steps.

Version Management Understanding data step versioning

Version Creation¶

A new version is created when: - You save changes to a data step that's actively used in deployments - The system detects the SQL has been modified - You're updating a data step that has running pipelines

Version Comparison¶

Use the Diff tab to compare versions:

Version Diff Comparing different versions of a data step

The diff view shows: - Added lines in green - Removed lines in red - Modified lines highlighted - Context lines for reference

Database Integrations¶

Supported Databases¶

data-conductor supports multiple database types: - PostgreSQL - MySQL - Microsoft SQL Server - SQLite - BigQuery (Google Cloud) - Snowflake - Amazon Redshift

Database Types Supported database integrations

Testing Connections¶

Always test your database connections:

Select your integration from the dropdown
The editor will validate the connection
Error messages appear if connection fails

Connection Test Testing database connections in the editor

Advanced Features¶

SQL Optimization Tips¶

Use LIMIT for Testing

SELECT * FROM large_table
WHERE condition = '{{VALUE}}'
LIMIT 10  -- Remove or increase for production

Index-Friendly Queries

-- Good: Uses index
WHERE created_date >= '{{START_DATE}}'

-- Avoid: Prevents index usage
WHERE DATE(created_date) >= '{{START_DATE}}'

Variable Placement

-- Good: Parameterized
WHERE status = '{{STATUS}}'

-- Avoid: SQL injection risk (though data-conductor sanitizes)
WHERE status = {{STATUS_RAW}}

Performance Monitoring¶

Monitor query performance: - Execution Time: Shown in results - Row Count: Indicates data volume - Database Load: Check with your DBA for long-running queries

Performance Metrics Monitoring query performance

Common Patterns¶

Date Range Queries¶

SELECT *
FROM events
WHERE event_date >= '{{START_DATE}}'
    AND event_date < '{{END_DATE}}'

Incremental Processing¶

SELECT *
FROM transactions
WHERE last_modified > '{{LAST_PROCESSED_TIME}}'
ORDER BY last_modified

Conditional Logic¶

SELECT
    customer_id,
    CASE
        WHEN total_orders > {{VIP_THRESHOLD}} THEN 'VIP'
        WHEN total_orders > {{REGULAR_THRESHOLD}} THEN 'Regular'
        ELSE 'New'
    END as customer_tier
FROM customer_summary

Data Aggregation¶

SELECT
    region,
    DATE_TRUNC('{{PERIOD}}', order_date) as period,
    COUNT(*) as order_count,
    SUM(total_amount) as revenue
FROM orders
WHERE order_date >= '{{START_DATE}}'
GROUP BY region, DATE_TRUNC('{{PERIOD}}', order_date)
ORDER BY period DESC, revenue DESC

Troubleshooting¶

Common Issues¶

Variable Not Substituted - Check variable name matches exactly (case-sensitive) - Ensure variable is defined in the Variables panel - Verify test value is provided

Connection Timeout - Query may be too complex or slow - Check database server performance - Consider adding LIMIT clause for testing

Permission Denied - Verify database user has required permissions - Check table/schema access rights - Confirm integration credentials are correct

Syntax Errors - Use the Preview tab to see final SQL - Check for missing quotes around string variables - Verify database-specific SQL syntax

Getting Help¶

If you encounter issues:

Check the Results tab for specific error messages
Use Preview to verify variable substitution
Test with simpler queries first
Consult your database documentation
Contact your administrator for permission issues

Next Steps¶

Now that you understand the Data Builder:

Create your first pipeline with multiple data steps
Set up deployments to run your data steps automatically
Learn about pipeline variables for advanced use cases
Explore API integrations to connect external systems

Ready to deploy your data steps? Continue to Deployment Manager!