Data Builder

The Data Builder is data-conductor's powerful SQL editor and data transformation tool. It allows you to create, test, and manage SQL data steps that form the building blocks of your data pipelines.

Data Builder Overview Complete tour of the Data Builder interface

Overview

The Data Builder provides:

  • SQL Editor with syntax highlighting and auto-completion
  • Real-time query execution with immediate results
  • Pipeline variables for dynamic SQL queries
  • Version management with diff comparison
  • Preview mode with variable substitution
  • Multiple database integrations

Data Builder Interface Main Data Builder interface showing the SQL editor

Creating Data Steps

Starting a New Data Step

  1. Navigate to Data Builder from the main menu
  2. Click + New to create a new data step
  3. You'll see the SQL editor interface with several tabs:
  4. Editor: Write and edit your SQL
  5. Preview: See SQL with variables replaced
  6. Results: View execution results
  7. Diff: Compare with other versions (when available)

New Data Step Creating a new data step

Basic Information

Fill in the required fields:

  • Name: A descriptive name for your data step
  • Description: Optional description of what this step does
  • Integration: Select which database to connect to

Data Step Form Basic data step configuration

SQL Editor Features

Syntax Highlighting and Auto-completion

The Monaco editor provides: - SQL syntax highlighting - Auto-completion for keywords - Error detection and highlighting - Multiple cursor support - Find and replace functionality

SQL Editor Features Advanced SQL editor with syntax highlighting

Writing SQL Queries

Write your SQL queries in the editor. Here's a simple example:

SELECT
    customer_id,
    customer_name,
    order_date,
    total_amount,
    status
FROM orders
WHERE order_date >= '2024-01-01'
    AND status IN ('completed', 'shipped')
ORDER BY order_date DESC
LIMIT 100

SQL Query Example Writing a SQL query in the editor

Pipeline Variables

Pipeline variables make your SQL queries dynamic and reusable.

Pipeline Variables Working with pipeline variables

Defining Variables

Variables use mustache syntax: {{VARIABLE_NAME}}

SELECT
    customer_id,
    customer_name,
    order_date,
    total_amount
FROM orders
WHERE order_date >= '{{START_DATE}}'
    AND order_date <= '{{END_DATE}}'
    AND region = '{{REGION}}'
    AND total_amount >= {{MIN_AMOUNT}}
ORDER BY order_date DESC
LIMIT {{LIMIT}}

SQL with Variables SQL query using pipeline variables

Variable Configuration

Click the Variables button to manage your variables:

  1. Variable Name: The name used in your SQL (without braces)
  2. Default Value: Value used when no specific value is provided
  3. Test Value: Value used for testing and preview
  4. Type: String, Number, Boolean, or JSON

Variable Configuration Configuring pipeline variables

Variable Types

String Variables

WHERE customer_name = '{{CUSTOMER_NAME}}'

Number Variables

WHERE total_amount >= {{MIN_AMOUNT}}
LIMIT {{ROW_LIMIT}}

Boolean Variables

WHERE active = {{IS_ACTIVE}}

JSON Variables

WHERE metadata @> '{{FILTER_JSON}}'

Preview Mode

The Preview tab shows your SQL with all variables replaced with their test values.

Preview Mode Preview showing SQL with variables replaced

This helps you: - Verify variable substitution is working correctly - See the actual SQL that will be executed - Debug variable-related issues

Executing Queries

Running Your SQL

  1. Write your SQL query in the Editor tab
  2. Configure any variables needed
  3. Click the Execute button
  4. Switch to the Results tab to see output

Query Execution Executing a SQL query and viewing results

Viewing Results

The Results tab displays: - Execution time in milliseconds - Row count of results returned - Data table with horizontal scrolling - Error messages if the query failed

Query Results Viewing query execution results

Error Handling

If your query fails, you'll see: - The specific error message from the database - Line numbers where applicable - Suggestions for common fixes

Query Error Handling SQL query errors

Data Return Options

Configure how your data step behaves in pipelines:

Data Return Options Configuring data return options

Return Data to Pipeline

When enabled, your query results can be used by other pipeline steps:

  • Format: Choose JSON, CSV, or XML output format
  • Variable Name: Name to use when referencing this data in other steps

Example Usage

If you set the variable name to customer_data, other pipeline steps can reference it:

SELECT * FROM pipeline_data
WHERE customer_id IN ({{customer_data.customer_ids}})

Version Management

data-conductor automatically manages versions of your data steps.

Version Management Understanding data step versioning

Version Creation

A new version is created when: - You save changes to a data step that's actively used in deployments - The system detects the SQL has been modified - You're updating a data step that has running pipelines

Version Comparison

Use the Diff tab to compare versions:

Version Diff Comparing different versions of a data step

The diff view shows: - Added lines in green - Removed lines in red - Modified lines highlighted - Context lines for reference

Database Integrations

Supported Databases

data-conductor supports multiple database types: - PostgreSQL - MySQL - Microsoft SQL Server - SQLite - BigQuery (Google Cloud) - Snowflake - Amazon Redshift

Database Types Supported database integrations

Testing Connections

Always test your database connections:

  1. Select your integration from the dropdown
  2. The editor will validate the connection
  3. Error messages appear if connection fails

Connection Test Testing database connections in the editor

Advanced Features

SQL Optimization Tips

Use LIMIT for Testing

SELECT * FROM large_table
WHERE condition = '{{VALUE}}'
LIMIT 10  -- Remove or increase for production

Index-Friendly Queries

-- Good: Uses index
WHERE created_date >= '{{START_DATE}}'

-- Avoid: Prevents index usage
WHERE DATE(created_date) >= '{{START_DATE}}'

Variable Placement

-- Good: Parameterized
WHERE status = '{{STATUS}}'

-- Avoid: SQL injection risk (though data-conductor sanitizes)
WHERE status = {{STATUS_RAW}}

Performance Monitoring

Monitor query performance: - Execution Time: Shown in results - Row Count: Indicates data volume - Database Load: Check with your DBA for long-running queries

Performance Metrics Monitoring query performance

Common Patterns

Date Range Queries

SELECT *
FROM events
WHERE event_date >= '{{START_DATE}}'
    AND event_date < '{{END_DATE}}'

Incremental Processing

SELECT *
FROM transactions
WHERE last_modified > '{{LAST_PROCESSED_TIME}}'
ORDER BY last_modified

Conditional Logic

SELECT
    customer_id,
    CASE
        WHEN total_orders > {{VIP_THRESHOLD}} THEN 'VIP'
        WHEN total_orders > {{REGULAR_THRESHOLD}} THEN 'Regular'
        ELSE 'New'
    END as customer_tier
FROM customer_summary

Data Aggregation

SELECT
    region,
    DATE_TRUNC('{{PERIOD}}', order_date) as period,
    COUNT(*) as order_count,
    SUM(total_amount) as revenue
FROM orders
WHERE order_date >= '{{START_DATE}}'
GROUP BY region, DATE_TRUNC('{{PERIOD}}', order_date)
ORDER BY period DESC, revenue DESC

Troubleshooting

Common Issues

Variable Not Substituted - Check variable name matches exactly (case-sensitive) - Ensure variable is defined in the Variables panel - Verify test value is provided

Connection Timeout - Query may be too complex or slow - Check database server performance - Consider adding LIMIT clause for testing

Permission Denied - Verify database user has required permissions - Check table/schema access rights - Confirm integration credentials are correct

Syntax Errors - Use the Preview tab to see final SQL - Check for missing quotes around string variables - Verify database-specific SQL syntax

Getting Help

If you encounter issues:

  1. Check the Results tab for specific error messages
  2. Use Preview to verify variable substitution
  3. Test with simpler queries first
  4. Consult your database documentation
  5. Contact your administrator for permission issues

Next Steps

Now that you understand the Data Builder:

Ready to deploy your data steps? Continue to Deployment Manager!