Data Builder¶
The Data Builder is data-conductor's powerful SQL editor and data transformation tool. It allows you to create, test, and manage SQL data steps that form the building blocks of your data pipelines.
Complete tour of the Data Builder interface
Overview¶
The Data Builder provides:
- SQL Editor with syntax highlighting and auto-completion
 - Real-time query execution with immediate results
 - Pipeline variables for dynamic SQL queries
 - Version management with diff comparison
 - Preview mode with variable substitution
 - Multiple database integrations
 
Main Data Builder interface showing the SQL editor
Creating Data Steps¶
Starting a New Data Step¶
- Navigate to Data Builder from the main menu
 - Click + New to create a new data step
 - You'll see the SQL editor interface with several tabs:
 - Editor: Write and edit your SQL
 - Preview: See SQL with variables replaced
 - Results: View execution results
 - Diff: Compare with other versions (when available)
 
Creating a new data step
Basic Information¶
Fill in the required fields:
- Name: A descriptive name for your data step
 - Description: Optional description of what this step does
 - Integration: Select which database to connect to
 
Basic data step configuration
SQL Editor Features¶
Syntax Highlighting and Auto-completion¶
The Monaco editor provides: - SQL syntax highlighting - Auto-completion for keywords - Error detection and highlighting - Multiple cursor support - Find and replace functionality
Advanced SQL editor with syntax highlighting
Writing SQL Queries¶
Write your SQL queries in the editor. Here's a simple example:
SELECT
    customer_id,
    customer_name,
    order_date,
    total_amount,
    status
FROM orders
WHERE order_date >= '2024-01-01'
    AND status IN ('completed', 'shipped')
ORDER BY order_date DESC
LIMIT 100
Writing a SQL query in the editor
Pipeline Variables¶
Pipeline variables make your SQL queries dynamic and reusable.
Working with pipeline variables
Defining Variables¶
Variables use mustache syntax: {{VARIABLE_NAME}}
SELECT
    customer_id,
    customer_name,
    order_date,
    total_amount
FROM orders
WHERE order_date >= '{{START_DATE}}'
    AND order_date <= '{{END_DATE}}'
    AND region = '{{REGION}}'
    AND total_amount >= {{MIN_AMOUNT}}
ORDER BY order_date DESC
LIMIT {{LIMIT}}
SQL query using pipeline variables
Variable Configuration¶
Click the Variables button to manage your variables:
- Variable Name: The name used in your SQL (without braces)
 - Default Value: Value used when no specific value is provided
 - Test Value: Value used for testing and preview
 - Type: String, Number, Boolean, or JSON
 
Configuring pipeline variables
Variable Types¶
String Variables
WHERE customer_name = '{{CUSTOMER_NAME}}'
Number Variables
WHERE total_amount >= {{MIN_AMOUNT}}
LIMIT {{ROW_LIMIT}}
Boolean Variables
WHERE active = {{IS_ACTIVE}}
JSON Variables
WHERE metadata @> '{{FILTER_JSON}}'
Preview Mode¶
The Preview tab shows your SQL with all variables replaced with their test values.
Preview showing SQL with variables replaced
This helps you: - Verify variable substitution is working correctly - See the actual SQL that will be executed - Debug variable-related issues
Executing Queries¶
Running Your SQL¶
- Write your SQL query in the Editor tab
 - Configure any variables needed
 - Click the Execute button
 - Switch to the Results tab to see output
 
Executing a SQL query and viewing results
Viewing Results¶
The Results tab displays: - Execution time in milliseconds - Row count of results returned - Data table with horizontal scrolling - Error messages if the query failed
Viewing query execution results
Error Handling¶
If your query fails, you'll see: - The specific error message from the database - Line numbers where applicable - Suggestions for common fixes
Handling SQL query errors
Data Return Options¶
Configure how your data step behaves in pipelines:
Configuring data return options
Return Data to Pipeline¶
When enabled, your query results can be used by other pipeline steps:
- Format: Choose JSON, CSV, or XML output format
 - Variable Name: Name to use when referencing this data in other steps
 
Example Usage¶
If you set the variable name to customer_data, other pipeline steps can reference it:
SELECT * FROM pipeline_data
WHERE customer_id IN ({{customer_data.customer_ids}})
Version Management¶
data-conductor automatically manages versions of your data steps.
Understanding data step versioning
Version Creation¶
A new version is created when: - You save changes to a data step that's actively used in deployments - The system detects the SQL has been modified - You're updating a data step that has running pipelines
Version Comparison¶
Use the Diff tab to compare versions:
Comparing different versions of a data step
The diff view shows: - Added lines in green - Removed lines in red - Modified lines highlighted - Context lines for reference
Database Integrations¶
Supported Databases¶
data-conductor supports multiple database types: - PostgreSQL - MySQL - Microsoft SQL Server - SQLite - BigQuery (Google Cloud) - Snowflake - Amazon Redshift
Supported database integrations
Testing Connections¶
Always test your database connections:
- Select your integration from the dropdown
 - The editor will validate the connection
 - Error messages appear if connection fails
 
Testing database connections in the editor
Advanced Features¶
SQL Optimization Tips¶
Use LIMIT for Testing
SELECT * FROM large_table
WHERE condition = '{{VALUE}}'
LIMIT 10  -- Remove or increase for production
Index-Friendly Queries
-- Good: Uses index
WHERE created_date >= '{{START_DATE}}'
-- Avoid: Prevents index usage
WHERE DATE(created_date) >= '{{START_DATE}}'
Variable Placement
-- Good: Parameterized
WHERE status = '{{STATUS}}'
-- Avoid: SQL injection risk (though data-conductor sanitizes)
WHERE status = {{STATUS_RAW}}
Performance Monitoring¶
Monitor query performance: - Execution Time: Shown in results - Row Count: Indicates data volume - Database Load: Check with your DBA for long-running queries
Monitoring query performance
Common Patterns¶
Date Range Queries¶
SELECT *
FROM events
WHERE event_date >= '{{START_DATE}}'
    AND event_date < '{{END_DATE}}'
Incremental Processing¶
SELECT *
FROM transactions
WHERE last_modified > '{{LAST_PROCESSED_TIME}}'
ORDER BY last_modified
Conditional Logic¶
SELECT
    customer_id,
    CASE
        WHEN total_orders > {{VIP_THRESHOLD}} THEN 'VIP'
        WHEN total_orders > {{REGULAR_THRESHOLD}} THEN 'Regular'
        ELSE 'New'
    END as customer_tier
FROM customer_summary
Data Aggregation¶
SELECT
    region,
    DATE_TRUNC('{{PERIOD}}', order_date) as period,
    COUNT(*) as order_count,
    SUM(total_amount) as revenue
FROM orders
WHERE order_date >= '{{START_DATE}}'
GROUP BY region, DATE_TRUNC('{{PERIOD}}', order_date)
ORDER BY period DESC, revenue DESC
Troubleshooting¶
Common Issues¶
Variable Not Substituted - Check variable name matches exactly (case-sensitive) - Ensure variable is defined in the Variables panel - Verify test value is provided
Connection Timeout - Query may be too complex or slow - Check database server performance - Consider adding LIMIT clause for testing
Permission Denied - Verify database user has required permissions - Check table/schema access rights - Confirm integration credentials are correct
Syntax Errors - Use the Preview tab to see final SQL - Check for missing quotes around string variables - Verify database-specific SQL syntax
Getting Help¶
If you encounter issues:
- Check the Results tab for specific error messages
 - Use Preview to verify variable substitution
 - Test with simpler queries first
 - Consult your database documentation
 - Contact your administrator for permission issues
 
Next Steps¶
Now that you understand the Data Builder:
- Create your first pipeline with multiple data steps
 - Set up deployments to run your data steps automatically
 - Learn about pipeline variables for advanced use cases
 - Explore API integrations to connect external systems
 
Ready to deploy your data steps? Continue to Deployment Manager!