Frequently Asked Questions¶

Find quick answers to common questions about data-conductor. If you don't find what you're looking for, check our Troubleshooting Guide or contact your administrator.

General Questions¶

What is data-conductor?¶

Q: What is data-conductor and what does it do?

A: data-conductor is a comprehensive data pipeline and workflow platform that enables you to: - Build SQL-based data transformation pipelines - Schedule automated executions using CRON - Expose workflows as secure API endpoints - Monitor and manage pipeline executions - Implement IP-based security and access controls

Getting Started¶

Q: How do I get started with data-conductor?

A: Follow these steps: 1. Complete organization setup 2. Configure database integrations 3. Create your first data step 4. Set up your first deployment

Q: Do I need technical skills to use data-conductor?

A: Basic SQL knowledge is helpful for the Data Builder, but many features are accessible to non-technical users: - Technical users: Can build complex SQL data steps and manage integrations - Business users: Can manage deployments, monitor executions, and configure scheduling - Administrators: Can manage organization settings, security, and user access

Data Builder¶

SQL and Variables¶

Q: What databases does data-conductor support?

A: data-conductor supports multiple database types: - PostgreSQL - MySQL - Microsoft SQL Server - SQLite - BigQuery (Google Cloud) - Snowflake - Amazon Redshift

Q: How do pipeline variables work?

A: Pipeline variables use mustache syntax ({{VARIABLE_NAME}}) and are replaced at execution time:

SELECT * FROM orders
WHERE order_date >= '{{START_DATE}}'
  AND region = '{{REGION}}'
LIMIT {{ROW_LIMIT}}

Variables can be: - Strings: Automatically quoted for SQL safety - Numbers: Used directly without quotes - Booleans: True/false values - JSON: Complex data structures

Q: Can I test my SQL before deploying?

A: Yes! Use these testing features: - Execute button: Run queries immediately - Preview tab: See SQL with variables replaced - Results tab: View query output and performance metrics - Variables panel: Test with different variable values

Data Sources¶

Q: How do I connect to my database?

A: Database connections are set up by administrators in the Integrations section: 1. Navigate to Integrations (admin panel) 2. Select your database type 3. Provide connection details (host, port, credentials) 4. Test the connection 5. Save the integration

Q: Can I connect to multiple databases?

A: Yes, you can configure multiple database integrations and select which one to use for each data step. This allows you to: - Connect to different databases for different purposes - Separate development and production databases - Use specialized databases for different data types

Deployment Manager¶

Scheduling¶

Q: How do CRON schedules work?

A: CRON expressions define when deployments run using five fields:

* * * * *
│ │ │ │ │
│ │ │ │ └── Day of Week (0-6, Sunday = 0)
│ │ │ └──── Month (1-12)
│ │ └────── Day of Month (1-31)
│ └──────── Hour (0-23)
└────────── Minute (0-59)

Common examples: - 0 9 * * * - Daily at 9 AM - 0 9 * * MON - Mondays at 9 AM - */15 * * * * - Every 15 minutes - 0 1 1 * * - Monthly on the 1st at 1 AM

Q: What timezone are schedules in?

A: All schedules use your organization's timezone, configured in Organization Settings. The timezone handles daylight saving time automatically.

Q: What happens if a pipeline is still running when the next execution is scheduled?

A: This depends on your "Single Instance" setting: - Enabled (recommended): New execution waits until current one completes - Disabled: Multiple executions can run simultaneously (may cause conflicts)

Execution Types¶

Q: What's the difference between the three deployment types?

A: Execute Pipeline: Run immediately for testing or one-time tasks CRON Deployment: Schedule recurring executions API Deployment: Expose as HTTP endpoint for external triggers

Q: Can I trigger a pipeline manually?

A: Yes, use "Execute Pipeline" in the Deployment Manager for immediate execution. This is useful for: - Testing new pipelines - Running ad-hoc analysis - Debugging issues - One-time data processing

Security and Access¶

Authentication¶

Q: How does authentication work?

A: data-conductor uses JWT (JSON Web Token) authentication with optional IP-based access control: 1. User login: Email/password authentication 2. Session tokens: Secure session management 3. API tokens: For programmatic access 4. IP filtering: Optional trusted IP address restrictions

Q: What are API tokens used for?

A: API tokens authenticate: - API deployment endpoints - External system integrations - Programmatic access to data-conductor - Webhook and automation systems

Generate tokens in Organization → API Tokens and store them securely.

IP Security¶

Q: How does IP security work?

A: IP security restricts access to your data-conductor instance: - Define trusted IP addresses or ranges - Support for IPv4 and IPv6 - CIDR notation for ranges (e.g., 192.168.1.0/24) - Automatic blocking of untrusted IPs - Comprehensive audit logging

Q: What IP formats are supported?

A: All standard IP formats: - Single IPv4: 192.168.1.100 - IPv4 CIDR: 192.168.1.0/24 - Single IPv6: 2001:db8::1 - IPv6 CIDR: 2001:db8::/32

Permissions¶

Q: What user roles are available?

A: Three main roles with different access levels:

Feature	Admin	User	Viewer
Create/edit data steps	✅	✅	❌
Create deployments	✅	✅	❌
View executions	✅	✅	✅
Organization settings	✅	❌	❌
User management	✅	❌	❌
Security settings	✅	❌	❌

Q: Can I share data steps with other users?

A: Data steps are generally visible to all users in your organization, but editing permissions depend on your role. Administrators can manage sharing settings and access controls.

Performance and Limits¶

Query Performance¶

Q: How can I optimize slow queries?

A: Several optimization strategies:

Query Optimization:

-- Use LIMIT for testing
SELECT * FROM large_table LIMIT {{TEST_LIMIT}}

-- Index-friendly date filters
WHERE created_date >= '{{START_DATE}}'  -- Good
WHERE DATE(created_date) = '{{TARGET_DATE}}'  -- Avoid

-- Specific column selection
SELECT customer_id, total_amount  -- Good
SELECT *  -- Avoid for large tables

Database Optimization: - Create indexes on frequently filtered columns - Use appropriate data types - Consider table partitioning for large datasets - Monitor query execution plans

Q: Are there limits on query size or execution time?

A: Limits depend on your data-conductor instance configuration: - Query timeout: Typically 5-30 minutes (configurable) - Result size: Usually limited to prevent memory issues - Concurrent executions: Limited to manage system resources - API request size: Typically 10MB for JSON payloads

Contact your administrator for specific limits in your environment.

Data Volume¶

Q: Can data-conductor handle large datasets?

A: Yes, with proper optimization: - Use LIMIT clauses during development - Implement pagination for large result sets - Consider incremental processing for large data - Use database-specific optimizations (partitioning, indexing)

Q: How do I handle incremental data processing?

A: Common patterns for incremental processing:

-- Process only new/updated records
SELECT * FROM orders
WHERE last_modified > (
  SELECT MAX(last_processed_timestamp)
  FROM processing_log
  WHERE pipeline_name = 'order_processing'
)

-- Date-based incremental processing
SELECT * FROM events
WHERE event_date >= '{{LAST_PROCESSED_DATE}}'
  AND event_date < '{{CURRENT_DATE}}'

Troubleshooting¶

Common Issues¶

Q: My pipeline failed. How do I debug it?

A: Follow this debugging process: 1. Check Results tab for specific error messages 2. Use Preview tab to verify variable substitution 3. Test with simpler queries to isolate the issue 4. Check database connectivity and permissions 5. Review audit logs for security-related issues

Q: Variables aren't being replaced in my SQL. Why?

A: Common variable issues: - Case sensitivity: Variable names must match exactly - Missing configuration: Ensure variables are defined in Variables panel - Syntax errors: Check for typos in {{VARIABLE_NAME}} - Quote issues: Strings need quotes, numbers don't

Q: My CRON deployment isn't running. What should I check?

A: CRON troubleshooting checklist: - ✅ Deployment is active - ✅ CRON expression is valid (test at crontab.guru) - ✅ Timezone is correct - ✅ No single-instance conflicts - ✅ System resources are available

Error Messages¶

Q: What does "Connection timeout" mean?

A: Connection timeouts indicate: - Database server is unreachable - Query is taking too long to execute - Network connectivity issues - Firewall blocking connections

Solutions: - Verify database server status - Check network connectivity - Optimize query performance - Contact your database administrator

Q: I'm getting "Permission denied" errors. How do I fix this?

A: Permission errors can be caused by: - Database permissions: User lacks required database access - Application permissions: User role doesn't allow the action - IP restrictions: Your IP isn't in the trusted list - Token issues: API token is invalid or expired

Best Practices¶

Development¶

Q: What are the best practices for building pipelines?

A: Follow these guidelines:

Development Process: 1. Start with simple queries and add complexity gradually 2. Test thoroughly with small datasets first 3. Use meaningful names and descriptions 4. Document your variable purposes 5. Implement proper error handling

Performance: 1. Use LIMIT clauses during development 2. Create appropriate database indexes 3. Monitor execution times and optimize 4. Consider incremental processing for large datasets

Security: 1. Use least-privilege database permissions 2. Implement IP restrictions where appropriate 3. Rotate API tokens regularly 4. Monitor audit logs for suspicious activity

Production¶

Q: How should I organize my deployments?

A: Deployment organization best practices: - Naming: Use clear, descriptive names - Scheduling: Stagger executions to avoid resource conflicts - Environments: Separate development and production - Monitoring: Set up alerts for failures - Documentation: Maintain deployment documentation

Q: How often should I backup my configurations?

A: Backup recommendations: - Weekly: Export deployment configurations - Before major changes: Backup current state - Quarterly: Full system backup review - After incidents: Document lessons learned

Support and Resources¶

Getting Help¶

Q: Where can I get help if I'm stuck?

A: Help resources in order of preference: 1. This FAQ and Troubleshooting Guide 2. Organization administrator for internal support 3. Browser console logs for technical debugging 4. System documentation for detailed guidance

Q: How do I report a bug or request a feature?

A: Contact your organization administrator with: - Detailed description of the issue or request - Steps to reproduce (for bugs) - Business justification (for features) - Screenshots or logs (when applicable) - Expected vs. actual behavior (for bugs)

Training and Resources¶

Q: Are there training resources available?

A: Available learning resources: - Getting Started Guide: getting-started.md - First Pipeline Tutorial: first-pipeline.md - Feature Documentation: Detailed guides for each component - Video Tutorials: Embedded throughout the documentation - Organization Training: Contact your administrator

Q: How do I stay updated on new features?

A: Stay informed through: - Release notes from your administrator - Documentation updates - Training sessions - User community discussions

Still Need Help?¶

If you didn't find the answer you were looking for:

Check the Troubleshooting Guide for detailed problem-solving steps
Contact your organization administrator for internal support
Review the Getting Started guide for setup issues
Explore feature-specific documentation for detailed guidance

Remember to include specific error messages, steps to reproduce issues, and relevant screenshots when seeking help!