Best Practices

Use Custom SQL for filters and aggregations

Rather than exporting a full table and filtering in your destination, write a Custom SQL query in Coupler.io. This reduces data transfer, keeps your destination lean, and makes your data flow self-documenting.

Test queries in Redshift first

Before pasting a Custom SQL query into Coupler.io, run it directly in your Redshift query editor (e.g., Redshift Query Editor v2). This lets you verify the results and execution time without tying up the data flow.

Use lowercase schema and table names

Redshift treats unquoted identifiers as lowercase. To avoid case-sensitivity issues, use lowercase names in your Custom SQL or quote them: `"MyTable"` for mixed case.

Data refresh and scheduling

Schedule after your data load window

If your Redshift cluster processes data at 2 AM, schedule your Coupler.io refresh for 2:30 AM (your timezone) to ensure fresh data. Add a buffer for large ETL jobs.

Run a manual test first

Always execute a manual run of your data flow before scheduling. This catches connection issues, query errors, and unexpected data volumes before automation kicks in.

Performance optimization

Avoid SELECT *; name your columns

Explicitly list the columns you need: `SELECT id, name, email FROM customers`. This is faster, makes your query clearer, and reduces accidental data leaks if new columns are added to the table.

Filter early in the query

Use WHERE clauses to reduce rows before aggregating or joining: `SELECT ... FROM orders WHERE order_date >= CURRENT_DATE - 30 DAYS`. Filtering in Redshift is faster than filtering in your destination.

Use SORTKEY and DISTKEY for large tables

If you're querying the same large table repeatedly, ask your Redshift administrator to set SORTKEY and DISTKEY columns that match your filter and join patterns. This can dramatically speed up queries.

Common pitfalls

Do

  • Test your data flow with a manual run before scheduling

  • Write readable Custom SQL with comments for complex logic

  • Monitor the first few scheduled runs to confirm data arrives correctly

  • Keep destination sheets or tables organized with clear column headers

Don't

  • Export raw tables without filtering — plan your queries in advance

  • Use hardcoded dates in WHERE clauses; instead, use CURRENT_DATE or DATEADD() for dynamic ranges

  • Run multiple large data flows simultaneously on a single Redshift cluster during peak hours

  • Store sensitive credentials (passwords, API keys) in your Custom SQL queries — use Coupler.io's connection settings

triangle-exclamation

Last updated

Was this helpful?