# Best Practices

## Recommended setup

<table data-card-size="large" data-view="cards"><thead><tr><th></th><th></th></tr></thead><tbody><tr><td><strong>Use glob patterns for scalability</strong></td><td>Instead of hardcoding file names, use patterns like `monthly/*.csv` or `exports/202[34]/*.parquet`. When new files arrive in those paths, they'll sync automatically on the next refresh without manual configuration changes.</td></tr><tr><td><strong>Organize files by date or version</strong></td><td>Store exports in folders like `data/2024/`, `data/2025/` or `reports/v1/`, `reports/v2/`. This makes glob patterns more predictable and lets you target specific batches without syncing outdated versions.</td></tr><tr><td><strong>Choose the right format for your workflow</strong></td><td>Use Parquet for large analytical datasets (better compression, native types). Use CSV for small datasets and Excel exports. Use JSONL for streaming event data or API responses.</td></tr></tbody></table>

## Data refresh and scheduling

<table data-card-size="large" data-view="cards"><thead><tr><th></th><th></th></tr></thead><tbody><tr><td><strong>Use start date for incremental syncs</strong></td><td>Set a start date to sync only files modified since a checkpoint. This reduces data redundancy when running frequent refreshes — unchanged files are skipped, saving time and bandwidth.</td></tr><tr><td><strong>Schedule refreshes to match your export cadence</strong></td><td>If your system exports files daily at 2 AM, schedule Coupler.io refreshes for 3 AM. If exports happen weekly on Fridays, schedule weekly refreshes for Saturday morning.</td></tr></tbody></table>

## Performance optimization

<table data-card-size="large" data-view="cards"><thead><tr><th></th><th></th></tr></thead><tbody><tr><td><strong>Prefer Parquet for large datasets</strong></td><td>Parquet is columnar, compressed, and preserves data types — syncs faster than CSV and reduces storage footprint in your destination.</td></tr><tr><td><strong>Split multi-sheet Excel workbooks</strong></td><td>Create separate data flows for each sheet you need, or pre-export sheets as individual CSVs in Azure. This avoids confusion and makes transformations clearer.</td></tr></tbody></table>

## Common pitfalls

{% columns %}
{% column %}

#### Do

* Test your glob pattern with a subset of files first
* Verify file paths in the Azure Portal Storage Browser before entering them in Coupler.io
* Use JSONL line-delimited format, not pretty-printed JSON
* Set a start date on refreshes to avoid re-syncing unchanged files
* Use Append transformations to combine multiple CSV or Parquet files into one table
  {% endcolumn %}

{% column %}

#### Don't

* Mix multiple file formats in one glob pattern (e.g., don't glob `data/*.*` and expect both CSV and Parquet to work identically — create separate flows)
* Assume Excel sheets with multiple tabs sync all tabs — specify one sheet per flow
* Include the storage account name in the container or file path (container name is enough)
* Rely on file creation time for filtering — start date uses modification time
* Leave glob patterns overly broad (e.g., `**/*`) if you have thousands of unrelated files — be specific to avoid noise and timeouts
  {% endcolumn %}
  {% endcolumns %}

{% hint style="danger" %}
**Regenerating storage account keys invalidates all old keys immediately.** If you regenerate a key in Azure, update it in Coupler.io right away or your data flows will fail on the next refresh.
{% endhint %}
