> For the complete documentation index, see [llms.txt](https://docs.coupler.io/llms.txt). Markdown versions of documentation pages are available by appending `.md` to page URLs; this page is available as [Markdown](https://docs.coupler.io/sources/category/files-and-tables/azure-blob-storage/best-practices.md). # Best Practices ## Recommended setup


Use glob patterns for scalability	Instead of hardcoding file names, use patterns like `monthly/.csv` or `exports/202[34]/.parquet`. When new files arrive in those paths, they'll sync automatically on the next refresh without manual configuration changes.
Organize files by date or version	Store exports in folders like `data/2024/`, `data/2025/` or `reports/v1/`, `reports/v2/`. This makes glob patterns more predictable and lets you target specific batches without syncing outdated versions.
Choose the right format for your workflow	Use Parquet for large analytical datasets (better compression, native types). Use CSV for small datasets and Excel exports. Use JSONL for streaming event data or API responses.

## Data refresh and scheduling


Use start date for incremental syncs	Set a start date to sync only files modified since a checkpoint. This reduces data redundancy when running frequent refreshes — unchanged files are skipped, saving time and bandwidth.
Schedule refreshes to match your export cadence	If your system exports files daily at 2 AM, schedule Coupler.io refreshes for 3 AM. If exports happen weekly on Fridays, schedule weekly refreshes for Saturday morning.

## Performance optimization


Prefer Parquet for large datasets	Parquet is columnar, compressed, and preserves data types — syncs faster than CSV and reduces storage footprint in your destination.
Split multi-sheet Excel workbooks	Create separate data flows for each sheet you need, or pre-export sheets as individual CSVs in Azure. This avoids confusion and makes transformations clearer.

## Common pitfalls {% columns %} {% column %} #### Do * Test your glob pattern with a subset of files first * Verify file paths in the Azure Portal Storage Browser before entering them in Coupler.io * Use JSONL line-delimited format, not pretty-printed JSON * Set a start date on refreshes to avoid re-syncing unchanged files * Use Append transformations to combine multiple CSV or Parquet files into one table {% endcolumn %} {% column %} #### Don't * Mix multiple file formats in one glob pattern (e.g., don't glob `data/*.*` and expect both CSV and Parquet to work identically — create separate flows) * Assume Excel sheets with multiple tabs sync all tabs — specify one sheet per flow * Include the storage account name in the container or file path (container name is enough) * Rely on file creation time for filtering — start date uses modification time * Leave glob patterns overly broad (e.g., `**/*`) if you have thousands of unrelated files — be specific to avoid noise and timeouts {% endcolumn %} {% endcolumns %} {% hint style="danger" %} **Regenerating storage account keys invalidates all old keys immediately.** If you regenerate a key in Azure, update it in Coupler.io right away or your data flows will fail on the next refresh. {% endhint %} --- # Agent Instructions This documentation is published with GitBook. GitBook is the documentation platform designed so that both humans and AI agents can read, navigate, and reason over technical content effectively. Learn more at gitbook.com. ## Querying This Documentation If you need additional information that is not directly available in this page, you can query the documentation dynamically by asking a question. Perform an HTTP GET request on the current page URL with the `ask` query parameter, and the optional `goal` query parameter: ``` GET https://docs.coupler.io/sources/category/files-and-tables/azure-blob-storage/best-practices.md?ask=&goal= ``` `ask` is the immediate question: it should be specific, self-contained, and written in natural language. `goal` is optional and describes the broader end goal you are ultimately trying to accomplish on behalf of the user. GitBook uses it to tailor the answer towards what is most useful for that goal. The response will contain a direct answer to the question and relevant excerpts and sources from the documentation. Use this mechanism when the answer is not explicitly present in the current page, you need clarification or additional context, or you want to retrieve related documentation sections.