# Data Overview

Azure Blob Storage doesn't have predefined entities like a CRM or API. Instead, you sync **files** from your container. The data structure depends entirely on the format and content of your blobs.

## File formats and structure

#### Structured formats

| Format      | Row/record structure                         | Data types                                    | Use case                                      |
| ----------- | -------------------------------------------- | --------------------------------------------- | --------------------------------------------- |
| **CSV**     | Comma-separated columns, first row = headers | Text, numbers (as strings unless converted)   | Sales transactions, customer lists, inventory |
| **JSONL**   | One JSON object per line                     | Nested objects, arrays, mixed types preserved | API logs, event streams, user activity        |
| **Parquet** | Binary columnar format                       | Native types (int, float, string, date, etc.) | Analytics data, large datasets, BI tools      |
| **Excel**   | Rows and columns in sheets                   | Mixed types per column                        | Budgets, forecasts, reports                   |

#### Unstructured format

| Format           | Content                        | Example use                              |
| ---------------- | ------------------------------ | ---------------------------------------- |
| **Unstructured** | Raw text, logs, markdown, HTML | Log files, error messages, documentation |

## Common import patterns

**Single file sync** — Pull one CSV or Excel file into Google Sheets for analysis.

**Multi-file append** — Use glob patterns like `monthly/*.csv` to match all monthly reports and Append them into a single BigQuery table. This creates a historical archive without duplicate logic.

**Nested folder sync** — Glob patterns like `data/**/*.parquet` sync Parquets from all subfolders in the `data` directory, useful for tiered storage structures.

**Format conversion** — Export data from a legacy system as Excel, sync to Coupler.io, and land it in BigQuery as a proper data type-aware table.

## Use cases by role

{% tabs %}
{% tab title="Data analysts" %}
Sync Parquet or CSV files from data lakes into BigQuery or Snowflake. Use glob patterns to match versioned exports (e.g., `exports/v2/*.parquet`) and Append transformations to build historical tables for trend analysis and modeling.
{% endtab %}

{% tab title="Finance teams" %}
Pull monthly export files from accounting systems (exported as CSV or Excel) into Google Sheets or a financial data warehouse. Schedule weekly syncs to stay current with bank statements, invoices, and budget actuals.
{% endtab %}

{% tab title="Data engineers" %}
Ingest raw event logs (JSONL format) from your application into a data lake. Use start dates to sync only recent logs, reducing storage and processing overhead.
{% endtab %}

{% tab title="Operations" %}
Sync inventory or capacity reports (Excel or CSV) from Azure into Looker Studio for real-time dashboarding. Combine with other data sources using Join transformations to correlate stock levels with sales trends.
{% endtab %}
{% endtabs %}

## Platform-specific notes

* **Glob patterns are powerful** — `*.csv` matches files in the root; `**/*.csv` matches nested folders; `data/2024/*.parquet` targets a specific year
* **Start date filters by modification time** — Files touched after the start date are synced; unchanged files are skipped on refresh
* **Excel syncs one sheet** — If your workbook has multiple sheets, create separate data flows for each or export individual sheets as CSVs
* **JSONL requires line-delimited JSON** — Each line must be a complete, valid JSON object (not pretty-printed)
* **Large files and performance** — Parquet files compress better than CSV; for very large datasets, consider splitting into multiple containers or using date-based folder structures
