# Best Practices

## Recommended setup

<table data-card-size="large" data-view="cards"><thead><tr><th></th><th></th></tr></thead><tbody><tr><td><strong>Pin your Dataset ID to a specific run</strong></td><td>Each Apify actor run produces a new dataset. Explicitly set the Dataset ID in your data flow to the run you want — don't rely on the "default" dataset if you need point-in-time accuracy.</td></tr><tr><td><strong>Use Item collection website content crawlers for AI workflows</strong></td><td>If you're feeding scraped content into ChatGPT, Claude, Gemini, or Perplexity, this entity gives you clean Markdown-formatted text that AI models process much better than raw HTML or mixed fields.</td></tr><tr><td><strong>Append multiple runs into one table</strong></td><td>Use the Append transformation to stack item collections from multiple actor runs — for example, weekly SERP scrapes or recurring price checks — into a single historical table in BigQuery or Google Sheets.</td></tr></tbody></table>

## Data refresh and scheduling

<table data-card-size="large" data-view="cards"><thead><tr><th></th><th></th></tr></thead><tbody><tr><td><strong>Sync after actor runs, not on a fixed clock</strong></td><td>Apify actors don't always run on a predictable schedule. Align your Coupler.io refresh schedule to run shortly after your actor is expected to complete, not on an arbitrary interval — otherwise you may pull stale data.</td></tr><tr><td><strong>Check item count before syncing large datasets</strong></td><td>Use the Datasets entity in a separate data flow to monitor `itemCount` and `modifiedAt`. This lets you verify the actor run completed before triggering a full item export.</td></tr></tbody></table>

## Performance optimization

<table data-card-size="large" data-view="cards"><thead><tr><th></th><th></th></tr></thead><tbody><tr><td><strong>Filter fields at the destination level</strong></td><td>Apify item collections can have dozens of fields depending on the actor. If you only need a few, use Coupler.io's column selection or transformation step to drop unused fields before they land in your destination — keeps sheets and tables clean.</td></tr><tr><td><strong>Use BigQuery for large datasets</strong></td><td>Datasets with tens of thousands of items or more are better suited for BigQuery than Google Sheets. Sheets has row limits and slows down with large payloads; BigQuery handles Apify's bulk output without issue.</td></tr></tbody></table>

## Common pitfalls

{% hint style="danger" %}
**Don't hardcode a Dataset ID and forget to update it.** Every actor run creates a new dataset. If your data flow keeps pointing to the same old ID, you'll keep pulling the same old data even after the actor has run many times since.
{% endhint %}

{% columns %}
{% column %}
**Do**

* Update the Dataset ID in your data flow after each new actor run if you need the latest output
* Use the Datasets entity to inspect `actRunId` and confirm which run produced the data
* Test with a manual run after changing the Dataset ID before re-enabling your schedule
  {% endcolumn %}

{% column %}
**Don't**

* Assume the item schema is consistent across runs — actors can be updated and field names can change
* Use the Item collection website content crawlers entity for non-crawler actors — it will return unexpected or missing fields
* Run data flows for very large datasets at high frequency without checking Apify's API rate limits on your plan
  {% endcolumn %}
  {% endcolumns %}
