Best Practices

Recommended setup

Pin your Dataset ID to a specific run

Each Apify actor run produces a new dataset. Explicitly set the Dataset ID in your data flow to the run you want — don't rely on the "default" dataset if you need point-in-time accuracy.

Use Item collection website content crawlers for AI workflows

If you're feeding scraped content into ChatGPT, Claude, Gemini, or Perplexity, this entity gives you clean Markdown-formatted text that AI models process much better than raw HTML or mixed fields.

Append multiple runs into one table

Use the Append transformation to stack item collections from multiple actor runs — for example, weekly SERP scrapes or recurring price checks — into a single historical table in BigQuery or Google Sheets.

Data refresh and scheduling

Sync after actor runs, not on a fixed clock

Apify actors don't always run on a predictable schedule. Align your Coupler.io refresh schedule to run shortly after your actor is expected to complete, not on an arbitrary interval — otherwise you may pull stale data.

Check item count before syncing large datasets

Use the Datasets entity in a separate data flow to monitor `itemCount` and `modifiedAt`. This lets you verify the actor run completed before triggering a full item export.

Performance optimization

Filter fields at the destination level

Apify item collections can have dozens of fields depending on the actor. If you only need a few, use Coupler.io's column selection or transformation step to drop unused fields before they land in your destination — keeps sheets and tables clean.

Use BigQuery for large datasets

Datasets with tens of thousands of items or more are better suited for BigQuery than Google Sheets. Sheets has row limits and slows down with large payloads; BigQuery handles Apify's bulk output without issue.

Common pitfalls

Don't hardcode a Dataset ID and forget to update it. Every actor run creates a new dataset. If your data flow keeps pointing to the same old ID, you'll keep pulling the same old data even after the actor has run many times since.

Update the Dataset ID in your data flow after each new actor run if you need the latest output
Use the Datasets entity to inspect actRunId and confirm which run produced the data
Test with a manual run after changing the Dataset ID before re-enabling your schedule

Don't

Assume the item schema is consistent across runs — actors can be updated and field names can change
Use the Item collection website content crawlers entity for non-crawler actors — it will return unexpected or missing fields
Run data flows for very large datasets at high frequency without checking Apify's API rate limits on your plan

PreviousCommon Issues NextFAQ

Last updated 11 days ago

Was this helpful?

Good evening

hashtagRecommended setup

hashtagData refresh and scheduling

hashtagPerformance optimization

hashtagCommon pitfalls

Recommended setup

Data refresh and scheduling

Performance optimization

Common pitfalls