# Apify Dataset

Apify is a web scraping and automation platform that lets you extract data from websites using pre-built or custom actors (scrapers). When an actor runs, it stores its output in a **dataset** — a structured collection of items you can export and analyze. Connecting Apify datasets to Coupler.io lets you pull that scraped data into your preferred destination automatically.

## Why connect Apify Dataset to Coupler.io?

* **Centralize scraped data** — move web-extracted data into Google Sheets, BigQuery, Excel, or Looker Studio without manual exports
* **Combine with other sources** — use Append or Join transformations to merge datasets from multiple actors or runs
* **Send to AI tools** — pipe scraped content into ChatGPT, Claude, Gemini, or Perplexity for analysis, summarization, or classification
* **Keep data fresh** — schedule recurring syncs so your destination always reflects the latest actor output

## Prerequisites

* An active Apify account with at least one completed actor run that has produced dataset output
* Your Apify API token (found in Apify Console under **Settings > Integrations**)
* The Dataset ID of the dataset you want to export (found in **Storage > Datasets** in Apify Console)

## Quick start

{% hint style="success" %}
If you want to export items from a website content crawler actor, choose the **Item collection website content crawlers** entity — it maps the crawler output fields automatically.
{% endhint %}

## How to connect

{% stepper %}
{% step %}
**Create a new data flow in Coupler.io.** From your Coupler.io dashboard, click **Add data flow** and search for **Apify Dataset** as your source.
{% endstep %}

{% step %}
**Enter your Apify API token.** In the source settings, paste your API token from Apify Console (**Settings > Integrations > API token**). Coupler.io uses this to authenticate requests on your behalf.
{% endstep %}

{% step %}
**Enter your Dataset ID.** In the **Dataset ID** field, paste the ID of the dataset you want to export. You can find it in Apify Console under **Storage > Datasets** — click on the dataset and copy the ID from the URL or the detail panel.
{% endstep %}

{% step %}
**Choose an entity.** Select one of the available entities depending on what you want to export — see the table below for a quick guide.
{% endstep %}

{% step %}
**Choose a destination.** Pick where you want your data to land — Google Sheets, Excel, BigQuery, Looker Studio, or an AI destination like ChatGPT, Claude, Gemini, Cursor, Perplexity, or OpenClaw.
{% endstep %}

{% step %}
**Run the data flow.** Click **Run** to execute a manual sync and confirm data is flowing correctly before setting up a schedule.
{% endstep %}
{% endstepper %}

## Entities overview

| Entity                                       | What it returns                                                                                      |
| -------------------------------------------- | ---------------------------------------------------------------------------------------------------- |
| **Dataset collections**                      | A list of all datasets stored in your Apify account                                                  |
| **Datasets**                                 | Metadata and details for a specific dataset                                                          |
| **Item collection**                          | The raw items (rows) stored in a specific dataset                                                    |
| **Item collection website content crawlers** | Items from a website content crawler actor, with pre-mapped fields like URL, title, and page content |


---

# Agent Instructions: Querying This Documentation

If you need additional information that is not directly available in this page, you can query the documentation dynamically by asking a question.

Perform an HTTP GET request on the current page URL with the `ask` query parameter:

```
GET https://docs.coupler.io/sources/category/marketing/apify-dataset.md?ask=<question>
```

The question should be specific, self-contained, and written in natural language.
The response will contain a direct answer to the question and relevant excerpts and sources from the documentation.

Use this mechanism when the answer is not explicitly present in the current page, you need clarification or additional context, or you want to retrieve related documentation sections.
