Data Overview

Apify datasets store the output of actor runs as structured collections of items — essentially rows of scraped or processed data. The exact fields in each item depend on the actor that produced them, but Coupler.io supports four entities that let you access both metadata and the raw item content.

Entities and what they return

Entity
Description

Dataset collections

Returns a list of all datasets in your Apify account, including IDs, names, and creation dates

Datasets

Returns metadata for a specific dataset: item count, size, creation date, and last modified time

Item collections

Returns the full list of items (rows) stored in a specific dataset — fields vary by actor

Item collection website content crawlers

Returns items from website content crawlers with standardized fields like URL, page title, and crawled text

Available fields

Dataset collections fields

Field
Description

id

Unique identifier for the dataset

name

Dataset name (if set)

createdAt

Timestamp when the dataset was created

modifiedAt

Timestamp of the last modification

accessedAt

Timestamp of the last access

itemCount

Number of items stored in the dataset

cleanItemCount

Number of items excluding empty or duplicate records

Dataset metadata fields

Field
Description

id

Dataset ID

name

Dataset name

userId

ID of the Apify user who owns the dataset

createdAt

Creation timestamp

modifiedAt

Last modified timestamp

itemCount

Total item count

cleanItemCount

Clean item count

actId

ID of the actor that created this dataset

actRunId

ID of the specific actor run that produced the data

Item collection fields

Fields vary depending on the actor that produced the dataset. Common fields include:

Field
Description

url

The URL that was scraped

title

Page or record title

description

Short description or meta description

text

Extracted text content

price

Price (e-commerce actors)

imageUrl

Image URL

Custom fields

Any additional fields output by the actor

Item collection website content crawler fields

Field
Description

url

Crawled page URL

title

Page title

text

Full extracted text from the page

markdown

Page content in Markdown format

metadata

Additional page metadata

crawl

Crawl metadata (depth, referrer URL, etc.)

Common field combinations

  • Content audits: url + title + text from Item collection to review all scraped pages

  • Dataset monitoring: itemCount + cleanItemCount + modifiedAt from Datasets to track actor run output over time

  • AI analysis: url + markdown from Item collection website content crawlers, piped into ChatGPT or Claude for summarization

  • Cross-dataset comparison: Use Append transformation to stack item collections from multiple actor runs into one unified table

Use cases by role

  • Pull competitor pricing data scraped by an Apify actor into Google Sheets for weekly tracking

  • Send crawled website content to ChatGPT or Gemini for automated content gap analysis

  • Append item collections from multiple scraping runs to build a historical dataset of SERP results

Platform-specific notes

  • Item fields are actor-dependent — the schema of Item collection data will differ between actors (e.g., an e-commerce scraper vs. a news scraper)

  • The cleanItemCount may be lower than itemCount if the actor produced empty or duplicate records

  • Apify datasets are tied to a specific actor run; if you re-run an actor, it creates a new dataset with a new ID

  • For website content crawlers, the markdown field is the most useful for feeding content into AI destinations

  • Very large datasets (millions of items) may require pagination — Coupler.io handles this automatically

Last updated

Was this helpful?