Data Overview
Apify datasets store the output of actor runs as structured collections of items — essentially rows of scraped or processed data. The exact fields in each item depend on the actor that produced them, but Coupler.io supports four entities that let you access both metadata and the raw item content.
Entities and what they return
Dataset collections
Returns a list of all datasets in your Apify account, including IDs, names, and creation dates
Datasets
Returns metadata for a specific dataset: item count, size, creation date, and last modified time
Item collections
Returns the full list of items (rows) stored in a specific dataset — fields vary by actor
Item collection website content crawlers
Returns items from website content crawlers with standardized fields like URL, page title, and crawled text
Available fields
Dataset collections fields
id
Unique identifier for the dataset
name
Dataset name (if set)
createdAt
Timestamp when the dataset was created
modifiedAt
Timestamp of the last modification
accessedAt
Timestamp of the last access
itemCount
Number of items stored in the dataset
cleanItemCount
Number of items excluding empty or duplicate records
Dataset metadata fields
id
Dataset ID
name
Dataset name
userId
ID of the Apify user who owns the dataset
createdAt
Creation timestamp
modifiedAt
Last modified timestamp
itemCount
Total item count
cleanItemCount
Clean item count
actId
ID of the actor that created this dataset
actRunId
ID of the specific actor run that produced the data
Item collection fields
Fields vary depending on the actor that produced the dataset. Common fields include:
url
The URL that was scraped
title
Page or record title
description
Short description or meta description
text
Extracted text content
price
Price (e-commerce actors)
imageUrl
Image URL
Custom fields
Any additional fields output by the actor
Item collection website content crawler fields
url
Crawled page URL
title
Page title
text
Full extracted text from the page
markdown
Page content in Markdown format
metadata
Additional page metadata
crawl
Crawl metadata (depth, referrer URL, etc.)
Common field combinations
Content audits:
url+title+textfrom Item collection to review all scraped pagesDataset monitoring:
itemCount+cleanItemCount+modifiedAtfrom Datasets to track actor run output over timeAI analysis:
url+markdownfrom Item collection website content crawlers, piped into ChatGPT or Claude for summarizationCross-dataset comparison: Use Append transformation to stack item collections from multiple actor runs into one unified table
Use cases by role
Pull competitor pricing data scraped by an Apify actor into Google Sheets for weekly tracking
Send crawled website content to ChatGPT or Gemini for automated content gap analysis
Append item collections from multiple scraping runs to build a historical dataset of SERP results
Load raw item collections into BigQuery for transformation and analysis at scale
Join dataset metadata with item collections to enrich records with actor run context (run ID, creation date)
Schedule syncs to keep a data warehouse table updated after each actor run
Use Dataset collections to programmatically audit all datasets stored in an account
Export item collections to Excel or Looker Studio for stakeholder-ready reporting without writing custom export scripts
Pipe website content crawler output into Cursor or Claude for AI-assisted code or content generation workflows
Platform-specific notes
Item fields are actor-dependent — the schema of Item collection data will differ between actors (e.g., an e-commerce scraper vs. a news scraper)
The
cleanItemCountmay be lower thanitemCountif the actor produced empty or duplicate recordsApify datasets are tied to a specific actor run; if you re-run an actor, it creates a new dataset with a new ID
For website content crawlers, the
markdownfield is the most useful for feeding content into AI destinationsVery large datasets (millions of items) may require pagination — Coupler.io handles this automatically
Last updated
Was this helpful?
