# Data Overview

Box Data Extract surfaces content from your Box documents in four different ways, depending on how much processing you want Box AI to apply. Each entity targets the same folder but returns a different representation of the documents inside it.

## Entities

| Entity                               | What it returns                                                 |
| ------------------------------------ | --------------------------------------------------------------- |
| Stream text representation folders   | Plain text extracted from each document                         |
| Stream AI ask folders                | AI-generated answer to your custom prompt, per document         |
| Stream AI extract folders            | Key-value metadata extracted by Box AI based on your prompt     |
| Stream AI extract structured folders | Structured fields extracted by Box AI using a predefined schema |

## Fields by entity

#### Stream text representation folders

| Field         | Description                           |
| ------------- | ------------------------------------- |
| file\_id      | Unique identifier of the Box file     |
| file\_name    | Name of the file                      |
| text\_content | Full extracted text from the document |
| folder\_id    | ID of the parent folder               |

#### Stream AI ask folders

| Field      | Description                                 |
| ---------- | ------------------------------------------- |
| file\_id   | Unique identifier of the Box file           |
| file\_name | Name of the file                            |
| prompt     | The question sent to Box AI                 |
| answer     | The AI-generated response for that document |
| folder\_id | ID of the parent folder                     |

#### Stream AI extract folders

| Field           | Description                         |
| --------------- | ----------------------------------- |
| file\_id        | Unique identifier of the Box file   |
| file\_name      | Name of the file                    |
| prompt          | The extraction prompt used          |
| extracted\_data | Key-value pairs extracted by Box AI |
| folder\_id      | ID of the parent folder             |

#### Stream AI extract structured folders

| Field              | Description                                     |
| ------------------ | ----------------------------------------------- |
| file\_id           | Unique identifier of the Box file               |
| file\_name         | Name of the file                                |
| structured\_fields | Extracted fields matching the predefined schema |
| folder\_id         | ID of the parent folder                         |

## Common field combinations

* **file\_name + text\_content** — useful for full-text search pipelines or feeding content into AI destinations
* **file\_name + answer** — great for summarization or Q\&A dashboards built from document libraries
* **file\_name + structured\_fields** — ideal for turning contracts, invoices, or forms into structured database records

## Use cases by role

{% tabs %}
{% tab title="Operations" %}

* Extract text from policy documents or SOPs and load into a searchable Google Sheet
* Use AI extract to pull key dates and parties from contracts into BigQuery for tracking
* Feed structured extraction results into an AI tool like Claude or ChatGPT for further processing
  {% endtab %}

{% tab title="Legal & Compliance" %}

* Run structured extraction on NDAs or agreements to capture counterparty, date, and jurisdiction fields
* Ask Box AI a compliance question across all files in a folder and collect the answers as a dataset
* Append results from multiple folder extractions to build a unified contract register
  {% endtab %}

{% tab title="Finance" %}

* Extract invoice data from scanned PDFs stored in Box using the structured AI entity
* Pull vendor names, amounts, and due dates into a spreadsheet automatically
* Use the Aggregate transformation in Coupler.io to summarize extracted invoice totals by vendor or period
  {% endtab %}
  {% endtabs %}

## Platform-specific notes

* Box AI features (AI ask, AI extract) require a **Box AI-enabled plan** — not all Box accounts include this
* The Folder ID is a numeric string visible in the Box URL when you open a folder (e.g., `/folder/123456789`)
* Enabling **Recursive** will process all subfolders, which can significantly increase the number of API calls and processing time for large folder trees
* Each document in the folder is processed individually — very large files or folders with hundreds of files may take longer to sync
* Supported file types for text extraction depend on Box's representation API; PDFs, Word documents, and plain text files are typically supported
