> For the complete documentation index, see [llms.txt](https://docs.coupler.io/llms.txt). Markdown versions of documentation pages are available by appending `.md` to page URLs; this page is available as [Markdown](https://docs.coupler.io/sources/category/files-and-tables/box-data-extract/data-overview.md).

# Data Overview

Box Data Extract surfaces content from your Box documents in four different ways, depending on how much processing you want Box AI to apply. Each entity targets the same folder but returns a different representation of the documents inside it.

## Entities

| Entity                               | What it returns                                                 |
| ------------------------------------ | --------------------------------------------------------------- |
| Stream text representation folders   | Plain text extracted from each document                         |
| Stream AI ask folders                | AI-generated answer to your custom prompt, per document         |
| Stream AI extract folders            | Key-value metadata extracted by Box AI based on your prompt     |
| Stream AI extract structured folders | Structured fields extracted by Box AI using a predefined schema |

## Fields by entity

#### Stream text representation folders

| Field         | Description                           |
| ------------- | ------------------------------------- |
| file\_id      | Unique identifier of the Box file     |
| file\_name    | Name of the file                      |
| text\_content | Full extracted text from the document |
| folder\_id    | ID of the parent folder               |

#### Stream AI ask folders

| Field      | Description                                 |
| ---------- | ------------------------------------------- |
| file\_id   | Unique identifier of the Box file           |
| file\_name | Name of the file                            |
| prompt     | The question sent to Box AI                 |
| answer     | The AI-generated response for that document |
| folder\_id | ID of the parent folder                     |

#### Stream AI extract folders

| Field           | Description                         |
| --------------- | ----------------------------------- |
| file\_id        | Unique identifier of the Box file   |
| file\_name      | Name of the file                    |
| prompt          | The extraction prompt used          |
| extracted\_data | Key-value pairs extracted by Box AI |
| folder\_id      | ID of the parent folder             |

#### Stream AI extract structured folders

| Field              | Description                                     |
| ------------------ | ----------------------------------------------- |
| file\_id           | Unique identifier of the Box file               |
| file\_name         | Name of the file                                |
| structured\_fields | Extracted fields matching the predefined schema |
| folder\_id         | ID of the parent folder                         |

## Common field combinations

* **file\_name + text\_content** — useful for full-text search pipelines or feeding content into AI destinations
* **file\_name + answer** — great for summarization or Q\&A dashboards built from document libraries
* **file\_name + structured\_fields** — ideal for turning contracts, invoices, or forms into structured database records

## Use cases by role

{% tabs %}
{% tab title="Operations" %}

* Extract text from policy documents or SOPs and load into a searchable Google Sheet
* Use AI extract to pull key dates and parties from contracts into BigQuery for tracking
* Feed structured extraction results into an AI tool like Claude or ChatGPT for further processing
  {% endtab %}

{% tab title="Legal & Compliance" %}

* Run structured extraction on NDAs or agreements to capture counterparty, date, and jurisdiction fields
* Ask Box AI a compliance question across all files in a folder and collect the answers as a dataset
* Append results from multiple folder extractions to build a unified contract register
  {% endtab %}

{% tab title="Finance" %}

* Extract invoice data from scanned PDFs stored in Box using the structured AI entity
* Pull vendor names, amounts, and due dates into a spreadsheet automatically
* Use the Aggregate transformation in Coupler.io to summarize extracted invoice totals by vendor or period
  {% endtab %}
  {% endtabs %}

## Platform-specific notes

* Box AI features (AI ask, AI extract) require a **Box AI-enabled plan** — not all Box accounts include this
* The Folder ID is a numeric string visible in the Box URL when you open a folder (e.g., `/folder/123456789`)
* Enabling **Recursive** will process all subfolders, which can significantly increase the number of API calls and processing time for large folder trees
* Each document in the folder is processed individually — very large files or folders with hundreds of files may take longer to sync
* Supported file types for text extraction depend on Box's representation API; PDFs, Word documents, and plain text files are typically supported


---

# Agent Instructions
This documentation is published with GitBook. GitBook is the documentation platform designed so that both humans and AI agents can read, navigate, and reason over technical content effectively. Learn more at gitbook.com.

## Querying This Documentation
If you need additional information that is not directly available in this page, you can query the documentation dynamically by asking a question.

Perform an HTTP GET request on the current page URL with the `ask` query parameter, and the optional `goal` query parameter:

```
GET https://docs.coupler.io/sources/category/files-and-tables/box-data-extract/data-overview.md?ask=<question>&goal=<endgoal>
```

`ask` is the immediate question: it should be specific, self-contained, and written in natural language.
`goal` is optional and describes the broader end goal you are ultimately trying to accomplish on behalf of the user. GitBook uses it to tailor the answer towards what is most useful for that goal.

The response will contain a direct answer to the question and relevant excerpts and sources from the documentation.

Use this mechanism when the answer is not explicitly present in the current page, you need clarification or additional context, or you want to retrieve related documentation sections.