Extract Subdocuments
This operation extracts subdocuments from a document or file.
Prerequisites
Ensure you have learned the Document Processing Book before using this operation. After learning the Book, make sure to publish your Agent and create a new Playground for it to take effect.
Overview
This operation extracts one or more subdocuments from a document or file. A subdocument is a subsection of a larger document.
Syntax
The syntax for this operation begins with extract subdocuments from the document
. Nested within this statement, you must specify the subdocuments to be extracted.
If the subdocuments to be extracted are not specified, a question will be raised.
Ways to Specify Subdocuments
To specify which subdocuments should be extracted, include one or more of the following lines under extract subdocuments from the document
:
Defining the Start Page Marker
extract subdocuments from the document
the start page marker is "{start marker}"
Specifying the Subdocument Size
extract subdocuments from the document
the subdocument size is {size}
Specifying the Subdocument Overlap Size
extract subdocuments from the document
the subdocument size is {size}
the subdocument overlap size is {overlap size}
OpenAI Model Specification
Optionally, the OpenAI model can be specified for the document extraction.
extract subdocuments from the document
the openai model is "{model}"
Data
Components
This table lists the names of the data components in this operation and their properties.
Label | Renamable | Required |
---|---|---|
the document | Yes | Required |
the start page marker | No | Optional |
the subdocument size | No | Optional |
the subdocument overlap size | No | Optional |
the openai model | No | Optional |
Values
The table below lists the parameters in the operation. Parameters are placeholders for data values. In the operation's syntax, replace the parameters with your own data values as needed.
Parameter | Description | Example Value | Required |
---|---|---|---|
start marker | Text indicating where a subdocument should begin. | Section 2 | Optional |
size | The maximum number of pages per subdocument. | 10 | Optional |
overlap size | The number of pages that overlap between consecutive subdocuments. | 1 | Optional |
model | The OpenAI model used for the extraction. | gpt-4o | Optional |
Examples
1. Using Start Page Marker and Additional Fields
extract subdocuments from the file
the start page marker is "Page containing the text 'Chapter'"
The first field is "invoice number"
The first field's format is "number"
The second field is "invoice date"
The second field's format is "string"
2. Using Subdocument Size & Overlap Size
extract subdocuments from the report
the subdocument size is 5
the subdocument overlap size is 1
3. Using Start Marker and OpenAI Model
extract subdocuments from the invoice
the start page marker is "Page is the beginning of a new invoice"
the openai model is gpt-"4o-mini"
Updated 2 days ago