Extract Multiple Subdocuments
Extracts multiple subdocuments from a document using markers or fixed-size chunking.
Overview
This procedure extracts multiple subdocuments from a large document by either identifying recurring content patterns (like invoices or chapters) or by splitting the document into fixed-size chunks with optional overlap. Each subdocument becomes a separate document that can be processed independently, making it ideal for batch processing of multi-document files.
Make sure to add the Document Processing Book to your agent before using this automation procedure.
Syntax
Below is a line-by-line overview of the automation syntax. Expand each line to learn more.
Examples
1. Extract Invoice Subdocuments with Field Extraction
Splits a batch invoice file into individual invoices and extracts key fields.
extract subdocuments from the document where
the start page marker is "The beginning of a new invoice"
the first field is "invoice number"
the first field's format is "string"
the second field is "invoice date"
the second field's format is "string"
2. Extract Fixed-Size Chunks with Overlap
Splits a large report into 5-page chunks with 1-page overlap.
extract subdocuments from the report where
the subdocument size is 5
the subdocument overlap size is 1
3. Extract Fixed-Size Chunks without Overlap
Splits a document into 5-page chunks with no overlap.
extract subdocuments from the document where
the subdocument size is 5
4. Extract Chapter-Based Subdocuments
Splits a document by identifying chapter beginnings.
extract subdocuments from the document where
the start page marker is "Page containing the text 'Chapter'"
the openai model is "gpt-4o"
Last updated
Was this helpful?