Extract Multiple Subdocuments
Extracts multiple subdocuments from a document using markers or fixed-size chunking.
Overview
This procedure extracts multiple subdocuments from a large document by either identifying recurring content patterns (like invoices or chapters) or by splitting the document into fixed-size chunks with optional overlap. Each subdocument becomes a separate document that can be processed independently, making it ideal for batch processing of multi-document files.
Make sure to add the Document Processing Book to your agent before using this automation procedure.
Syntax
Below is a line-by-line overview of the automation syntax. Expand each line to learn more.
Examples
1. Extract Non-Contiguous Invoice Subdocuments
Extracts only invoices from a mixed document containing invoices and BOL documents.
2. Extract Invoice Subdocuments with Field Extraction
Splits a batch invoice file into individual invoices and extracts key fields.
3. Extract Invoices with Inclusive End Marker
Extracts invoices from start to a page containing the invoice total (included).
4. Extract Fixed-Size Chunks with Overlap
Splits a large report into 5-page chunks with 1-page overlap.
5. Extract Fixed-Size Chunks without Overlap
Splits a document into 5-page chunks with no overlap.
6. Extract Chapter-Based Subdocuments
Splits a document by identifying chapter beginnings.
Last updated
Was this helpful?
