Extract Subdocument
Extracts a subdocument from a document or file based on page numbers or markers.
Overview
This procedure extracts a subdocument from a document or file. A subdocument is a continuous subsection of a larger document. You can specify the section to extract using page numbers or by using markers that describe the content. The extracted subdocument can then be processed independently in your automation workflow.
Make sure to add the Document Processing Book to your agent before using this automation procedure.
Syntax
Below is a line-by-line overview of the automation syntax. Expand each line to learn more.
the start page is x
What does it do?
Specifies the first page of the subdocument.
Where does it go?
Indented under extract subdocument from {the source}.
Is it required?
❌ No — This phrase is optional.
Does it require data?
✅ Yes — Replace x with a starting page number (1-based index). Defaults to the first page.
Example
the start page is 2the end page is y
What does it do?
Specifies the last page of the subdocument.
Where does it go?
Indented under extract subdocument from {the source}.
Is it required?
❌ No — This phrase is optional.
Does it require data?
✅ Yes — Replace y with a page number (1-based index). Defaults to the last page.
Example
the end page is 5the start page marker is "start-description"
What does it do?
Uses AI to find the starting page based on content description.
Where does it go?
Indented under extract subdocument from {the source}.
Is it required?
❌ No — This phrase is optional.
Does it require data?
✅ Yes — Replace start-description with a description of what content marks the start.
Example
the start page marker is "Page containing the text 'Introduction'"the end page marker is "end-description"
What does it do?
Uses AI to find the ending page based on content description (inclusive).
Where does it go?
Indented under extract subdocument from {the source}.
Is it required?
❌ No — This phrase is optional.
Does it require data?
✅ Yes — Replace end-description with a description of what content marks the end.
Example
the excluded end page marker is "excluded-end-description"
What does it do?
Uses AI to find the ending page based on content description (exclusive).
Where does it go?
Indented under extract subdocument from {the source}.
Is it required?
❌ No — This phrase is optional.
Does it require data?
✅ Yes — Replace excluded-end-description with a description of what content marks the end (page not included).
Example
the openai model is "openai-model"
What does it do?
Specifies the OpenAI model to use for marker-based extraction.
Where does it go?
Indented under extract subdocument from {the source}.
Is it required?
❌ No — This phrase is optional.
Does it require data?
✅ Yes — Replace openai-model with a valid OpenAI model.
Example
Examples
1. Extract by Page Numbers
In this example, pages 2 through 5 are extracted as a single subdocument.
2. Extract by Content Markers
Extracts from 'Introduction' section to before 'Conclusion' section.
3. Extract with Size Limit
Extracts starting from a specific form, limited to 1 page.
4. Extracting by Page Numbers with OpenAI Model
5. Extract Section by Inclusive End Marker
Extracts from Section 2 start through Section 2 end (inclusive).
Last updated
Was this helpful?
