Extracts a subdocument from a document or file based on page numbers or markers.
Overview
This procedure extracts a subdocument from a document or file. A subdocument is a continuous subsection of a larger document. You can specify the section to extract using page numbers or by using markers that describe the content. The extracted subdocument can then be processed independently in your automation workflow.
Make sure to add the Document Processing Book to your agent before using this automation procedure.
Syntax
Below is a line-by-line overview of the automation syntax. Expand each line to learn more.
extract subdocument from {the source}
What does it do?
Begins subdocument extraction from the specified source
Where does it go?
This phrase should be written on a new line.
Is it required?
✅ Yes — This phrase is required.
Does it require data?
✅ Yes — Replace the source with a reference to a document or file.
the start page is x
What does it do?
Specifies the first page of the subdocument.
Where does it go?
Indented under extract subdocument from {the source}.
Is it required?
❌ No — This phrase is optional.
Does it require data?
✅ Yes — Replace x with a starting page number (1-based index). Defaults to the first page.
Example
the start page is 2
the end page is y
What does it do?
Specifies the last page of the subdocument.
Where does it go?
Indented under extract subdocument from {the source}.
Is it required?
❌ No — This phrase is optional.
Does it require data?
✅ Yes — Replace y with a page number (1-based index). Defaults to the last page.
Example
the end page is 5
the start page marker is "start-description"
What does it do?
Uses AI to find the starting page based on content description.
Where does it go?
Indented under extract subdocument from {the source}.
Is it required?
❌ No — This phrase is optional.
Does it require data?
✅ Yes — Replace start-description with a description of what content marks the start.
Example
the start page marker is "Page containing the text 'Introduction'"
the end page marker is "end-description"
What does it do?
Uses AI to find the ending page based on content description (inclusive).
Where does it go?
Indented under extract subdocument from {the source}.
Is it required?
❌ No — This phrase is optional.
Does it require data?
✅ Yes — Replace end-description with a description of what content marks the end.
Example
the end page marker is "Page containing the text 'Conclusion'"
the excluded end page marker is "excluded-end-description"
What does it do?
Uses AI to find the ending page based on content description (exclusive).
Where does it go?
Indented under extract subdocument from {the source}.
Is it required?
❌ No — This phrase is optional.
Does it require data?
✅ Yes — Replace excluded-end-description with a description of what content marks the end (page not included).
Example
the excluded end page marker is "Page containing the text 'Next Section'"
the subdocument size is n
What does it do?
Limits the subdocument to a maximum number of pages.
Where does it go?
Indented under extract subdocument from {the source}.
Is it required?
❌ No — This phrase is optional.
Does it require data?
✅ Yes — Replace n with the maximum number of pages.
Example
the subdocument size is 3
the openai model is "openai-model"
What does it do?
Specifies the OpenAI model to use for marker-based extraction.
Where does it go?
Indented under extract subdocument from {the source}.
Is it required?
❌ No — This phrase is optional.
Does it require data?
✅ Yes — Replace openai-model with a valid OpenAI model.
Example
the openai model is "gpt-4o"
Examples
1. Extract by Page Numbers
In this example, pages 2 through 5 are extracted as a single subdocument.
extract subdocument from the document where
the start page is 2
the end page is 5
2. Extract by Content Markers
Extracts from 'Introduction' section to before 'Conclusion' section.
extract subdocument from the document where
the start page marker is "Page containing the text 'Introduction'"
the excluded end page marker is "Page containing the text 'Conclusion'"
3. Extract with Size Limit
Extracts starting from a specific form, limited to 1 page.
extract subdocument from the file where
the start page marker is "Page containing the text 'MERCHANT SERVICES BANK ACCOUNT CHANGE REQUEST FORM.'"
the subdocument size is 1
4. Extracting by Page Numbers with OpenAI Model
extract subdocument from the transcript
the start page is 2
the end page is 5
the openai model is "gpt-4o-mini"
5. Extract Section by Inclusive End Marker
Extracts from Section 2 start through Section 2 end (inclusive).
extract subdocument from the document where
the start page marker is "The start of the Section 2"
the end page marker is "The end of the Section 2"