Extract Subdocument

Extracts a subdocument from a document or file based on page numbers or markers.

Overview

This procedure extracts a subdocument from a document or file. A subdocument is a continuous subsection of a larger document. You can specify the section to extract using page numbers or by using markers that describe the content. The extracted subdocument can then be processed independently in your automation workflow.

Make sure to add the Document Processing Book to your agent before using this automation procedure.

Syntax

Below is a line-by-line overview of the automation syntax. Expand each line to learn more.

extract subdocument from {the source}

What does it do?

Begins subdocument extraction from the specified source

Where does it go?

This phrase should be written on a new line.

Is it required?

✅ Yes — This phrase is required.

Does it require data?

✅ Yes — Replace the source with a reference to a document or file.

the start page is x

What does it do?

Specifies the first page of the subdocument.

Where does it go?

Indented under extract subdocument from {the source}.

Is it required?

❌ No — This phrase is optional.

Does it require data?

✅ Yes — Replace x with a starting page number (1-based index). Defaults to the first page.

Example

the start page is 2

the end page is y

What does it do?

Specifies the last page of the subdocument.

Where does it go?

Indented under extract subdocument from {the source}.

Is it required?

❌ No — This phrase is optional.

Does it require data?

✅ Yes — Replace y with a page number (1-based index). Defaults to the last page.

Example

the end page is 5

the start page marker is "start-description"

What does it do?

Uses AI to find the starting page based on content description.

Where does it go?

Indented under extract subdocument from {the source}.

Is it required?

❌ No — This phrase is optional.

Does it require data?

✅ Yes — Replace start-description with a description of what content marks the start.

Example

the start page marker is "Page containing the text 'Introduction'"

the end page marker is "end-description"

What does it do?

Uses AI to find the ending page based on content description (inclusive).

Where does it go?

Indented under extract subdocument from {the source}.

Is it required?

❌ No — This phrase is optional.

Does it require data?

✅ Yes — Replace end-description with a description of what content marks the end.

Example

the end page marker is "Page containing the text 'Conclusion'"

the excluded end page marker is "excluded-end-description"

What does it do?

Uses AI to find the ending page based on content description (exclusive).

Where does it go?

Indented under extract subdocument from {the source}.

Is it required?

❌ No — This phrase is optional.

Does it require data?

✅ Yes — Replace excluded-end-description with a description of what content marks the end (page not included).

Example

the excluded end page marker is "Page containing the text 'Next Section'"

the subdocument size is n

What does it do?

Limits the subdocument to a maximum number of pages.

Where does it go?

Indented under extract subdocument from {the source}.

Is it required?

❌ No — This phrase is optional.

Does it require data?

✅ Yes — Replace n with the maximum number of pages.

Example

the subdocument size is 3

the openai model is "openai-model"

What does it do?

Specifies the OpenAI model to use for marker-based extraction.

Where does it go?

Indented under extract subdocument from {the source}.

Is it required?

❌ No — This phrase is optional.

Does it require data?

✅ Yes — Replace openai-model with a valid OpenAI model.

Example

the openai model is "gpt-4o"

Examples

1. Extract by Page Numbers

In this example, pages 2 through 5 are extracted as a single subdocument.

extract subdocument from the document where
    the start page is 2
    the end page is 5

2. Extract by Content Markers

Extracts from 'Introduction' section to before 'Conclusion' section.

extract subdocument from the document where
    the start page marker is "Page containing the text 'Introduction'"
    the excluded end page marker is "Page containing the text 'Conclusion'"

3. Extract with Size Limit

Extracts starting from a specific form, limited to 1 page.

extract subdocument from the file where
    the start page marker is "Page containing the text 'MERCHANT SERVICES BANK ACCOUNT CHANGE REQUEST FORM.'"
    the subdocument size is 1

4. Extracting by Page Numbers with OpenAI Model

extract subdocument from the transcript
    the start page is 2
    the end page is 5
    the openai model is "gpt-4o-mini"

5. Extract Section by Inclusive End Marker

Extracts from Section 2 start through Section 2 end (inclusive).

extract subdocument from the document where
    the start page marker is "The start of the Section 2"
    the end page marker is "The end of the Section 2"

PreviousExtract Pages NextExtract Multiple Subdocuments

Last updated 4 months ago

Was this helpful?