Extract a Subdocument

This operation extracts a subdocument from a document or file.

📘

Prerequisites

Ensure you have learned the Document Processing Book before using this operation. After learning the Book, make sure to publish your Agent and create a new Playground for it to take effect.

Overview

This operation extracts a subdocument from a document or file. A subdocument is a subsection of a larger document.

Syntax

The syntax for this operation begins with extract subdocument from the document. Nested within this statement, you must specify the subdocument to be extracted.

⚠️

If the subdocument to be extracted is not specified, a question will be raised.

Ways to Specify a Subdocument for Extraction

1. Page Numbers

Specify the start and/or end page numbers of the subdocument:

extract subdocument from the document
    the start page is {start}
    the end page is {end}

2. Start and End Markers

Use textual markers to define where the subdocument begins and/or ends:

Included End Marker

extract subdocument from the document
    the start page marker is "{start marker}"
    the end page marker is "{end marker}"

Excluded End Marker

extract subdocument from the document
    the start page marker is "{start marker}"
    the excluded end page marker is "{excluded end marker}"

ℹ️

Included vs. Excluded End Markers

  • An included end marker (or just end marker) includes the specified page in the subdocument.
  • An excluded end marker does not include the specified page in the subdocument.

Optional: Specifying the OpenAI Model

Optionally, the OpenAI model can be specified for the document extraction.

the openai model is "{model}"

Data

Components

This table lists the names of the data components in this operation and their properties.

LabelRenamableRequired
the documentYesRequired
the start pageNoOptional
the end pageNoOptional
the start page markerNoOptional
the end page markerNoOptional
the excluded end page markerNoOptional
the openai modelNoOptional

Values

The table below lists the parameters in the operation. Parameters are placeholders for data values. In the operation's syntax, replace the parameters with your own data values as needed.

ParameterDescriptionExample ValueRequired
startThe starting page number. Defaults to the first page.2Optional
endThe ending page number. Defaults to the last page.5Optional
start markerText indicating where the subdocument should begin.Section 2Optional
end markerText indicating where the subdocument should end, including the specified marker.Page is the beginning of an invoiceOptional
excluded end markerText indicating where the subdocument should end, excluding the specified marker.Page containing the text 'Conclusion'Optional
modelThe OpenAI model used for the extraction.gpt-4oOptional

Examples

1. Extracting by Page Numbers

extract subdocument from the file
	the start page is 2
	the end page is 5

2. Extracting by Page Markers

Included End Marker

extract subdocument from the report
	the start page marker is "Page containing the text 'Introduction'"
	the end page marker is "Page containing the text 'Conclusion'"

Excluded End Marker

extract subdocument from the manuscript
	the start page marker is "Page is the beginning of an invoice"
	the excluded end page marker is "Page is the beginning of a different invoice"

3. Extracting by Page Numbers with OpenAI Model

extract subdocument from the transcript
	the start page is 2
	the end page is 5
	the openai model is "gpt-4o-mini"

4. Extracting by Start Marker Only

extract subdocument from the document
	the start page marker is "Invoice 230320-01"