# Extract Subdocument

### Overview

This procedure extracts a **subdocument** from a document or file. A subdocument is a continuous subsection of a larger document. You can specify the section to extract using page numbers or by using markers that describe the content. The extracted subdocument can then be processed independently in your automation workflow.

{% hint style="warning" %}
Make sure to add the **Document Processing Book** to your agent before using this automation procedure.
{% endhint %}

### Syntax

Below is a line-by-line overview of the automation syntax. Expand each line to learn more.

<details>

<summary><code>extract subdocument from {the source}</code></summary>

#### What does it do?

Begins subdocument extraction from the specified source

#### Where does it go?

This phrase should be written on a **new line**.

#### Is it required?

✅ Yes — This phrase is **required**.

#### Does it require data?

✅ Yes — Replace **the source** with a reference to a document or file.

</details>

<details>

<summary><code>the start page is x</code></summary>

#### What does it do?

Specifies the first page of the subdocument.

#### Where does it go?

Indented under `extract subdocument from {the source}`.

#### Is it required?

❌ No — This phrase is **optional**.

#### Does it require data?

✅ Yes — Replace **x** with a starting page number (1-based index). Defaults to the first page.

#### Example

```
the start page is 2
```

</details>

<details>

<summary><code>the end page is y</code></summary>

#### What does it do?

Specifies the last page of the subdocument.

#### Where does it go?

Indented under `extract subdocument from {the source}`.

#### Is it required?

❌ No — This phrase is **optional**.

#### Does it require data?

✅ Yes — Replace **y** with a page number (1-based index). Defaults to the last page.

#### Example

```
the end page is 5
```

</details>

<details>

<summary><code>the start page marker is "start-description"</code></summary>

#### What does it do?

Uses AI to find the starting page based on content description.

#### Where does it go?

Indented under `extract subdocument from {the source}`.

#### Is it required?

❌ No — This phrase is **optional**.

#### Does it require data?

✅ Yes — Replace **start-description** with a description of what content marks the start.

#### Example

```
the start page marker is "Page containing the text 'Introduction'"
```

</details>

<details>

<summary><code>the end page marker is "end-description"</code></summary>

#### What does it do?

Uses AI to find the ending page based on content description *(inclusive)*.

#### Where does it go?

Indented under `extract subdocument from {the source}`.

#### Is it required?

❌ No — This phrase is **optional**.

#### Does it require data?

✅ Yes — Replace **end-description** with a description of what content marks the end.

#### Example

```
the end page marker is "Page containing the text 'Conclusion'"
```

</details>

<details>

<summary><code>the excluded end page marker is "excluded-end-description"</code></summary>

#### What does it do?

Uses AI to find the ending page based on content description *(exclusive)*.

#### Where does it go?

Indented under `extract subdocument from {the source}`.

#### Is it required?

❌ No — This phrase is **optional**.

#### Does it require data?

✅ Yes — Replace **excluded-end-description** with a description of what content marks the end *(page not included)*.

#### Example

```
the excluded end page marker is "Page containing the text 'Next Section'"
```

</details>

<details>

<summary><code>the subdocument size is n</code></summary>

#### What does it do?

Limits the subdocument to a maximum number of pages.

#### Where does it go?

Indented under `extract subdocument from {the source}`.

#### Is it required?

❌ No — This phrase is **optional**.

#### Does it require data?

✅ Yes — Replace **n** with the maximum number of pages.

#### Example

```
the subdocument size is 3
```

</details>

<details>

<summary><code>the openai model is "openai-model"</code></summary>

#### What does it do?

Specifies the OpenAI model to use for marker-based extraction.

#### Where does it go?

Indented under `extract subdocument from {the source}`.

#### Is it required?

❌ No — This phrase is **optional**.

#### Does it require data?

✅ Yes — Replace **openai-model** with a valid [OpenAI model](https://docs.kognitos.com/llms#available-llm-models).

#### Example

```
the openai model is "gpt-4o"
```

</details>

### Examples

#### 1. Extract by Page Numbers

In this example, pages 2 through 5 are extracted as a single subdocument.

```
extract subdocument from the document where
    the start page is 2
    the end page is 5
```

#### 2. Extract by Content Markers

Extracts from 'Introduction' section to before 'Conclusion' section.

```
extract subdocument from the document where
    the start page marker is "Page containing the text 'Introduction'"
    the excluded end page marker is "Page containing the text 'Conclusion'"
```

#### 3. Extract with Size Limit

Extracts starting from a specific form, limited to 1 page.

```
extract subdocument from the file where
    the start page marker is "Page containing the text 'MERCHANT SERVICES BANK ACCOUNT CHANGE REQUEST FORM.'"
    the subdocument size is 1
```

#### 4. Extracting by Page Numbers with OpenAI Model

```
extract subdocument from the transcript
    the start page is 2
    the end page is 5
    the openai model is "gpt-4o-mini"
```

#### 5. Extract Section by Inclusive End Marker

Extracts from Section 2 start through Section 2 end (inclusive).

```
extract subdocument from the document where
    the start page marker is "The start of the Section 2"
    the end page marker is "The end of the Section 2"
```
