# Extract Tables

### Overview

This procedure extracts structured table data from various sources including documents, text, and files. It uses AI to identify and parse table structures, even when they don't have clear visual boundaries. You can customize the extraction with specific descriptions, models, and processing modes for optimal accuracy.

{% hint style="warning" %}
Make sure to add the **Document Processing Book** to your agent before using this automation procedure.
{% endhint %}

### Syntax

Below is a line-by-line overview of the automation syntax. Expand each line to learn more.

<details>

<summary><code>extract a table from {the source}</code></summary>

#### What does it do?

Begins table extraction from the specified source.

#### Where does it go?

This phrase should be written on a **new line**.

#### Is it required?

✅ Yes — This phrase is **required**.

#### Does it require data?

✅ Yes — Replace **the source** with the document, text, or file from which to extract the table.

</details>

<details>

<summary><code>the description is "table-description"</code></summary>

#### What does it do?

Describes the table structure and content to be extracted.

#### Where does it go?

Indented under `extract a table from {the source}`.

#### Is it required?

✅ Yes — This phrase is **required**.

#### Does it require data?

✅ Yes — Replace **table-description** with a detailed description of the table you want to extract.

#### Example

```
the description is "A table with invoice number, date and amount columns"
```

</details>

<details>

<summary><code>the openai model is "openai-model"</code></summary>

#### What does it do?

Specifies the OpenAI model to use for extraction.

#### Where does it go?

Indented under `extract a table from {the source}`.

#### Is it required?

❌ No — This phrase is **optional**.

#### Does it require data?

✅ Yes — Replace **openai-model** with a valid OpenAI model name. The default is `gpt-4o`.

#### Example

```
the openai model is "gpt-4o"
```

</details>

<details>

<summary><code>the gemini model is "gemini-model"</code></summary>

#### What does it do?

Specifies the Gemini model to use for extraction.

#### Where does it go?

Indented under `extract a table from {the source}`.

#### Is it required?

❌ No — This phrase is **optional**.

#### Does it require data?

✅ Yes — Replace **gemini-model** with a valid Gemini model name. The default is `gemini-2.5-pro`.

#### Example

```
the gemini model is "gemini-2.0-flash"
```

</details>

<details>

<summary><code>the visual reference is the document</code></summary>

#### What does it do?

Specifies a visual reference to guide the extraction.

#### Where does it go?

Indented under `extract a table from {the source}`.

#### Is it required?

❌ No — This phrase is **optional**.

#### Does it require data?

✅ Yes — A reference to the document must be defined in the automation.

</details>

<details>

<summary><code>the extraction mode is {"precise" | "no ocr"}</code></summary>

#### What does it do?

Specifies the extraction mode for improved accuracy.

#### Where does it go?

Indented under `extract a table from {the source}`.

#### Is it required?

❌ No — This phrase is **optional**.

#### Does it require data?

✅ Yes — Use "**precise**" to use location data for higher accuracy or use "**no ocr**" to skip OCR *(Optical Character Recognition)*.

#### Example

```
the extraction mode is "precise"
```

</details>

<details>

<summary><code>the subdocument size is s</code></summary>

#### What does it do?

Specifies the maximum number of pages per subdocument for large documents.

#### Where does it go?

Indented under `extract a table from {the source}`.

#### Is it required?

❌ No — This phrase is **optional**.

#### Does it require data?

✅ Yes — Replace **s** with a number representing the page limit per subdocument.

#### Example

```
the subdocument size is 5
```

</details>

<details>

<summary><code>the creativity is x</code></summary>

#### What does it do?

Adjusts the creativity of the response.

#### Where does it go?

Indented under `extract a table from {the source}`.

#### Is it required?

❌ No — This phrase is **optional**.

#### Does it require data?

✅ Yes — Replace **x** with a number between 0.0 and 1.0. Higher values produce more creative responses.

#### Example

```
the creativity is 0.2
```

</details>

<details>

<summary><code>the strict mode is "mode"</code></summary>

#### What does it do?

Controls table extraction behavior.

#### Where does it go?

Indented under `extract a table from {the source}`.

#### Is it required?

❌ No — This phrase is **optional**.

#### Does it require data?

✅ Yes — Replace **mode** with "**on**" for two-pass extraction with header validation (default), or "**off**" for single-pass extraction without validation. The default is `on`.

#### Example

```
the strict mode is "off"
```

</details>

### Examples

#### 1. Extract Invoice Table with Precise Mode

```
extract a table from the document where
    the openai model is "gpt-4o"
    the visual reference is the document
    the extraction mode is "precise"
    the description is "The table has invoice number, date and amount columns. Remove currency symbols from amounts. Keep only the first 4 digits of invoice numbers."
```

#### 2. Extract Simple Table from Text

```
extract a table from the text where
    the description is "A table with employee names and their departments"
    the gemini model is "gemini-2.0-flash"
```

#### 3. Extract Table from Large Document

```
extract a table from the document where
    the openai model is "gpt-4o"
    the subdocument size is 3
    the description is "Transaction history table with date, description, and amount columns"
```


---

# Agent Instructions: Querying This Documentation

If you need additional information that is not directly available in this page, you can query the documentation dynamically by asking a question.

Perform an HTTP GET request on the current page URL with the `ask` query parameter:

```
GET https://docs.kognitos.com/legacy/legacy-experience/automation-areas/llm/automation-procedures/extract-tables.md?ask=<question>
```

The question should be specific, self-contained, and written in natural language.
The response will contain a direct answer to the question and relevant excerpts and sources from the documentation.

Use this mechanism when the answer is not explicitly present in the current page, you need clarification or additional context, or you want to retrieve related documentation sections.