Extract Tables

Extracts table data from documents, text, or files using AI-powered analysis.

Overview

This procedure extracts structured table data from various sources including documents, text, and files. It uses AI to identify and parse table structures, even when they don't have clear visual boundaries. You can customize the extraction with specific descriptions, models, and processing modes for optimal accuracy.

Make sure to add the Document Processing Book to your agent before using this automation procedure.

Syntax

Below is a line-by-line overview of the automation syntax. Expand each line to learn more.

extract a table from {the source}

What does it do?

Begins table extraction from the specified source.

Where does it go?

This phrase should be written on a new line.

Is it required?

✅ Yes — This phrase is required.

Does it require data?

✅ Yes — Replace the source with the document, text, or file from which to extract the table.

the description is "table-description"

What does it do?

Describes the table structure and content to be extracted.

Where does it go?

Indented under extract a table from {the source}.

Is it required?

✅ Yes — This phrase is required.

Does it require data?

✅ Yes — Replace table-description with a detailed description of the table you want to extract.

Example

the description is "A table with invoice number, date and amount columns"

the openai model is "openai-model"

What does it do?

Specifies the OpenAI model to use for extraction.

Where does it go?

Indented under extract a table from {the source}.

Is it required?

❌ No — This phrase is optional.

Does it require data?

✅ Yes — Replace openai-model with a valid OpenAI model name. The default is gpt-4o.

Example

the openai model is "gpt-4o"

the gemini model is "gemini-model"

What does it do?

Specifies the Gemini model to use for extraction.

Where does it go?

Indented under extract a table from {the source}.

Is it required?

❌ No — This phrase is optional.

Does it require data?

✅ Yes — Replace gemini-model with a valid Gemini model name. The default is gemini-2.5-pro.

Example

the gemini model is "gemini-2.0-flash"

the visual reference is the document

What does it do?

Specifies a visual reference to guide the extraction.

Where does it go?

Indented under extract a table from {the source}.

Is it required?

❌ No — This phrase is optional.

Does it require data?

✅ Yes — A reference to the document must be defined in the automation.

the extraction mode is {"precise" | "no ocr"}

What does it do?

Specifies the extraction mode for improved accuracy.

Where does it go?

Indented under extract a table from {the source}.

Is it required?

❌ No — This phrase is optional.

Does it require data?

✅ Yes — Use "precise" to use location data for higher accuracy or use "no ocr" to skip OCR (Optical Character Recognition).

Example

the extraction mode is "precise"

the subdocument size is s

What does it do?

Specifies the maximum number of pages per subdocument for large documents.

Where does it go?

Indented under extract a table from {the source}.

Is it required?

❌ No — This phrase is optional.

Does it require data?

✅ Yes — Replace s with a number representing the page limit per subdocument.

Example

the subdocument size is 5

the creativity is x

What does it do?

Adjusts the creativity of the response.

Where does it go?

Indented under extract a table from {the source}.

Is it required?

❌ No — This phrase is optional.

Does it require data?

✅ Yes — Replace x with a number between 0.0 and 1.0. Higher values produce more creative responses.

Example

the creativity is 0.2

the strict mode is "mode"

What does it do?

Controls table extraction behavior.

Where does it go?

Indented under extract a table from {the source}.

Is it required?

❌ No — This phrase is optional.

Does it require data?

✅ Yes — Replace mode with "on" for two-pass extraction with header validation (default), or "off" for single-pass extraction without validation. The default is on.

Example

the strict mode is "off"

Examples

1. Extract Invoice Table with Precise Mode

extract a table from the document where
    the openai model is "gpt-4o"
    the visual reference is the document
    the extraction mode is "precise"
    the description is "The table has invoice number, date and amount columns. Remove currency symbols from amounts. Keep only the first 4 digits of invoice numbers."

2. Extract Simple Table from Text

extract a table from the text where
    the description is "A table with employee names and their departments"
    the gemini model is "gemini-2.0-flash"

3. Extract Table from Large Document

extract a table from the document where
    the openai model is "gpt-4o"
    the subdocument size is 3
    the description is "Transaction history table with date, description, and amount columns"

PreviousExtract Multiple Subdocuments NextIdentify Elements in a Text

Last updated 2 months ago

Was this helpful?