Intelligent Document Processing (IDP)
Overview of the Intelligent Document Processing (IDP) integration.
The following documentation is for Intelligent Document Processing (IDP) v4.15.0.
Overview
Intelligent Document Processing (IDP) extracts structured data from documents using AI. This integration enables automated document analysis, data extraction, and intelligent processing workflows.
Setup
The following integrations need to be connected to your Kognitos workspace:
Intelligent Document Processing (IDP)
Steps
Follow these steps to connect the integration in Kognitos:
Configure
Add a name for the connection. You'll be prompted for authentication details if needed. Then, click on Connect.
Authentication
Use one of the following authentication methods to connect this integration in Kognitos. Each method has its own configuration requirements.
Connect using API Key
Connect to Anthropic API for document processing.
API Key
The Anthropic API key
sensitive
Connect using Service Account Credentials and Region
Connect to Google Vertex AI (Gemini) API for document processing.
Service Account Credentials
The Google service account credentials JSON as string
sensitive
Region
The Google Cloud region
text
Connect using API Key
Connect to OpenAI API for document processing.
API Key
The OpenAI API key
sensitive
Actions
The following actions are available in the Intelligent Document Processing (IDP) integration:
1. Analyze a thing
Analyze a document to extract its structure and content.
2. Classify a thing
Classify a document against user-defined topics or rules.
3. Extract data from a thing
Extract data from a document using field specifications.
4. Extract data from the documents
Extract data from multiple documents in a single LLM call.
5. Extract pages from a thing
Extract specified pages from a document.
6. Extract subdocument from a thing
Extract a subdocument from a document based on page numbers or markers.
7. Extract subdocuments from a thing
Extract subdocuments from a document based on markers or fixed size with overlap.
8. Extract table from a thing
Extract table from a document using field specifications.
9. Merge pages into a document
Merge multiple page documents into a single consolidated document.
10. Merge subdocuments into a document
Merge multiple subdocuments into a single consolidated document.
11. Read content from a thing
Read/extract text content from a document.
Concepts
Document analysis specification
Input for document analysis (schema-free extraction).All fields are optional.
llm_model
[Optional, default: provider default] LLM model to use. Provider defaults: gemini-2.5-pro, gpt-4o, claude-sonnet-4-5.
optional[text]
analysis_mode
[Optional, default: "single_pass"] Analysis mode. - "single_pass": Sequential per-page extraction (simplest) - "parallel": Parallel extraction with smart deduplication - "plan_based": Plan-based extraction (best for complex docs)
optional[text]
verify
[Optional, default: False] If True, performs cross-model verification to detect hallucinations.
optional[boolean]
verification_strictness
[Optional, default: "high"] Strictness level for verification. Options: "high", "moderate", "low".
optional[text]
dpi
[Optional, default: 150] DPI for image processing.
optional[number? or text]
confidence_threshold
[Optional, default: 0] Minimum confidence score (0-100) for extracted elements. Behavior: - When set to 0 (default): No threshold checking. All elements are returned regardless of confidence. Caller can inspect the confidence attribute on individual entities, key_value_pairs, and tables to decide how to handle low-confidence data. - When set to 1-100: Acts as a hard limit. If ANY element (entity, key-value pair, or table) has confidence below this threshold, raises AnalysisError. The exception contains: - analysis: The full DocumentAnalysis result (still usable) - low_confidence_elements: List of elements that failed - threshold: The threshold value that was not met Example: Set confidence_threshold=80 to fail if any element has confidence below 80. Catch AnalysisError to access both the full result and the list of problematic elements.
optional[number]
Idp document analysis
Complete analysis of a document.This is the output of schema-free document analysis, containing all detected elements including entities, tables, forms, and structure. All elements have a consistent structure with: - element_type: For UI discrimination after serialization - page_number: Page where element was found (1-indexed) - bounding_box: Optional location on page - confidence: Optional confidence score (0-100)
result_type
Top-level discriminator for API response type.
optional[text]
document_type
The type of document (e.g., invoice, receipt, form).
optional[text]
document_title
The title of the document if detected.
optional[text]
source
The source file name or identifier.
optional[text]
full_text
Full text content from markdown normalization.
optional[text]
confidence
Overall confidence score for the analysis (0-100).
optional[number]
verified
Whether the analysis has been verified.
optional[boolean]
hallucinations_detected
Whether hallucinations were detected during verification.
optional[boolean]
analysis_mode
The analysis mode used (single_pass, parallel, or plan_based).
optional[text]
Idp extracted table
A detected table in the document.This class is returned by both extract_table_from_thing (directly) and analyze_thing (as part of DocumentAnalysis.tables).
element_type
Always "table" regardless of which procedure produced it.
optional[text]
page_number
Page number where table was found (1-indexed).
optional[number]
confidence
Confidence score (0-100).
optional[number]
is_handwritten
Whether the table appears to be handwritten.
optional[boolean]
title
Table title/caption.
optional[text]
headers
Column headers if detected.
optional[list of text]
rows
Table rows (list of cell values).
optional[list of text]
num_rows
Number of rows.
optional[number]
num_cols
Number of columns.
optional[number]
arrow_table
PyArrow Table representation of the data.
optional[table?]
Idp classification input
Input for document classification.All fields are optional. At least one of topics, prompt, or classification_rules (on procedure) must be provided.
[Optional, default: None] List of ClassificationTopic objects to classify the document against. Highest priority if provided.
optional[list of idp classification topic]
prompt
[Optional, default: None] Classification instructions or context. If no topics provided, topics will be extracted from this prompt. Example: "Is this an invoice or a receipt?"
optional[text]
llm_model
[Optional, default: provider default] LLM model to use. Provider defaults: gemini-2.5-pro, gpt-4o, claude-sonnet-4-5.
optional[text]
include_reasoning
[Optional, default: True] Whether to include reasoning explanation for each classification decision.
optional[boolean]
confidence_threshold
[Optional, default: 50] Minimum confidence score (0-100) to consider a topic as matched.
optional[number]
Idp classification topic
A topic to classify the document against.Topics define what the classification should look for in a document. They can be simple labels or detailed criteria for the LLM to evaluate.
name
The topic name (e.g., "Contains PII", "Is Invoice").
optional[text]
description
Optional description to help understand the topic.
optional[text]
criteria
Optional specific criteria for evaluating this topic.
optional[text]
Idp classification result
Complete classification result for a document.Contains classification results for all evaluated topics, with an overall summary and confidence score.
result_type
Top-level API response discriminator.
optional[text]
source
Source document name or identifier.
optional[text]
List of classification results for each topic.
optional[list of idp topic classification]
summary
Overall summary of the classification results.
optional[text]
confidence
Overall confidence score (average across topics, 0-100).
optional[number]
processing_time_ms
Time taken to process the classification.
optional[number]
Idp topic classification
Classification result for a single topic.Represents whether a document matches a specific topic, with confidence scoring and optional reasoning.
element_type
Discriminator for this element type in API responses.
optional[text]
topic
The topic name that was evaluated.
optional[text]
matches
Whether the document matches this topic.
optional[boolean]
confidence
Confidence score (0-100) for this classification.
optional[number]
reasoning
Explanation for why this classification was made.
optional[text]
Field extraction specification
Input for data extraction from documents.Specify extraction requirements using either fields or prompt (not both).
[Required, one of] List of ExtractField objects defining what to extract from the document.
optional[list of json]
prompt
[Required, one of] Natural language description of what to extract. The LLM will parse this into fields automatically. Example: "Extract the invoice number, total amount, and vendor"
optional[text]
llm_model
[Optional, default: provider default] LLM model to use. Provider defaults: gemini-2.5-pro, gpt-4o, claude-sonnet-4-5.
optional[text]
common_default_value
[Optional, default: ""] Default value to use for all fields if no specific default is provided.
optional[text]
dpi
[Optional, default: 150] DPI for image processing.
optional[number? or text]
confidence_threshold
[Optional, default: 90] Minimum confidence score (0-100). Fields below this threshold cause ExtractionError.
optional[number]
generate_overlay
[Optional, default: False] If True, generates an annotated PDF with bounding boxes around extracted fields.
optional[boolean]
reconcile_locations
[Optional, default: True] If True, matches LLM-extracted boxes with precise PDF text locations for digital PDFs. Falls back to LLM locations for scanned documents.
optional[boolean]
verify
[Optional, default: False] If True, performs cross-model verification to detect hallucinations and validate extracted values exist in the source document.
optional[boolean]
verification_strictness
[Optional, default: "high"] Strictness level for verification. Options: - "high": Zero tolerance, cell-by-cell exact matching - "moderate": Allows formatting differences - "low": Only flags clear hallucinations
optional[text]
raise_exception
[Optional, default: True] If True, raises ExtractionError when fields are missing or have low confidence. If False, returns partial results.
optional[boolean]
business_rules
[Optional, default: None] File (.txt, .md, .docx) containing extraction guidelines to include in the LLM prompt.
optional[file]
Extraction result
Wrapper for extraction results providing self-describing API response.This class wraps the list of extracted fields to provide a consistent, self-describing response format that matches the pattern used by DocumentAnalysis from analyze_thing. The UI can use result_type to determine how to render the response: - "extraction_result" -> render as field list - "document_analysis" -> render as structured analysis
result_type
Type discriminator for UI rendering ("extraction_result").
optional[text]
document
Primary document filename (for single-doc extraction).
optional[text]
document_count
Number of documents processed (for multi-doc).
optional[number]
confidence
Average confidence across all fields (0-100).
optional[number]
Document field
A dataclass representing an extraction field result.This class defines the structure for field results of data extraction, including the field name, extracted values, confidence, document source, and location. It shares common attributes with KeyValuePair from analysis for consistent API responses. Common attributes with analysis models: - element_type: Type discriminator for UI rendering - page_number: 1-indexed page number - bounding_box: Location on page (BoundingBox format) - confidence: Confidence score (0-100) - verification: Validation result Extraction-specific attributes: - name: Field name (equivalent to 'key' in KeyValuePair) - values: List of extracted values - document: Source document reference - document_index: For multi-document extraction
element_type
Type of element for UI discrimination ("extracted_field").
optional[text]
name
The name of the extraction field.
optional[text]
values
List of extracted values for the field.
optional[list of list of any?]
document
Full document reference where the value was found.
optional[text]
document_index
0-based index of the document (multi-document only).
optional[number]
page_number
1-indexed page number where the value was found.
optional[number]
confidence
Confidence score (0-100).
optional[number]
Page extraction specification
Input for extracting specific pages from a document.All fields are optional.
start_page
[Optional, default: first page] First page to extract (1-based index).
optional[number]
end_page
[Optional, default: last page] Last page to extract (1-based index).
optional[number]
llm_model
[Optional, default: provider default] LLM model to use. Provider defaults: gemini-2.5-pro, gpt-4o, claude-sonnet-4-5.
optional[text]
dpi
[Optional, default: 150] DPI for image processing.
optional[number? or text]
Page extraction result
Result of extracting pages from a document.
Subdocument extraction specification
Input for extracting a subdocument based on pages or markers.All fields are optional. Use page numbers OR markers, not both.
llm_model
[Optional, default: provider default] LLM model to use. Provider defaults: gemini-2.5-pro, gpt-4o, claude-sonnet-4-5.
optional[text]
start_page
[Optional, default: first page] First page of the subdocument (1-based index).
optional[number]
end_page
[Optional, default: last page] Last page of the subdocument (1-based index).
optional[number]
start_page_marker
[Optional, default: None] Text/logic to find the starting page. Example: "Page containing 'Introduction'"
optional[text]
end_page_marker
[Optional, default: None] Text/logic to find the ending page (inclusive).
optional[text]
excluded_end_page_marker
[Optional, default: None] Text/logic to find the ending page (exclusive - page before this marker).
optional[text]
subdocument_size
[Optional, default: None] Maximum number of pages in the subdocument. Cannot be used with end markers.
optional[number]
dpi
[Optional, default: 150] DPI for image processing.
optional[number? or text]
Subdocument extraction result
Result of extracting a subdocument from a document.
result_type
Type discriminator for API response handling.
optional[text]
subdocument
The extracted subdocument as an IO object.
file
source_document
Reference to the source document.
optional[text]
start_page
Starting page number (1-based) of the subdocument.
optional[number]
end_page
Ending page number (1-based) of the subdocument.
optional[number]
Subdocuments extraction specification
Input for extracting multiple subdocuments using markers or fixed size.All fields are optional. Supports two extraction strategies: 1. Marker-based: Split at pages matching a text pattern 2. Fixed-size: Split into N-page chunks with optional overlap
llm_model
[Optional, default: provider default] LLM model to use. Provider defaults: gemini-2.5-pro, gpt-4o, claude-sonnet-4-5.
optional[text]
start_page
[Optional, default: first page] First page to process (1-based index).
optional[number]
end_page
[Optional, default: last page] Last page to process (1-based index).
optional[number]
start_page_marker
[Optional, default: None] Text/logic to find where each subdocument starts. Example: "INVOICE NUMBER"
optional[text]
end_page_marker
[Optional, default: None] Text/logic to find where each subdocument ends (inclusive). Requires start_page_marker.
optional[text]
excluded_end_page_marker
[Optional, default: None] Text/logic to find where each subdocument ends (exclusive). Requires start_page_marker.
optional[text]
subdocument_size
[Optional, default: None] Fixed number of pages per subdocument for chunking mode.
optional[number]
subdocument_overlap_size
[Optional, default: None] Number of overlapping pages between consecutive chunks. Requires subdocument_size.
optional[number]
dpi
[Optional, default: 150] DPI for image processing.
optional[number? or text]
Subdocuments extraction result
Result of extracting multiple subdocuments from a document.
result_type
Type discriminator for API response handling.
optional[text]
source_document
Reference to the source document.
optional[text]
Table extraction specification
Input for table extraction from documents.
description
[Required] Description of the table to extract. Example: "Extract the Previous Employment History table"
text
llm_model
[Optional, default: provider default] LLM model to use. Provider defaults: gemini-2.5-pro, gpt-4o, claude-sonnet-4-5.
optional[text]
dpi
[Optional, default: 150] DPI for image processing.
optional[number? or text]
confidence_threshold
[Optional, default: 0] Minimum confidence score (0-100) for the extracted table. If below threshold, raises ExtractionError. Set to 0 to disable threshold checking.
optional[number]
raise_exception
[Optional, default: True] If True, raises ExtractionError when confidence is below threshold. If False, returns the result with low confidence instead of raising.
optional[boolean]
verify
[Optional, default: False] If True, performs cross-model verification to detect hallucinations and validate extracted table data exists in the source document.
optional[boolean]
verification_strictness
[Optional, default: "high"] Strictness level for verification. Options: - "high": Zero tolerance, cell-by-cell exact matching - "moderate": Allows formatting differences - "low": Only flags clear hallucinations
optional[text]
Idp table extraction result
Wrapper for table extraction results.This provides a consistent API response format for extract_table_from_thing, with a result_type discriminator for UI rendering.
Merge document result
Result of merging pages or subdocuments into a document.
result_type
Type discriminator for API response handling.
optional[text]
document
The merged document as an IO object.
file
document_name
Name of the merged document.
optional[text]
source_count
Number of source pages/subdocuments that were merged.
optional[number]
Read content result
Result of reading/extracting text content from a document.
result_type
Type discriminator for API response handling.
optional[text]
content
The extracted text content.
optional[text]
source_document
Reference to the source document.
optional[text]
Concept attribute specifications
key_value_pairs (idp document analysis)
element_type
optional[text]
page_number
optional[number]
bounding_box
optional[json]
confidence
optional[number]
is_handwritten
optional[boolean]
key
optional[text]
value
number?] or optional[text
verification
optional[json]
forms (idp document analysis)
element_type
optional[text]
page_number
optional[number]
bounding_box
optional[json]
confidence
optional[number]
is_handwritten
optional[boolean]
title
optional[text]
fields
optional[json]
entities (idp document analysis)
element_type
optional[text]
page_number
optional[number]
bounding_box
optional[json]
confidence
optional[number]
is_handwritten
optional[boolean]
entity_type
optional[text]
text
optional[text]
normalized_value
optional[any?]
context
optional[text]
verification
optional[json]
text_blocks (idp document analysis)
element_type
optional[text]
page_number
optional[number]
bounding_box
optional[json]
confidence
optional[number]
is_handwritten
optional[boolean]
content
optional[text]
style
optional[text]
level
optional[number]
items
optional[text]
list_type
optional[text]
lists (idp document analysis)
element_type
optional[text]
page_number
optional[number]
bounding_box
optional[json]
confidence
optional[number]
is_handwritten
optional[boolean]
list_type
optional[text]
items
optional[text]
metrics (idp document analysis)
num_pages
optional[number]
bounding_box (idp extracted table)
x
optional[number]
y
optional[number]
width
optional[number]
height
optional[number]
verification (idp extracted table)
is_valid
optional[boolean]
adjusted_confidence
optional[number]
value_found_in_document
optional[boolean]
issues
optional[text]
suggested_correction
optional[text]
metrics (idp classification result)
num_pages
optional[number]
fields (field extraction specification)
name
text
format
optional[text]
rule
optional[text]
default_value
optional[any?]
metrics (extraction result)
num_pages
optional[number]
bounding_box (document field)
x
optional[number]
y
optional[number]
width
optional[number]
height
optional[number]
verification (document field)
is_valid
optional[boolean]
adjusted_confidence
optional[number]
value_found_in_document
optional[boolean]
issues
optional[text]
suggested_correction
optional[text]
metrics (page extraction result)
num_pages
optional[number]
metrics (subdocument extraction result)
num_pages
optional[number]
metrics (subdocuments extraction result)
num_pages
optional[number]
metrics (idp table extraction result)
num_pages
optional[number]
metrics (merge document result)
num_pages
optional[number]
metrics (read content result)
num_pages
optional[number]
Last updated
Was this helpful?

