Document Processing
Learn how to process documents and extract information from them in Kognitos.
Prerequisites
Learning the Document Processing Book
To begin extracting information from documents, follow these steps to first learn the Document Processing Book:
- In the left sidebar, click on Books.
- Use the search bar to find "Document Processing."
- Click on +Book to open the Add New Book pop-up.
- Click on Add to finish adding the book.
Introducing Documents and File Objects
To introduce a document or file object in your automation, use any of the following lines:
the file
get the file
use the file
the document
get the document
use the document
Which Keyword Should I Use?
For a detailed explanation of each keyword and how they differ, check out our Automation Basics guide.
These lines will raise a Question in Kognitos, prompting you to Please provide the file or document, as the system requires the specified object to proceed.
To Upload a Local File
- Click on Select a method to open the drop-down menu.
- Select Upload files. Refer to the table below for supported file types.
- Upload your file or document and click Submit.
After uploading, you can perform additional operations on the document or file within the platform.
Category | Supported File Types for Upload |
---|---|
Text Documents | .pdf, .docx |
ERP Document | .edi |
Image | .jpeg, .jpg, .png, .tif, .tiff |
Data | .txt, .json, .yml, .yaml, .csv |
Spreadsheets | .xlsx, .xls, .csv |
HTML | .html |
.eml | |
Audio | .mp3, .wav |
Operations for Document Processing
1. Extracting Fields from a Document
You can easily extract fields from a document.
Syntax
get the document's {field}
get the file's {field}
Input Parameters
field
:- The item from the document you wish to extract.
- Example:
invoice number
,id
,name
,date of birth
,supplier name
Examples
get the document's invoice number
get the document's tax id
get the file's supplier name
Don't Forget
the
!When writing your statements, remember to include the
the
keyword, as it's necessary to introduce a new field in your automation.
Video Demonstration
The video showcases how Kognitos simplifies extracting structured data from scanned documents using advanced document processing capabilities. By leveraging AI and tools like Amazon Textract, users can flexibly retrieve specific values, tables, or other data points with natural language commands.
What if Multiple Fields Share the Same Name?
If a document contains multiple fields with the same name, Kognitos will raise a Question prompting the user to select which value to use for an extracted field.
Example
Consider a document with the following information:
Customer Name: John Doe
Address: 1234 Elm Street Address: 5678 Oak Avenue
If your automation is written like this:
the document
get the document's address
Kognitos will raise a Question: Multiple values found for address, please pick one. To respond, you can select Pick selected value from the drop-down menu and choose the desired address to use.
To avoid this prompt, use relative indicators like first, second, or last to specify which value to use:
get the document's first address
get the document's last address
For added clarity, you can also rename the fields:
get the document's first address as the home address
get the document's last address as the business address
Don't Worry About Field Name Formats
Kognitos is flexible about extracting specific fields. For example, if a document contains a field called Trailer No., you can extract it by writing:
get the document's trailer number
Even though the field in the document may be labeled Trailer No., the automation recognizes and understands variations in naming, allowing you to refer to it as trailer number.
Handling Extraction Failures
The automation may sometimes fail to extract a field from a document. This can happen if a field does not exist or is unable to be found. In these cases, Kognitos will raise a Question asking you to Please provide the field. Below is a table outlining the available resolution options.
Resolution Option | Description |
---|---|
Write in answer | Manually enter a value for the requested field. |
Upload files | Upload a file for the required field. |
No value | Indicate that no value is needed at this time. |
Skip this step | Skip the field extraction step. |
Compute an answer | Open a Mini-Playground to test operations without affecting the main run. |
Retry | Retry the failed automation step, useful for issues like timeouts or slow APIs. |
Retry after an interval | Re-run the automation after a set period of time. |
Handling Exceptions
Learn more about handing exceptions in Kognitos with our exception handling guide.
2. Extracting Tables
Extracting tables from documents is simple using the get and table keywords.
Syntax
get the document's tables
If there are multiple sections in the document that resemble tables, the system will return them as a list. Below, we show you how to narrow down to a specific table.
Extracting a Specific Table
If a document contains multiple tables, you can specify which one to extract by referencing a column name:
get the document's tables whose columns contain "Column Name"
get the above as the items table
You can also use relative keywords like first, second, or last to target specific tables:
get the document's first table
get the document's third table
get the document's last table
For more precise filtering, you can use multiple column names:
get the document's table whose columns contain "Column1", "Column2", and "Column3"
Extracting Data With Directional Keywords
Directional keywords can be used with document extraction to pinpoint specific lines.
below
: Looks beneath the reference lineabove
: Looks on top of the reference lineleft
: Looks left of the reference lineright
: Looks right of the reference line
Example
get the document's first line which contains "recipe"
get the lines below that as the recipe text
Video Example: Extracting Information
This video walks through an example of extracting information from an SAP Sales Order.
Updated 15 days ago