Document Processing

Learn how to process documents and extract information from them in Kognitos.

Prerequisites

Learning the Document Processing Book

To begin extracting information from documents, follow these steps to first learn the Document Processing Book:

  1. In the left sidebar, click on Books.
  2. Use the search bar to find "Document Processing."
  3. Click on +Book to open the Add New Book pop-up.
  4. Click on Add to finish adding the book.
Add the Document Processing Book

Introducing Documents and File Objects

To introduce a document or file object in your automation, use any of the following lines:

the file
get the file
use the file
the document
get the document
use the document

🤔

Which Keyword Should I Use?

For a detailed explanation of each keyword and how they differ, check out our Automation Basics guide.

These lines will raise a Question in Kognitos, prompting you to Please provide the file or document, as the system requires the specified object to proceed.

To Upload a Local File

  1. Click on Select a method to open the drop-down menu.
  2. Select Upload files. Refer to the table below for supported file types.
  3. Upload your file or document and click Submit.

After uploading, you can perform additional operations on the document or file within the platform.

CategorySupported File Types for Upload
Text Documents.pdf, .docx
ERP Document.edi
Image.jpeg, .jpg, .png, .tif, .tiff
Data.txt, .json, .yml, .yaml, .csv
Spreadsheets.xlsx, .xls, .csv
HTML.html
Email.eml
Audio.mp3, .wav

Operations for Document Processing

1. Extracting Fields from a Document

You can easily extract fields from a document.

Syntax

get the document's {field}
get the file's {field}

Input Parameters

  1. field:
    • The item from the document you wish to extract.
    • Example: invoice number, id, name, date of birth, supplier name

Examples

get the document's invoice number
get the document's tax id
get the file's supplier name

🚧

Don't Forget the!

When writing your statements, remember to include the the keyword, as it's necessary to introduce a new field in your automation.

Video Demonstration

The video showcases how Kognitos simplifies extracting structured data from scanned documents using advanced document processing capabilities. By leveraging AI and tools like Amazon Textract, users can flexibly retrieve specific values, tables, or other data points with natural language commands.

What if Multiple Fields Share the Same Name?

If a document contains multiple fields with the same name, Kognitos will raise a Question prompting the user to select which value to use for an extracted field.

Example

Consider a document with the following information:

Customer Name: John Doe
Address: 1234 Elm Street Address: 5678 Oak Avenue

If your automation is written like this:

the document
get the document's address

Kognitos will raise a Question: Multiple values found for address, please pick one. To respond, you can select Pick selected value from the drop-down menu and choose the desired address to use.

To avoid this prompt, use relative indicators like first, second, or last to specify which value to use:

get the document's first address
get the document's last address

For added clarity, you can also rename the fields:

get the document's first address as the home address
get the document's last address as the business address

Don't Worry About Field Name Formats

Kognitos is flexible about extracting specific fields. For example, if a document contains a field called Trailer No., you can extract it by writing:

get the document's trailer number

Even though the field in the document may be labeled Trailer No., the automation recognizes and understands variations in naming, allowing you to refer to it as trailer number.

Handling Extraction Failures

The automation may sometimes fail to extract a field from a document. This can happen if a field does not exist or is unable to be found. In these cases, Kognitos will raise a Question asking you to Please provide the field. Below is a table outlining the available resolution options.

Resolution OptionDescription
Write in answerManually enter a value for the requested field.
Upload filesUpload a file for the required field.
No valueIndicate that no value is needed at this time.
Skip this stepSkip the field extraction step.
Compute an answerOpen a Mini-Playground to test operations without affecting the main run.
RetryRetry the failed automation step, useful for issues like timeouts or slow APIs.
Retry after an intervalRe-run the automation after a set period of time.

📘

Handling Exceptions

Learn more about handing exceptions in Kognitos with our exception handling guide.


2. Extracting Tables

Extracting tables from documents is simple using the get and table keywords.

Syntax

get the document's tables

If there are multiple sections in the document that resemble tables, the system will return them as a list. Below, we show you how to narrow down to a specific table.

Extracting a Specific Table

If a document contains multiple tables, you can specify which one to extract by referencing a column name:

get the document's tables whose columns contain "Column Name"
get the above as the items table

You can also use relative keywords like first, second, or last to target specific tables:

get the document's first table
get the document's third table
get the document's last table

For more precise filtering, you can use multiple column names:

get the document's table whose columns contain "Column1", "Column2", and "Column3"

Extracting Data With Directional Keywords

Directional keywords can be used with document extraction to pinpoint specific lines.

  • below: Looks beneath the reference line
  • above: Looks on top of the reference line
  • left: Looks left of the reference line
  • right: Looks right of the reference line

Example

get the document's first line which contains "recipe"
get the lines below that as the recipe text

Video Example: Extracting Information

This video walks through an example of extracting information from an SAP Sales Order.