Document Processing

The Book to rule all other Books. Extracting data from documents, whoo!


Documents can be the hardest part of an automation to plan for. Luckily, Kognitos has a very flexible solution to extracting data off even your most unconventional document types.

Learning the Document Understanding Book

Navigate to the Department tab, and learn the Document Processing Book to be able to extract fields from documents.

Reading in Your Documents

To read in a document, after learning the Document Processing Book all you need to do is reference a document or file in your automation and upload a document using the prompt from Kognitos. For example writing any of the following lines will prompt Kognitos to ask you for a document:

get the document
use the document
the file
the document

Once your document has been uploaded to the playground you can start querying your file to extract fields and tables with ease.

Common Operations for Document Process

Extracting fields from Document

Kognitos makes it easy to extract specific fields from a document. Once the document has been read in you can write get the document's and then type in whatever commonly referred to name for that field is. For example:

get the document's invoice number
get the document's tax id

Don't forget the importance of the 's when you write your statements.

get the file's supplier name
get the document's patient name

What if there are multiple fields with the same name?

Let's say you have a document where the field address shows up twice in the same document. Whenever Kognitos detects multiple potential values, it will prompt the user to select which value they want to use. If the document follows a consistent format, you can use relative indicators like first or last to guide your automation without being prompted.

For example:

get the document's first address as the shipping address
get the document's last address as the receiver address

Don't get hung up on field name formats

Kognitos is pretty flexible when it comes to extracting specific fields. For example, if in a document there is a field called "Trailer No." if you wanted to extract that field all you would have to write it:

get the document's trailer number

Note that in the document it may be written Trailer No. but in the automation we could write trailer number and the automation is still smart enough to recognize this behavior.

Extracting Tables

Getting tables out of documents is easy using the get and table keywords! For example to extract all tables from a document you can run:

get the document's tables

When you write get the document's tables, if there are multiple parts of the document that resemble a table you will be returned them in a list. We show you below how you can narrow down your specific table.

Getting a Table With a Specific Column in It

Let's say there are multiple tables in a document, you can specify based off column name which table you want to extract to make it easier to assign:

get the document's tables whose columns contain "Description"
get the above as the line items table

You can filter this way with multiple column names too:

get the document's table whose columns contain "Description", "Date", and "Qty"

Getting a Table Based Off Location in the Document

You can also use the first, second, etc, and last keywords when pulling tables from a document. For example:

get the document's first table
get the document's third table
get the document's last table