Document Extraction: Key Components

Here we learn the Document Data Extraction - IDP work

Where to start

🚧

Before starting to build a process involving document understanding and extraction, make sure you learn the OCR book in your department before starting.

How Can I Extract Data From a Document?

Tools you can utilize to extract information from a document

  • Query
  • Tables
  • Other ways to extract information from a document
    • Lines based
    • GPT(Large Language Model)
    • Labels
get the document
get the document's "invoice number", "total", "invoice date"

Choosing an OCR Engine

Kognitos makes it extremely easy to apply OCR to a document. Our out of the box OCR engines include:

  • AWS Textract
  • Google
  • Azure
  • Open AI

As long as you have these Books learnt, you can specify which OCR you would like to use. Unless specified otherwise we will use Textract by default. You might learn more in the upcoming pages.