Creating contextual learnings based on the document

About Context Based Learning

Context Based Learning (CBL) is a powerful feature that uses text embeddings to streamline your document handling process. It translates key information in documents, such as field names, column names, and large words, into numerical values. These values can then be compared between two documents to determine their similarity.

CBL is able to gauge similarity between documents, accurately detect distinct document types and raise an exception if a new document type is found and .

This technology allows CBL to detect different document (eg: from different vendors previously processed by our system) and apply specific learnings based on the document type.

How is this feature useful?

  • Automatically classifies between different document types
  • Handles new document types on-demand, without the need for manual procedure modification.
  • Eliminates the need for complex if-else structures, reducing process length and complexity.
  • Provides learning suggestions for incoming exceptions, making your workflow even smoother.
  • Dynamic learning - it learns from user interactions and improves over time

The goal of CBL is to make your document handling process as efficient and accurate as possible. It's designed to learn and adapt based on the documents it encounters, continually improving its performance over time.

2. Confidence Score in CBL

In CBL, a confidence score is used to determine the similarity between documents. This score is essentially a measure of how "identical" two documents are, based on the numerical values assigned to their key information through vector embeddings.

The default confidence score set in CBL is 95%. This means that for a learning to be applied, the match between the current document and the learned document must be 95% or higher. This high threshold ensures that the applied learnings are highly relevant and accurate. We have prioritized tuning our internal logic to yield a higher percentage match, rather than lowering the score. However, if you encounter a need to adjust this score, here’s how you can do it.

Manual configuration of the confidence score

Add the steps below to specify a custom confidence score

the minimum document similarity is 0.92

Above will set the similarity score to 0.92, i.e. the matching threshold would be 0.92 instead of 0.95.

Note: Its advisable to not lower the similarity than 0.90 unless absolutely sure what you are doing. This will add in false positives.

Remember, the goal of the confidence score is to ensure the accuracy and efficiency of CBL. It's a crucial part of how CBL learns and adapts to handle your documents effectively.


3. How to Use CBL

3.1 Setup

To start using CBL, follow these steps:

  • Make sure to include the line "get the file as a scanned document" in your procedure.

The modifier “scanned document” ensure that CBL will be applied for that line.

3.2 Writing Procedures

When writing procedures, ensure all document extractions are written as a single line. The extractions or 'gets' should always raise an exception. For example:

get the attachment as a scanned document
imagine the document’s ocr info
get the document’s ocr info’s billing number
get the document’s ocr info’s shipper
get the document’s ocr info’s shipping date

CBL works by detecting the presence of a document in any exception. So the word “document” should always be present in that line

3.3 Handling Exceptions

In the event of an exception, follow these steps:

  1. If this is the first instance with no learnings, select the answer type from the drop-down. If you want to teach a method - go to "Compute an answer".

    Note: You can also choose "Write Answer", "Skip", etc. options.

  2. This will open a mini playground, a smaller version of our procedure run: Try and run your method in the mini-playground.

  3. If the result is as expected, click Next. It will show the document if its a CBL based exception. You can learn the answer as usual or just the answer for that instance

  4. Next time, when the exception occurs for a similar document type (context), at the same step-path, the learning will be applied for the document automatically.

Note: CBL will be applied to a learning if the exception occurs for any step which contains “document” as a key word (except “get the file as a scanned document”)

4. Modifying CBL Learnings

CBL is designed to learn and adapt over time, but there may be instances where it doesn't yield the expected results. In such cases, you have the ability to manually correct it.

Here's how:

  1. If a learning is applied using CBL and it fails to yield a value (learning fails): You can manually correct it by providing the value directly or trying out a new learning in the mini playground.
  2. If a learning is applied using CBL and yields an invalid value: This might have downstream effects where it will fail. In such cases, consider editing the technique to ensure it deals with both previous cases as well as the new ones.

Remember, the goal of CBL is to continually improve its performance over time. Your manual corrections play a crucial role in this learning process. If you encounter any other cases, don't hesitate to let our engineering team know!

5. Compatibility of CBL

CBL currently supports is designed to work with OCR documents processed with AWS Textract and Azure Form Recognizer.

Remember, the goal of CBL is to streamline your document handling process, regardless of the document type. As we continue to develop and improve CBL, we aim to expand its compatibility to include even more document types and data structures.