PDF

Learn how to operate on PDF files in Kognitos.

PDF Operations

Getting Information from a PDF

Getting a Label

To retrieve a label from a PDF, use the following syntax:

get the pdf
get the pdf's label where
    the label is "Invoice Number"

Getting a Field

To retrieve a field from a PDF, use the following syntax:

get the pdf
get the pdf's field where
    the field contains "Date"

Setting or Changing Information in a PDF

This section covers how to set or change field values in a PDF file.

Setting a Field to a String

To set a field in a PDF to a string value, use the following syntax:

get the pdf
the field is "Date"
set the pdf's field to "2023-01-01"

Changing a Field to a String

To change a field in a PDF to a string value, use the following syntax:

get the pdf
the field is "Name"
change the pdf's field to "John Doe"

Setting a Field to a Number

To set a field in a PDF to a numeric value, use the following syntax:

get the pdf
the field name is "Total Amount"
set the pdf's field to 150

Changing a Field to a Number

To change a field in a PDF to a numeric value, use the following syntax:

get the pdf
the field name is "Page Count"
change the pdf's field to 5

Saving a PDF

This section covers how to save a PDF file after making changes.

Saving to a Local Path

To save a PDF to a local path, use the following syntax:

get the pdf
save the pdf to a file with
    the target is "/local/path/to/save/the/pdf"

Saving to an S3 URL

To save a PDF to an S3 URL, use the following syntax:

get the pdf
save the pdf to a file with
    the target is "s3://bucket-name/path/to/save/the/pdf"

Removing Duplicates from a PDF

This section covers how to remove duplicates from a PDF file.

Removing Duplicates with a Confidence Threshold

To remove duplicates from a PDF with a specified confidence threshold, use the following syntax:

get the pdf
the department's duplicate confidence threshold is 0.95
remove duplicates from the pdf

Removing Duplicates without a Confidence Threshold

To remove duplicates from a PDF without specifying a confidence threshold, use the following syntax:

get the pdf
remove duplicates from the pdf

Convert from Word Document to PDF

This operation allows you to convert from Word (.doc or .docx) to portable document format (.pdf):

the file is the document
read the file as a pdf

Convert Picture to PDF

To convert a picture (.jpg / .png) to portable document format (.pdf):

convert a file to a pdf file with
	the file is the picture

Merge (Document)

This operation combines multiple PDFs into a single PDF file:

get the attachments
get the above as the scanned documents
merge the scanned documents into a single document where
	the document name is "statements.pdf"

Working with .tif and .tiff Files

To work with .tif or .tiff files in Kognitos, you can convert them to PDF format and use PDF operations on them. To convert to PDF, use the following syntax:

read a file as a pdf