Working with .pdf Files

Overview

This manual is divided into several sections, each focusing on a specific aspect of the PDF Manipulation Library.

Introduction

Kognitos offers a powerful and intuitive way to interact with PDF files. This library enables users to perform a wide range of operations on PDF documents, including opening, retrieving information, setting and changing field values, saving, and removing duplicates. This section provides an overview of the PDF Manipulation Library and outlines the prerequisites for using it effectively.

Prerequisites for Using the PDF Manipulation Library

Before you can start using the PDF Manipulation Library, there are several prerequisites that need to be met:

  1. Access to PDF Files: Ensure you have access to the PDF files you intend to manipulate. These files can be stored locally or on cloud storage services like Amazon S3.
  2. Basic Understanding of PDF Structure: While the library simplifies interactions, having a basic understanding of PDF structure (e.g., fields, labels) will help you automate more effectively.

Once these prerequisites are met, you are ready to start leveraging the power of the PDF Manipulation Library to enhance your document management experience, making it more accessible, efficient, and customizable.


Setting Up PDF Manipulation

Opening a PDF

To interact with a PDF file using the PDF Manipulation Library, the first critical step is to open the PDF. This section outlines the process required to open a PDF file.

Required Information

To open a PDF, you will need the following information:

  1. PDF File Path or URL: The path or URL of the PDF file you want to open. This has to be a cloud storage URL.

Step-by-Step Process

  1. Specify the PDF File Path or URL: Ensure you have the correct path or URL to the PDF file.

    the string is "s3://mybucket/myfolder/mypdf.pdf"
    open the pdf at the string
    

    Replace s3://mybucket/myfolder/mypdf.pdf with your actual PDF file path or URL.

  2. Verify PDF Opening: After executing the above command, the library will attempt to open the PDF file using the provided path or URL. If the PDF is successfully opened, you can proceed with further operations like fetching information, setting values, or saving the PDF. If the opening fails, verify your file path or URL and ensure it is correct.


Working with PDF Information

Interacting with PDF information is a core functionality when using the PDF Manipulation Library. This section covers how to fetch data from PDF files, including labels and fields.

Getting Information from a PDF

Getting a Label

To retrieve a label from a PDF, use the following command:

get the pdf
get the pdf's label where
    the label is "Invoice Number"

Getting a Field

To retrieve a field from a PDF, use the following command:

get the pdf
get the pdf's field where
    the field contains "Date"

Setting or Changing Information in a PDF

This section covers how to set or change field values in a PDF file.

Setting a Field to a String

To set a field in a PDF to a string value, use the following command:

get the pdf
the field is "Date"
set the pdf's field to "2023-01-01"

Changing a Field to a String

To change a field in a PDF to a string value, use the following command:

get the pdf
the field is "Name"
change the pdf's field to "John Doe"

Setting a Field to a Number

To set a field in a PDF to a numeric value, use the following command:

get the pdf
the field name is "Total Amount"
set the pdf's field to 150

Changing a Field to a Number

To change a field in a PDF to a numeric value, use the following command:

get the pdf
the field name is "Page Count"
change the pdf's field to 5

Saving a PDF

This section covers how to save a PDF file after making changes.

Saving to a Local Path

To save a PDF to a local path, use the following command:

get the pdf
save the pdf to a file with
    the target is "/local/path/to/save/the/pdf"

Saving to an S3 URL

To save a PDF to an S3 URL, use the following command:

get the pdf
save the pdf to a file with
    the target is "s3://bucket-name/path/to/save/the/pdf"

Removing Duplicates from a PDF

This section covers how to remove duplicates from a PDF file.

Removing Duplicates with a Confidence Threshold

To remove duplicates from a PDF with a specified confidence threshold, use the following command:

get the pdf
the department's duplicate confidence threshold is 0.95
remove duplicates from the pdf

Removing Duplicates without a Confidence Threshold

To remove duplicates from a PDF without specifying a confidence threshold, use the following command:

get the pdf
remove duplicates from the pdf

Glossary of Terms

  • PDF (Portable Document Format): A file format developed by Adobe that allows documents to be presented in a manner independent of application software, hardware, and operating systems.
  • Field: An interactive element in a PDF form where users can enter data.
  • Label: A text element in a PDF that identifies or describes a field or section.
  • S3 (Amazon Simple Storage Service): A scalable object storage service provided by Amazon Web Services (AWS) for storing and retrieving any amount of data at any time.

By following these guidelines, you can effectively manage PDF files using the PDF Manipulation Library, from opening and retrieving information to setting values, saving, and removing duplicates as needed.