File Conversion and Manipulation

Manipulation of different file types

You can convert files to different formats and manipulate then when working with Kognitos

Convert from Word (.Docx & .Doc) to PDF

Many tools require a document to be in PDF format for processing. Kognitos gives you the ability to convert the filetype from Word (.doc or .docx) to a portable document format (.pdf). Here is an example of a command that is used to convert a file as above:

the file is the document
read the file as a pdf

Convert a picture (.jpg / .png) to PDF

In many cases, a picture needs to be converted to a PDF prior to processing. This capability is build into Kognitos. Here is an example of how to convert a file to PDF:

convert a file to a pdf file with
	the file is the picture

Converting CSV to JSON

To convert a CSV file to JSON format, use the following function:

import new_library

csv_file_path = 'path/to/your/file.csv'
json_file_path = 'path/to/your/file.json'

new_library.convert_csv_to_json(csv_file_path, json_file_path)

Converting JSON to CSV

To convert a JSON file to CSV format, use the following function:

import new_library

json_file_path = 'path/to/your/file.json'
csv_file_path = 'path/to/your/file.csv'

new_library.convert_json_to_csv(json_file_path, csv_file_path)

Merge (Document)

On occasion, it will be necessary to combine multiple PDFs into a single document. Here is an example where all input files are merged into a single file:

get the attachments
get the above as the scanned documents
merge the scanned documents into a single document where
	the document name is "statements.pdf"

Data Extraction

Extracting Data from XML

To extract data from an XML file, use the following function:

import new_library

xml_file_path = 'path/to/your/file.xml'
data = new_library.extract_data_from_xml(xml_file_path)
print(data)

Text Processing

Transcribing Text

To transcribe text from an audio file, use the following function:

import new_library

audio_file_path = 'path/to/your/file.wav'
transcription = new_library.transcribe_audio(audio_file_path)
print(transcription)