Text Manipulations

Overview

This manual is divided into several sections, each focusing on a specific aspect of Text Manipulation in Kognitos.

Introduction

User can perform a wide range of operations on text strings, including extraction, conversion, formatting, and more, on Kognitos.

Prerequisites for Using Text Manipulation in Kognitos

Before you can start using, there are two prerequisites that need to be met:

  1. Kognitos Account: You need to have an active Kognitos account. If you do not have one, you can request for one by writing to us at [email protected]
  2. Basic Understanding of Text Data: While Kognitos simplifies interactions, having a basic understanding of text data and common operations on strings will help you automate more effectively with Kognitos.

Text Manipulation and Extraction

To manipulate text or extract various elements from it, the basic structure is to first provide the text in double quotes and then add the operation you want Kognitos to perform on it.

πŸ“˜

To perform any action on a given text/string/sentence/paragraph, always add it inside double quotation marks " "


Extracting Elements from Text

To extract various elements from a given text, you can use the following commands.

Learn more about Document Extraction in Kognitos here

Extract lines from a text input

Consider this paragraph. - Today is a good day. It will be sunny. I will step out. To extract lines from this text input, use:

the text is "Today is a good day. It will be sunny. I will step out."
get the text's lines

Extract words from a text input

Consider this line. - Today is a good day. To extract words from this text input, use:

the text is "Today is a good day."
get the text's words

Extract characters from a text input

Consider this line. - Today is a good day. To extract characters from this text input, use:

the text is "Today is a good day."
get the text's characters

Extract URLs from a text input

Consider this line. - Visit https://www.kognitos.com To extract url from this text input, use:

the text is "Visit <https://www.kognitos.com>."
get the text's URL

Extract numbers from a text input

Consider this line. - I will run 5 laps of this field. To extract characters from this text input, use:

the text is "I will run 5 laps of this field."
get the text's number

Extract specific partial text/substrings from a text input

You can use regular expression to extract text/substrings from a text input. To specify the rules for the set of possible strings that you want to match and extract from your text in Kognitos see the example below. The rules are referred to as patterns in Kognitos. Add any Python regular expression in double quotes after pattern.

the thing is "substrings"
the thing's pattern is "error [0-9]+"
get the text's thing

Learn more about using Regular Expressions here


Extract Hashtags from Text

Consider this sentence "Check out #Python and #coding!"
To extract '#Python' & '#coding' from below text, use

the text is "Check out #Python and #coding!"
extract hashtags from the text

Or simply

extract hashtags from "Check out #Python and #coding!"

Extract Whole Numbers from Text

Consider this sentence "There are 4 apples and 5 oranges."
To extract '4' & '5' from below text, use

the text is "There are 4 apples and 5 oranges."
extract whole numbers from the text

Or simply

extract whole numbers from "There are 4 apples and 5 oranges."

Extract Percentages from Text

Consider this sentence "The project is 75% complete, with 25% remaining." To extract the percentages from this text use,

extract percentages from "The project is 75% complete, with 25% remaining."

The result should be: 75% & 25%


Extract Emails from Text

Extract the Emails from the text - For eg. this will extract '[email protected]' & '[email protected]' from ""Please contact us at [email protected] or [email protected] for further assistance."

Example:

the text is "Please contact us at [email protected] or [email protected] for further assistance."
extract emails from the text

Or simply use,

 extract emails from "Please contact us at [email protected] or [email protected] for further assistance."

Note: In this scenario text and string can be used interchangeably.

Start and End text check

To check if a string or a text starts with a specific substring, use the following commands:

the string is "Hello, world!"
if the string is started by "Hello" then
	<action>

the text is "Good morning, everyone!"
if the text is started by "Good" then
	<action>

To check if a string ends with a specific substring, use the following commands:

the string is "Hello, world!"  
if the string is terminated by "world!" then
	<action>

the text is "Hello, world!"  
if the text is ended by "world!" then
	<action>

Note: In this scenario text and string can be used interchangeably.

Text/String Length

To get the length of a string, use the following commands:

option1

get the string's length where
    the string is "Hello, world!"

option 2

the string is "BT 20000"
get the string's length

Note: This will count the spaces in between the words

Note: In this scenario text and string can be used interchangeably.

Text/String Case Conversion

To convert the case of a string, use the following commands:

To convert all characters in a string / text to lowercase characters use,

the title is "HELLO WORLD"
get the title's lowercase

the string is "hello world"
get the string's titlecase
the string is "this is a test"
get the string's titlecase
the string is "hello world"
the string's titlecase
the string is "good morning, everyone!"
the string's titlecase

Follow the below commands for converting to UPPERCASE characters

the message is "hello world"
get the message's uppercase

Follow below commands for converting to Titlecase (Capitalise each of the word) characters

the string is "good morning, everyone!"
the string's titlecase

Note: In this scenario text and string can be used interchangeably.

Find/Match a substring within a Text

To check if a string contains a specific substring or number, use the following commands:

Option 1

the message is "This is a test string"
if "test" is in the message then
	<action>

Option 2

the number is 123
the string is "The total is 12345"
if the number is in the string then
	<action>

Using Regular Expression

By using Regular expressions, you can perform partial matching of the strings to extract some substrings

Option 1

the string is "[email protected]"
the regular expression is "^[a-zA-Z0-9._%+-]+@[a-zA-Z0-9.-]+[a-zA-Z]{2,}$"
if the string is matched by the regular expression then
	<action>

Option 2

the text is "123-45-6789"
the regular expression is "^\d{3}-\d{2}-\d{4}$"
get the text's substring which matches the regular expression

Option 3

the regular expression is "[0-9]+"
the string is "The order number is 12345."
remove the regular expression from the string

Using Removal method

Below mentioned variations of the function can be utilised;

To remove all repetition of a word

the target is "Hello, world! world"
the object is "world"
remove the object from the target where
    the remove strategy is "all"

To remove first repetition of a word [second/third..etc can be used]

the text is "Hello, world! World world"
remove the object from the text where
  the remove strategy is "first"

Note - the removal word is case sensitive i.e - if we write 'world' as removal object but there is 'World' as a word in the text, then this will be ignored


Using Replacing method

Use the below commands to replace a string with another string within a lines or a sentence

Option 1

the text is "Hello, world!"
the thing is "world"
the string is "KOG"
replace "{the thing}" with the "{the string}" in the text

Option 2

the text is "I love programming in Python."
replace "Python" with "C" in the text

Replacing content in a word document

the template # <---- uploads a docx file
fill in the template
  the template marker is "<>" # <--- needed for any marker other than "{}", the default
  the date is the month
  the dollars is the rupees
  the colour is the style
  the value is "Steve"
use the answer as the rendered file

Splitting the Text [With and without Delimiter]

Split the text as per required delimiter based

With Delimiter

the string is "apple,banana,cherry"
split the string
    the delimiter is ","

Without Delimiter[default delimiter is blank space]

the string is "hello world"
split the string

Text/String to Number Conversion & Vice versa

To convert a string to a number, use the following command:

the string is "12345"
get the string as a number

In this scenario text has a different meaning. If you have an ID or code number say 12479 it maybe treated as a number (you can do mathematical operations on it) But since it is an ID/code you may wish for it to behave like a text. You can convert a number to string using the following:

To convert a number to a string, use the following command:

the number is 12345

get the number as a string

#OR

the number is a text

Date Conversion

To convert dates to different formats, use the following commands:

Convert date into the format to 'March 15, 2023'

the date is "2023-03-15"
convert the date to format with
    the format is "%B %d, %Y"

Convert date into the format to '2023-03-15'

the date is "2023-03-15"
convert the date to format with
    the format is "%B %d, %Y"

Convert date into the format similar to a iso format

Option 1

the date is "March 15, 2023"
convert the date to "iso format"

Option 2

the date is "15/03/2023"
the department's zone is "UTC"
convert the date to "iso format"

Text/String with Citation

To get a string with a citation, i.e. to add 'citation: ' word for any kind of reference to such docs; Adding details in the use the following command:

the string is "This is an example sentence."
get the string with a citation

Text/Strings Concatenation

To concatenate multiple strings with a specific delimiter, use the following commands:

Option 1

the strings are "apple", "banana", "cherry"
get strings with comma

Option 2

the strings are "New York", "Los Angeles", "Chicago"
get strings with dash

Join Texts with a Delimiter

the texts are "apple", "banana", "cherry"
the delimiter is ", "
join the texts with the delimiter

Trim Whitespace from Text

the text is "   Hello, World!   "
trim whitespace from the text

Or simply

trim whitespace from  "   Hello, World!   "

Markdown Headings Extraction

To extract headings from markdown text, use the following command:

the markdown text is "
# Heading 1
Some text under heading 1.
## Heading 2
Some more text under heading 2.
"
extract headings from the markdown text

Language Detection

Detecting Text Language

Detect the Language of a Text

the text is "Bonjour le monde"
detect the text's language

Get the Language of a Text

the text is "Hello World"
get the text's language

By following these guidelines, you can effectively manage and manipulate text data using the Kognitos Text Manipulation Library, from extracting elements and converting cases to matching patterns and concatenating strings.


Working with Number Operations

You can perform various functions on numeric data. If your string is behaving like text you may first have to convert it to a number. See the steps below

Number conversion from String

To convert a string representation into a decimal number, use the following utility function:

convert "123" to a number

Number Modification

Increase a Number by a Percentage

increase the number 100 by 20 percentage

Reduce a Number by a Percentage

reduce 100 by 20 percent

Round a Number to a Specific Precision

the number is 3.145234
round the number with
    the precision is 2

Number Sorting and Finding

Sort a List of Numbers

the numbers are 3, 1, 4, 1, 5, 9, 2, 6
sort the numbers

Find the Maximum Number in a List

the numbers are 34, 78, 12, 89, 23
find the maximum number in the numbers

Find the Minimum Number in a List

the numbers are 45, 22, 78, 3, 90
find the minimum number in numbers


Glossary of Terms

  • API (Application Programming Interface): A set of rules and protocols for building and interacting with software applications. Kognitos API allows external services to communicate with Kognitos to retrieve or modify data.
  • String: A sequence of characters used to represent text.
  • Regular Expression: A sequence of characters that define a search pattern, often used for string matching and manipulation.
  • Delimiter: A character or sequence of characters used to specify the boundary between separate, independent regions in plain text or other data streams.
  • ISO Format: A standardized format for representing dates and times, defined by the International Organization for Standardization (ISO).
  • String: A sequence of characters used to represent text.
  • Attribute: A property or characteristic of a string, such as its length or case.
  • Procedure: A defined sequence of actions or operations - Automation in Kognitos.

By leveraging the Kognitos Text Manipulation Library, users can perform a wide range of text operations efficiently and effectively, enhancing their data processing and automation capabilities.