Text Manipulations
Overview
This manual is divided into several sections, each focusing on a specific aspect of Text Manipulation in Kognitos.
Introduction
User can perform a wide range of operations on text strings, including extraction, conversion, formatting, and more, on Kognitos.
Prerequisites for Using Text Manipulation in Kognitos
Before you can start using, there are two prerequisites that need to be met:
- Kognitos Account: You need to have an active Kognitos account. If you do not have one, you can request for one by writing to us at [email protected]
- Basic Understanding of Text Data: While Kognitos simplifies interactions, having a basic understanding of text data and common operations on strings will help you automate more effectively with Kognitos.
Text Manipulation and Extraction
To manipulate text or extract various elements from it, the basic structure is to first provide the text in double quotes and then add the operation you want Kognitos to perform on it.
To perform any action on a given text/string/sentence/paragraph, always add it inside double quotation marks " "
Extracting Elements from Text
To extract various elements from a given text, you can use the following commands.
Learn more about Document Extraction in Kognitos here
Extract lines from a text input
Consider this paragraph. - Today is a good day. It will be sunny. I will step out. To extract lines from this text input, use:
the text is "Today is a good day. It will be sunny. I will step out."
get the text's lines
Extract words from a text input
Consider this line. - Today is a good day. To extract words from this text input, use:
the text is "Today is a good day."
get the text's words
Extract characters from a text input
Consider this line. - Today is a good day. To extract characters from this text input, use:
the text is "Today is a good day."
get the text's characters
Extract URLs from a text input
Consider this line. - Visit https://www.kognitos.com To extract url from this text input, use:
the text is "Visit <https://www.kognitos.com>."
get the text's URL
Extract numbers from a text input
Consider this line. - I will run 5 laps of this field. To extract characters from this text input, use:
the text is "I will run 5 laps of this field."
get the text's number
Extract specific partial text/substrings from a text input
You can use regular expression to extract text/substrings from a text input. To specify the rules for the set of possible strings that you want to match and extract from your text in Kognitos see the example below. The rules are referred to as patterns in Kognitos. Add any Python regular expression in double quotes after pattern.
the thing is "substrings"
the thing's pattern is "error [0-9]+"
get the text's thing
Learn more about using Regular Expressions here
Extract Hashtags from Text
Consider this sentence "Check out #Python and #coding!"
To extract '#Python' & '#coding' from below text, use
the text is "Check out #Python and #coding!"
extract hashtags from the text
Or simply
extract hashtags from "Check out #Python and #coding!"
Extract Whole Numbers from Text
Consider this sentence "There are 4 apples and 5 oranges."
To extract '4' & '5' from below text, use
the text is "There are 4 apples and 5 oranges."
extract whole numbers from the text
Or simply
extract whole numbers from "There are 4 apples and 5 oranges."
Extract Percentages from Text
Consider this sentence "The project is 75% complete, with 25% remaining." To extract the percentages from this text use,
extract percentages from "The project is 75% complete, with 25% remaining."
The result should be: 75% & 25%
Extract Emails from Text
Extract the Emails from the text - For eg. this will extract '[email protected]' & '[email protected]' from ""Please contact us at [email protected] or [email protected] for further assistance."
Example:
the text is "Please contact us at [email protected] or [email protected] for further assistance."
extract emails from the text
Or simply use,
extract emails from "Please contact us at [email protected] or [email protected] for further assistance."
Note: In this scenario text and string can be used interchangeably.
Start and End text check
To check if a string or a text starts with a specific substring, use the following commands:
the string is "Hello, world!"
if the string is started by "Hello" then
<action>
the text is "Good morning, everyone!"
if the text is started by "Good" then
<action>
To check if a string ends with a specific substring, use the following commands:
the string is "Hello, world!"
if the string is terminated by "world!" then
<action>
the text is "Hello, world!"
if the text is ended by "world!" then
<action>
Note: In this scenario text and string can be used interchangeably.
Text/String Length
To get the length of a string, use the following commands:
option1
get the string's length where
the string is "Hello, world!"
option 2
the string is "BT 20000"
get the string's length
Note: This will count the spaces in between the words
Note: In this scenario text and string can be used interchangeably.
Text/String Case Conversion
To convert the case of a string, use the following commands:
To convert all characters in a string / text to lowercase characters use,
the title is "HELLO WORLD"
get the title's lowercase
the string is "hello world"
get the string's titlecase
the string is "this is a test"
get the string's titlecase
the string is "hello world"
the string's titlecase
the string is "good morning, everyone!"
the string's titlecase
Follow the below commands for converting to UPPERCASE characters
the message is "hello world"
get the message's uppercase
Follow below commands for converting to Titlecase (Capitalise each of the word) characters
the string is "good morning, everyone!"
the string's titlecase
Note: In this scenario text and string can be used interchangeably.
Find/Match a substring within a Text
To check if a string contains a specific substring or number, use the following commands:
Option 1
the message is "This is a test string"
if "test" is in the message then
<action>
Option 2
the number is 123
the string is "The total is 12345"
if the number is in the string then
<action>
Using Regular Expression
By using Regular expressions, you can perform partial matching of the strings to extract some substrings
Option 1
the string is "[email protected]"
the regular expression is "^[a-zA-Z0-9._%+-]+@[a-zA-Z0-9.-]+[a-zA-Z]{2,}$"
if the string is matched by the regular expression then
<action>
Option 2
the text is "123-45-6789"
the regular expression is "^\d{3}-\d{2}-\d{4}$"
get the text's substring which matches the regular expression
Option 3
the regular expression is "[0-9]+"
the string is "The order number is 12345."
remove the regular expression from the string
Using Removal method
Below mentioned variations of the function can be utilised;
To remove all repetition of a word
the target is "Hello, world! world"
the object is "world"
remove the object from the target where
the remove strategy is "all"
To remove first repetition of a word [second/third..etc can be used]
the text is "Hello, world! World world"
remove the object from the text where
the remove strategy is "first"
Note - the removal word is case sensitive i.e - if we write 'world
' as removal object but there is 'World
' as a word in the text, then this will be ignored
Using Replacing method
Use the below commands to replace a string with another string within a lines or a sentence
Option 1
the text is "Hello, world!"
the thing is "world"
the string is "KOG"
replace "{the thing}" with the "{the string}" in the text
Option 2
the text is "I love programming in Python."
replace "Python" with "C" in the text
Replacing content in a word document
the template # <---- uploads a docx file
fill in the template
the template marker is "<>" # <--- needed for any marker other than "{}", the default
the date is the month
the dollars is the rupees
the colour is the style
the value is "Steve"
use the answer as the rendered file
Splitting the Text [With and without Delimiter]
Split the text as per required delimiter based
With Delimiter
the string is "apple,banana,cherry"
split the string
the delimiter is ","
Without Delimiter[default delimiter is blank space]
the string is "hello world"
split the string
Text/String to Number Conversion & Vice versa
To convert a string to a number, use the following command:
the string is "12345"
get the string as a number
In this scenario text has a different meaning. If you have an ID or code number say 12479 it maybe treated as a number (you can do mathematical operations on it) But since it is an ID/code you may wish for it to behave like a text. You can convert a number to string using the following:
To convert a number to a string, use the following command:
the number is 12345
get the number as a string
#OR
the number is a text
Date Conversion
To convert dates to different formats, use the following commands:
Convert date into the format to 'March 15, 2023'
the date is "2023-03-15"
convert the date to format with
the format is "%B %d, %Y"
Convert date into the format to '2023-03-15'
the date is "2023-03-15"
convert the date to format with
the format is "%B %d, %Y"
Convert date into the format similar to a iso format
Option 1
the date is "March 15, 2023"
convert the date to "iso format"
Option 2
the date is "15/03/2023"
the department's zone is "UTC"
convert the date to "iso format"
Text/String with Citation
To get a string with a citation, i.e. to add 'citation: ' word for any kind of reference to such docs; Adding details in the use the following command:
the string is "This is an example sentence."
get the string with a citation
Text/Strings Concatenation
To concatenate multiple strings with a specific delimiter, use the following commands:
Option 1
the strings are "apple", "banana", "cherry"
get strings with comma
Option 2
the strings are "New York", "Los Angeles", "Chicago"
get strings with dash
Join Texts with a Delimiter
the texts are "apple", "banana", "cherry"
the delimiter is ", "
join the texts with the delimiter
Trim Whitespace from Text
the text is " Hello, World! "
trim whitespace from the text
Or simply
trim whitespace from " Hello, World! "
Markdown Headings Extraction
To extract headings from markdown text, use the following command:
the markdown text is "
# Heading 1
Some text under heading 1.
## Heading 2
Some more text under heading 2.
"
extract headings from the markdown text
Language Detection
Detecting Text Language
Detect the Language of a Text
the text is "Bonjour le monde"
detect the text's language
Get the Language of a Text
the text is "Hello World"
get the text's language
By following these guidelines, you can effectively manage and manipulate text data using the Kognitos Text Manipulation Library, from extracting elements and converting cases to matching patterns and concatenating strings.
Working with Number Operations
You can perform various functions on numeric data. If your string is behaving like text you may first have to convert it to a number. See the steps below
Number conversion from String
To convert a string representation into a decimal number, use the following utility function:
convert "123" to a number
Number Modification
Increase a Number by a Percentage
increase the number 100 by 20 percentage
Reduce a Number by a Percentage
reduce 100 by 20 percent
Round a Number to a Specific Precision
the number is 3.145234
round the number with
the precision is 2
Number Sorting and Finding
Sort a List of Numbers
the numbers are 3, 1, 4, 1, 5, 9, 2, 6
sort the numbers
Find the Maximum Number in a List
the numbers are 34, 78, 12, 89, 23
find the maximum number in the numbers
Find the Minimum Number in a List
the numbers are 45, 22, 78, 3, 90
find the minimum number in numbers
Glossary of Terms
- API (Application Programming Interface): A set of rules and protocols for building and interacting with software applications. Kognitos API allows external services to communicate with Kognitos to retrieve or modify data.
- String: A sequence of characters used to represent text.
- Regular Expression: A sequence of characters that define a search pattern, often used for string matching and manipulation.
- Delimiter: A character or sequence of characters used to specify the boundary between separate, independent regions in plain text or other data streams.
- ISO Format: A standardized format for representing dates and times, defined by the International Organization for Standardization (ISO).
- String: A sequence of characters used to represent text.
- Attribute: A property or characteristic of a string, such as its length or case.
- Procedure: A defined sequence of actions or operations - Automation in Kognitos.
By leveraging the Kognitos Text Manipulation Library, users can perform a wide range of text operations efficiently and effectively, enhancing their data processing and automation capabilities.
Updated 12 days ago