Extract Data
Extracts data from texts, documents, or files using LLM models.
Overview
This procedure extracts data from texts, images, documents, and files. Using LLM models, it identifies and retrieves the text content, making it easy to access and work with the information in those documents.
Before using this procedure, ensure you have added the Document Processing Book to your agent. After learning the Book, make sure to create a new Playground for it to take effect.
Syntax
Below is a line-by-line overview of the automation syntax. Expand each line to learn more.
Examples
1. Extract Multiple Fields from a Document
extract data from the document
the dpi is 144
the openai model is "gpt-4o-latest"
the first field is "po number"
the first field's format is "string"
the first field's rule is "the po number has 10 characters"
the second field is "due date"
the second field's format is "date"
the second field's rule is "format should be DD/MM/YY"
2. Extract Data from Text
the text is "The amount for the invoice number 123456 is Rs.1000."
extract data from the text
the first field is "invoice numbers"
the first field's format is "number"
the second field is "invoice amount"
the second field's format is "string"
the second field's rule is "keep just the amount without the currency"
3. Extract Multiple Fields from Text (Using Default Values)
the text is "The invoice date is 21 jan 2023"
extract data from the text
the gemini model is "gemini-2.0-flash"
the common default value is 5678
the first field is "invoice number"
the first field's default value is 1234
the second field is "invoice date"
the third field is "invoice location"
the third field's default value is "San Jose"
the fourth field is "invoice amount"
Last updated
Was this helpful?