Fallback strategies
Use Sensible's fallback mechanisms to solve missing or inaccurately extracted data. You can set fallbacks at different levels of granularity:
- You can fall back from one field to another, allowing you to try multiple methods or prompts to extract a target piece of data from a single document.
- You can fall back from one config to another, allowing you to try multiple configs on a single document and pick the winning config. Specify fallback configs using fingerprints.
For examples of both strategies, see the following sections.
Fallback fields
Use fallback fields to define alternate methods of extracting the same target data. This enables you to use a single configuration to extract data from similar documents whose formatting can vary.
Example 1
If a company's explanation of benefits lists the patient name near the phrase "received for" in other cases and near the phrase "claimant" in others, you can write fallbacks like the following:
{
"fields": [
{
/* first look for patient name near phrase "recieved for" */
"id": "patient_name",
"anchor": "recieved for"
"method": {
"id": "label",
"position": "right"
}
},
/* if that fails, look for it in a row near the phrase "claimant" */
{
"id": "patient_name",
"anchor": "claimant"
"method": {
"id": "row",
"position": "right"
}
},
]
}
Example 2: LLM prompt fallbacks
Sometimes a field works for the majority of documents in a document type, but returns null or an inaccurate response (a "false positive") for a minority of documents. This situation is most common with LLM-based methods. Rather than rewrite the LLM prompt, which can cause regressions, create fallbacks targeted at the failing documents.
For example, you parse automotive repair invoices. For most auto shops' invoices, the prompt total parts price
extracts a total price. For Andy & Son's car shop's invoices, this prompt returns null
or it returns an inaccurate response, for example, a subtotal. Through experimenting, you find a successful prompt: What is the 'total parts price'. The total parts price will be labeled 'total parts' or something semantically similar. It's not a value that can be summed from the line items on the invoice.
To create a fallback field for Andy & Son's shop, create two fields with the same ID:
{
"fields": [
/*
fallback fields in query groups must be defined in
separate groups, so define two single-member groups
*/
{
/* if Sensible doesn't find the anchor text
"Andy & Son's" in the invoice,
it returns null for the parts_total_price field in this group
and falls back to the parts_total_price in the next group.
Use this fallback behavior to create a detailed prompt for
Andy & Son's invoices and more general prompt
for all other invoices*/
"anchor": "Andy & Son's",
"method": {
"id": "queryGroup",
"queries": [
{
"id": "parts_total_price",
"description": "What is the 'total parts price'. The total parts price will be labeled 'total parts' or something semantically similar. It's not a value that can be summed from the line items on the invoice",
"type": "string"
}
]
}
},
{
"method": {
"id": "queryGroup",
"queries": [
{
/* this field runs only if the previous
parts_total_price field returns null */
"id": "parts_total_price",
"description": "total parts price",
"type": "string"
}
]
}
}
]
}
Fallback fields can be of any kind. For example, you can fallback from an LLM-based field to a layout-based field, or from a computed field to a section group. For more information, see Field query object.
Fallback configs
Use fallback configs to capture long-tail document variations in a document type.
Example 1
For example, say you extract data from automotive repair invoices. You have high volume from 5 auto shops, and a long tail of low-volume invoices from hundreds of other shops. In this case, define a layout-based config for each of your top 5 auto shops to take advantage of layout-based methods' speed and deterministic behavior, and define one catch-all LLM-based config for the long tail.
-
To define a layout-based config for each of your top 5 auto shops, take the following steps:
- Define fingerprints for each auto shop's config. For example, if Andy and Son's is one of your top 5 shops, then include fingerprints for phrases that occur in those invoices:
{ "fingerprint": { "tests": [ { "text": "ANDY AND SON'S", "type": "includes", "isCaseSensitive": true } ] },
-
Leverage the consistent formatting in each of the top vendors to extract data. For example, if Andy and Son's always labels the repaired vehicle's VIN number with the text
VIN #:
, then define a field similar to the following:{ "fields": [ { "id": "vehicle_VIN", "anchor": "VIN #:", "method": { "id": "label", "position": "below" } } ] }
-
To define an LLM-based config for the long tail, take the following steps:
-
Don't define fingerprints.
-
Define the same field IDs as in previous configs using LLM-based methods. For example:
{ "method": { "id": "queryGroup", "queries": [ { "id": "vehicle_VIN", "description": "vehicle VIN" } ] } }
-
Updated 9 days ago