Field query object
A field is the basic SenseML query unit for extracting a piece of document data. The output of a field is a JSON key-value pair that structures the extracted data.
Here is a simple example of a field:
{
"fields": [
{
"id": "name_of_output_key",
"anchor": "an anchor is some text to match. An anchor can be an array of matches",
"method": {
"id": "label",
"position": "below"
}
},
]
}
The following image shows this example in the Sensible app:
As the preceding image shows, this is the output of the example field:
{
"name_of_output_key": {
"type": "string",
"value": "Below the matching anchor, this is the data to extract. The anchor is a label for this data."
}
}
Parameters
The Field object has the following top-level parameters:
Parameter | Value | Description |
---|---|---|
id (required) | string | Sensible uses the ID as the key in the structured key/value output. In the API response, this output is in the parsed_document section.To specify fallbacks, use the same ID in multiple fields. Succeeding fields act as fallbacks if the first returns null. For example, to capture differences in wording between document revisions, define two fields with the same ID, which anchor on synonymous text that may be present or absent in different document revisions. |
anchor (required) | object | The anchor identifies one or more lines of text in the document at which Sensible starts executing a method. Can be a string, Match object, or array of Match objects. For more information, see Anchor object and Match object. |
method (required) | object | The method describes how Sensible expands from the anchor and extracts the target data. For more information, see Methods and Method object. |
type | see Types | The data type to extract, for example, a currency, an address, or a custom type you define. This structured output includes the type information. If the field captures other data in addition to the data matching the type, Sensible suppresses the additional data from the output. For more information, see Types. |
match | first ,last ,all , mostFrequent | If there are multiple matches for the anchor, specifies which one to use. This parameter applies to the anchor's Match parameter, not to the Start or Stop parameters. - all returns an array of matched values under a single key. For example, something like: { "name_of_output_key": [ { "type": "string", "value": "extracted data for first match" }, { "type": "string", "value": "extracted data for second match" } ] } - mostFrequent returns the most frequently matched value. This is useful for OCR text, like poor-quality scans or photographs. For example, if a scanned document contains repeated data for a field anchored on "1 Wages", but due to OCR errors the matched values are 21050.20 , 21850.20 , 21850.20 and 21850.58, this option returns the most frequent, and therefore the mostly likely correct value, 21850.20 . |
Examples
The following example shows all the top-level parameters of the Field object:
{
"fields": [
{
"id": "name_of_output_key",
"anchor": "text to match",
"type":"accountingCurrency",
"match":"last",
"method": {
"id": "row",
"position": "right",
}
}
],
}
Next
The Field object contains:
Updated about 1 month ago
Did this page help you?