SenseML
SenseML is a query language that lets you extract structured data from PDF documents. A field is the basic SenseML query unit for extracting a piece of document data. The output of a field is a JSON key-value pair that structures the extracted data.
Here's a simple example of a field:
{
"fields": [
{
"id": "name_of_output_key",
"anchor": "an anchor is some text to match. An anchor can be an array of matches",
"method": {
"id": "label",
"position": "below"
}
},
]
}
The following image shows this example in the Sensible app:
As the preceding image shows, here's the output of the example field:
{
"name_of_output_key": {
"type": "string",
"value": "Below the matching anchor, this is the data to extract. The anchor is a label for this data."
}
}
This example shows the following key concepts:
key | description |
---|---|
field | A query that extracts data in relationship to matched text. Its ID is the key for the extracted data. In this example, name_of_output_key . |
anchor | Matched text that helps narrow down a location in the PDF from which to extract data. In this example, "an anchor is some text to match..." . |
method | Defines how to expand out from the anchor and extract data. In this example, the Label method extracts data that's below the anchor ("position": "below" ). For a list of methods, see Methods. |
For a more complete SenseML example, see the SenseML introduction.
Updated about 9 hours ago