Welcome! Sensible is a developer-first platform for extracting structured data from documents, for example, business forms in PDF format. Use Sensible to build document-automation features into your vertical SaaS products.

With Sensible's SenseML language, you can write extraction queries for any document:

Click to enlarge

And get back key facts as JSON:

{
    "street_address": {
        "value": "1234 ABC COURT",
        "type": "address"
    },
    "included_appliances": [
        {
            "value": "washers",
            "type": "string"
        },
        {
            "value": "dryers",
            "type": "string"
        },
        {
            "value": "refrigerators",
            "type": "string"
        }
    ]
}

Sensible is highly configurable. You can extract data in minutes by leveraging GPT-4 and other large language models (LLMs), or you can get fine-grained control with Sensible's visual, layout-based rules. By combining layout- and LLM-based extraction methods, Sensible supports the entire document landscape, from consistently laid-out, highly structured business forms to free-form, variable legal contracts :

Click to enlarge

Configurable data extraction

Configure your extractions using SenseML, Sensible's document-specific query language. SenseML combines the latest LLM techniques with visual layout-based rules to extract document primitives like rows, tables, checkboxes, sections, and more as JSON.

Click to enlarge

With SenseML, you can:

  • Preprocess documents by correcting layout metadata problems, removing unwanted pages, and more, so that Sensible has a clean, standardized text representation of the document from which to extract structured data in a later step. For more information, see Preprocessors.
  • Use "methods" to extract document primitives, like rows, columns, tables, boxes, checkbox status, and more. You can also parse extracted data types like currencies, dates, addresses, or your custom types. For more information, see Layout-based methods.
  • Post-process extracted document data. For example:

A field is the basic SenseML query unit for extracting a piece of document data. The output of a field is a JSON key-value pair that structures the extracted data. SenseML is the basis for Sensible's extraction workflow.

Here's an example of a field that extracts a table:

Click to enlarge

For more information about SenseML, see SenseML reference introduction.

Devops platform for document data extraction

See the following image for a high-level overview of Sensible's document data extraction workflow:

Click to enlarge

For more information about this diagram, see Devops platform.

Learn more

To use the Sensible platform, you'll: