October 2021

The last month marked a large release, with support for extracting multiple documents out of a single PDF, a new feature for extracting complex, repeating documents sections, and several other enhancements.

New feature: PDF portfolios

You can now extract multiple documents packaged into a single PDF (a PDF "portfolio") using just one API call to multiple document types. This allows you to write validations for each document in the portfolio separately. Just list the types of documents found in the portfolio, and Sensible returns a list of extracted documents. Sensible automatically parses the portfolio into multiple documents using our newly enhanced fingerprints, which allow you to specify text in start and end pages of a document. For more information, see the docs.

New feature: Sections

Extract complex, repeating sections from a document using the new Sections feature. For example, extract an array of unprocessed_claims objects from a loss run document:

Click to enlarge

You can skip missing information in sections, nest sections inside sections, and configure complex starts and stops for the sections and their ranges in the documents, making this feature powerfully configurable. For more information, see the docs.

New feature: Any match

We've introduced a new match type, the Any match, in addition to our Regex, Simple, and First matches. With the Any match, you can list an array of matches for synonymous or alternate terms, and return a match for any of the terms. For example:

{
  "fields": [
    {
      "anchor": {
        "match": {
          "type": "any",
          "matches": [
            {
              "type": "equals",
              "text": "load value"
            },
            {
              "type": "regex",
              "text": "cargos? value"
            }
          ]
        }
      },
      "id": "cargo",
      "method": {
        "id": "passthrough"
      }
    }
  ]
}

For more information, see the docs.

Improvement: Phone number type

We've expanded our existing types to include phone numbers. For more information, see the docs.

Improvement: Web app UX

In the Sensible app, you can search for configurations and reference documents. While editing SenseML, you can now quickly change between configurations:

Click to enlarge

Improvement: Better page rotation detection

We added improvements and bug fixes to how Sensible corrects for rotated pages (in the case where a scanned document is photographed or scanned at an angle). Sensible now handles rotated pages that contain a mix of horizontally aligned and vertically aligned text (for example, vertical bar chart labels).