December 2023

In the last month, we released an open-source Python SDK that offers convenient access to the Sensible API, added advanced configuration options for the LLM-powered List and NLP Table methods, and made minor improvements to other features.

New feature: Python SDK

In addition to our Node SDK, you can now use our new open-source Python SDK to extract from and classify documents. For more information, see the Python SDK quickstart and SDK documentation.

Improvement: Choose your LLM model for the List method

You can now specify which LLM model to use to extract lists. The new LLM Engine parameter uses the "fast" model by default. To troubleshoot missing data in the extracted list, choose the "thorough" model. For details about the LLM models Sensible uses, see the List method.

Improvement: Advanced configuration for multi-page LLM-based tables

To troubleshoot automatic multi-page NLP Table extraction, use the new Page Span Threshold parameter. By default, if a table occurs near the top or bottom of a page, the NLP Table method automatically searches surrounding pages for a continuation of the table. You can now loosen the definition of "top" and "bottom". For example, when a multi-page table is bumped toward the center of the page by footer text, loosen the bottom threshold. For more information, see NLP Table method.

Improvement: Merged cells in Fixed Table and Table methods

The Table and Fixed Table methods now handle merged cells in tables by populating "empty" cells with the merged value. Sensible enables this behavior if you follow the best practice of specifying the Stop parameter for these table methods.

For example, the following table in a document contains merged cells:

Click to enlarge The following image of extracted data shows that Sensible now populates "merged" cells when you specify the Stop parameter:

Click to enlarge Without the Stop parameter, Sensible leaves 3 out of every 4 rows empty in the class column, and every other row empty in the student column.

For an example, see Merged cells example.

Improvement: Portfolios return verbose output

Sensible now supports verbose output for portfolio documents as well as for single-document files. Use verbose output to get metadata about the source text for an extracted field, for example, to troubleshoot scanned, OCRed text.