In the last month, we released several new features, including support for extracting from image documents (PNG and JPEG), a powerful new column-recognition capacity in our Sections feature, the ability to add your own custom data types, and more.
You can now use the Sensible API to extract data from PNG and JPEG images of documents, not just PDFs. Sensible OCRs photos of documents and then processes them exactly as it would other documents.
We added power and flexibility to sections with the ability to recognize columns in a section ("vertical sections"). This allows you to extract from a wide variety of complex table-like and column-like layouts, including tables nested in tables, table grids, and tables with both row labels and column labels.
In order to extract repeated tables in vertical sections, we also added the ability to define relative coordinates in the Text Table method with the new Columns Relative To Anchor parameter.
You're no longer limited by the types that Sensible defines; you can define your own custom type using regular expressions, for example for invoice numbers or time durations.
You can now view each extraction in a portfolio separately in the extraction list in the Sensible app by clicking the extraction and navigating to each separate PDF in the portfolio.
In the SenseML editor, the right pane now displays additional tabs so you can view not just the extraction, but also any validation messages or errors:
Scanned documents can have off-white, mottled backgrounds. Sensible now automatically adjusts for such backgrounds when recognizing boxes and checkboxes.
With the new Paragraph method, we added document layout recognition to enable you to extract paragraphs from column or multi-page layouts.
You can now customize recognizing the Currency type with options for delimiting characters, currency characters, and decimal place, for example to recognize non-USA currencies.
You can now standardize capitalization for the output of the Name type.
We improved handling for multi-page fixed tables by removing headers that repeat on subsequent pages from the output.