Supported file types

File types

Sensible supports the following file types:

File formatExtraction contextExtraction method
Sensible app's Extract tabSingle-file extraction with SDKs or APIPortfolio extraction with SDKs or APIClassification by type with SDKs or APIMethods that require rendering non-text image pixels2NLP Table method,
Fixed Table method3
Extraction of text that requires OCR
PDF
Microsoft Word
(DOC and DOCX)
Spreadsheet formats1
(XLSX, XLS, and CSV)
single-page image formats1
(JPEG, PNG)
multi-page image formats1
(TIFF)
Email bodies
  1. All OCR settings are inapplicable for Microsoft Excel and CSV.

  2. Methods that require rendering an image include pixel-based methods, such as Box, Checkbox, Nearest Checkbox, and Signature methods, multimodal LLM-based methods, and image coordinates returned by the Document Range method.

  3. As alternatives to these Table methods, use the Fixed Table method or the List method.

File sizes

Sensible supports the following file sizes:

OperationSize limit for /extract/{doc-type} API endpointSize limit for asynchronous calls
Single-document file extractionunder 4.5MB, or under 30 seconds processing time6 GB
Portfolio extractionn/a6 GB
Classification4.5 MB4.5 MB

Notes

  • Word documents (DOC and DOCX): Sensible converts the document to PDF before processing it.
  • Email bodies: Sensible converts the body to PDF before processing it.
  • Spreadsheet documents (XLSX, XLS, and CSV): Sensible extracts text directly from the file without OCR. Sensible represents the text both internally and in the Sensible app's editor as follows:
    • Standardizes the formatting of all text in the file. Each cell contains exactly one line.
    • Standardizes cell height at 0.25'' tall and cell width at 1''. Overflow text in a cell is still available for extraction but isn't viewable in the Sensible app editor unless you click on a line in the rendered document to view its details.
    • Standardizes the maximum page height at 15 inches. Sensible splits longer sheets into consecutive pages.