Supported file types

File types

Sensible supports the following file types:

PDFMicrosoft Word
(DOC and DOCX)
Spreadsheet formats1
(XLSX and CSV)
single-page image formats1
(JPEG, PNG)
multi-page image formats1
(TIFF)
Context
Sensible app's Extract tab
Single-file extraction with SDKs or API
Portfolio extraction with SDKs or API
Classification by type with SDKs or API
Extraction method
Methods that require rendering non-text image pixels2
NLP Table method,
Fixed Table method3
Extraction of text that requires OCR
  1. All OCR settings are inapplicable for Microsoft Excel andCSV.

  2. Methods that require rendering an image include pixel-based methods, such as Box, Checkbox, Nearest Checkbox, and Signature methods, multimodal LLM-based methods, and image coordinates returned by the Document Range method.

  3. As alternatives to these Table methods, use the Fixed Table method or the List method.

File sizes

Sensible supports the following file sizes:

OperationSize limit for /extract/{doc-type} API endpointSize limit for aysnchronous calls
Single-document file extractionunder 4.5MB, or under 30 seconds processing time6 GB
Portfolio extractionn/a6 GB
Classification4.5 MB4.5 MB

Notes

  • For DOC and DOCX documents, Sensible converts the document to PDF before processing it.
  • For spreadsheet documents (XLSX and CSV), Sensible extracts text directly from the file without OCR. Sensible represents the text both internally and in the Sensible app's editor as follows:
    • Standardizes the formatting of all text in the file. Each cell contains exactly one line.
    • Standardizes cell height at 0.25'' tall and cell width at 1''. Overflow text in a cell is still available for extraction but isn't viewable in the Sensible app editor unless you click on a line in the rendered document to view its details.
    • Standardizes the maximum page height at 15 inches. Sensible splits longer sheets into consecutive pages.