Supported file types
File types
Sensible supports the following file types:
File format | Extraction context | Extraction method | ||||||
---|---|---|---|---|---|---|---|---|
Sensible app's Extract tab | Single-file extraction with SDKs or API | Portfolio extraction with SDKs or API | Classification by type with SDKs or API | Methods that require rendering non-text image pixels2 | NLP Table method, Fixed Table method3 | Extraction of text that requires OCR | ||
✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ||
Microsoft Word (DOC and DOCX) | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | |
Spreadsheet formats1 (XLSX, XLS, and CSV) | ✅ | ✅ | ❌ | ✅ | ❌ | ❌ | ❌ | |
single-page image formats1 (JPEG, PNG) | ❌ | ✅ | ❌ | ✅ | ✅ | ✅ | ✅ | |
multi-page image formats1 (TIFF) | ❌ | ✅ | ❌ | ✅ | ❌ | ❌ | ✅ | |
Email bodies | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ |
-
All OCR settings are inapplicable for Microsoft Excel and CSV.
-
Methods that require rendering an image include pixel-based methods, such as Box, Checkbox, Nearest Checkbox, and Signature methods, multimodal LLM-based methods, and image coordinates returned by the Document Range method.
-
As alternatives to these Table methods, use the Fixed Table method or the List method.
File sizes
Sensible supports the following file sizes:
Operation | Size limit for /extract/{doc-type} API endpoint | Size limit for asynchronous calls |
---|---|---|
Single-document file extraction | under 4.5MB, or under 30 seconds processing time | 6 GB |
Portfolio extraction | n/a | 6 GB |
Classification | 4.5 MB | 4.5 MB |
Notes
- Word documents (DOC and DOCX): Sensible converts the document to PDF before processing it.
- Email bodies: Sensible converts the body to PDF before processing it.
- Spreadsheet documents (XLSX, XLS, and CSV): Sensible extracts text directly from the file without OCR. Sensible represents the text both internally and in the Sensible app's editor as follows:
- Standardizes the formatting of all text in the file. Each cell contains exactly one line.
- Standardizes cell height at 0.25'' tall and cell width at 1''. Overflow text in a cell is still available for extraction but isn't viewable in the Sensible app editor unless you click on a line in the rendered document to view its details.
- Standardizes the maximum page height at 15 inches. Sensible splits longer sheets into consecutive pages.
Updated 9 days ago