Quickstart PDF to Excel
In this quickstart, extract data from an example tax form PDF and convert the document data to a spreadsheet with no coding involved.
- If you instead want a guided tour of SenseML concepts so you extract data from your own custom documents, see Getting started.
- If you want a low-code way to convert document data to formats other than Excel or CSV, for example, emails, Google sheets, or databases, see Sensible's Zapier integration.
If you're trying to convert a PDF into an Excel spreadsheet, you often find tools that copy the PDF's visual layout into a spreadsheet, with no meaningful relationship between the extracted text and the underlying cells.
In contrast, this tutorial shows you how to use Sensible to convert document tables, checkboxes, paragraphs, and even complex repeating section layouts into meaningfully labeled column/row pairs and linked sheets. You can convert documents formatted as PDFs, PNGs, TIFFs, and JPEGs.
Extract sample document data
Get an account at sensible.so.
Navigate to Sensible's open-source configuration library to choose an example document type. For this tutorial, select Tax forms.
Select Clone to account to copy example tax forms and associated configurations for extracting data from those forms to your account.
Download the following example tax form:
Example PDF Download link
Navigate to the quick extraction tab.
Upload the document you downloaded in the previous step.
Select tax_forms in the Document type dropdown and click Run extraction.
Sensible extracts data from the document and displays it as JSON in the pane to the right.
Convert to spreadsheet
- Click the Download icon to convert the extracted document data to Excel.
The following spreadsheet shows the example output:
- (Optional) View the document and its configuration in the Sensible app at https://app.sensible.so/editor/?d=tax_forms&c=1040_2021&g=1040_2021_sample to explore or tweak the SenseML rules for extracting data from this tax form.
Compile PDFs into one spreadsheet
To combine multiple PDFs into one multi-document spreadsheet, use the Sensible API.
- For more information about how Sensible converts JSON document extractions to Excel, see SenseML to spreadsheet reference.
- See the Getting started guide to learn how to extract from your custom documents or tweak Sensible's open-source configuration library.
Updated about 1 month ago