Quickstart PDF to Excel

In this quickstart, extract data from an example tax form PDF and convert the document data to a spreadsheet with no coding involved.

  • If you instead want a guided tour of SenseML concepts so you extract data from your own custom documents, see Getting started.
  • If you want a low-code way to convert document data to formats other than Excel or CSV, for example, emails, Google sheets, or databases, see Sensible's Zapier integration.

Introduction

If you're trying to convert a PDF into an Excel spreadsheet, you often find tools that copy the PDF's visual layout into a spreadsheet, with no meaningful relationship between the extracted text and the underlying cells.

In contrast, this tutorial shows you how to use Sensible to convert document tables, checkboxes, paragraphs, and even complex repeating section layouts into meaningfully labeled column/row pairs and linked sheets. You can convert documents formatted as PDFs, PNGs, TIFFs, and JPEGs.

Extract sample document data

  1. Get an account at sensible.so.

  2. Navigate to Sensible's open-source configuration library to choose an example document type. For this tutorial, select Tax forms.

  3. Select Clone to account to copy example tax forms and associated configurations for extracting data from those forms to your account.

  4. Download the following example tax form:

    Example PDFDownload link
  5. Navigate to the quick extraction tab.

  6. Upload the document you downloaded in the previous step.

  7. Select tax_forms in the Document type dropdown and click Run extraction.

    Click to enlarge

    Sensible extracts data from the document and displays it as JSON in the pane to the right.

Convert to spreadsheet

  1. Click the Download icon to convert the extracted document data to Excel.

Click to enlarge

The following spreadsheet shows the example output:

  1. (Optional) View the document and its configuration in the Sensible app at https://app.sensible.so/editor/?d=tax_forms&c=1040_2021&g=1040_2021_sample to explore or tweak the SenseML rules for extracting data from this tax form.

Click to enlarge

Compile PDFs into one spreadsheet

To combine multiple PDFs into one multi-document spreadsheet, use the Sensible API.

Next