Sensible

The Sensible Developer Hub

Welcome to the Sensible developer hub. You'll find comprehensive guides and documentation to help you start working with Sensible as quickly as possible, as well as support if you get stuck. Let's jump right in!

Get Started    

Use the box method to grab lines inside a box. This method works by default with boxes that have a light background and dark, continuous borders.

Parameters

Note: For the full list of parameters available for this method, see Global parameters for methods. The following table only shows parameters most relevant to or specific to this method.

keyvaluedescription
id (required)box
positionright, left, below, above. default: center of the anchor line's bounding boxUnlike the Position parameter for the Label method, this parameter doesn't specify where to find the data to grab relative to the anchor. Instead, it helps Sensible finetune how to recognize the box that surrounds the data you want to grab.
The Position parameter defines a little cluster of pixels relative to the anchor line, from which to expand outward until dark pixels signifying the box border are found.
Why override the default position? One reason is the uncommon circumstance where Sensible happens to start the box recognition inside a letter in the anchor text that has a continuous border, like "O" or "D". In this case, Sensible interprets the letter as a little box and looks no further.
For an example of how to use this parameter, see the following Examples section.
offsetXnumber (decimals allowed). default: 0The offset in inches along the X axis, from the starting point defined by the Position parameter, from which to expand out and recognize another box. You can use offset X and Y coordinates as an alternative to the Offset Boxes parameter. Coordinates are faster in terms of performance, but are more sensitive to inconsistent box positioning across PDFs and require more configuration. For an example of how to use this parameter, see the following Examples section.
offsetYnumber (decimals allowed). default: 0The offset in inches along the Y axis, from the starting point defined by  the Position parameter, from which to expand out and recognize another box. You can use offset X and Y coordinates as an alternative to the Offset Boxes parameter. Coordinates are faster in terms of performance, but are more sensitive to inconsistent box positioning across PDFs and require more configuration. For an example of how to use this parameter, see the following Examples section.
offsetBoxesobject. default: noneThe offset, in boxes, from the anchoring box. For example, use this parameter for tables or grids formatted such that continuous borders surround every cell. This parameter does not work for isolated boxes that are separated from each other by whitespaces. In other words, the boxes must share borders. Has these parameters:
- direction: The Direction to search in, relative to the starting box.
- number: The number of boxes to offset by.
For an example of how to use this parameter, see the following Examples section.
darknessThresholdnumber between 0 and 1. default: 0.9The brightness threshold below which to consider a pixel a box boundary (white is 1.0). Configure this parameter when you have a box with a dark background. For an example of how to use this parameter, see the following Examples section.

Examples

Simple box

The following image shows a config that grabs a dollar amount from a box in a 1099 form, based on anchor text matching in the box:

The config for the preceding example is:

{
 "fields": [
   {
     "id": "rents_income",
     "type": "currency",
     "method": {
       "id": "box",
     },
     "anchor": "rents"
   }
 ]
}

Dark box

The following image shows extracting text from a box with a dark background and light text using the darknessThreshold parameter:

You can try out this example yourself in the Sensible app using the following downloadable PDF and config:

Example PDF for dark boxesDownload link

The config for the preceding example is:

{
  "fields": [
    {
      "id": "dark_box",
      "method": {
        "id": "box",
        "darknessThreshold": 0.8
      },
      "anchor": "dark box with light text",
    }
  ]
}

Offset boxes

The following image shows extracting text from boxes using the Box Offset parameter:

You can try out this example yourself in the Sensible app using the following downloadable PDF and config:

Example PDF for offset boxesDownload link

The config for the preceding example is:

{
  "fields": [
    {
      "id": "auto_limit_in_policy_1",
      "anchor": "auto only",
      "match": "first",
      "method": {
        "id": "box",
        "offsetBoxes": {
          "direction": "right",
          "number": 1
        }
      }
    },
    {
      "id": "injury_limit_in_policy_2",
      "anchor": "dollar amount",
      "match": "last",
      "method": {
        "id": "box",
        "offsetBoxes": {
          "direction": "below",
          "number": 2
        }
      }
    },
    {
      "id": "oddly_formatted_boxes",
      "anchor": "spanning multiple",
      "method": {
        "id": "box",
        "offsetBoxes": {
          "direction": "below",
          "number": 3
        }
      }
    },
  ]
}

The oddly formatted boxes in this example help illustrate how Sensible recognizes offset boxes after the first box:

  1. Recognize the starting box by finding the dark pixel borders of the box. Expand out from the starting position (in this case, the green dot is the default starting position). The expansion is in all directions, not just the cardinal directions shown by the red arrows in the image.

  2. Find the border that is shared with the next box, as specified by the Direction parameter (below, in this case). Pick the middle of that border in terms of the starting box's dimensions.

  3. Offset just a little from the shared border to get inside the next box, then expand out from that position (shown as a green dot in the Sensible app) to recognize the next box's borders.

  4. repeat steps 2 and 3 for the next box, and so on.

When boxes are complex (inconsistently sized, spanned, or aligned, as in the preceding image), Sensible's methods for recognizing boxes can be correspondingly complex. In such cases, use the Sensible app to visually examine the anchors matches (orange boxes), starting positions (green dots), box matches (green boxes) and method matches (blue boxes) to understand Sensible's behavior. Also keep in mind that another approach might be a better fit for such a complex scenario. See the following example for one such approach.

Box coordinates

You can use the Offset X and Offset Y parameters as an alternative to the Offset Boxes parameter. See Parameters for a description of tradeoffs.

The following image shows the same PDF as the Offset Boxes example, but uses X and Y coordinates to find the offset boxes instead:

The red arrows show the specified offsets from the initial positions (green dots) to the new positions from which to expand out and recognize a box. The green dots move as you adjust the inches coordinates, so you can visually tweak your measurements in the Sensible app.

You can try out this example yourself in the Sensible app using the following downloadable PDF and config:

Example PDF for box coordinatesDownload link

This example uses the following config:

{
  "fields": [
    {
      "id": "auto_limit_in_policy_1",
      "anchor": "auto only",
      "match": "first",
      "method": {
        "id": "box",
        "offsetX": 1.5,
        "offsetY": 0.0
      }
    },
    {
      "id": "injury_limit_in_policy_2",
      "anchor": "dollar amount",
      "match": "last",
      "method": {
        "id": "box",
        "offsetX": 0.0,
        "offsetY": 1.0
      }
    },
    {
      "id": "oddly_formatted_boxes",
      "anchor": "spanning multiple",
      "method": {
        "id": "box",
        "offsetX": 1.0,
        "offsetY": 2.5
      }
    },
  ]
}

Troubleshoot box recognition

Use the Position parameter to fine tune box recognition.

For example, in the following image, the config specifies to find the box by expanding outward from the left edge of the anchor line's boundaries ("position": "left") until Sensible finds dark borders. But the starting position (the green dot) is right on the box border itself, so Sensible can't recognize the box:

However, if you specify to find the box borders by starting from the right edge of the anchor line's boundaries ("position": "right"), there's enough whitespace between the anchor and the box border for Sensible to recognize the box:

You can try out this example yourself in the Sensible app using the following downloadable PDF and config:

Example PDF for box recognitionDownload link

This example uses the following config to recognize the box and filter out an unwanted string:

{
  "fields": [
    {
      "id": "box_position_test",
      "anchor": "big anchor text",        
      "method": {
        "id": "box",
        "position": "right",
        "wordFilters": ["cramped"]
      }
    }

Notes

The Box method is similar to the Region method, but requires less configuration and is slightly slower. Use the Region method instead of the Box method if the borders of the box are compromised in some way (incomplete, discontinuous, or otherwise broken, for example).

Updated 15 days ago


Box


Suggested Edits are limited on API Reference Pages

You can only suggest edits to Markdown body content, but not to the API spec.