Box
Extract lines inside a box. This method works by default with boxes that have a light background and dark, continuous borders.
Parameters
Note: For additional parameters available for this method, see Global parameters for methods. The following table shows parameters most relevant to or specific to this method.
key | value | description |
---|---|---|
id (required) | box | Extracts all lines in a box. If you define an anchor that's outside the box borders, then use offset parameters to define a point that's inside the box borders so that Sensible recognizes the box. |
position | right , left , below , above . default: center of the anchor line's bounding box | Use this parameter to fine tune box recognition. Defines the starting point for the box recognition relative to the anchor. For example, right specifies starting at the midpoint of the anchor line's right boundary, and below specifies starting at the midpoint of the anchor line's bottom boundary. Sensible searches outward from this point until it finds dark pixels signifying the box border. For an example of how to use this parameter, see the following Examples section. |
offsetX | number in inches default: 0 | Searches for a box starting at a point offset from the point defined by the Position parameter. Positive values offset to the right, negative values offset to the left. For an example of how to use this parameter, see the following Examples section. |
offsetY | number in inches default: 0 | Searches for a box starting at a point offset from the point defined by the Position parameter. Positive values offset down the page, negative values offset up the page. For an example of how to use this parameter, see the following Examples section. |
offsetBoxes | object. default: none | Recognize a box offset from the point defined in the Position parameter by a number of contiguous boxes that share borders. For example, use this parameter for tables or grids where borders surround every cell. Contains the following parameters: - direction : The direction to search in (above, below, right, left , relative to the starting box.- number : The number of boxes to offset by.For an example of how to use this parameter, see the following Examples section. |
darknessThreshold | number between 0 and 1. default: 0.9 | The brightness threshold below which to consider a pixel a box boundary. White is 1.0. Configure this parameter for checkboxes with dark backgrounds relative to the surrounding background. If the document has a white background, the default value is 0.9. If the document has dark or mottled background, for example as the result of a scan, then Sensible automatically chooses a default value based on the amount of contrast in the document. For an example of how to use this parameter, see the following Examples section. |
includeAnchor | true , false . default: false | If true, includes anchors lines that are inside the box borders in the method output. Ignores anchor lines that are outside box borders. |
Examples
Simple box
The following example shows extracting a dollar amount from a box in a 1099 form, based on anchor text matching in the box.
Config
{
"fields": [
{
/* find the first box in the document
containing the text 'rents'
and return all the other text in the box */
"id": "rents_income",
"type": "currency",
"method": {
"id": "box",
},
"anchor": "rents"
}
]
}
Example document
The following image shows the example document used with this example config:
Example document | Download link |
---|
Output
{
"rents_income": {
"source": "4,200",
"value": 4200,
"unit": "$",
"type": "currency"
}
}
Dark box
The following example shows extracting text from a box with a dark background and light text using the darknessThreshold
parameter.
Config
{
"fields": [
{
"id": "dark_box",
"method": {
"id": "box",
"darknessThreshold": 0.8
},
"anchor": "dark box with light text",
}
]
}
Example document
The following image shows the example document used with this example config:
Example document | Download link |
---|
Output
{
"dark_box": {
"type": "string",
"value": "Here’s some more text"
}
}
Offset boxes
The following example shows recognizing boxes relative to other boxes using the Box Offset parameter.
Config
{
"fields": [
{
"id": "auto_limit_in_policy_1",
"anchor": "auto only",
"match": "first",
"method": {
"id": "box",
"offsetBoxes": {
"direction": "right",
"number": 1
}
}
},
{
"id": "injury_limit_in_policy_2",
"anchor": "dollar amount",
"match": "last",
"method": {
"id": "box",
"offsetBoxes": {
"direction": "below",
"number": 2
}
}
},
{
"id": "offset_boxes",
"anchor": "spanning multiple",
"method": {
"id": "box",
"offsetBoxes": {
"direction": "below",
"number": 3
}
}
},
]
}
Example document
The following image shows the example document used with this example config:
Example document | Download link |
---|
Output
{
"auto_limit_in_policy_1": {
"type": "string",
"value": "$2,000"
},
"injury_limit_in_policy_2": {
"type": "string",
"value": "$4,000"
},
"offset_boxes": {
"type": "string",
"value": "Third offset box"
}
}
Notes
The following image illustrates how Sensible recognizes offset boxes after the first box:
- Recognize the starting box by searching for the dark borders of the box. The search starts at the green dot defined by the Position parameter. The search expansion is in all directions, not just the cardinal directions shown by the red arrows in the image.
- Find a bottom border (
"direction": "below"
) that's shared with the next box. Choose a point ( represented as the second green dot) on the bottom border that's in the middle of the starting box's border and that's just inside the next box's borders. - Search from that point to recognize the next box's borders.
- Repeat steps 2 and 3 for the next box.
When boxes are complex (inconsistently sized, spanned, or aligned, as in the preceding image), Sensible's methods for recognizing boxes can be correspondingly complex. In such cases, use the Sensible app to visually examine and understand the extraction. Or, see the following example for an alternative approach.
Box coordinates
You can use the Offset X and Offset Y parameters:
- to anchor on text outside the target box, for example, if all the box's contents are variable.
- as an alternative to the Offset Boxes parameter. Offsets provide faster performance, but are more sensitive to inconsistent box positioning across documents and require more configuration.
The following example shows the same document as the Offset Boxes example, but uses distances in inches rather than boxes to define the offsets.
Config
{
"fields": [
{
"id": "auto_limit_in_policy_1",
"anchor": "auto only",
"match": "first",
"method": {
"id": "box",
"offsetX": 1.5,
"offsetY": 0.0
}
},
{
"id": "injury_limit_in_policy_2",
"anchor": "dollar amount",
"match": "last",
"method": {
"id": "box",
"offsetX": 0.0,
"offsetY": 1.0
}
},
{
"id": "oddly_formatted_boxes",
"anchor": "spanning multiple",
"method": {
"id": "box",
"offsetX": 1.0,
"offsetY": 2.5
}
},
]
}
Example document
The following image shows the example document used with this example config:
Example document | Download link |
---|
The red arrows in the preceding image show the offsets in inches from the point defined by the Position parameter. The green dots move as you adjust the inches coordinates, so you can visually tweak your measurements in the Sensible app.
Output
{
"auto_limit_in_policy_1": {
"type": "string",
"value": "$2,000"
},
"injury_limit_in_policy_2": {
"type": "string",
"value": "$6,000"
},
"oddly_formatted_boxes": {
"type": "string",
"value": "Third offset box"
}
}
Troubleshoot box recognition
Use the Position parameter to fine tune box recognition.
PROBLEM
In the following error example, Sensible searches for borders starting from the middle point of the anchor line's left boundary ("position": "left"
). Since the point (the green dot) overlaps the box border, Sensible can't recognize the box.
Config
{
"fields": [
{
"id": "box_test",
"anchor": "big anchor text",
"method": {
"id": "box",
"position": "left",
"wordFilters": [
"cramped"
]
}
}
]
}
Example document
The following image shows the example document used with this example config:
Example document | Download link |
---|
SOLUTION
If you specify "position": "right"
, the green dot is far enough inside the borders that Sensible recognizes the box:
Config
{
"fields": [
{
"id": "box_test",
"anchor": "big anchor text",
"method": {
"id": "box",
"position": "right",
"wordFilters": [
"cramped"
]
}
}
]
}
Example document
Output
{
"box_test": {
"type": "string",
"value": " with some text to extract"
}
}
Notes
The Box method is an alternative to the Region method that requires less configuration and is slightly slower. Use the Region method instead of the Box method for faster performance, or if the borders of a box are incomplete or discontinuous.
Updated 1 day ago