Corrects the size of text in documents whose size varies, for example as a result of being photographed at different distances. ID cards and receipts are common examples of such documents. This preprocessor enables coordinates-based methods, such as the Region or Text Table methods, to work with such unpredictably scaled documents. For alternatives to this preprocessor, see Page transformation preprocessors.

Parameters
Examples

Notes

Parameters

keyvaluedescription
type (required)scale
samplesarray of objectsArray of example objects containing font heights for text matches in 100% scaled documents. Sensible compares the actual size of each match against the examples, then take an average of the ratios and use that to rescale the whole document. Sensible recommends the following practices:
- Choose samples for which the font height does not vary relative to other font heights in the document. For example, don't create a sample that can match to both a heading 1 and a heading 4 style.
- Choose samples that appear on each page, such as headers or footers.
Each example object has the following parameters:
match: a Match object
targetHeight: the number in inches of the match at 100% scale.
perPagebooleanIf true, Sensible rescales each page individually against the Target Height parameter, taking the average of all matches' heights on that page rather than in the whole document. For example, if a tax return contains multiple W-2 forms, but each W-2 can be scanned at an unpredictable scale, then you can set this parameter to true and match on text such as the "Wage and Tax" and the W-2 titles in the W-2 form.

Examples

The following example shows using the Per Page parameter to scale an ID card that has a different size on each page, where the second page contains the target size to standardize on.

Config

{
  "preprocessors": [
    {
      "type": "scale",
      "perPage": true,
      "samples": [
        {
          "match": {
            "type": "includes",
            "text": "First",
            "isCaseSensitive": true
          },
          "targetHeight": 0.22
        }
      ]
    }
  ],
  "fields": [
    
    {
      "id": "white_house_tenure",
      "anchor": "tenure",
      "match": "all",
      "method": {
        "id": "region",
        "start": "below",
        "offsetX": -1.7,
        "offsetY": 0,
        "width": 1.5,
        "height": 0.6
      }
    }
  ]
}

Example document

The following image shows the example document used with this example config:

Click to enlarge

Example documentDownload link

Output

{
  "white_house_tenure": [
    {
      "type": "string",
      "value": "1940-1945"
    },
    {
      "type": "string",
      "value": "1940-1945"
    },
    {
      "type": "string",
      "value": "1940-1945"
    }
  ]
}