Remove page

Removes pages that match the specified text.

Parameters

keyvaluedescription
type (required)removePage
match (required)Match objectSensible removes the page that contain this text.
matchAllbooleanIf true, removes all pages containing the text specified by the Match parameter.
pageOffsetnumber. default: 0The zero-indexed number of the page to remove, counting from the page number of the text matched by the Match parameter.
To remove a single page offset from the first page of the document, rather than offset from matched text, specify "match": { "type": "first" }.

Examples

The following example shows removing all pages with an Appendix A header in order to extract text from Appendix B pages.

Config

{
  "preprocessors": [
    {
      /* remove all pages containing 
         the large-font header "Appendix A" */
      "type": "removePage",
      "match": {
        "type":"equals",
      "text": "Appendix A",
      "minimumHeight": 0.2
      },
      "matchAll": true
    }
  ],
  "fields": [
    /* get the Rider A in appendix B,
       not in Appendix A */
    {
      "id": "rider_A",
      "anchor": "rider a",
      "type": "currency",
      "method": {
        "id": "label",
        "position": "right",
      }
    }
  ]
}

Example document

Click to enlarge
Example documentDownload link

Output

{
  "rider_A": {
    "source": "$600",
    "value": 600,
    "unit": "$",
    "type": "currency"
  }
}