Paragraph

Extracts paragraphs in various layouts, including paragraphs in columns and paragraphs that span pages.

Parameters
Examples
Notes

Parameters

Note: For the full list of parameters available for this method, see Global parameters for methods. The following table shows parameters most relevant to or specific to this method.

keyvaluedescription
id (required)paragraph

Examples

Config

{
  "fields": [
    {
      "id": "repair_completion",
      "anchor": {
        "match": {
          "text": "may not repair",
          "type": "includes"
        }
      },
      "method": {
        "id": "paragraph"
      }
    },
    {
      "id": "lead_warning_spans_pages",
      "type": "paragraph",
      "anchor": {
        "match": [
          {
            "text": "lead warning statement",
            "type": "startsWith"
          }
        ]
      },
      "method": {
        "id": "paragraph"
      }
    }
  ]
}

Example document
The following image shows the example document used with this example config:

Click to enlarge

Click to enlarge

Example documentDownload link

Output

{
  "repair_completion": {
    "type": "string",
    "value": "1. Tenant may not repair or cause to be repaired any condition, regardless of the cause, without Landlord's permission. All decisions regarding repairs, including the completion of any repair, whether to repair or replace the item, and the selection of contractors, will be at Landlord's sole discretion."
  },
  "lead_warning_spans_pages": {
    "type": "string",
    "value": "LEAD WARNING STATEMENT: Housing built before 1978 may contain lead-based paint. Lead from paint, paint chips, and dust can pose health hazards if not managed properly. Lead exposure is especially harmful to young children and pregnant women. Before renting a home built before 1978, landlords must disclose the presence of any known lead- based paint and/or lead-based paint hazards in the dwelling. Tenants must also receive a federally approved pamphlet on lead poisoning prevention."
  }
}

Notes

This method uses document layout to detect paragraphs. In contrast, the Document Range method extracts all the text between an upper and a lower bound.