Image processing

You have the following options for processing non-text images in documents:

Use caseMethod
Extract structured data from an image with an LLM. For example, extract facts about a photo of a building, such as its color and whether it's multistory-story or single-story.Use the Query Group method with the Multimodal Engine parameter configured.
Extract an image from a known region as an encoded string. For example, use this option when your documents contain complex charts, from which neither LLM-based nor layout-based methods can reliably extract structured data. Extract the chart as an image and render it for a human to interpret.Use the Region method with the As Image parameter configured.
Search for non-labeled, non-text images in a range. For example, search for unlabeled photos of houses in a real estate document. This option returns images' coordinates, which you can then use to render the images.Use the Document Range method with the Include Images parameter configured.

Notes

  • Sensible's rectangular coordinates for images follow these conventions:

    • they're in reference to a 0.0 origin at the top left corner of the page (not the bottom left origin, as is for example the convention with the popular PDF.js library)

    • they're in inches (to convert inches to pixels, multiply the inches coordinates by your PPI setting. For example, an x-coordinate of 3.156 inches is ~227 pixels for a PPI setting of 72 (72 PPI * 3.156 inches)).

    • they're ordered clockwise from top left: (top left), (top right), (bottom right), (bottom left)

  • This topic is about processing non-text images. For information about processing text images, see OCR.