Smart Data Extraction with ReportMiner: Automating Creation of Extraction Models

An extraction model is at the heart of data extraction from unstructured data using ReportMiner. The model essentially consists of a set of data matching patterns. These patterns are used to identify the desired data within the document.

Typically, these data matching patterns are built after carefully observing any patterns in the data and then applying appropriate pattern identifiers. The pattern is used to identify the desired data block. The next step requires identifying data fields by marking them inside a sample data region.

Astera recognized early on that manually creating data regions and fields can be time consuming and error prone, consequently we set out to develop a solution that automates the steps required to build an extraction model. This eliminates the need to observe the data and manually come up with a matching pattern.

As shown in the figure below, once you have loaded your report, all you need to do is select a couple of sample lines that belong  in your region and you’ll see a positive marker (green) appear next to your selection.

ReportMiner automatically scans the text for a pattern, highlights the area, and creates the region for you. If you want to make adjustments, simply click the  green marker or click to the left of a line that is highlighted with no marker, and a negative marker (red) will appear, which will un-highlight that line and exclude it from the region.

As with regions, the new automated feature in ReportMiner also enables you to create fields automatically by scanning the region’s sample for repeating patterns of data.

All you need to do is right click in the data area and select “Auto Create Fields.”

ReportMiner will scan the source file and create fields automatically, as shown below.

With the new smart creation of regions and fields capability within ReportMiner 6.4, you no longer have to spend tedious hours manually creating regions and fields in order to extract the data you need. With a few mouse clicks you can quickly begin the most important part of your project—leveraging your extracted data to increase your business efficiency.