Using ReportMiner to Extract Business Information From Printed Documents Part 2: Main Region

Last week we talked about how ReportMiner can be used to build an extraction model quickly and easily and demonstrated how to create the header in a report Model.

Now that the definition of the Header has been created, the next step is to create the Main Region of the report. The Main Region starts with the Customer Name and then includes Account Number, Contact Name, and, finally, specific order details.

Select the Main Region in the Report Definition Editor, then right-click it and select “Add Data Region” from the context menu.

This will add a new node “Data” in the Report Browser. This new node has no fields at this point.

ReportMiner has assigned the default vertical size of this region as 23 lines based on your selection. This number can be adjusted as needed by using the Line Count input under the toolbar.

Next, identify the starting point of the region. Place the cursor at the position where the text ‘CUSTOMER:’ begins as shown in the screenshot below and enter CUSTOMER: in the pattern text input.

Report Definition Editor highlights any occurrences of the Data Region in report.  Remember that the height of the region can be adjusted by using the Line Count input.

Rename the region CustomerData. Now the report has two regions: Header and CustomerData.

Next, identify the fields making up the CustomerData region. The fields can either be manually assigned, or  the Auto Create Fields feature can be used.

To manually add a field, highlight a field with the mouse cursor, right-click it and select Add Field. A new field is added to the Report Browser. The Report Definition Editor shows all the occurrences of this field in the report.

To automatically add fields, right-click within the header area, and select Auto Create fields. You can then modify, rename, add or delete fields as necessary.

Next, CustomerData is created. Notice that each Customer can have one or more orders, and each order may have several items in it.  In ReportMiner  this is called a Collection. Order data is located within the CustomerData region defined above. In other words, CustomerData region is also a container for order details.

Select CustomerData node in ReportBrowser. Right-click it and select Add Collection Data Region. This will add a new region under the CustomerData node. The default name here is Data, which can be renamed OrderData to make it more descriptive.

Now, define the starting point of the new region.

Type ORDER NUMBER: in the text pattern input.

The report definition editor highlights all instances of OrderData region.

Right-click anywhere within the region and select Auto Create Fields. This creates the Order Number and Ship Date fields, named Field_0 and Field_1 respectively. Give these fields more user-friendly names.

As described earlier, a Customer can have more than one order, which in ReportMiner is called a Collection of items.  Whenever a node has a collection of items, the user needs to turn on its “Is Collection” property. Notice that the appearance of the icon for ORDER node in the report browser changes to help identify this node as a Collection. When a Collection Data Region is added via the context menu, the “Is Collection” property is enabled automatically.

Now, create the definition of Order Items. Select CustomerData node in the Report Browser. Add a new Collection Data Region in the report definition editor in the same manner as done earlier.

Specify the text pattern that will identify the order items. In the example, part of the Quantity data is used, followed by a space character to identify a line with the order item. To that end, enter “Match any digit” and then “Match any blank character,” as shown below.

Next, rename the new region OrderDetails and auto create fields using the protocol demonstrated earlier.

This action adds six fields, such as media type, quantity, description, label/no, unit price, and amount. The fields are named by default Field_0, Field_1, so it is necessary to rename them.

The sample report does not have the footer.  If necessary, a footer can be added in the same way the Header and Data Regions were added above.

The report definition has now been created. Report definitions are used by ReportMiner to correctly parse, interpret and assign data as it is fed from the report source. Report definitions are assigned an *.rmd extension.

Save the report model by clicking Save icon on the main toolbar.  Now the data can be previewed to see how it is parsed by ReportMiner.

Click the  magnifying glass icon on the top toolbar. This opens the Data Preview window, showing the entire report structure with the actual values for all the fields that have been defined above. Now we’ll take a look at some additional functionality that Centerprise offers to help you customize your report.

Selecting Fields and Regions

To select a field, left-click on it in the Report Browser’s tree. The field is highlighted in yellow in the Report Definition Editor. Some of the more common field properties are displayed in the top pane of the editor:

To select a region, click on it in the Report Browser’s tree. The region is highlighted in light purple in the Report Definition Editor, and the fields in the selected region are also highlighted in darker purple. The top pane shows the properties that are applicable for the region.

Managing Field and Region Properties

To view and update all other properties of a field or a region, right-click on a field (or region) inside Report Browser, and select Edit Field (or Edit Region) from the context menu.

Field properties can also be accessed by right-clicking the field in the Report Definition Editor and selecting ‘Field Properties…‘ from the context menu.

Renaming Fields and Regions

To rename a field, double-click it on the tree in the Report Browser and enter a new name.

To rename a region, double-click it on the tree in the Report Browser and enter a new name.

A field or a region can also be renamed by entering the new name in the Name input on the top pane.

Deleting Fields and Regions

To delete a field, right-click it in Report Browser or Report Definition Editor and select Delete Field.

To delete a region, right-click on a region (or a field inside the region), and select Delete Region from the context menu. This action will also delete any fields in that region.

Customizing Fields

After a field has been created, its start position can be changed by moving it a number of characters to the left or to the right.   Right-click on a field and select “Move Field Marker Right One Character” or “Move Field Marker Left One Character” from the context menu. Repeat as needed to move the field the desired number of characters.

The field length can be changed by selecting “Decrease Field Length by one character” and “Increase Field Length by one character” from the context menu.  Repeat as many times as needed to change the field length by the desired number of characters.

To auto determine field length based on the available sample data, right-click a field and select “Auto determine field length“ from the context menu. Or click the   icon on the top toolbar.

Alternatively, you can also move all fields within the same region left or right by a specified number of characters. To do it, right-click on a region or field, and select “Move All field markers left one character” or “move all field markers right one character”.

To undo any action in the editor, use the Undo dropdown menu on the toolbar or press CTRL  + Z.

Identifying Text Patterns for Fields and Regions

The following options are available to help you create a text pattern that will identify the starting point of a field, or a region.

Report Options

To change report options, click the report options icon on the report toolbar.

The following options are available:

Sample File Path – provides the path to the sample report file that you want to use for creating your report model.

Line Count – controls how many lines are loaded from the sample file

Other useful options are Tab Size (default value of 8), Font and Numeric Format.

Now the sample report can be previewed based on the report model that has been created. To preview it, click  magnifying glass icon on the Report toolbar. The report displays in the Data Preview pane as shown in the screenshot  below.

Now that the report has been created and previewed, add it to a dataflow so the entire source report can be read and fed to a destination object.

Go to File -> New -> Dataflow. This creates a new dataflow.

Using the Toolbox pane, expand the Sources category and select Report Source.

Drag and drop Report Source onto the Designer.

Double click the ReportModel1 object just added (or right-click it and select Properties) to open the Properties dialog.

Using the properties dialog, enter the path to the report source file and the report model. The report model location should point to the report model created and saved earlier.

Click OK to close the dialog.  ReportModel1 object shows the report structure according to the report model we created:

The tree nodes may need to be expanded to see all the child nodes under the root node.

The new report source is now ready to feed data to the downstream objects in the dataflow.