Category Archives: ReportMiner7 Highlight Reel

Multi-Column Support

Many documents have newspaper-style formatting, which contains more than one column with a repeating pattern of records. As a result, the layout is more complex than a single-column document, and extracting useful information can pose a challenge.

ReportMiner 7 now provides a Multi-Column layout option to handle documents with multiple columns.  In the past, if documents had more than one column with a repeating pattern, it would be very difficult to extract information from all the columns in a clean and efficient manner.  This was due to the way in which the software looks for information: it scans the data in horizontal sweeps. With the latest version of ReportMiner, you can now process your multi-column documents within minutes for perfectly editable and searchable data.

Here’s how to use ReportMiner 7’s Multi-Column feature: 

First, load your multi-column document in ReportMiner.

Add a Data Region to create matching patterns.

In this case, we’ll create a matching pattern for Names and Phone Numbers in the document.

Next, add Data Fields for Names and Phone Numbers in the document.

When previewed, the data is displayed accurately in a list format.

As seen in the screenshot below, a blank bar appears as soon as you check off the Multi-Column option in the Data Region. Click on the bar and a black dotted vertical line will appear indicating a column boundary. If a line is placed incorrectly, click on it within the bar to remove it and try again. Make sure that the line is flush with the left side of the first column of characters in your document.

Since there are three columns in the sample document, another column boundary is added just before the start of the second column. All records in both columns have now been successfully identified.

Preview your data and export it to a destination file type of your choice with easy access to the extracted information.

From one column to multiple columns, Astera can extract information with ease. Thanks to ReportMiner7, your data is more accessible than ever before. 

Optical Character Recognition Support

Welcome to The Highlight Reel, Astera Software’s blog series on ReportMiner 7’s newest features.

ReportMiner 7 now offers built-in Optical Character Recognition (OCR). Combined with our sophisticated pattern based text extraction functionality, ReportMiner can be used to unlock data trapped in scanned documents seamlessly.

How does it work in ReportMiner?

ReportMiner uses OCR as a preprocessing step to get the text equivalent of the image found in the scanned pdf documents. Once the equivalent text is available, rest of the process is exactly same as other text based documents. Let’s review the OCR process for PDF documents containing textual information as images:

  1. Once we select File > New > Report Model, we can go ahead and set the path to the PDF document containing textual information that we would like to run OCR on.

Make sure that the “Run OCR” option is checked, so that ReportMiner will run OCR on the document.

An important thing to be noted here is the option of zoom level and its default value being set to 100%.

Selecting an appropriate zoom level results in both speed and accuracy. If the image containing text is very small, increasing the image size can result in better text recognition with improved accuracy. Hence, you can adjust this zoom level until you get the desired results.

  1. Below is a screenshot of the PDF document we are trying to read using ReportMiner.

  2. As soon as you select “Ok”, ReportMiner will start running OCR on the document.

 

 

  1. As shown below, Report Miner grabs the textual information from the PDF document and displays it on the screen.

Now that you have your document digitized, it can be processed by ReportMiner. It can be used to create report models to create data regions and identify matching patterns, grab data and then export it to your desired destination.

Be sure to check back every Thursday for more highlights. If you’d like to see the current list of featured features, click here.

Microsoft Word and Rich Text Format Support

Welcome to The Highlight Reel, Astera Software’s blog series on ReportMiner 7’s newest features.

ReportMiner 7 now includes support for Microsoft Word (Doc and Docx) and RTF formats: enjoy efficient and easy information extraction from more source files than ever before. Now you can process invoices, purchase orders, receipts, forms and other Word/RTF-formatted files with ReportMiner 7.

The screenshots below illustrate .docx extension support in ReportMiner.

Select File > New > Report Model and choose your source document.

As seen in the screenshot below, the .docx file opens and is ready for processing.

You can now continue to create your report model.

We at Astera Software are proud to take every step towards making our products the best that they can be. Be sure to check back every Thursday for more highlights.