Microsoft Word and Rich Text Format Support

Welcome to The Highlight Reel, Astera Software’s blog series on ReportMiner 7’s newest features.

ReportMiner 7 now includes support for Microsoft Word (Doc and Docx) and RTF formats: enjoy efficient and easy information extraction from more source files than ever before. Now you can process invoices, purchase orders, receipts, forms and other Word/RTF-formatted files with ReportMiner 7.

The screenshots below illustrate .docx extension support in ReportMiner.

Select File > New > Report Model and choose your source document.

As seen in the screenshot below, the .docx file opens and is ready for processing.

You can now continue to create your report model.

We at Astera Software are proud to take every step towards making our products the best that they can be. Be sure to check back every Thursday for more highlights.

Astera Introduces ReportMiner 7

Astera is excited to announce that ReportMiner 7 is now available.  New innovative features make it one of the most powerful software solutions for document content extraction and integration on the market.

Astera’s ReportMiner enables users to extract data from non-tabular data documents. Users define pattern-based extraction models to extract desired data, transform, and send it to a variety of destination formats, such as spreadsheets, relational databases, and XML files. ReportMiner comes with advanced integration features such as workflow orchestration and scheduling with batch and real-time modes, making it a complete solution for end-to-end unstructured data integration.

This latest release builds on our software’s signature ease of use and intuitiveness with features intended to capture a multitude of new source types and structures.

Key new features in ReportMiner 7 include:

  • OCR for PDFs: This feature is introduced to handle documents containing images. With built-in OCR feature, ReportMiner can read data from scanned PDFs directly within the software.
  • Microsoft Word and RTF support: ReportMiner can now extract data from Microsoft Word and RTF documents.
  • Multi-column layouts: Multi-column (newspaper-style) layouts are now supported for extraction. Visually mark the columns and ReportMiner will automatically apply the extraction logic to each of the columns.
  • Action functions: These context-based functions give control of the raw text while creating formula fields.
  • Forward and backward append: “Append” data regions can work in both directions. User can append data to all preceding records or all succeeding records.
  • Flat Preview: Where applicable, ReportMiner now offers a flattening of the preview grid so users can envision how their data will look when exported to a flat structure such as an Excel spreadsheet.
  • User interface enhancements: With a sleek new interface, brand new visual themes, and many user interface enhancements, ReportMiner 7 offers an entirely custom experience in data extraction.

Pricing and Availability

ReportMiner 7 is available immediately. For those interested in giving ReportMiner 7 a test drive, a free trial can be downloaded here. If you’re a current customer and want to upgrade, click here.

A single-user, one-year license of ReportMiner Express is priced at $750. The price includes training and bundle discounts are available.

The OCR version is priced at $1,000 per year, per user. The software can be purchased here.

Trade Partner Management in EDIConnect

EDIConnect is a complete solution for accurate and efficient bi-directional EDI data integration. It offers EDI data integration capability in an intuitive user interface with visual tools to build bi-directional integration. This blog will discuss how to set up a trade partner relationship in EDIConnect, including how to define incoming and outbound condition maps, how to define separators for the segment elements, and how to customize standards and use them inside the trade partner definition.

First, let’s review the EDIConnect process for receiving and sending messages.

1_EDI1

You define your trade partner and specify the settings to receive incoming messages.

Once you receive the message, it goes into the system, where you can validate it and generate acknowledgements such as X12, 997, and 999 or Tier One.

Now let’s take a closer look at the process for defining a trade partner and specifying the settings to receive incoming messages.

Begin by creating a new trade partner. Go to file/new and click on EDI Trade Partner Profile.

2_EDI1

In the screen, you can specify the partner name and also specify your settings, inbound maps, outbound maps, and sequences.

3_EDI1

Let’s look at these options one by one.

First, select the dialect for the partner, in this case X12.

4_EDI1

Within the X12 settings, you can specify information for the interchange headers, functional groups, validation settings, and acknowledgement.

5_EDI1

Within the Functional Group you can specify GS properties as shown below.

6_EDI1

Similarly, for Interchange Header you can specify all ISA properties, as well as separators for your data elements, as shown below.

7_EDI1

For the validation settings, you can specify what should be validated when a message is received for the partner.

8_EDI1

You can specify acknowledgements such as what kind of acknowledgement you want to convey when the messages arrive, such as FunctionalAckTransactionSet, GenerateAk2ForAcceptedTransactionSet, or SendTechnicalAcknowledgement.

9_EDI1

Now we’ll move on to how to specify inbound maps. This is where you specify how specific transactions from the partner are treated and how many you are expecting from that partner. For example, from this partner I am expecting only 850 transactions.

10_EDI1

So I’m going to add one 850 as an incoming transaction set and specify how I want it to be processed in the system. You can have a standard 850 or you can have a customized version of it. In the case of a customized version, you will pick your customized 850 version as a mapping so that when the messages arrive the customized information is used to process that message.

11_EDI1

Similarly, for outbound messages you specify what maps you are going to send out to this partner and if there is any customization, in which case you map the standard message to the customized message in the customization window as shown below.

12_EDI1

Now we’ll move on to sequences. All the sequence generation for this EDI partner is controlled by the properties specified in this tab.

13_EDI1

Here you specify where your sequence object resides, your database information, and your functional group control numbers, interchange control numbers, and conditions of control numbers.

Now that you know how to set up a trade partner relationship, our next blog will show you how to use the EDI builder and EDI destination file in EDIConnect.

Creating a Complex Dataflow in Centerprise – Part 2

Part 2 – Routing Data to Multiple Destinations

In Part 1 we have learned how to use the Join transformation and different types of maps such as the Expression map and Function map. Now we will use the Router to send data into two different tables depending on the routing information.

We want to decide the destination of the loan depending on its origin state. Take the State field from the transformation window and drag and drop it into the Route, which maps it to the State field in the Route.

26

Open the router properties and go to the next page, here you can add the rules to decide the different routes. For example, for the California loans we can write a simple rule such as “State equals CA.”

27

Click on the new rule icon in the left top of the rule window and the next rule we will write is “State does not equal CA.”

28

The result in the Router window is that there are two different outgoing nodes available for mapping, CA and Out_Of_State. This enables us to put each set of data in a different table.

29

Now we can go ahead and create the destination tables for the routed loans. We need to create one for California loans and one for Out_Of_State loans.  You can drag and drop the database table destination from the toolbox onto the designer, however, you can also use a shortcut for creating a database table source or destination using the database browser. You simply select the database browser underneath the toolbox and point to one of the existing connections.

30

The browser will then show you all the existing databases, including tables and views.

31

Select the tables folder, which shows all the tables in the database and we can see the California loans table.

32

Select the California loans table, press shift, and drag and drop it onto the designer to create the destination.

33

Follow the same steps to create the Out_Of_State Loans destination.

34

If we expand the CA_Loans designation, we can see that we have all the fields and we can do our map from the Route to each destination. Drag the CA field from the router to the destination, and do the same for the Out_Of_State_Loans.

35

So with these few clicks we have created the scenario we started with in the very first image. We have the Loans and Taxes sources, we did the join, we did the calculation for the Address and Name Parse function, and, finally, we did the routing to send the loans to two different destination tables.

36

However, as you can see if you compare this to the first image at the beginning of the blog, there are a couple of lookups along with the address calculation and name parsing, as well as a data quality rule to check the data coming from the tax source.

37

If we open the preview window for the tax data, we can see that for some loans the tax is showing zero.

38

We need to check whether the source data is correct or not, so we add a data quality rule, which can be found in the toolbox transformations. Drag and drop it onto the designer and do the mapping as with any other transformation. Open the properties window and a rule can be written to specify that the property tax cannot be zero.

39

Now do a preview on the loan tax join. Since the data is passing through the data quality rule you can see that the data quality rule has flagged all the errors.

40

Another thing that has been done is to add a profile of the tax data. If you do a preview you can see that for the LoanID and the PropertyTax fields the information about the all the values has been collected.

41

Since the profile is like any other source, it can be mapped to an Excel spreadsheet destination and when the dataflow is run, it puts the information in the Excel sheet, which becomes our report. So along with doing the data transfer, a report on the tax data is provided as well.

This is a quick overview has demonstrated how to create a complex dataflow in Centerprise. In Part 1 we learned how to combine data from two sources using a join, send the data for transformation and mapping, and create a function.  In Part 2 we learned how to route our data to two different destination tables. For more information on creating workflows and subflows, visit our Astera TV Workflows and Subflows playlist.

Creating a Complex Dataflow in Centerprise – Part 1

Part 1 –Join Transformations and Functions

Our last post (Creating an Integration Flow in Centerprise) described how to create a simple dataflow in Centerprise. In this two-part blog we will show you how to build a more complex dataflow including maps, transformations, data quality rules, and data profiling.

The figure below shows a more complex dataflow.

37

In this example we are working with two source files, one contains information about home loans and the other contains information about the property tax for the corresponding home loans. We need to combine these two pieces of data and do some conversions by running some calculations on attributes. In the end we want to route the data to two different destination tables, depending on the origin of the home loan: if it is from California it goes to the California Loans table, otherwise it goes to the Out-of-State Loans table. Alongside this, we need to check the data quality for the loan data and again for the tax data. We also need to profile the tax data so that it can be sent to an Excel file and output as a report.

In order to design the dataflow shown above, we begin by clicking on the New Dataflow button to create a new dataflow. First we look at the data—both loan data and tax data. In the previous blog, Creating Simple Dataflows, we learned how to create our source simply by dragging and dropping from the toolbox onto the designer and specifying properties.  However, there is also a shortcut to create sources directly. Simply drag and drop the Loans and Tax Excel files directly from the Explorer window to the designer.

2

Centerprise does the rest. It has created the source, knows where the file comes from, and has done the layout. When you click on the chevron you can see all the data columns from the source file.

3

Click on preview and you can see all your data in the preview window.

12

Now do the same thing with the Tax file. When you preview your tax data, you can see the property tax information for each of the loans.

5

Next we want to combine the two sources. To do this we use the Join transformation. Drag and drop the Join transformation onto the designer.

6

When you click on the chevron, you can see that the transformation doesn’t have any elements.

7

We want to take all the elements from both the Loans and the Tax sources and combine them in the Join transformation. Drag and drop the Loans top node into the Join window. You can see that Centerprise has automatically created and mapped all the fields.

8

To add the two Tax fields to the join, drag and drop each field to the Join window and Centerprise automatically adds the fields and maps them.

9

Note that since there are now two LoanId fields, Centerprise has appended the one from the Tax source to LoanID_1.

Now we have all the fields required for the join. If we right click on the Join window and select Properties, we can see all the fields from both Loans and Taxes.

10

Click on the blue arrow at the top left of the window to go to the next page, where we will specify what kind of join we want. Choose a simple inner join, then in the Sort Left and Sort Right inputs specify the key that will be used for the join. For Loans it is the LoanID and for Taxes it is the LoanID_1.

11

Click OK and our join is ready. When we preview the data we can see that for each of the loans the property tax and loan information is joined.

12

So with a few clicks we have joined our two sources.

The next step is to use our join as a source for our transformation and maps.  Drag and drop the Expression Map from the toolbox onto the designer.

13

This is used to do calculations and any kind of combining of data. In this example we see that the Loans information has the Borrower Name, State, and Zip Code. We want to combine these three fields into one field and call it “Address” in our destination. Since we are going to be routing to two different destinations, our natural next step is to add a router.

Drag and drop a router from the toolbox onto the designer.  The router becomes the next destination.

14

Next, drag and drop the three fields we want to combine (Borrower Name, State, and Zip Code) from our Join window to the expression window.

15

Then open the expression properties window, click on the blue arrow next button and we are presented with the rules writer, which allows us to write any kind of rule. You can see the functions drop down menu has a large selection of functions that can be used for writing rules such as logical, conversion, date/time, name and address parsing, math, etc.

17

In this example we have a very simple concatenation so we will write the rule starting with Name, then a comma, then State, then a space, then the Zip Code, which is an integer. Since we are doing a concatenation of the strings we will use a conversion function to convert the Zip Code from an integer to string.

18

Click on OK and our value is ready for output. We take this value and drag and drop it to our destination. You can see the value is now in the destination.

19

At this point we can do a preview and see how our data is really going to work. You can see that the Name, State, and Zip Code have been combined the way we wanted: Name, comma, State, space, Zip Code. This is how you can write simple rules and simple calculations for data conversion.

20

Next we want to create a function. We start by dragging and dropping a function from the toolbox onto the designer.

21

We have the Name field in our join, but our destination uses FirstName and LastName fields, so we need to take the Name field and split it into FirstName and LastName. For that we will use the Name Parsing function. Click on the function properties and choose Name and Address Parsing from the drop-down menu. Then select the Parse Name function and click OK.

ParseName

 

When you expand the function, you can see that a list of possible name related field options is available.

23

Drag and drop the name field from the Join window to the left side of the function to create the input, which then we have the options on the right side for the output. Drag and drop FirstName and LastName fields from the function window to the destination.

24

When you preview, you can see that Centerprise has taken the names from the transformation and split them into first name and last name.

25

This is how you can use functions and expressions. Part 2 of this blog coming next week will explain how to route the data we have transformed to multiple destinations.

Creating an Integration Flow in Centerprise

Integration flows are the foundation of any data integration project. Centerprise Data Integrator has built-in automation features that make this oftentimes complex process so easy that non-technical business users can create flows with minimal or no IT support.

In this example we will create a simple integration flow, called a dataflow, using an Excel source and putting the data in a database table. This is a common task used often for moving data from documents to databases so that it can be used downstream for operations and business intelligence.

To create a new dataflow, go to the file menu and select new/dataflow.

Flow_1

In the toolbox on the left side, you can see items such as sources, destinations, maps, transformations, and more.

To create a source, point to your source and drag and drop it onto the designer. In this example, since the source is an Excel workbook, we will drag and drop the Excel workbook source item onto the designer.

Flow_2

Next we need to specify the properties of the source. Right click on the source to open the properties window, which presents a wizard where we can specify all the properties for the data source.

Flow_3

In this window we specify where the file path for the source is located by clicking on the File Path button and pointing to the source file in the Explorer window.

Flow_4

Move to the next page by clicking on the right arrow button in the top left corner of the source window.

Flow_5

Centerprise opens a window that shows the layout of all the source fields from the Excel file. The application automatically identifies all the fields from the source and their corresponding data types.

Flow_6

Click OK and you can see your source in the designer. Click on the chevron in the upper right corner and the window expands to show all the fields from your source.

Flow_7

Now that the source is ready, you can preview your data by right clicking and selecting Preview Data. Centerprise has read the data from the Excel source and at the bottom of the window you can see how it looks inside the source.

Flow_8

Next we want to create a destination. We go to the destination table and from the toolbox drag and drop the table destination onto the designer.

Flow_9

Again, right click on Properties and in the Options dialog box you specify your credentials and the location of your database table.

Flow_10

Here you choose which type of database you are working with depending on your destination and input your credentials for that database type. In this example, we select the SQL server and input our credentials (or you can choose a recently used connection), then we click the test connection button to ensure that the connection works.

Flow_11

Now we move to the next page by clicking on the right arrow button in the top left corner. This opens a window that asks for information about the table into which we are going to write. We can choose an existing table or create a new table. In this case we will create a new table and leave the default options.

Flow_12

Again, click the right arrow to go to the next page, which shows us the layout of the destination.

Flow_13

Now our source as well as our destination is ready and we will map the two together. For the mapping, we will use the auto mapping feature of Centerprise. To do this, we drag and drop the entire source node at the top of the input to the output.

Flow_14

You can see that Centerprise has automatically created all the maps and that for each field in the source there is a line that goes to the matching field in the destination. This very simple map from source to destination will take the data as it is in the source and put it in the destination.

Flow_15

We have just created a simple dataflow by mapping our fields from our source Excel file to our destination database. Now we will give it a name and save it on our system so we can go ahead and run the dataflow.

For that, we use a very simple method. On the top left of the screen we click the drop down list next to Servers, which will show all the servers installed on the machine. In this case we choose the server Development.

Flow_16

We click on the green arrow to the right to start the dataflow.

Flow_17

At the bottom of the page the Job Progress window will show you the progress.

Flow_18

Click on the database button and you can see the results of your dataflow. This example shows that there were 83 records and they were all processed to the database destination with no errors.

Flow_19

That’s how easy it is to create a simple dataflow in Centerprise. The capabilities of the software extend far beyond simple processes to encompass the most complex of structured and unstructured data sources. Our next blog will show you how Centerprise can be used to create more complex dataflows.

Parameterization in Centerprise Data Integrator

Parameters play a very important role in reusability and configurability of dataflows. An extensive parameterization capability ensures that dataflows and workflows can be invoked in multiple situations, saving time and enhancing return on investment.

A common scenario would be if you wanted to use an existing dataflow for a file that has the same structure but data from a different source. This would be the perfect opportunity to use parameters.

In this example, we will change the source file to a different file and change the parameters to specify an effective date for our data quality rules.

We begin by dragging and dropping the parameter onto the dataflow, then open the parameter property dialog box.

Param_fig1

We specify a new parameter and call it “effective date.” Chose the data type and give it a default value of December 31.

Param_fig2

Once the specifications are set, the parameter is available for mapping.

Param_fig3

In this example the data quality rule was working on property tax and checking whether the property tax was zero or not.

Param_fig4

Now we want to add an effective date. We want to apply this parameter to our data quality rule to say that it won’t start until the effective date is matched and we want to specify this effective date from outside the dataflow. So we go ahead and do the mapping so the data quality rule has the effective date. Next, we go to the data quality rules dialog box and check “if effective date is greater than today, then always return true, otherwise, check for this rule.”

Param_fig5

That means that it is going to check this rule only when it becomes effective. You can specify any effective date from outside now and control its behavior, so this data quality rule is now dependent on a specific date.

We can then take this file and in the job scheduler schedule a new job and point to the newly created dataflow with parameters. When we go to the job parameters tab we can see all the implicit and explicit parameters.

Param_fig7

If we select our user-defined parameter, we can see the specified default value of December 31.

Param_fig8

Say we decide we don’t want this rule to be effective until March 31. We can select that date from the calendar on the right side.

Param_fig10

This tells the application not to use the data quality rule before March 31. That is how the behavior of the dataflow can be controlled from outside the dataflow.

Implicitly, the software has scanned and has figured out that the source has two file paths: loans and tax.

Param_fig11

I can point to a different file and change to a different file path.

Param_fig12

The same thing can be done on the destination side, enabling you to use the same flow for a totally different set of data.

You can see parameterization and other useful getting started videos on Astera TV at http://www.astera.com/astera-tv/getting-started.

ReportMiner Has Been Named a 2015 Trend-Setting Product by KMWorld Magazine

KM World Trend Setting Product 2015We are excited to share with our blog readers that our industry-leading ReportMiner data extraction software has been named a 2015 Trend-Setting Product by KMWorld Magazine!

KMWorld Editor-in-Chief Hugh McKellar commented that, “In each and every case, the thoughtfulness and elegance of the software certainly warrants deep examination. Depending on customer needs, the products on the list can dramatically boost organizational performance. The products identified fulfill the ultimate goal of knowledge management—delivering the right information to the right people at the right time.”

The panel, which consists of editorial colleagues, market and technology analysts, KM theoreticians, practitioners, customers and a select few savvy users (in a variety of disciplines), reviewed more than 200 vendors, whose combined product lineups include more than 1,000 separate offerings.

ReportMiner’s user-friendly interface enables business users with little or no technical background to easily accomplish a wide range of data extraction tasks without employing expensive IT resources. Smart features such as automated name and address parsing and auto creation of data extraction patternsautomate many time-consuming manual tasks, saving time and increasing data quality. You can find out more at astera.com/reportminer-data-extraction-software.

We Have a Winner!

contest winnerWe just finished our second campaign to post customer reviews on a software reviews portal and we are excited to announce that Phil Nacamuli of Leximantix has won the iPad drawing. We had a great response to this campaign and want to thank all of our customers who took the time to post a review of Centerprise or ReportMiner.

You can access all of our reviews on our customer testimonial page. Here are some that stood out for us in this latest campaign:

Prasad Sunkara of Vish Group

“Centerprise at its best!”

Centerprise at its best! Easy for business users. We were able to train new employees and get them to speed within a week. Using Centerprise as an ETL tool is working out great. Especially working with uncommon data sources! The tool is easy to use, very affordable to own and returns are very high, including increased speed of delivery of data, business user-driven data delivery and rapid prototyping.

Dawn Bauer of Farmers Mutual Hail Insurance

“Fast and Simple. All-in-one package”

Centerprise is perfect for dumping data from an ODS system into a data warehouse for reporting. It is fast, easy and simple to use. You can have a dataflow up and running in mere minutes compared to some other tools on the market.

Don Smith of Software Solutions

“WOW Why did we wait soooo long?”

Centerprise allows us to create a reusable template to standardize the information that we use to validate conversions. Standardizing the reporting data from two systems is an awesome strength that can be developed out of Centerprise. We have seen that the use of this package saved us 2 hours per conversion but now since the templates are already developed it is saving us more. This reduces our lead time for conversions, and increases our efficiencies as a conversion team.

Mario Ferrer of Achievers

 “Astera Centerprise rules!”

I love Centerprise and I believe it has a lot of potential. It’s incredibly easy to use. People with no previous ETL experience can start building simple mappings very quickly. It’s incredibly easy to install and maintain. After I first installed Centerprise, I was able to start working in just a few minutes. I am particularly impressed with how much Centerprise simplifies transformations that require more work in other ETL tools. The perfect example is the SCD transformation, which handles all the logic in an Slowly Changing Dimension. Even with a market leader like Informatica, the SCD logic has to be manually built. With Centerprise this can be built in just a few minutes, and it impacts every mapping.