Creating a Complex Dataflow in Centerprise – Part 1

Part 1 –Join Transformations and Functions

Our last post (Creating an Integration Flow in Centerprise) described how to create a simple dataflow in Centerprise. In this two-part blog we will show you how to build a more complex dataflow including maps, transformations, data quality rules, and data profiling.

The figure below shows a more complex dataflow.


In this example we are working with two source files, one contains information about home loans and the other contains information about the property tax for the corresponding home loans. We need to combine these two pieces of data and do some conversions by running some calculations on attributes. In the end we want to route the data to two different destination tables, depending on the origin of the home loan: if it is from California it goes to the California Loans table, otherwise it goes to the Out-of-State Loans table. Alongside this, we need to check the data quality for the loan data and again for the tax data. We also need to profile the tax data so that it can be sent to an Excel file and output as a report.

In order to design the dataflow shown above, we begin by clicking on the New Dataflow button to create a new dataflow. First we look at the data—both loan data and tax data. In the previous blog, Creating Simple Dataflows, we learned how to create our source simply by dragging and dropping from the toolbox onto the designer and specifying properties.  However, there is also a shortcut to create sources directly. Simply drag and drop the Loans and Tax Excel files directly from the Explorer window to the designer.


Centerprise does the rest. It has created the source, knows where the file comes from, and has done the layout. When you click on the chevron you can see all the data columns from the source file.


Click on preview and you can see all your data in the preview window.


Now do the same thing with the Tax file. When you preview your tax data, you can see the property tax information for each of the loans.


Next we want to combine the two sources. To do this we use the Join transformation. Drag and drop the Join transformation onto the designer.


When you click on the chevron, you can see that the transformation doesn’t have any elements.


We want to take all the elements from both the Loans and the Tax sources and combine them in the Join transformation. Drag and drop the Loans top node into the Join window. You can see that Centerprise has automatically created and mapped all the fields.


To add the two Tax fields to the join, drag and drop each field to the Join window and Centerprise automatically adds the fields and maps them.


Note that since there are now two LoanId fields, Centerprise has appended the one from the Tax source to LoanID_1.

Now we have all the fields required for the join. If we right click on the Join window and select Properties, we can see all the fields from both Loans and Taxes.


Click on the blue arrow at the top left of the window to go to the next page, where we will specify what kind of join we want. Choose a simple inner join, then in the Sort Left and Sort Right inputs specify the key that will be used for the join. For Loans it is the LoanID and for Taxes it is the LoanID_1.


Click OK and our join is ready. When we preview the data we can see that for each of the loans the property tax and loan information is joined.


So with a few clicks we have joined our two sources.

The next step is to use our join as a source for our transformation and maps.  Drag and drop the Expression Map from the toolbox onto the designer.


This is used to do calculations and any kind of combining of data. In this example we see that the Loans information has the Borrower Name, State, and Zip Code. We want to combine these three fields into one field and call it “Address” in our destination. Since we are going to be routing to two different destinations, our natural next step is to add a router.

Drag and drop a router from the toolbox onto the designer.  The router becomes the next destination.


Next, drag and drop the three fields we want to combine (Borrower Name, State, and Zip Code) from our Join window to the expression window.


Then open the expression properties window, click on the blue arrow next button and we are presented with the rules writer, which allows us to write any kind of rule. You can see the functions drop down menu has a large selection of functions that can be used for writing rules such as logical, conversion, date/time, name and address parsing, math, etc.


In this example we have a very simple concatenation so we will write the rule starting with Name, then a comma, then State, then a space, then the Zip Code, which is an integer. Since we are doing a concatenation of the strings we will use a conversion function to convert the Zip Code from an integer to string.


Click on OK and our value is ready for output. We take this value and drag and drop it to our destination. You can see the value is now in the destination.


At this point we can do a preview and see how our data is really going to work. You can see that the Name, State, and Zip Code have been combined the way we wanted: Name, comma, State, space, Zip Code. This is how you can write simple rules and simple calculations for data conversion.


Next we want to create a function. We start by dragging and dropping a function from the toolbox onto the designer.


We have the Name field in our join, but our destination uses FirstName and LastName fields, so we need to take the Name field and split it into FirstName and LastName. For that we will use the Name Parsing function. Click on the function properties and choose Name and Address Parsing from the drop-down menu. Then select the Parse Name function and click OK.



When you expand the function, you can see that a list of possible name related field options is available.


Drag and drop the name field from the Join window to the left side of the function to create the input, which then we have the options on the right side for the output. Drag and drop FirstName and LastName fields from the function window to the destination.


When you preview, you can see that Centerprise has taken the names from the transformation and split them into first name and last name.


This is how you can use functions and expressions. Part 2 of this blog coming next week will explain how to route the data we have transformed to multiple destinations.

Creating an Integration Flow in Centerprise

Integration flows are the foundation of any data integration project. Centerprise Data Integrator has built-in automation features that make this oftentimes complex process so easy that non-technical business users can create flows with minimal or no IT support.

In this example we will create a simple integration flow, called a dataflow, using an Excel source and putting the data in a database table. This is a common task used often for moving data from documents to databases so that it can be used downstream for operations and business intelligence.

To create a new dataflow, go to the file menu and select new/dataflow.


In the toolbox on the left side, you can see items such as sources, destinations, maps, transformations, and more.

To create a source, point to your source and drag and drop it onto the designer. In this example, since the source is an Excel workbook, we will drag and drop the Excel workbook source item onto the designer.


Next we need to specify the properties of the source. Right click on the source to open the properties window, which presents a wizard where we can specify all the properties for the data source.


In this window we specify where the file path for the source is located by clicking on the File Path button and pointing to the source file in the Explorer window.


Move to the next page by clicking on the right arrow button in the top left corner of the source window.


Centerprise opens a window that shows the layout of all the source fields from the Excel file. The application automatically identifies all the fields from the source and their corresponding data types.


Click OK and you can see your source in the designer. Click on the chevron in the upper right corner and the window expands to show all the fields from your source.


Now that the source is ready, you can preview your data by right clicking and selecting Preview Data. Centerprise has read the data from the Excel source and at the bottom of the window you can see how it looks inside the source.


Next we want to create a destination. We go to the destination table and from the toolbox drag and drop the table destination onto the designer.


Again, right click on Properties and in the Options dialog box you specify your credentials and the location of your database table.


Here you choose which type of database you are working with depending on your destination and input your credentials for that database type. In this example, we select the SQL server and input our credentials (or you can choose a recently used connection), then we click the test connection button to ensure that the connection works.


Now we move to the next page by clicking on the right arrow button in the top left corner. This opens a window that asks for information about the table into which we are going to write. We can choose an existing table or create a new table. In this case we will create a new table and leave the default options.


Again, click the right arrow to go to the next page, which shows us the layout of the destination.


Now our source as well as our destination is ready and we will map the two together. For the mapping, we will use the auto mapping feature of Centerprise. To do this, we drag and drop the entire source node at the top of the input to the output.


You can see that Centerprise has automatically created all the maps and that for each field in the source there is a line that goes to the matching field in the destination. This very simple map from source to destination will take the data as it is in the source and put it in the destination.


We have just created a simple dataflow by mapping our fields from our source Excel file to our destination database. Now we will give it a name and save it on our system so we can go ahead and run the dataflow.

For that, we use a very simple method. On the top left of the screen we click the drop down list next to Servers, which will show all the servers installed on the machine. In this case we choose the server Development.


We click on the green arrow to the right to start the dataflow.


At the bottom of the page the Job Progress window will show you the progress.


Click on the database button and you can see the results of your dataflow. This example shows that there were 83 records and they were all processed to the database destination with no errors.


That’s how easy it is to create a simple dataflow in Centerprise. The capabilities of the software extend far beyond simple processes to encompass the most complex of structured and unstructured data sources. Our next blog will show you how Centerprise can be used to create more complex dataflows.

Parameterization in Centerprise Data Integrator

Parameters play a very important role in reusability and configurability of dataflows. An extensive parameterization capability ensures that dataflows and workflows can be invoked in multiple situations, saving time and enhancing return on investment.

A common scenario would be if you wanted to use an existing dataflow for a file that has the same structure but data from a different source. This would be the perfect opportunity to use parameters.

In this example, we will change the source file to a different file and change the parameters to specify an effective date for our data quality rules.

We begin by dragging and dropping the parameter onto the dataflow, then open the parameter property dialog box.


We specify a new parameter and call it “effective date.” Chose the data type and give it a default value of December 31.


Once the specifications are set, the parameter is available for mapping.


In this example the data quality rule was working on property tax and checking whether the property tax was zero or not.


Now we want to add an effective date. We want to apply this parameter to our data quality rule to say that it won’t start until the effective date is matched and we want to specify this effective date from outside the dataflow. So we go ahead and do the mapping so the data quality rule has the effective date. Next, we go to the data quality rules dialog box and check “if effective date is greater than today, then always return true, otherwise, check for this rule.”


That means that it is going to check this rule only when it becomes effective. You can specify any effective date from outside now and control its behavior, so this data quality rule is now dependent on a specific date.

We can then take this file and in the job scheduler schedule a new job and point to the newly created dataflow with parameters. When we go to the job parameters tab we can see all the implicit and explicit parameters.


If we select our user-defined parameter, we can see the specified default value of December 31.


Say we decide we don’t want this rule to be effective until March 31. We can select that date from the calendar on the right side.


This tells the application not to use the data quality rule before March 31. That is how the behavior of the dataflow can be controlled from outside the dataflow.

Implicitly, the software has scanned and has figured out that the source has two file paths: loans and tax.


I can point to a different file and change to a different file path.


The same thing can be done on the destination side, enabling you to use the same flow for a totally different set of data.

You can see parameterization and other useful getting started videos on Astera TV at

ReportMiner Has Been Named a 2015 Trend-Setting Product by KMWorld Magazine

KM World Trend Setting Product 2015We are excited to share with our blog readers that our industry-leading ReportMiner data extraction software has been named a 2015 Trend-Setting Product by KMWorld Magazine!

KMWorld Editor-in-Chief Hugh McKellar commented that, “In each and every case, the thoughtfulness and elegance of the software certainly warrants deep examination. Depending on customer needs, the products on the list can dramatically boost organizational performance. The products identified fulfill the ultimate goal of knowledge management—delivering the right information to the right people at the right time.”

The panel, which consists of editorial colleagues, market and technology analysts, KM theoreticians, practitioners, customers and a select few savvy users (in a variety of disciplines), reviewed more than 200 vendors, whose combined product lineups include more than 1,000 separate offerings.

ReportMiner’s user-friendly interface enables business users with little or no technical background to easily accomplish a wide range of data extraction tasks without employing expensive IT resources. Smart features such as automated name and address parsing and auto creation of data extraction patternsautomate many time-consuming manual tasks, saving time and increasing data quality. You can find out more at

We Have a Winner!

contest winnerWe just finished our second campaign to post customer reviews on a software reviews portal and we are excited to announce that Phil Nacamuli of Leximantix has won the iPad drawing. We had a great response to this campaign and want to thank all of our customers who took the time to post a review of Centerprise or ReportMiner.

You can access all of our reviews on our customer testimonial page. Here are some that stood out for us in this latest campaign:

Prasad Sunkara of Vish Group

“Centerprise at its best!”

Centerprise at its best! Easy for business users. We were able to train new employees and get them to speed within a week. Using Centerprise as an ETL tool is working out great. Especially working with uncommon data sources! The tool is easy to use, very affordable to own and returns are very high, including increased speed of delivery of data, business user-driven data delivery and rapid prototyping.

Dawn Bauer of Farmers Mutual Hail Insurance

“Fast and Simple. All-in-one package”

Centerprise is perfect for dumping data from an ODS system into a data warehouse for reporting. It is fast, easy and simple to use. You can have a dataflow up and running in mere minutes compared to some other tools on the market.

Don Smith of Software Solutions

“WOW Why did we wait soooo long?”

Centerprise allows us to create a reusable template to standardize the information that we use to validate conversions. Standardizing the reporting data from two systems is an awesome strength that can be developed out of Centerprise. We have seen that the use of this package saved us 2 hours per conversion but now since the templates are already developed it is saving us more. This reduces our lead time for conversions, and increases our efficiencies as a conversion team.

Mario Ferrer of Achievers

 “Astera Centerprise rules!”

I love Centerprise and I believe it has a lot of potential. It’s incredibly easy to use. People with no previous ETL experience can start building simple mappings very quickly. It’s incredibly easy to install and maintain. After I first installed Centerprise, I was able to start working in just a few minutes. I am particularly impressed with how much Centerprise simplifies transformations that require more work in other ETL tools. The perfect example is the SCD transformation, which handles all the logic in an Slowly Changing Dimension. Even with a market leader like Informatica, the SCD logic has to be manually built. With Centerprise this can be built in just a few minutes, and it impacts every mapping.

Centerprise Best Practices: Working With the High Volume Data Warehouse

Data warehouses and data marts provide the business intelligence needed for timely and accurate business decisions. But data warehousing comes with a unique set of challenges revolving around huge volumes of data and maintenance of data keys.
warehouse bp wpCenterprise is the ideal solution for transferring and transforming large amounts of records from a transactional database to a data warehouse. It provides all the functionality needed for today’s demanding data storage and analysis requirements, including sophisticated ETL features that ensure data quality, superior performance, usability, scalability, change data capture for fast throughput, and wide connectivity to popular databases and file formats.

A new whitepaper from Astera provides best practices to be kept in mind during the entirety of the development process in order to make certain data warehousing projects will be successful. Topics include data quality, data profiling, validation, logging, translating into star schema, options for related tables, and performance considerations. Download your free copy here!

Rule-Based Filtering for Export in ReportMiner

Often when exporting data from an extraction process, only certain information is needed. It can be a time-consuming and complex process to export all the extracted data and then delete the unwanted data from the destination.

ReportMiner solves this problem in a quick and easy way with its rule-based filtering for export feature. All you need to do is create your export setting and then type in your rule-based filter in the expression window as shown shown in the figure below, then verify the rule by clicking on the compile button. In this case, the user only wanted to export data for sofas, so the expression is ITEM = SOFA.

Screen Shot 2015-09-08 at 1.41.20 PM

ReportMiner will export only the records that meet the criteria of your expression. In this case, two records that pertain to sofas were exported.

Screen Shot 2015-09-08 at 1.43.08 PM

To learn more about this feature, view our Rule-Based Filtering From Export Settings video, part of our ReportMiner Tutorial Series at

Saving Time and Ensuring Data Quality with ReportMiner Automatic Name and Address Parsing

Many times people have a single address field from a data source that has all the address information in the one field. They need to parse out the individual sections of the address into separate fields so it can be loaded into a database and/or combined with information from different sources. Often there are thousands of records that need to be parsed and to do this manually is a time-consuming and error-prone task, putting your data quality and reliability at risk.

Astera’s ReportMiner data extraction software automatically parses name and address data with a few simple clicks, ensuring your data quality and saving you resource time and money.

ReportMiner breaks up name and address data into separate components such as Name: prefix, first, middle, last, suffix and Address: street, suite, city, state, zip, country.

Once your Data Region has been created, you simply highlight the name area, right click and select “Add Name Field.” You do the same for addresses: Highlight the address area, right click and select “Add Address Field (US).” ReportMiner will automatically create your name and address fields by breaking them up into individual fields.


For more information on creating data regions and fields in ReportMiner, check out our blog Smart Data Extraction with ReportMiner: Automating Creation of Extraction Models.

Extract Valuable Data from PDFs With ReportMiner

PDF (portable document format) files were developed in the early 1990s to enable computer users with different platforms and software tools to share documents with a fixed layout of text and graphics. Because they are independent of application software, hardware, and operating systems, PDFs have become a popular way to share documents. All that is needed is a PDF reader, available for free download on the Internet.

In this day and age, however, data lives on, even if it’s trapped inside a PDF. Businesses need PDF data to combine with other data and use in spreadsheets or databases, and integrate it with other applications or use it for business intelligence.

Astera’s ReportMiner data extraction software offers many capabilities for PDF data extraction in an easy-to-use interface that doesn’t require code writing. The tool enables users to easily extract data by simply creating an extraction layout and exporting to the destination of their choice. ReportMiner does all the heavy lifting by automatically recognizing data patterns and creating necessary data regions and fields.

In addition, users are able to use their extracted data to take advantage of product’s advanced transformation, quality, and scrubbing features.

To extract information from a PDF file in ReportMiner, simply upload a pdf and create a report model by selecting what needs to be extracted and specifying a pattern within the report.

ReportMiner also has a preview feature so that users can make sure everything is being extracted as intended. Once the layout is complete, users have the option to export to Excel, CSV, or a chosen database. The report model can also be opened in a dataflow to apply transformations to the data.

For more information on specifying regions and fields and exporting data, check out these blogs:

Smart Data Extraction with ReportMiner: Automating Creation of Extraction Models

Exporting Data in ReportMiner