Creating a Complex Dataflow in Centerprise – Part 2

Part 2 – Routing Data to Multiple Destinations

In Part 1 we have learned how to use the Join transformation and different types of maps such as the Expression map and Function map. Now we will use the Router to send data into two different tables depending on the routing information.

We want to decide the destination of the loan depending on its origin state. Take the State field from the transformation window and drag and drop it into the Route, which maps it to the State field in the Route.

26

Open the router properties and go to the next page, here you can add the rules to decide the different routes. For example, for the California loans we can write a simple rule such as “State equals CA.”

27

Click on the new rule icon in the left top of the rule window and the next rule we will write is “State does not equal CA.”

28

The result in the Router window is that there are two different outgoing nodes available for mapping, CA and Out_Of_State. This enables us to put each set of data in a different table.

29

Now we can go ahead and create the destination tables for the routed loans. We need to create one for California loans and one for Out_Of_State loans.  You can drag and drop the database table destination from the toolbox onto the designer, however, you can also use a shortcut for creating a database table source or destination using the database browser. You simply select the database browser underneath the toolbox and point to one of the existing connections.

30

The browser will then show you all the existing databases, including tables and views.

31

Select the tables folder, which shows all the tables in the database and we can see the California loans table.

32

Select the California loans table, press shift, and drag and drop it onto the designer to create the destination.

33

Follow the same steps to create the Out_Of_State Loans destination.

34

If we expand the CA_Loans designation, we can see that we have all the fields and we can do our map from the Route to each destination. Drag the CA field from the router to the destination, and do the same for the Out_Of_State_Loans.

35

So with these few clicks we have created the scenario we started with in the very first image. We have the Loans and Taxes sources, we did the join, we did the calculation for the Address and Name Parse function, and, finally, we did the routing to send the loans to two different destination tables.

36

However, as you can see if you compare this to the first image at the beginning of the blog, there are a couple of lookups along with the address calculation and name parsing, as well as a data quality rule to check the data coming from the tax source.

37

If we open the preview window for the tax data, we can see that for some loans the tax is showing zero.

38

We need to check whether the source data is correct or not, so we add a data quality rule, which can be found in the toolbox transformations. Drag and drop it onto the designer and do the mapping as with any other transformation. Open the properties window and a rule can be written to specify that the property tax cannot be zero.

39

Now do a preview on the loan tax join. Since the data is passing through the data quality rule you can see that the data quality rule has flagged all the errors.

40

Another thing that has been done is to add a profile of the tax data. If you do a preview you can see that for the LoanID and the PropertyTax fields the information about the all the values has been collected.

41

Since the profile is like any other source, it can be mapped to an Excel spreadsheet destination and when the dataflow is run, it puts the information in the Excel sheet, which becomes our report. So along with doing the data transfer, a report on the tax data is provided as well.

This is a quick overview has demonstrated how to create a complex dataflow in Centerprise. In Part 1 we learned how to combine data from two sources using a join, send the data for transformation and mapping, and create a function.  In Part 2 we learned how to route our data to two different destination tables. For more information on creating workflows and subflows, visit our Astera TV Workflows and Subflows playlist.

Creating a Complex Dataflow in Centerprise – Part 1

Part 1 –Join Transformations and Functions

Our last post (Creating an Integration Flow in Centerprise) described how to create a simple dataflow in Centerprise. In this two-part blog we will show you how to build a more complex dataflow including maps, transformations, data quality rules, and data profiling.

The figure below shows a more complex dataflow.

37

In this example we are working with two source files, one contains information about home loans and the other contains information about the property tax for the corresponding home loans. We need to combine these two pieces of data and do some conversions by running some calculations on attributes. In the end we want to route the data to two different destination tables, depending on the origin of the home loan: if it is from California it goes to the California Loans table, otherwise it goes to the Out-of-State Loans table. Alongside this, we need to check the data quality for the loan data and again for the tax data. We also need to profile the tax data so that it can be sent to an Excel file and output as a report.

In order to design the dataflow shown above, we begin by clicking on the New Dataflow button to create a new dataflow. First we look at the data—both loan data and tax data. In the previous blog, Creating Simple Dataflows, we learned how to create our source simply by dragging and dropping from the toolbox onto the designer and specifying properties.  However, there is also a shortcut to create sources directly. Simply drag and drop the Loans and Tax Excel files directly from the Explorer window to the designer.

2

Centerprise does the rest. It has created the source, knows where the file comes from, and has done the layout. When you click on the chevron you can see all the data columns from the source file.

3

Click on preview and you can see all your data in the preview window.

12

Now do the same thing with the Tax file. When you preview your tax data, you can see the property tax information for each of the loans.

5

Next we want to combine the two sources. To do this we use the Join transformation. Drag and drop the Join transformation onto the designer.

6

When you click on the chevron, you can see that the transformation doesn’t have any elements.

7

We want to take all the elements from both the Loans and the Tax sources and combine them in the Join transformation. Drag and drop the Loans top node into the Join window. You can see that Centerprise has automatically created and mapped all the fields.

8

To add the two Tax fields to the join, drag and drop each field to the Join window and Centerprise automatically adds the fields and maps them.

9

Note that since there are now two LoanId fields, Centerprise has appended the one from the Tax source to LoanID_1.

Now we have all the fields required for the join. If we right click on the Join window and select Properties, we can see all the fields from both Loans and Taxes.

10

Click on the blue arrow at the top left of the window to go to the next page, where we will specify what kind of join we want. Choose a simple inner join, then in the Sort Left and Sort Right inputs specify the key that will be used for the join. For Loans it is the LoanID and for Taxes it is the LoanID_1.

11

Click OK and our join is ready. When we preview the data we can see that for each of the loans the property tax and loan information is joined.

12

So with a few clicks we have joined our two sources.

The next step is to use our join as a source for our transformation and maps.  Drag and drop the Expression Map from the toolbox onto the designer.

13

This is used to do calculations and any kind of combining of data. In this example we see that the Loans information has the Borrower Name, State, and Zip Code. We want to combine these three fields into one field and call it “Address” in our destination. Since we are going to be routing to two different destinations, our natural next step is to add a router.

Drag and drop a router from the toolbox onto the designer.  The router becomes the next destination.

14

Next, drag and drop the three fields we want to combine (Borrower Name, State, and Zip Code) from our Join window to the expression window.

15

Then open the expression properties window, click on the blue arrow next button and we are presented with the rules writer, which allows us to write any kind of rule. You can see the functions drop down menu has a large selection of functions that can be used for writing rules such as logical, conversion, date/time, name and address parsing, math, etc.

17

In this example we have a very simple concatenation so we will write the rule starting with Name, then a comma, then State, then a space, then the Zip Code, which is an integer. Since we are doing a concatenation of the strings we will use a conversion function to convert the Zip Code from an integer to string.

18

Click on OK and our value is ready for output. We take this value and drag and drop it to our destination. You can see the value is now in the destination.

19

At this point we can do a preview and see how our data is really going to work. You can see that the Name, State, and Zip Code have been combined the way we wanted: Name, comma, State, space, Zip Code. This is how you can write simple rules and simple calculations for data conversion.

20

Next we want to create a function. We start by dragging and dropping a function from the toolbox onto the designer.

21

We have the Name field in our join, but our destination uses FirstName and LastName fields, so we need to take the Name field and split it into FirstName and LastName. For that we will use the Name Parsing function. Click on the function properties and choose Name and Address Parsing from the drop-down menu. Then select the Parse Name function and click OK.

ParseName

 

When you expand the function, you can see that a list of possible name related field options is available.

23

Drag and drop the name field from the Join window to the left side of the function to create the input, which then we have the options on the right side for the output. Drag and drop FirstName and LastName fields from the function window to the destination.

24

When you preview, you can see that Centerprise has taken the names from the transformation and split them into first name and last name.

25

This is how you can use functions and expressions. Part 2 of this blog coming next week will explain how to route the data we have transformed to multiple destinations.

Creating an Integration Flow in Centerprise

Integration flows are the foundation of any data integration project. Centerprise Data Integrator has built-in automation features that make this oftentimes complex process so easy that non-technical business users can create flows with minimal or no IT support.

In this example we will create a simple integration flow, called a dataflow, using an Excel source and putting the data in a database table. This is a common task used often for moving data from documents to databases so that it can be used downstream for operations and business intelligence.

To create a new dataflow, go to the file menu and select new/dataflow.

Flow_1

In the toolbox on the left side, you can see items such as sources, destinations, maps, transformations, and more.

To create a source, point to your source and drag and drop it onto the designer. In this example, since the source is an Excel workbook, we will drag and drop the Excel workbook source item onto the designer.

Flow_2

Next we need to specify the properties of the source. Right click on the source to open the properties window, which presents a wizard where we can specify all the properties for the data source.

Flow_3

In this window we specify where the file path for the source is located by clicking on the File Path button and pointing to the source file in the Explorer window.

Flow_4

Move to the next page by clicking on the right arrow button in the top left corner of the source window.

Flow_5

Centerprise opens a window that shows the layout of all the source fields from the Excel file. The application automatically identifies all the fields from the source and their corresponding data types.

Flow_6

Click OK and you can see your source in the designer. Click on the chevron in the upper right corner and the window expands to show all the fields from your source.

Flow_7

Now that the source is ready, you can preview your data by right clicking and selecting Preview Data. Centerprise has read the data from the Excel source and at the bottom of the window you can see how it looks inside the source.

Flow_8

Next we want to create a destination. We go to the destination table and from the toolbox drag and drop the table destination onto the designer.

Flow_9

Again, right click on Properties and in the Options dialog box you specify your credentials and the location of your database table.

Flow_10

Here you choose which type of database you are working with depending on your destination and input your credentials for that database type. In this example, we select the SQL server and input our credentials (or you can choose a recently used connection), then we click the test connection button to ensure that the connection works.

Flow_11

Now we move to the next page by clicking on the right arrow button in the top left corner. This opens a window that asks for information about the table into which we are going to write. We can choose an existing table or create a new table. In this case we will create a new table and leave the default options.

Flow_12

Again, click the right arrow to go to the next page, which shows us the layout of the destination.

Flow_13

Now our source as well as our destination is ready and we will map the two together. For the mapping, we will use the auto mapping feature of Centerprise. To do this, we drag and drop the entire source node at the top of the input to the output.

Flow_14

You can see that Centerprise has automatically created all the maps and that for each field in the source there is a line that goes to the matching field in the destination. This very simple map from source to destination will take the data as it is in the source and put it in the destination.

Flow_15

We have just created a simple dataflow by mapping our fields from our source Excel file to our destination database. Now we will give it a name and save it on our system so we can go ahead and run the dataflow.

For that, we use a very simple method. On the top left of the screen we click the drop down list next to Servers, which will show all the servers installed on the machine. In this case we choose the server Development.

Flow_16

We click on the green arrow to the right to start the dataflow.

Flow_17

At the bottom of the page the Job Progress window will show you the progress.

Flow_18

Click on the database button and you can see the results of your dataflow. This example shows that there were 83 records and they were all processed to the database destination with no errors.

Flow_19

That’s how easy it is to create a simple dataflow in Centerprise. The capabilities of the software extend far beyond simple processes to encompass the most complex of structured and unstructured data sources. Our next blog will show you how Centerprise can be used to create more complex dataflows.

Parameterization in Centerprise Data Integrator

Parameters play a very important role in reusability and configurability of dataflows. An extensive parameterization capability ensures that dataflows and workflows can be invoked in multiple situations, saving time and enhancing return on investment.

A common scenario would be if you wanted to use an existing dataflow for a file that has the same structure but data from a different source. This would be the perfect opportunity to use parameters.

In this example, we will change the source file to a different file and change the parameters to specify an effective date for our data quality rules.

We begin by dragging and dropping the parameter onto the dataflow, then open the parameter property dialog box.

Param_fig1

We specify a new parameter and call it “effective date.” Chose the data type and give it a default value of December 31.

Param_fig2

Once the specifications are set, the parameter is available for mapping.

Param_fig3

In this example the data quality rule was working on property tax and checking whether the property tax was zero or not.

Param_fig4

Now we want to add an effective date. We want to apply this parameter to our data quality rule to say that it won’t start until the effective date is matched and we want to specify this effective date from outside the dataflow. So we go ahead and do the mapping so the data quality rule has the effective date. Next, we go to the data quality rules dialog box and check “if effective date is greater than today, then always return true, otherwise, check for this rule.”

Param_fig5

That means that it is going to check this rule only when it becomes effective. You can specify any effective date from outside now and control its behavior, so this data quality rule is now dependent on a specific date.

We can then take this file and in the job scheduler schedule a new job and point to the newly created dataflow with parameters. When we go to the job parameters tab we can see all the implicit and explicit parameters.

Param_fig7

If we select our user-defined parameter, we can see the specified default value of December 31.

Param_fig8

Say we decide we don’t want this rule to be effective until March 31. We can select that date from the calendar on the right side.

Param_fig10

This tells the application not to use the data quality rule before March 31. That is how the behavior of the dataflow can be controlled from outside the dataflow.

Implicitly, the software has scanned and has figured out that the source has two file paths: loans and tax.

Param_fig11

I can point to a different file and change to a different file path.

Param_fig12

The same thing can be done on the destination side, enabling you to use the same flow for a totally different set of data.

You can see parameterization and other useful getting started videos on Astera TV at http://www.astera.com/astera-tv/getting-started.

Centerprise Best Practices: Working With the High Volume Data Warehouse

Data warehouses and data marts provide the business intelligence needed for timely and accurate business decisions. But data warehousing comes with a unique set of challenges revolving around huge volumes of data and maintenance of data keys.
warehouse bp wpCenterprise is the ideal solution for transferring and transforming large amounts of records from a transactional database to a data warehouse. It provides all the functionality needed for today’s demanding data storage and analysis requirements, including sophisticated ETL features that ensure data quality, superior performance, usability, scalability, change data capture for fast throughput, and wide connectivity to popular databases and file formats.

A new whitepaper from Astera provides best practices to be kept in mind during the entirety of the development process in order to make certain data warehousing projects will be successful. Topics include data quality, data profiling, validation, logging, translating into star schema, options for related tables, and performance considerations. Download your free copy here!

Rule-Based Filtering for Export in ReportMiner

Often when exporting data from an extraction process, only certain information is needed. It can be a time-consuming and complex process to export all the extracted data and then delete the unwanted data from the destination.

ReportMiner solves this problem in a quick and easy way with its rule-based filtering for export feature. All you need to do is create your export setting and then type in your rule-based filter in the expression window as shown shown in the figure below, then verify the rule by clicking on the compile button. In this case, the user only wanted to export data for sofas, so the expression is ITEM = SOFA.

Screen Shot 2015-09-08 at 1.41.20 PM

ReportMiner will export only the records that meet the criteria of your expression. In this case, two records that pertain to sofas were exported.

Screen Shot 2015-09-08 at 1.43.08 PM

To learn more about this feature, view our Rule-Based Filtering From Export Settings video, part of our ReportMiner Tutorial Series at http://www.astera.com/report-miner/tutorial-center.

Saving Time and Ensuring Data Quality with ReportMiner Automatic Name and Address Parsing

Many times people have a single address field from a data source that has all the address information in the one field. They need to parse out the individual sections of the address into separate fields so it can be loaded into a database and/or combined with information from different sources. Often there are thousands of records that need to be parsed and to do this manually is a time-consuming and error-prone task, putting your data quality and reliability at risk.

Astera’s ReportMiner data extraction software automatically parses name and address data with a few simple clicks, ensuring your data quality and saving you resource time and money.

ReportMiner breaks up name and address data into separate components such as Name: prefix, first, middle, last, suffix and Address: street, suite, city, state, zip, country.

Once your Data Region has been created, you simply highlight the name area, right click and select “Add Name Field.” You do the same for addresses: Highlight the address area, right click and select “Add Address Field (US).” ReportMiner will automatically create your name and address fields by breaking them up into individual fields.

Rmss2015-07-07_1436

For more information on creating data regions and fields in ReportMiner, check out our blog Smart Data Extraction with ReportMiner: Automating Creation of Extraction Models.

Extract Valuable Data from PDFs With ReportMiner

PDF (portable document format) files were developed in the early 1990s to enable computer users with different platforms and software tools to share documents with a fixed layout of text and graphics. Because they are independent of application software, hardware, and operating systems, PDFs have become a popular way to share documents. All that is needed is a PDF reader, available for free download on the Internet.

In this day and age, however, data lives on, even if it’s trapped inside a PDF. Businesses need PDF data to combine with other data and use in spreadsheets or databases, and integrate it with other applications or use it for business intelligence.

Astera’s ReportMiner data extraction software offers many capabilities for PDF data extraction in an easy-to-use interface that doesn’t require code writing. The tool enables users to easily extract data by simply creating an extraction layout and exporting to the destination of their choice. ReportMiner does all the heavy lifting by automatically recognizing data patterns and creating necessary data regions and fields.

In addition, users are able to use their extracted data to take advantage of product’s advanced transformation, quality, and scrubbing features.

To extract information from a PDF file in ReportMiner, simply upload a pdf and create a report model by selecting what needs to be extracted and specifying a pattern within the report.

ReportMiner also has a preview feature so that users can make sure everything is being extracted as intended. Once the layout is complete, users have the option to export to Excel, CSV, or a chosen database. The report model can also be opened in a dataflow to apply transformations to the data.

For more information on specifying regions and fields and exporting data, check out these blogs:

Smart Data Extraction with ReportMiner: Automating Creation of Extraction Models

Exporting Data in ReportMiner

Human-Readable Reports and the Data Trapped Within

data extraction wpOften reports are produced with the intention that they will be printed and read by human eyes. In today’s data-driven world, however, some or all of that physical data needs to be transformed into electronic data that can be integrated into enterprise applications for operational and business intelligence use.

IT is usually tasked with extracting the important data trapped within human-readable reports. This complex process involves coding and writing scripts that identify data patterns in the underlying reports. Since typically the requirements for what data needs to be extracted and how it will be used comes from the business side of the enterprise, the process also involves multiple back-and-forth rounds between the business department and IT.

There must be an easier way. How about a software-based solution that automatically extracts the desired data and can be used by the business department with little or no IT involvement? What would such a solution look like?

Our new whitepaper describes the anatomy of a solution that eliminates the complexity of traditional data extraction methods. You can download it free here.

 

Exporting Data in ReportMiner

Once you’ve built your extraction model in ReportMiner to extract desired data from unstructured sources such as printed or spool documents, you need to send it somewhere so that it becomes meaningful and useful to your business.   With ReportMiner, you can map and export data to almost any destination you want, including databases like SQL Server, Access, MySQL, PostgreSQL, and any ODBC-compatible database, as well as formats such as fixed length, delimited, Excel, and XML.

In this blog we’ll show you how to quickly export your data to an Excel file, where you’ll be able to analyze it, add it to your database, and if it is important over the long term, to your data warehouse.

After you’ve prepared your extraction model, you need to make sure everything is set up correctly for exporting. You do this by selecting the Preview icon and checking in the preview window to make sure everything is the way you’d like it to look.

When you are sure your setup is correct, choose the Excel icon: Create New Export Setting and Run (to Excel).

The pop up window will enable you to specify where you want to save the file.

Once you’ve saved the file, there are options available, including First Row Contains Header, specifying the worksheet if you have multiple worksheets in your Excel file, or appending to a file that already exists.

After selecting your options, a new popup will allow you to change the layout of your file. For example, you might want to make changes to the column header such as spacing, change the order of the fields, or change formats such as date format.

After exporting, you can check to make sure all your records were exported in the progress window on the left side. On the right side, there is a link to open the file in Excel.

There you can see all of your exported data in spreadsheet format and check to make sure all your changes to headers, field position, etc. are reflected in the exported data.

That’s it! You’re done!