Dataflows: The Cornerstone of Data Integration Examples 6-20

Example 11. This subflow contains some complex logic to process dates. Most of the Input fields are passed through with no additional processing from Subflow Input to Subflow Output.  The date fields, on the other hand, are processed using a set of expressions, date functions, and lookup lists to create a custom date representation as required by the application.

Example 12. This dataflow calls a subflow (Subflow1) to process customer names. The customer names are fed from a database table, and passed to the subflow containing the name parsing logic. The parsed names coming out of the subflow are written to a delimited file. The subflow acts as a black box hiding the inner logic from the calling dataflow. This design simplifies the overall visual diagram, allowing the architect to focus on what the subflow does instead of how it does it.

Example 13. This dataflow reads a set of related database tables using a multi-table query (aka data model).  It then applies a data quality rule to filter out unwanted records from the parent source table. The remaining records are deleted from a destination table (this is achieved by mapping to the Input_Delete node on the destination database object). Finally, the records are also inserted into a second database table (DatabaseDest2 object).

Example 14. This is a simple dataflow reading two delimited files. A single Field Loan_ID is read from the first source and written to a private message queue (MQDest1). The field is fed to the message queue ‘as is’ since there are no additional transformations as it travels from source to destination.

In contrast, all fields are read from the second source and then inserted to a database table. Like with the other source, the fields are written ‘as is’ since they are connected from source to destination using direct maps.

 

Example 15. This dataflow reads database data using a custom SQL query (SQLQuerySource1). The fields are then written to a DB2 table. Additional processing is applied to DEPENDENTCARE field using a custom expression (Ex_Start_Date) object. A constant value is written to SKIPPED_COUNT field and dynamic parameter values for Collection_Name and Package_Name are written to their corresponding destination fields. The same record is also written to an Excel file. A data profile is created by the Profile1 object to capture the status of each record, including any successes, errors, warnings or other conditions.

 Example 16. This dataflow transforms a hierarchical XML source into three flat destinations: Excel file, delimited file and database table. No additional processing or transformations are applied to the fields as they are written to destination (the fields are mapped using direct mapping). Notice that each destination receives data from a different node inside the hierarchical source structure.  As a result, there will likely be different number of records written to each destination. Also notice that the database destination table receives fields from two nodes: row and val, in fact creating a join of sorts on the incoming data.

 

Example 17. This simple dataflow reads the contents of each file from the selected folder and writes it to a database table. The entire stream containing a file’s content is written to Value field in the destination table. An additional field FullPath is used to store the network path of the file whose content is being written to value field in the table. The reading of the file’s content is done by the built-in ReadFileBytes function. This function accepts file path as the input parameter and returns the file’s content in its output value.

 

Example 18.  This dataflow reads a database table, filters records with City = ‘Los Angeles’, and updates those records in a destination database table. Source database connection is managed by a shared connection object DbConnection1.  This setup lets the user quickly update the connection as needed when deploying the dataflow in a new environment.

 

Example 19. This dataflow reads a report file, and applies a set of expressions and built-in functions to process AccountNumber, FileDate, and Amount fields. A set of parameters with default values is configured to feed into ImportDescription expression, as well as write directly to several destination fields. AccountNumber field is processed by IsNumeric function to determine whether it is a number, and then converted to Integer data type before being written to the AccountNumber destination. FileDate field is processed by ParseDate function that extracts Year, Month and Date from the date value. The output of the ParseDate function is fed into FormatDate function to convert the date to a desired format.  Finally, the transformed values are written to a delimited file using a custom field-to-field mapping.

Example 20. In this dataflow, data comes from two sources: a flat file and a database table. The incoming data from the two sources is merged into a single record set, which is then sorted and finally processed by a database write strategy that controls what action, such as insert, update, delete or something else, should take place for each record going into the destination database table. A designated field in the destination database table also receives some constant value for all records coming from the database write strategy.

The data from the two sources is also fed into a second destination, which is an XML file. The XML file receives a snapshot of all records merged from the dataflow sources. The records are written to the XML file unsorted, just as they are output by the Merge object.

Two types of logs are present in the dataflow to ensure logging and profiling of the data. A record level log collects the status of records created by the database write strategy. A data profile object also captures the summary and statistical data for the values going into the destination database table. Finally, an expression is applied for some of the source values, and the expression result is collected by a field profile object, which makes it easy to preview statistics at dataflow design time for quality control or debugging purposes.

Also present on the dataflow is a shared connection, which makes it possible for the database connection to be reused by other objects if needed. Finally, parameters and ContextInfo objects provide additional context information, such as server name or user account running the dataflow.