Dataflows – The Cornerstone of Data Integration Part 3

Part 3 – Managing Dataflow Layout

Auto Layout

The Auto Layout feature in Centerprise allows you to arrange objects on the dataflow to improve its visual representation. To invoke the Auto Layout feature, click the icon on the Dataflow toolbar, or open the Dataflow menu and select Auto Layout. You can manually move the object around the dataflow by holding the LEFT mouse button down over the object title, and moving it to a new location.

Expand All/Collapse All

To show only object names in the dataflow and hide their field layout, click the icon on the Dataflow toolbar or go to Dataflow menu and select Collapse All. When an object is collapsed, the maps to and from the object are shown as a single line. To see how fields are mapped, expand the object. To show object names as well as their field layout, click the icon on the Dataflow toolbar, or go to Dataflow menu and select Expand All. To collapse a single object, click the icon in the top right corner of the object box. To expand a single object, click the icon in the top right corner of the object box.

Zoom In/Zoom Out/Fit to Screen

You can adjust the display size of the dataflow by selecting Zoom In, Zoom Out, and Fit to Screen. You can also select a custom zoom percentage using the Zoom % input on the Dataflow toolbar.

Auto Size All

Auto Size All adjusts the size of object boxes to delete extra white space or increase the vertical footprint to eliminate the vertical scroll bar. To auto resize a single object, right click on it and select Resize to Fit from the context menu. You can resize any object by ‘grabbing’ any corner of the object with the mouse pointer.

Use Orthogonal Links reorganizes links between objects in a grid.

Linking and Mapping Objects

Most objects on the dataflow have ports. Some have only input ports, others have output ports, and still others have both input and output ports. An input port on an object signifies that incoming data can be fed to the object. Because source objects cannot receive data, they do not have Input ports.

Example:

An output port on an object signifies that the object can send data to another object.

Example:

An object having both an input port and an output port can receive and send data. An example of such an object is a destination object that is also source to another destination object. Most transformation objects also have both types of ports.

There are two types of ports: field ports and node ports. A field port allows you to map to an individual field. A node port allows you to map to a node, including all fields and child nodes in the selected node. The main node located at the very top of the object box is essentially the root node that spans the entire tree (or all fields in the case of a flat layout).

To quickly create field maps between two objects, drag and drop a node output port of the upstream object to the node input port of a downstream object.

If the downstream object already has fields, the fields with the same name will be mapped between the two objects. If the downstream object has an empty field layout, it will get the same layout as the upstream object, and all the fields will be mapped between the two objects.

By default, maps copy source values to the destination, and, if needed, convert the value to the data type of the destination element. If you want to copy metadata for the field, you can change the map type to copy field name, field data type, or length to the destination.

To remove all fields from a node, right click the node and select Remove All Elements from the context menu. This action will also unmap any fields mapped to the fields being removed. To remove a map between two fields, right click on the map and select Delete, or left-click on the map and press DEL key on the keyboard. To remove incoming maps for a node, right click on the node, and select Remove All Inbound Maps.

For objects with tree layouts in the collapsed state, a single map link will be shown for the entire node, which could make it difficult to see how fields are mapped.  The Find Map From To capability automatically expands and positions the two trees inside the object box showing how the fields are mapped. Simply right click on a map and select Find Map From To.

A similar feature is available when you need to identify to or from which field or fields a given field is mapped. To find the destination field or fields for a given source field, right click on the field and select Find Elements Mapped To. To find the source field for a given field, right click on the field and select Find Element Mapped From.

Setting Object Properties

To open an object’s properties, double click on the object’s title or right click on the object and select Properties from the context menu. An example of a destination database table’s Properties screen is shown in the figure below.

While in the Properties screen, you can navigate the wizard pages by pressing the backward or forward icons. You can also switch to the Properties of another object on the dataflow by selecting an object in the Editing dropdown menu.

Creating Field Layouts

Most objects on the dataflow have field layouts. Field layouts can be either flat or hierarchical (tree layouts).  Depending on the layout type, the field layout is displayed as a flat list or a hierarchy of fields inside the object box.  The following is an example of a flat field layout.

Below is an example of a tree field layout.

To navigate a tree field layout, expand or collapse the tree nodes as needed using the + or –  icons.

There are four ways to create a field layout for your object.

1. Auto populate the field layout based on a source’s content.  For example, for a source-delimited file, Centerprise will read the file to derive the field layout, including the data type of each field.

2. Create the field layout based on the layout of another object on the dataflow (this does not apply to source objects).

To create a field layout based on another object’s field layout, grab the node output port of the object whose layout you wish to replicate and drop it on the node input of the object. This action will also map all fields inside the node (including child nodes) between the two objects.

Using this feature, you can create field layouts for the entire tree, or a selected node only.

This feature is available only for nodes with no fields added yet.  To clear a node that already has fields, right click on the node, and select Remote All Elements. This action will also unmap all fields inside the node (including child nodes) between the two objects.

3. Add a single field to the layout by dropping the field on the <New Element> placeholder. This action will also map the field between the two objects.

4. Manually create or edit a field layout by opening an object’s properties and going to the appropriate field layout screen (Source Fields screen for source objects, and Destination Fields screen for destination objects). Here, you can also change field data types and the order of fields and specify null/not-null properties, among other actions.

Copying Field Layouts

You can copy an entire layout from one object and paste it into another object on the same or different dataflow. This way you can quickly replicate a set of fields between two objects.

To copy a field layout, right click a node in the object whose layout you want to copy and select Copy Layout from the context menu. Then, right click a node in the target object and select one of the following options from the context menu:

  • Paste Layout (Add Member) adds a new node to the existing layout keeping existing structure unchanged
  • Paste Layout (Replace) replaces the existing layout with the layout being copied
  • Paste Layout (Add Elements) adds fields to the existing layout keeping existing fields unchanged.

Deleting Fields

To delete a field from the layout, either right click on the field inside the object’s box and select Remove Element (this function is not available for source objects) or open the object’s properties, go to the appropriate field layout screen and remove the field from the grid by selecting the field and pressing the DEL key on the keyboard.

General Options

The General Options screen shares the options common to most objects on the dataflow. A key option is Clear Incoming Record Messages.

Clear Incoming Record Messages

When this option is on, any messages coming in from objects preceding the current object will be cleared. This is useful when you need to capture record messages in the log generated by the current object and filter out any record messages generated earlier in the dataflow. The Comments input allows you to enter comments associated with this object.

Tools for Previewing and Monitoring Dataflow

The dataflow user interface provides the many tools that are helpful in previewing, debugging, and monitoring your dataflow.

Job Progress

The Job Progress window displays status of the dataflow as it is being executed.  This window also provides links to any error log files and the data profiler files. You can view the Job Progress window by clicking View>Job Progress or using the shortcut key Ctrl+Alt+T.

Data Preview

The Data Preview window displays a sample of records for the selected object. You can view the Data Preview window by clicking ViewData Preview or using the shortcut key Ctrl+Alt+P. You can also right click on any object’s properties and select Preview Data from the context menu.

Quick Profile

The Data Statistics window displays statistical information for a sample of records from the selected object. You can view the Data Statistics window by clicking View>Quick Profile or using the shortcut key Ctrl+Alt+A. You can also right click on any object’s properties and select Quick Profile from the context menu.

Verifying Dataflow

Verifying a dataflow will list any errors or warnings present in the dataflow design. Correct any such errors or warnings and verify your dataflow again to ensure there are no errors. To verify a dataflow, click the icon on the main Toolbar. Verification results will be displayed in the Verify window. To stop verification while it is still in process, click the  icon on the Verify window Toolbar.

Running Dataflow

To run your dataflow, click the icon on the main Toolbar. The dataflow will run on the server that is selected in the Server input on the main Toolbar. To stop a dataflow that is currently running, click the icon on the Job Progress window toolbar.

Next week we will present the first of four sets of dataflow examples.