Posts Tagged ‘data transformation’

Source Control In Centerprise

Monday, August 2nd, 2010

With Centerprise’s file system approach, putting meta data files such as transfers and dataflows under source control was always possible.  But now, we’ve added a built-in, fully-functional source control client right in the studio UI.  If you’re familiar with Microsoft’s Visual Studio, you’re probably already familiar the concept of integrated source control and know how much not having to switch between applications means to  your overall productivity.  Centerprise follows the same model and thus, now allows you to work at the file level (dataflows) as well as the project level (Centerprise projects) all within the same UI and all under source control.

Combined with Centerprise’s project feature, the new source control features are a perfect fit resulting in a more collaborative, team-based approach to data integration projects.  With source control built-in, you can now be working on a dataflow, re-factor a complicated dataflow, check-in your project, and when your colleague clicks the “get latest” button, he or she will instantly see  your work and thus be on the same page.  Some of the basic features include:

  • Check-in, check-out on all project files such as dataflows, workflows, SQL files, etc.
  • Get specific versions of files and projects including the latest version
  • View change history
  • Conflict resolution for when changes overlap.
  • Undo pending changes
  • Full source control explorer

Currently, Centerprise works with Microsoft’s Team Foundation Server for its version control back-end, but other systems such as Rational’s Clear Case and Subversion are scheduled to be included in an upcoming point release.  Please contact sales@astera.com for more info.

Centerprise 5 Beta Release is now available

Wednesday, June 2nd, 2010

I am pleased to announce that Centerprise Data Integrator 5 beta is now available for download. This release represents a major upgrade and adds sophisticated dataflow and workflow designers. We are very excited about this release and the great deal of value it can add to our customers’ business.

Centerprise 5 represents an attractive alternative to exorbitant cost of traditional data integration vendors and not-ready-for-primetime products offered at the lower end of the market.

Over the past year, we have had extensive discussions with customers from a variety of industries including financial services, pharmaceutical, healthcare, utilities, and government. Centerprise team used this feedback and other research to develop and improve product from many perspectives. Usability, always a hallmark of Centerprise, has been improved further with the addition of flow designers that provide drag and drop capabilities, unlimited undo/redo, cut/copy/paste, and parameterization, among others. Performance has been enhanced by further increasing parallelism and optimizing a number of areas including database writes and file reads. A new set of APIs enables customers and partners to extend Centerprise by adding new sources, destinations, transformations, and custom functions.

Here are the key features:

  • A Dataflow designer that supports complex data integration flows and features full complement of transformations including lookups, expressions, functions, aggregate, sort, join, normalize, denormalize, union, route, filter, and others.
  • Subflows to create reusable dataflow components that can be plugged into dataflows or other subflows.
  • Single-click WYSIWYG data view capability to preview data at any stage in data flow.
  • Integrated data quality validation and profiling.
  • Integrated drag and drop environment with unlimited undo/redo, cut/copy/paste, automatic layout building, auto map creation, one click element addition, and more.
  • A visual Workflow designer for defining job orchestration. Workflow designer provides the functionality to define job sequence, routing, and dependencies.
  • Restart capability to resume a job from the point of failure.
  • Workflow provides built-in tasks to run Dataflow, Workflow, SQL, or other programs, perform file systems actions, FTP actions, send mail, and others. Additional tasks can be created using Centerprise APIs.
  • Built-in job scheduler to start jobs at recurring intervals including hourly, daily, weekly, and monthly. Jobs can also be triggered based on file drop and through APIs.
  • High-performance parallel processing engine optimized to deliver the performance and scalability required to efficiently process very high data volumes.

If you would like to participate in Centerprise beta program, please register here:

Or you can call us at 1-888-77-ASTERA begin_of_the_skype_highlighting              1-888-77-ASTERA      end_of_the_skype_highlighting (1-805-579-004) or email sales@astera.com.

Usability Matters

Friday, April 16th, 2010

I often read user surveys on corporate software products and see usability ranked fourth or fifth as a selection criterion for these products. Is it any wonder then that corporate software still remains difficult to learn and use even as consumer hardware and software products are undergoing a major revolution in usability?

If you compare today’s phones, consumer oriented websites, and other technology products with those from just five years ago, you will notice remarkable leaps in the overall user experience. On the other hand, most business software products look like they were designed around early 1990s or before. It is not uncommon to have to go through bulky user manuals and weeks long training before these products can be used.

While researching data integration products recently, I found that many of these products were difficult to use and required significant learning curve to get started. Some of the products I came across had over 200+ buttons visible on the screen!

I believe that benefits of superior usability are generally understated and underestimated. A well designed product improves productivity in ways small and large. An intuitive product seems familiar even to a new user. In most cases, you just know how to use it. For companies where job functions are frequently distributed between business users and IT staff primarily because business users do not have expertise in certain tools, a well-designed product can blur those lines by enabling business users to perform these tasks directly.

At Astera, we put great deal of emphasis on making our products easy to learn and use. Our goal is to build data conversion and data integration products that can be used by business experts while providing all the hooks and features required for serious development. With “Hermes”, the upcoming Centerprise version, we have preserved the same intuitive, clutter-free user interface while adding a great deal of power.

For instance, dataflow and workflow designers support cut, copy, paste, unlimited undo/redo, dynamic creation of layouts, global replacement for database connections and file paths, and numerous other usability features that are notably absent from other data integration products.

Other notable features are one click WYSIWYG instant preview and Quick Profile. Data Preview and Quick Profile are invaluable debugging aids that speed up your development and testing.

Data transformation in a data integration job using next Centerprise version “Hermes”

Tuesday, April 13th, 2010

Most data integration jobs  revolve  around  gathering  data from disparate sources, cleaning it, making it conform to the standards by means of data mapping and data transformations, and then sending it to a destination such as a Data warehouse, a Data mart, or an Operational Data Store. One of the key challenges here  is the transformation or the ‘T’ of the ETL. Hermes  offers an exhaustive set of inbuilt data field mapping and data record  transformation constructs. In this blog I am going to talk about the field mapping. I’ll talk about record level transformations on a later day.

Data Field Mapping

In a regular migration job, one needs to map the source and the target. Based on the layout of the destination, the user picks what field of source is going to be mapped to which field of the destination. But this straight movement is not enough in most of the cases. Hermes offers various ways for data field mapping, apart from the field-to-field direct map.

To illustrate other forms of mapping, I have taken an example of a simple data migration job. This scenario involves transfer of a company’s customers’ information from the first quarter of the year to a database table. A snapshot of the visual data-flow design looks like this -

I am going to describe the functionality of each of the data field mapping types, followed by how I have used them in this sample.

Expression Map

Backed by Astera’s rules engine, expression map provides the capability to write Excel-like expressions and do complex calculations based on source fields and then assign them to a destination field.

In our example, source data comes with individual pieces of information about the customer address, but our destination data table has only one column for address and it takes the full address. I have used the expression map for this purpose and combined the five source fields – Street_address, city, state, zip, and country – into one full address and then assigned it to the destination.

List Lookup:

A list lookup stores the lookup list with the meta and uses it for looking up a destination value for incoming source values.

The source data in the example contains the full country names, but the destination table accepts only a two letter code for the country. I have used a list lookup for this purpose and created a lookup with entries like, UK for United Kingdom, US for United States, and so on and so forth. This lookup is attached to the source field of country and the result of the lookup goes to the country field of the destination.

Hermes offers another way to store this lookup information, called a database table lookup. A database lookup essentially works the same way as a list lookup, the only difference being that lookup list is stored in a database table.

SQL Statement Map

Using the SQL Statement map, you can  run a SQL query or a stored procedure with some of its parameters taking value from source fields and the output will be assigned to the destination.

For the example, one requirement was to get the total sales amount for the quarter for each of the customer records, where sales data is stored in a separate table. I have created an SQL query for this purpose that reads –

select TotalSales from ActiveCustomers where contactname = ‘@CustomerId’

Now, I connect the CustomerId from the source to this SQL Map, which works as the parameter for this SQL query, get the total sales for the customer and assigns to the destination field TotalSales.

Function Map

A function map offers a list of Financial, Name Parsing, Regular Expression, Date time, String, Logical, and Conversion functions, where these functions take input from the source fields and the output of the function can be assigned to a destination field.

I have used a function GetLastName from the functions’ list to get the last name of the customer. As you can see in the picture, this map is getting the ContactName from the source and assigning the result, last name — that is obtained by parsing the contact’s full name –  to the LastName field of the destination.

Constant Value Map

A constant value map can be used to assign a constant value to a destination field.

For the example transfer of customer records, we needed to add the information about which quarter this data belongs to, and I have used a constant value map. I am assigning a constant value “Q1 2010″ as data period to each of the customers.

Using this sample, we looked at different field mapping options available in Hermes to transform the source data at the field level before putting it into a destination. Hermes offers several record level transformations, such as merge, sort, union, join, distinct, etc.  I’ll talk about the record level transformations in my next blog entry.