Archive for the ‘Centerprise Team Blog’ Category

Centerprise 5.0 Released!

Wednesday, September 1st, 2010

Centerprise Data Integrator 5.0 is now available.   If you’ve been participating in our beta program, we thank you and invite you to download the released version of Centerprise.  You can find it at the same download location you were given for the beta version or you can contact sales@astera.com.

Source Control In Centerprise

Monday, August 2nd, 2010

With Centerprise’s file system approach, putting meta data files such as transfers and dataflows under source control was always possible.  But now, we’ve added a built-in, fully-functional source control client right in the studio UI.  If you’re familiar with Microsoft’s Visual Studio, you’re probably already familiar the concept of integrated source control and know how much not having to switch between applications means to  your overall productivity.  Centerprise follows the same model and thus, now allows you to work at the file level (dataflows) as well as the project level (Centerprise projects) all within the same UI and all under source control.

Combined with Centerprise’s project feature, the new source control features are a perfect fit resulting in a more collaborative, team-based approach to data integration projects.  With source control built-in, you can now be working on a dataflow, re-factor a complicated dataflow, check-in your project, and when your colleague clicks the “get latest” button, he or she will instantly see  your work and thus be on the same page.  Some of the basic features include:

  • Check-in, check-out on all project files such as dataflows, workflows, SQL files, etc.
  • Get specific versions of files and projects including the latest version
  • View change history
  • Conflict resolution for when changes overlap.
  • Undo pending changes
  • Full source control explorer

Currently, Centerprise works with Microsoft’s Team Foundation Server for its version control back-end, but other systems such as Rational’s Clear Case and Subversion are scheduled to be included in an upcoming point release.  Please contact sales@astera.com for more info.

Slowly Changing Dimensions in Centerprise 5

Friday, June 11th, 2010

Maintaining dimension tables in a data mart is quite a chore. We have talked to customers who spend a great deal of time writing SQL scripts, stored procedures, or other code to perform this function. Often, the code is written or duplicated for each dimension table and must be modified regularly to accommodate changing business requirements. Typically, this custom code performs poorly for all but small tables. I am sure that this brings a great deal of excitement to their working day.

Well, I guess they will have to find that excitement elsewhere because with Centerprise 5, we are taking all that fun out of dimension table maintenance. Centerprise’s Slowly Changing Dimension Write Strategy automates this function and eliminates all that work. All you need to do is to set up your SCD structure using a simple UI. SCD Strategy supports Type 1 and Type 2 slowly changing dimensions and provides support for multiple row versioning strategies including effective/expiration dates, current row flag, and version number. The following screenshot shows an example of SCD definition.

As you can see, all you have to do is specify the role that each field plays in the dimension table. These roles can be surrogate key, business key, SCD1 value, SCD2 value, effective/expiration dates, active value field, etc. That’s it. Once the roles are defined, you can use Centerprise’s preview feature to see exactly how the incoming data would be processed by the SCD strategy. Here is a preview of the dimension table update in the above screenshot.

Centerprise’s SCD Strategy is powered by a high-speed, parallel engine that compares incoming data against the data in the table and, based on the differences between the values, performs an SCD1 update, an SCD2 update/insert, a simple update, or skips the row if no material changes were found. The engine is designed to efficiently process even the largest of dimension tables.

Here is a simple dataflow

I encourage you to try this firsthand using the trial version of Centerprise and you will see the ease and speed of this component. Please contact us at sales@astera.com or 1-888-77-ASTERA begin_of_the_skype_highlighting              1-888-77-ASTERA      end_of_the_skype_highlighting for a trial download.

Centerprise 5 Beta Release is now available

Wednesday, June 2nd, 2010

I am pleased to announce that Centerprise Data Integrator 5 beta is now available for download. This release represents a major upgrade and adds sophisticated dataflow and workflow designers. We are very excited about this release and the great deal of value it can add to our customers’ business.

Centerprise 5 represents an attractive alternative to exorbitant cost of traditional data integration vendors and not-ready-for-primetime products offered at the lower end of the market.

Over the past year, we have had extensive discussions with customers from a variety of industries including financial services, pharmaceutical, healthcare, utilities, and government. Centerprise team used this feedback and other research to develop and improve product from many perspectives. Usability, always a hallmark of Centerprise, has been improved further with the addition of flow designers that provide drag and drop capabilities, unlimited undo/redo, cut/copy/paste, and parameterization, among others. Performance has been enhanced by further increasing parallelism and optimizing a number of areas including database writes and file reads. A new set of APIs enables customers and partners to extend Centerprise by adding new sources, destinations, transformations, and custom functions.

Here are the key features:

  • A Dataflow designer that supports complex data integration flows and features full complement of transformations including lookups, expressions, functions, aggregate, sort, join, normalize, denormalize, union, route, filter, and others.
  • Subflows to create reusable dataflow components that can be plugged into dataflows or other subflows.
  • Single-click WYSIWYG data view capability to preview data at any stage in data flow.
  • Integrated data quality validation and profiling.
  • Integrated drag and drop environment with unlimited undo/redo, cut/copy/paste, automatic layout building, auto map creation, one click element addition, and more.
  • A visual Workflow designer for defining job orchestration. Workflow designer provides the functionality to define job sequence, routing, and dependencies.
  • Restart capability to resume a job from the point of failure.
  • Workflow provides built-in tasks to run Dataflow, Workflow, SQL, or other programs, perform file systems actions, FTP actions, send mail, and others. Additional tasks can be created using Centerprise APIs.
  • Built-in job scheduler to start jobs at recurring intervals including hourly, daily, weekly, and monthly. Jobs can also be triggered based on file drop and through APIs.
  • High-performance parallel processing engine optimized to deliver the performance and scalability required to efficiently process very high data volumes.

If you would like to participate in Centerprise beta program, please register here:

Or you can call us at 1-888-77-ASTERA begin_of_the_skype_highlighting              1-888-77-ASTERA      end_of_the_skype_highlighting (1-805-579-004) or email sales@astera.com.

Upcoming Centerprise Upgrade and Microsoft.Net 4.0

Tuesday, April 27th, 2010

Earlier this month, Microsoft released Visual Studio 2010. Along with VS 2010, Microsoft also released version 4.0. .Net 4.0 brings new features and improvements in a number of areas. Centerprise team has been working on a major new upgrade for some time now. Last fall, when Microsoft released preview version of Visual Studio 2010, we evaluated it and decided to develop next version of Centerprise on .Net 4.0.

For the next generation of Centerprise product, our key goals are:

• Powerful data integration functionality including enterprise grade dataflow and workflow designers, first class support for dimension and fact table loading, and extensive data quality features.
• A Highly parallel data integration engine that would scale to take advantage of increasingly larger number of CPUs and cores in today’s machine and deliver the performance and throughput that can handle very large data sets.
• An extensible platform that makes it easy to add new sources, destinations, transformations, and functions easily while providing a secure runtime environment.

Centerprise has been a parallel processing engine since version 2.0. We have always focused on performance and scalability as a key design goal. Centerprise’s parallel framework has provided an excellent foundation the integration engine. Employing the new parallel programming extensions in .Net 4.0, we have made substantial improvements to our framework and the entire product has become significantly more efficient and scalable. Parallelism pervades every aspect of the data integration engine including new multithreaded algorithms for sorting, file reading and parsing, database writes, profiling, and transformations such as join, lookup, aggregation and others.

We are planning to use .Net extensibility framework in future versions of Centerprise to provide a secure and powerful platform for customers and third party developers. This includes the ability to add new data sources, transformations, and workflow tasks.

Hermes will also feature extensive set of .Net APIs to enable our customers and partners integrate Centerprise as part of their solutions. This includes triggering and monitoring jobs, creation of function, custom transformations, addition of new data sources and destinations, and much more.

New Workflow Coming to Centerprise

Tuesday, April 20th, 2010

As you may know, we’re busy working on the next edition of Centerprise Data Integrator.  Part of that effort includes a brand new way we will be handling data workflows.  In the currently released version of Centerprise, you pick your data source and then your destination.  Anything beyond this fixed flow is handled with a combination of plug-ins and batch transfers.  The upcoming version of Centerprise opens this up completely.

The new workflow in Centerprise will allow for various  actions to be laid out in a flow-chart like fashion to be executed sequentially.  Need to upload a file to an FTP site and then send out an email on the completion of your dataflow processing?  No problem.  Just drag and drop these two tasks onto our new workflow diagram and link them together in the desired sequence.  You can pretty much create any workflow process imaginable with this new flexibility and, as always, we’re allowing for custom workflow actions written in .NET for further extensibility.  It is our very propensity for extensibility which is a big part of why were adding this feature set to the product.

In the last two iterations of Centerprise, customers have taken advantage of our API and written some pretty clever plug-ins to, in essence, write their own workflow into the data transfer process.  While we’re glad our customers find this feature useful, we know a need when we see one.  So we’re introducing the usual workflow suspects such as FTP, Email, Run SQL Statement, Run Exe as well as some looping and switching mechanisms.  We’re very excited about this effort and we think you will be too.  Let us know what you think and/or any thing you’d like to see in our new workflow engine.

workflowshot

The image above shows a simple example.  In this scenario, the first task is downloading a file from an ftp directory (all properties for this task are set in a separate editor in the exact same way was dataflow tasks).  After the file is downloaded from the FTP directory, the file is copied to a folder that services a dataflow which is the third task in the queue.  The next action is the “Decision” which is a simple expression that routes the flow depending on the results of the dataflow.  If successful, an SQL script is run on a database.  If not, an email is sent to the appropriate individual.

It’s pretty simple and extremely flexible.  This combined with the fact that you can write your own workflow actions should make this component a lifesaver.  I’ll be writing more about this component in the coming weeks.

Usability Matters

Friday, April 16th, 2010

I often read user surveys on corporate software products and see usability ranked fourth or fifth as a selection criterion for these products. Is it any wonder then that corporate software still remains difficult to learn and use even as consumer hardware and software products are undergoing a major revolution in usability?

If you compare today’s phones, consumer oriented websites, and other technology products with those from just five years ago, you will notice remarkable leaps in the overall user experience. On the other hand, most business software products look like they were designed around early 1990s or before. It is not uncommon to have to go through bulky user manuals and weeks long training before these products can be used.

While researching data integration products recently, I found that many of these products were difficult to use and required significant learning curve to get started. Some of the products I came across had over 200+ buttons visible on the screen!

I believe that benefits of superior usability are generally understated and underestimated. A well designed product improves productivity in ways small and large. An intuitive product seems familiar even to a new user. In most cases, you just know how to use it. For companies where job functions are frequently distributed between business users and IT staff primarily because business users do not have expertise in certain tools, a well-designed product can blur those lines by enabling business users to perform these tasks directly.

At Astera, we put great deal of emphasis on making our products easy to learn and use. Our goal is to build data conversion and data integration products that can be used by business experts while providing all the hooks and features required for serious development. With “Hermes”, the upcoming Centerprise version, we have preserved the same intuitive, clutter-free user interface while adding a great deal of power.

For instance, dataflow and workflow designers support cut, copy, paste, unlimited undo/redo, dynamic creation of layouts, global replacement for database connections and file paths, and numerous other usability features that are notably absent from other data integration products.

Other notable features are one click WYSIWYG instant preview and Quick Profile. Data Preview and Quick Profile are invaluable debugging aids that speed up your development and testing.

Data transformation in a data integration job using next Centerprise version “Hermes”

Tuesday, April 13th, 2010

Most data integration jobs  revolve  around  gathering  data from disparate sources, cleaning it, making it conform to the standards by means of data mapping and data transformations, and then sending it to a destination such as a Data warehouse, a Data mart, or an Operational Data Store. One of the key challenges here  is the transformation or the ‘T’ of the ETL. Hermes  offers an exhaustive set of inbuilt data field mapping and data record  transformation constructs. In this blog I am going to talk about the field mapping. I’ll talk about record level transformations on a later day.

Data Field Mapping

In a regular migration job, one needs to map the source and the target. Based on the layout of the destination, the user picks what field of source is going to be mapped to which field of the destination. But this straight movement is not enough in most of the cases. Hermes offers various ways for data field mapping, apart from the field-to-field direct map.

To illustrate other forms of mapping, I have taken an example of a simple data migration job. This scenario involves transfer of a company’s customers’ information from the first quarter of the year to a database table. A snapshot of the visual data-flow design looks like this -

I am going to describe the functionality of each of the data field mapping types, followed by how I have used them in this sample.

Expression Map

Backed by Astera’s rules engine, expression map provides the capability to write Excel-like expressions and do complex calculations based on source fields and then assign them to a destination field.

In our example, source data comes with individual pieces of information about the customer address, but our destination data table has only one column for address and it takes the full address. I have used the expression map for this purpose and combined the five source fields – Street_address, city, state, zip, and country – into one full address and then assigned it to the destination.

List Lookup:

A list lookup stores the lookup list with the meta and uses it for looking up a destination value for incoming source values.

The source data in the example contains the full country names, but the destination table accepts only a two letter code for the country. I have used a list lookup for this purpose and created a lookup with entries like, UK for United Kingdom, US for United States, and so on and so forth. This lookup is attached to the source field of country and the result of the lookup goes to the country field of the destination.

Hermes offers another way to store this lookup information, called a database table lookup. A database lookup essentially works the same way as a list lookup, the only difference being that lookup list is stored in a database table.

SQL Statement Map

Using the SQL Statement map, you can  run a SQL query or a stored procedure with some of its parameters taking value from source fields and the output will be assigned to the destination.

For the example, one requirement was to get the total sales amount for the quarter for each of the customer records, where sales data is stored in a separate table. I have created an SQL query for this purpose that reads –

select TotalSales from ActiveCustomers where contactname = ‘@CustomerId’

Now, I connect the CustomerId from the source to this SQL Map, which works as the parameter for this SQL query, get the total sales for the customer and assigns to the destination field TotalSales.

Function Map

A function map offers a list of Financial, Name Parsing, Regular Expression, Date time, String, Logical, and Conversion functions, where these functions take input from the source fields and the output of the function can be assigned to a destination field.

I have used a function GetLastName from the functions’ list to get the last name of the customer. As you can see in the picture, this map is getting the ContactName from the source and assigning the result, last name — that is obtained by parsing the contact’s full name –  to the LastName field of the destination.

Constant Value Map

A constant value map can be used to assign a constant value to a destination field.

For the example transfer of customer records, we needed to add the information about which quarter this data belongs to, and I have used a constant value map. I am assigning a constant value “Q1 2010″ as data period to each of the customers.

Using this sample, we looked at different field mapping options available in Hermes to transform the source data at the field level before putting it into a destination. Hermes offers several record level transformations, such as merge, sort, union, join, distinct, etc.  I’ll talk about the record level transformations in my next blog entry.

Upcoming Centerprise Upgrade

Thursday, April 1st, 2010

We are working on developing the next generation of Centerprise platform. Since launching Centerprise in early 2008, we have made continual improvements to the product incorporating feedback from customers in numerous industries.

The next version, code named “Hermes”, represents a major upgrade of the product providing high-end dataflow, workflow, and data quality features. We have used feedback gathered over the past two years to redesign the user interface and introduce a number of new concepts and features.

High performance and superior usability have been the hallmarks of Centerprise from day one. Every feature is continually tested for usability and refined to ensure ease of learning and use. The server features a parallel processing engine to deliver high performance and scalability. Hermes continues that tradition.
We are targeting Q3 for the production release. Preview releases are planned for early May. Initially, we will be working with select customers and later on expand the preview program to a larger group.
Over the next few months, Centerprise team will use this blog to discuss various aspects of the product including discussion of features, performance and scalability, programmability, and technology.

Here are the key characteristics of the upcoming release:

• A scalable and multithreaded engine that represents state of the art in parallel processing. The engine has been designed to support massive parallelism with minimal blocking or starvation. This means that Centerprise scales to support ever increasing volumes and take full advantage of today’s multicore and multiprocessor hardware.

• Drag-and-drop Dataflow Designer enabling creation of sophisticated dataflows. Dataflow features include join, sort, merge, union, route, normalize, high speed database loading, slowly changing dimension support, change data capture, and much, much more.

• Workflow designer to support job sequencing and dependencies.

• Intuitive and clutter free user experience greatly improves productivity and affords a short learning curve.

• Extensive data quality features including rule-based data quality checks, data correction, data profiling, access to error information while mapping and more.

Over the next few months, the team will be discussing these features in greater detail.