Big Data Toronto 2018: Post-Event Highlights

Team Astera participated in Big Data Toronto 2018 as an exhibitor and had a truly great experience networking with data scientists, practitioners, and thought leaders from across North America and beyond. We took the opportunity to introduce our end-to-end, agile data warehouse solution, DWAccelerator, and received a great response from the attendees.

Event Overview

Going strong in its third year, Big Data Toronto is among the most acclaimed big data and analytics conferences & expos in Canada. Over 4,000 innovators, 60+ exhibiting brands, and 100+ speakers gathered under one roof to embark on an educational journey and exchange technical insights, while providing an outlook on future trends of technology and best practices related to AI and data science.

The conference, held in Metro Toronto Convention Centre, was a two-day event, and covered key topics, including digital transformation, data governance, data management, predictive analytics, advanced machine learning, cyber security and privacy, and more.

Highlights of the Event

Big Data Toronto 2018 has been a thought-provoking and insightful journey for our team. Here is what we learned from the conference:

Agile Data Warehousing Practices

Data warehousing, as a field, stays highly relevant for businesses in the big data era, albeit the expectations have changed significantly over time. Agility in design and development of warehouses has become a key requirement. Businesses are diligently seeking solutions that can accelerate core data warehousing functions, such as data integration, analysis, and cleansing for business intelligence, among others.

Intelligent Data Extraction

Businesses are increasingly taking interest in agile data warehousing solutions that offer effective data extraction from complex structured and unstructured sources, as well as traditional sources. They need accurate, intelligible results quickly to acquire analytical reports through business intelligence tools, allowing them to conform to the changing market dynamics, as well as for making sound decisions for future endeavors.

DWAccelerator – The Solution to Major Data Warehousing Problems

Massive setup costs, time-consuming processes, and lack of agility and flexibility are some of the key obstacles met during BI project implementation. DWAccelerator can very well be the answer to all these issues. DWA is an all-round solution that instills agility and automation in data warehouse designing and development without having the user write a single line of code.

Our COO, Jay Mishra, conducted a session on “Accelerating your Data Warehouse Project” and shed light on problems and ways for businesses to speed up setting up a data warehouse. He presented DWAccelerator and showed that while a data warehouse, using traditional techniques, takes anywhere from a few weeks to several months to build, it can be done in under an hour with our automated solution. Some of the highlighting features of DWAccelerator, such as data models based mapping, automatic field matching, flow generation, and the concept of virtual data models, were well received.

We Met Prospective Partners and Clients

The innovative features and powerful data warehousing capabilities of DWAccelerator gathered much attention from the attendees. We talked to many data professionals and showcased how our product can automate the entire data warehousing process and cut down the costs and time required. This generated a lot of interest in our product, inspiring several new partners and prospective clients to join hands with Astera.

Overall, Big Data Toronto 2018 offered a plethora of insights and concepts that are likely to pave way for future technologies and trends. In addition, the Canada’s #1 big data and AI conference proved to be a great opportunity for us to meet new clients and partners and provide exposure to the latest addition to our product line, DWAccelerator.

PDF-Based Data Extraction Made Easy with ReportMiner

Businesses have used PDF format for exchanging data because of its convenience and reliability. However, manual extraction of data from PDFs is a challenging task. Some of the commonly exchanged PDF documents include purchase orders, invoices, financial statements, and valuation reports. In this blog, we discuss how businesses can liberate important business data from PDFs with automated PDF data extraction.

Challenges of PDF Data Extraction

Many businesses find data extraction from PDF documents challenging as they are in an unstructured format. Previously, businesses relied on the IT department to perform this task, increasing the burden on IT personnel, which led to delays in data exchange.

In most cases, the requirement is to extract data not from only one, but a batch of similarly structured files. In this case, manual extraction of data from PDFs is not only time-consuming but can also lead to errors. A data extraction tool can reduce manual effort required and save time by automating extraction from PDF documents.

Since an organization receives PDF documents in different formats such as scanned PDFs, text-based PDFs, and PDF forms, a desirable data extraction solution should be able to deal with all kinds of PDFs.

How ReportMiner makes PDF-based Data Extraction Painless?

Astera offers a data extraction solution for all PDF-based documents. ReportMiner’s automated data extraction features make it an easy to create and deploy end-to-end integration solution for any use case involving data extraction from PDF sources.

Featuring a user-friendly interface, the solution design is based on a visual, drag-and-drop environment and does not require any form of coding or scripting.

  • Text-based PDFs: ReportMiner can read directly through text-based PDFs and extract the required data based on the designed extraction template.
  • Scanned or Image-only PDFs: Some of the source documents that companies receive are image-only PDFs such as scanned invoices. ReportMiner’s OCR capability creates a text equivalent of images stored in PDF documents. That point onwards, the extraction process is identical to text-based.
  • PDF Forms: In some cases, businesses also deal with PDF Forms to collect important information such as customer details. ReportMiner enables extraction of data from these forms and makes critical business data available for further use.

Crucial business data is often trapped in PDF documents. ReportMiner enables businesses to liberate data from different types of PDFs with its extensive data extraction features. Streamlined PDF data extraction, combined with the ability to automate the process, helps businesses save time and gain access to mission-critical information promptly.

Download our whitepaper, ‘Liberating Data from PDF Documents’ to learn how ReportMiner can help businesses in extracting business data for further processing.

The Dilemma of Build vs Buy – How it applies to Enterprise Software

If Shakespeare was an IT manager, the famous question ‘To be, or not to be’ would have been ‘To build or to Buy’. In fact, the phenomena of DIY-ing something or buying a commercial product is not only limited to enterprise software. IKEA is running a whole business out of providing utility to ‘build proponents’ and DIY enthusiasts. While building furniture can be fun, building an enterprise level software – not so much.

Build vs Buy

Like any other business decision, the decision to either build a software or to buy a commercial product is significantly influenced by total cost of the approach and return on investments. If you’re facing a similar dilemma, the table below summarizes the prospects and consequences of both the approaches.

         Metrics and KPIs                     Build Approach                        Buy Approach
Cost of deployment Hiring a team of developers, designers, programmers to build the solution License fee of the product and deployment costs
Time to market – Time to develop the product

– Time for performing QA analysis

– Time to fix any patches or bugs found

– Time to deploy the solution

– Product development, QA analysis and patch fixes are already taken care of by the solution provider. Therefore, solution can directly be deployed.

– Time to configure and install the product.

Ongoing maintenance and support costs A dedicated team of IT professionals should be on-board to help with ongoing product support and maintenance Updates, maintenance and customer support are handled by the solution provider. However, the solution provider might charge a fee for providing these services
Learning curve A steep learning curve is usually associated with the developed product Commercial products are developed to be used by a wide range of audience with varying levels of technical skills therefore, in most cases these solutions are designed to be more intuitive and user-friendly

 

When is ‘Building’ the right approach?

Building a software is going to be beneficial for your business if:

  • The software is going to give you sustainable competitive advantage
  • No other available solution can meet your business needs
  • The end-points from where your business collects data are not volatile or prone to frequent changes
  • You have substantial resources to cover the costs associated with building and maintaining the software

When is ‘Buying’ the right approach?

You should opt for buying a commercial software if:

  • Building a software is not the core of your business and is not going to yield you any competitive advantage
  • You have limited resources and you would rather invest them in improving your core business activities
  • There are solutions available that address the challenges your business is facing
  • You are looking for a quick solution that can be immediately deployed

IT manager at Brickell Bank, formerly known as Espirito Santo Bank, faced challenges in migrating broker data from MS Access database to IBM mainframe data warehouse. Learn more about the approach he opted for and other factors that influence build vs buy decision by downloading the free white paper.

An Automated Approach to Modeling Your Slowly Changing Dimensions

Business data is inherently susceptible to change with the passage of time and impacts the business in different ways. In data warehouses, the effect of time on our dimensions and facts requires careful study for the repository to meet the business intelligence objective of delivering up-to-date information to decision makers.

Question is, how best to handle these changes?

Developing a dimensional model that captures the different states of your data with respect to time is a key objective of an Enterprise Data Warehouse. For measures in our fact tables, we can use date dimensions and link them using foreign keys. For dimensions, the complexity of handling changes increases greatly. Each step of the Slowly Changing Dimension (SCD) flow must be hand-coded using multiple, complex SQL statements. The implementation is lengthy and complex, and affects the business’ ability to maintain its data quickly and reliably – which is always a critical consideration.

Slowly Changing Dimensions in Centerprise

Compared to the traditional hand-coded approach to the slowly changing dimension flow, Astera offers an automated implementation using a completely drag-and-drop interface. Source data is mapped to an SCD object in Centerprise, which pushes system-generated SQL statements directly to the target data warehouse (Read: Pushdown Optimization Mode in Centerprise) based on the field layouts defined by the user. Each column in the user’s table can be designated as Surrogate Key, Business Key, SCD1, SCD2, etc. (see below) within the component’s properties in Centerprise. The platform handles the update strategy, performance considerations, routing, and complex joins automatically on the backend, as long as the SCD Field Types in below screen are defined correctly.

Field Layout - Slowly Changing Dimensions component

SCD Object Properties in Centerprise

Automating Type 1 & 2 Slowly Changing Dimension Implementation

Centerprise supports both Type 1 and Type 2 SCD to update records with and without maintaining history.

SCD Type 1

This type deals with updates in the dimensional table, for cases when preserving history is not a consideration and you need to replace the old values in your table with recent ones.

To use SCD Type 1 in Centerprise, you can mark your column as ‘SCD1 – Update’ in the Layout Fields menu of the SCD object in Centerprise, as seen in above screenshot for the ‘Contact Title’ column.

SCD Type 2

This type deals with changes in your dimension that need to be tracked. A new record is inserted with each change, and the existing record is marked as expired, by date, version, or status.

To use SCD Type 2 in Centerprise, mark your chosen column as ‘SCD2 – Update and Insert’, as seen in above screenshot for ‘ContactName’ column.

Push-Down Optimization

Once the layout is defined and flow executed, the Astera SCD transformation generates the SQL code necessary to compare, join, route, and insert data in your target dimension and pushes the transformation logic down to the database for processing.

Using this approach, the maintenance of large dimensions is significantly faster because all the processing is done by the database rather than the Centerprise server performing the operations and going back and forth between the database to read, compare, and write the data.

To learn more about the automated Slowly Changing Dimensions component in Centerprise and how to use it to manage your dimensions, download the white paper: How to Manage Slowly Changing Dimensions Using Centerprise.

Pushdown Optimization Mode in Centerprise Data Integrator

How does Pushdown Optimization mode work in Centerprise?

Moving data, containing millions of records, between source, ETL server, and target database can be a time-consuming process. When source and target database reside on the same server, unnecessary data movements and delays can be prevented by applying transformations to data in pushdown optimization mode.

Pushdown optimization mode pushes down the transformation logic to the source or target database. Centerprise integration server translates the applied transformation logic into automatically generated SQL queries. This eliminates the need for extracting data from the source, migrating it to staging tables on an ETL server for applying transformations, and then loading the transformed data on the target database. As a result, performance is significantly improved and data is readily made available to the end-users.

ELT, ETL, Pushdown optimization mode

Types of Pushdown Mode

There are two types of pushdown optimization modes:

  1. Full pushdown optimization mode
  2. Partial pushdown optimization mode

In full pushdown optimization mode, the Centerprise integration server executes the job completely in the pushdown mode. And in partial pushdown mode, the transformation logic is either pushed down to the target database or the source database, depending on the transformation logic and database provider.

Database Providers supported in Pushdown Mode by Centerprise

Centerprise supports following database providers:

  1. MySQL
  2. SQL
  3. Oracle
  4. Postgres
  5. MSSQL

Verify Pushdown Mode

Certain transformation logic cannot be executed in a pushdown mode. ‘Verify Pushdown Mode’ feature in Centerprise identifies the transformation logic that can be pushed down to the source or destination database.

To learn more about Pushdown Optimization mode in Centerprise and its use cases, download the white paper Centerprise Automated Pushdown Optimization.

Optimizing Business Capabilities with a Data Integration Software

Businesses are increasingly adopting a data-driven culture. The significant surge in the volume of the exchanged data indicates that the trend is creating a paradigm shift – a shift from manufacturing to an information economy. To put this in perspective, Google processes petabyte of information by the hour and The Economist recently declared data as the most valuable resource, even more than the oil.

Data integration with Centerprise

“The world’s most valuable resource is no longer oil, but data.”

-The Economist

But the true utility of any resource comes from its consumption or the value it delivers to the consumers. The same principle applies to data. To gain maximum utility out of data, businesses must be able to (quickly and reliably) integrate incoming data from disparate sources and make that information available to the relevant stakeholders, both internally and externally. Your business needs a data integration tool to perform this task efficiently.

A data integration tool can help you in following ways to optimize your current business capabilities:

By extracting data from structured and unstructured sources

Incoming data can be structured, semi-structured, poly-structured or unstructured. For instance, text-based PDF files, PDF forms and scanned PDF images are used as a medium for exchanging information by many organizations. But the data contained in PDF files is unstructured and is required to be extracted for crucial business decisions. A data integration tool can automate the data extraction process and integrate the extracted data with the internal systems for further processing and analysis.

By integrating data from hierarchical files

Integrating data from flat files is comparatively easier but business users face challenges when they try to extract, parse and integrate information from hierarchical data files such as XML, JSON, EDI and COBOL. To perform hierarchical data integration, business users rely on IT, which increases the burden on them. A data integration tool can effectively bridge this gap between business executives and IT.

Learn how Centerprise Data Integrator enables business users to work with hierarchical data, without the need for custom coding and programming, by downloading the whitepaper Hierarchical Data Integration for Business Users.

By making data readily available to business users

A data integration tool with a user-friendly interface and a comprehensive library of built-in functions can help limit the reliance on IT. It readily makes the data available to business users who can then work with the available information and get business insights without delay. Additionally, data integration tools can automate the ETL process, which eliminates the need for manual integration and significantly reduces the chances of errors.

The performance of a business is optimized when the executives are more focused on making critical business decisions rather than collecting and integrating the data.

By checking for data-quality

A data integration tool cleanses, validates and ensures the trustworthiness of the incoming data. Poor quality data can adversely affect business insights that can prove to be expensive for the business.

Overall, a data integration tool that simplifies the ETL process for the users is an investment that organizations should make to stay relevant in the current data driven business environment. It can prove to be beneficial for the business in more than one ways. By bridging the gap between IT and business executives, it helps in efficient division of workload. It empowers business users to drive insights from the data by giving them prompt access to it. And when executives delegate the task of data integration and extraction to software, they can focus on more critical aspects of the business. The result is faster and more accurate business decisions, minimized costs and increased revenue.

Astera’s Centerprise Data Integrator is a complete data integration solution that provides the mentioned benefits to its users, and more. The user-friendly interface and visual drag-and-drop environment eliminates the need for manual scripting and enables business users to work with data without relying on IT. Contact Astera’s sales and support to get more information.

Streamlining Data Extraction

Most of the crucial business data is stored in unstructured formats, while machines require structured data for processing. Businesses need data extraction tools to bridge this gap.

Unstructured Data | Data Extraction

Data extraction has evolved with technology, from manual extraction to complete automation. Constant innovation and developments in this field are making data extraction easier, flexible, and scalable for users.

Automation of Data Extraction

Previously, organizations were heavily dependent on manual extraction of data. In some cases, the IT department was responsible for writing custom scripts to extract data points, and in other, employees manually read through every document to extract data. In both cases, the data required further massaging based on the needs of end users, delaying business decisions.

Today, the key goal of a data extraction tool is to automate the entire process for its users. Template-based data extraction is a popular route to automation, giving greater control to users. It involves converting incoming documents using extraction templates which can be re-used for documents with similar layouts. Moreover, modern tools provide a Graphical User Interface (GUI) for the creation of these extraction templates, enabling business users to extract documents on their own, without the need to script or code.

Other than this, technologies like Natural Language Processing (NLP) enable computers to understand free-form text and make it analyzable through speech tagging, deep learning, text analytics, and other methods. Tools that leverage Machine Learning (ML) use algorithms to understand text structures and word morphology.

Automating data extraction process accompanies several benefits for businesses. Some of them are listed below:

  • Saves Time and Effort

Reusability of extraction templates for similar documents saves time and effort.

  • Faster Decision Making

Data can be processed in real time. This makes meaningful data readily available for business analysis, ensuring faster decision-making.

  • Streamlined Document Processing

Data patterns are used for recognizing documents and can allow for automatic classification of documents.

Conclusion

Automation has reshaped the business landscape. In today’s dynamic environment, it is important for businesses to focus on the quality and accessibility of data to stay ahead of their competitors. Accurate data can be made available in real-time through the automation of data extraction process.

For more information about how the concept of data extraction has evolved to meet modern business needs download the whitepaper ‘State of the Art of Data Extraction’.

Mainframe Modernization

Common Challenges of COBOL Data Extraction and How Centerprise Addresses Them

Although technologies like Ruby, Hadoop, and Cloud Computing continue to dominate headlines, there are still a large number of businesses that rely on legacy technologies. Many businesses, particularly those operating in the banking and insurance sector, use solutions that are COBOL-based.

According to Reuters, over 220 billion lines of COBOL code are in use today. As a result, a tremendous amount of data remains tied up in legacy systems. For any legacy modernization and BI initiative to be successful, it is important that this data must be integrated, transformed, and offloaded onto an analytics platform.

While extracting data from COBOL-based legacy applications is essential for improved decision-making, it remains a challenge for most businesses due to two primary reasons:

  • Shortage of COBOL Skills

There is a growing gap between the number of skilled COBOL programmers and organizations relying on the programming language. The average age of COBOL programmers is 55 years, and 70 percent of universities are favoring fancy languages like Java, C++, Linux, and UNIX over COBOL.

  • Need for Custom Programming

Analyzing data by directly querying the mainframe is a complex process. It requires custom development and therefore can be time-consuming and costly, with billing based on MIPS.

To addresses these two challenges, businesses need a solution that can fuel their data integration efforts, while ensuring data quality and reducing the need for hand-coding the processes.

How Centerprise Facilitates COBOL Data Extraction

Centerprise is a complete data integration solution that allows users to import data from a variety of sources, including legacy systems, transform it and write it to a destination of their choice. With its user-friendly, drag-and-drop interface and unparalleled data mapping capabilities, Centerprise makes the process of extracting data from COBOL-based systems simple, quick, and cost-effective.

cobol data extraction, legacy modernization

Centerprise offers complete support for COBOL data extraction with the functionality to:

  • Read a COBOL File — Centerprise features a high-speed COBOL file reader that can efficiently process large COBOL files.
  • Parse a Copybook — The built-in copybook parser reads a COBOL copybook and automatically builds the layout. When a copybook is not available, users can import a COBOL data file as a fixed length file and manually define field markers, data types, and numeric formats.
  • Identify USAGE, REDEFINES, and OCCURS — Centerprise offers support for different clauses used in a COBOL data file, including REDEFINES, OCCURS, and USAGE, such as COMP, COMP-3, COMP-5.

Once a COBOL data file has been imported, users can leverage the code-free, drag-and-drop environment of Centerprise to transform and write data to a destination of their choice.

Download our whitepaper to learn how Centerprise can help you combine legacy COBOL data with modern data streams and get a unified view of your information assets.

Automate Partner Data Exchange and Integration through Astera’s Customer Portal

Effective business intelligence demands proper data collection and integration processes. Modern businesses receive data in different formats from a variety of communication protocols, when efficient collaboration and tracking of data exchanges are more important than ever before.

Partner Data Exchange and Integration Automation

Partner Data Exchange – Centerprise Data Integration Software

As data increases, the maintenance and management of multiple files from disparate sources takes its toll on IT teams. File based partner data exchanges need to be fast, reliable, and secure for transfers, especially on large-scale operations. Manual data entry and file exchange over channels like email aren’t sufficient anymore. Businesses need a centralized platform where customers, partners, suppliers, and employees can upload data files, ensure compliance, and track file status.

Enter the new Astera Cloud Customer Portal.

Key features include:

Centralized Uploading

Partners upload data files directly on the portal URL using their unique login credentials. Validation and cleansing take place automatically on the backend Centerprise server using custom integration flows. A response file is generated at the end for both the admin and users.

File Tracking

The Customer Portal comes with inherent file tracking: all involved remain up-to-date on file statuses in real-time via the Dashboard. For unsuccessful uploads, the response file contains error codes and explanations. Status emails can also be configured for all types of accounts. This ensures all necessary files are uploaded correctly after data validation, allowing the final workflow to run as scheduled.

Security

Secure Sockets Layer (SSL) and file encryption are supported on the Customer Portal. Uploaded encrypted files are automatically decrypted, processed, and then re-encrypted by Centerprise to be sent back.

Scalability

The Customer Portal is built to handle unlimited data volume and borrows on Centerprise’s powerful scalability. Cloud servers can also be added for reliable load balancing.

White Label Capability

Astera Cloud’s Customer Portal is customizable, with the ability to use your company logo and domain name. Get all the functionality of Centerprise while maintaining brand cohesion.

For a live demo and information on pricing for our partner data exchange and integration solution, call us at +1 888-772-7837 or email us at sales@astera.com

Business Intelligence Tools and Centerprise

Astera Software’s Centerprise Data Integrator is an easy-to-use and robust data integration tool which can be used for ETL tasks, data quality and profiling. Centerprise seamlessly integrates with various data visualization and reporting tools to create a full-stack Business Intelligence process.

data-warehouseing-img

What is BI Software?

BI software is typically designed to analyze, transform and prepare data for reporting. BI tools use data which has been previously stored and which may or may not be stored in a data warehouse/data mart. BI tools enable users to transform raw data into legible information that enables businesses to make better decisions, which in turn allows for growth and increased revenue.

Centerprise is a complete data integration solution which can be used as a standalone product or can be integrated with other BI tools such as prediction analysis and data visualization tools to achieve a complete data mining and analytics process.

Centerprise provides a scalable, powerful, robust platform which helps in many processes, including: data extraction, data scrubbing, data warehousing, and data migration.

Some of the benefits of Centerprise include:

  • Support for complex hierarchical data
  • Easy to use graphical interface
  • Well-suited for business as well as technical users
  • High-performance and scalable
  • Built-in extensive set of data transformation features
  • Data profiling

    If you’re interested in seeing Astera Software products in action, check out our monthly webinar series! You can find out more on the Astera Software Events page.