Current ETL/ELT Framework
ETL/ELT has so far been an integral component in BI implementations of all sizes. Its sole purpose is to capture, synthesize, and consolidate data from a wide variety of enterprise sources. Its ultimate goal is to manage a centralized data store from which the business users can draw insights to monitor, and manage the performance of their business. It is always a challenging task, in this framework, to balance between processing ever-increasing data volumes, and delivering timely insights to the end users. Transforming data from Point A to Point D (in the diagram shown below), i.e. Time to Insights, will be, undoubtedly, the most critical factor for a successful BI implementation.
(ETL: Extract Transform Load; ELT: Extract Load Transform; BI: Business Intelligence)
Challenges in BI 2.0
We are experiencing a phenomenal growth of data at present (hence you might have heard of the three Vs – Velocity, Variety, and Volume, in one of those meetings with your data team a while back!). The traditional ETL/ELT framework has always been intended for a batch-oriented, report-centric, and system-focused data nature. It would no longer be capable and responsive enough to manipulate an ever-increasing amount of data, arriving in a wide variety of structures (or no structures at all), and at a dizzying speed. Here are some startling statistics – “about 1.7 megabytes of new information will be created every second for every human being on the planet”, or “every minute up to 300 hours of video are uploaded to YouTube alone” ⇒
Information consumers in a BI 2.0 world are also much more demanding. They want to extract insights quickly from many different sources, regardless of the underlying data semantics, and technologies. It is no longer acceptable to wait for hours for the ETL to complete its processing because no business decisions can be made on stale data.
Data Integration in BI 2.0
ETL/ELT should now be part of a more comprehensive framework that allows a robust orchestration of data movements within an enterprise, and data interactions with its clients. The new framework should be able to accommodate data arriving from many different channels (real-time weblogs, clickstreams, to batched deliveries). It should be ready to ingest a variety of data semantics (structured to unstructured), be defined with a balanced governance between managed and unmanaged/adhoc datasets. In addition to system-defined data flows, via some enterprise ETL/ELT tools such as IBM DataStage, the framework should support user-defined data flows, built and managed by the users via some data preparation self-service tools such as Alteryx.
Below is a high level depiction of an alternative BI 2.0 Data Integration Framework.