What is ELT?
What is ELT? It is one of the most common questions on the minds of any analyst of developer starting out on their data warehousing journey. ELT is the acronym for Extract, Load, Transform – a process that is critical to data warehousing. Over the last decade components of the analytics stack like data warehouses, storage, compute have seen tremendous improvement. These technological advances have allowed us to re-look at the traditional Extract (E), Transform (T), and Load (L) process.
ELT is an evolution of the traditional ETL systems designed to make better use of the inexpensive and scalable compute and storage resources offered by modern cloud data warehouses. In a nutshell, in the ELT setup, raw data is first loaded into the data warehouse and then transformed into the warehouse.
What are the benefits of ELT?
The following can be listed as the major advantages of having a modern ELT setup:
- Efficient use of data warehouse resources – By pushing down transformations to the database layer, an ELT setup can take advantage of the scalability of the cloud resources to perform transformations within the data warehouse.
- Flexibility in data transformation – An ELT operation typically involves replication of data from the source system to the data warehouse with minimal modifications, if any. As a result, the raw data is available in the data warehouse ready to be transformed to address any number of use cases. The same copy of the raw data can also be changed multiple times or altered on the fly using SQL or other programming languages.
- No more data set rigidity – Traditional ETL models force-fed data into pre-defined tables which were difficult, slow, and expensive to change. An ELT strategy allows for rapid experimentation because of the availability of raw data in the data warehouse. While storage and processing costs were influential factors that drove a lot of decisions in the ETL world, cloud technologies and innovative payment models are the driving factor in an era dominated by ELT strategies.
What are the everyday use cases that benefit from ELT?
ELT is mostly used for the following occasions:
- Massive data volumes – Data such as the ones generated from sensors, system activity captured in log files, etc. benefit significantly from adopting an ELT strategy. ELT approach works well because the volume and velocity of data sent to the warehouse make the application of transformations not only challenging but also expensive and slow. Once the raw data gets replicated to the warehouse or object storage, changes can then be applied in the data warehouse using tools explicitly written to handle large data volumes stored in a database system.
- Handling unstructured data – Hadoop and MapReduce have ushered in the era of leveraging commodity hardware to store and process large volumes of unstructured data. These systems rely on extracting the data from source systems and replicating or loading that data to directly to a distributed storage cluster. Clusters of compute nodes process this stored data at a later time, transform the data, and may, in some cases, load the data to another database used for reporting purposes.
- Schema on read vs Schema on write – In many instances, analysts may require the schemas not to be rigid and appreciate the flexibility provided by leveraging SQL views on top of raw data tables in the warehouse. Traditional ETL models followed a schema on write approach whereas ELT lends itself favourably to a Schema on read strategy. A schema on read approach encourages data exploration and experimentation by eliminating the constraints imposed by a pre-defined schema.
What are the common use cases that benefit from ETL?
While ELT has become a dominant strategy in data warehousing, there are still situations where an ETL approach is still a better fit.
- Data enrichment – In scenarios where you want data enriched before persistence in a data warehouse, an ETL approach may be more favourable.
- Data Cleansing and Filtering – In many cases, you may want to run some cleansing routines or filter out data that is not required to be available in the data warehouse. In such cases, applying light transformation logic before data persistence would result in cost savings.
Can ETL and ELT solutions coexist?
A modern ETL tool should support both the ETL and ELT strategies. A decrease in storage costs and increase the level of automation-related to database management have led to ELT becoming the dominant strategy in recent times. However, there are still situations in which taking the ETL approach may prove to be a cleaner approach. For example, suppose you don’t want PII data from source replicated to the data warehouse. In that case, you can either mask it at origin or hide it during data transfer and before loading in the data warehouse. Another example is when incoming data needs to be filtered before loading in the data warehouse.
How can Daton help accelerate your data warehouse build-out?
Daton is a modern cloud data pipeline designed to replicate data to a cloud data warehouse with the utmost ease. Daton is the cheapest data pipeline in the market which has built-in support for 100+ applications, databases, files, cloud storages, analytics, CRM, Customer support, and many others. Analysts can replicate data from any source to any destination (BigQuery, Snowflake, Redshift), without writing a single line of code and in a matter of minutes. Sign up for a free trial of Daton and kickstart your analytics journey.