What is your excuse to not have a data warehouse?

Posted By: administrator
Posted On: 26 Nov 2019

Well, if you are one of those who are still debating about the technical reasons to justify having a data warehouse, then you might as well skip this article. But, it is quite possible that someone on your team thinks otherwise and therefore it might still be a good reason for you to read on. In any case, it will get you thinking.

There have traditionally been three primary factors for not having a data warehouse and using product databases or excel spreadsheets as the reporting solutions. 

Let’s talk about each of these in detail to see if an argument based on these three factors holds any water or not.

Cost  

The foremost reason is the cost of maintaining a data warehouse. The cost of a data warehousing solution is usually broken down into three constituent parts:

Cost Breakdown: Data Warehouse Software  

Many database vendors have been in the business for a long time. But, I am not going to focus on an on-premise solution because that is the route you want to do; this article is not for you. 

Now, let us shed some light on popular cloud data warehouses like Google BigQuery, Amazon Redshift, and Oracle Autonomous DB.

Google BigQuery:    Google has an attractive pricing model. You can read a detailed article about it here. The basics for their on-demand plans are that you may get charged for:

  1. the amount of data stored and 
  2. how much data you process while querying data

Assume a scenario where you have 

  1. 1Tb of storage. 
    1. This is a lot of data for most SMBs, but let’s go with it.
  2. Two users who run reports or queries 20 times a day and each time they process 10GB of data for their reporting bringing the total data processed to 200GB per day.
    1. Again, this is very conservative. The amount processed can be brought down with good partition planning and other strategies.

For the scenario listed above, Bigquery might cost you about $45/Mo. You can go to the pricing calculator and try it out for yourself. 

image-3-1024x644

Oracle Autonomous DataWarehouse:   Oracle’s Autonomous Data Warehouse runs on its own Exadata Converged Infrastructure. Exadata is the same infrastructure that runs on-premises data warehouses of some of the largest companies on the planet. Oracle has made Exadata available in the cloud in small slices along with a tried and tested database with features and functionality. We’ll leave the technical discussion for another day, but let’s get a look at the cost.

Oracle’s Autonomous DB is priced based on compute resources used and the storage utilized. You have to buy minimum storage of 1TB at the time of writing this article.

The compute, though, can be turned off an on depending on when you want reports run on the service.

For this pricing example, I’ve chosen a compute usage time of two hours per day. By plugging these numbers into the oracle cloud estimator that you can find here, you will the price come up to $378. You get a lot more for this price, like inbuilt ML capability, robust SQL, a fully managed service, a Jupyter notebook, and a data visualization tool and a lot more.

image-4-1024x429
Oracle Autonomous DataWarehouse
Oracle Autonomous DataWarehouse

AWS Redshift: AWS is the first public cloud to offer a cloud-native service dedicated to data warehousing. The service is called Redshift, and it is the most popular cloud data warehouse. It boasts thousands of companies small and large as its customers. AWS Redshift is priced based on compute resources used and the storage utilized. You don’t have an option to buy a VM with a 1TB option, so we have to go for a bigger server, which impacts your price.

The compute, though, can be turned off an on depending on when you want reports run on the service.

For this pricing example, I’ve chosen a compute usage time of 730 hours per month. Please note that in contrast, we only selected 2 hours per day for Oracle. Part of the reason for this choice is that the pricing calculator by Amazon doesn’t allow for a scenario planning as the oracle calculator does. By plugging these numbers into the AWS cloud estimator that you can find here, you will the price come up to $644.05. This pricing is much higher, and you don’t have the option to stop using the compute node unless you terminate the Redshift cluster. Refer to the billing section in this FAQ link.

AWS Redshift

Cost Breakdown: Data Warehouse Administration and Maintenance  

Category Google BigQuery Oracle Autonomous DB AWS Redshift

CategoryGoogle BigQueryOracle Autonomous DBAWS Redshift
Fully Managed by Cloud VendorYesYesPartially
Patch OwnershipYesYesPartially
Upgrades OwnershipYesYesPartially
Performance TuningNoYesNo
Backup and RecoveryYesYesYes

As the table above highlights, Oracle and Google offer the best when it comes to offering a fully managed service with Oracle going one step further and even tuning your poor performing queries, typically a job performed by a DBA. It is safe to say that Google and Oracle services won’t require a DBA; however, it is always good to have a person knowledgeable about databases handy even if it is not full-time. Redshift won’t require a full-time DBA, but we need some maintenance to keep it humming.

In summary, depending on the software you select, you may not be spending any time or money on maintaining the data warehouse. THE SAVINGS FROM AUTOMATION ARE SIGNIFICANT, unprecedented until just a couple of years ago. 

Cost Breakdown: ETL Software  

Traditionally, the ETL market was dominated by enterprise players like Informatica, Oracle, Talend, and some open-source alternatives. Both these methodologies require heavy lifting on the part of the company adopting these technologies. ETL engineers had to be hired who could stitch together data from various sources systems using tools provided by the vendors listed above. What that meant was that analysts, business users, and visualization developers had to wait on the ETL engineering team or IT team to build out the pipeline before they can start working on the data. This methodology, although robust, introduced delays into the process and was not amenable to the fast-changing market conditions and the demands of the end-users. Read our article on what is a data pipeline for more details on the challenges and the opportunity. Fortunately, a rise in cloud computing, REST APIs, and cloud-based data warehouses have created opportunities for companies like Saras Analytics to simplify the architecture, accelerate the development of data warehouses, and timely availability of data to the business users, visualization developers, and analysts. These users can sign up for products like Daton, just like they would sign up for any SaaS Application, configure the data sources from where they expect data, configure the cloud data warehouse, and that is all. Data replication starts almost immediately, and your raw data or curated data sets become available in a few hours. 

Cost – $150 for most SMB businesses. 

Time  

Another vital reason for not considering data warehousing is Time. To create a data warehouse, the customer has to follow the process in the picture below:

Building an on-premise data warehouse:  

  1. Procure data warehouse and hardware to run the database (Usually takes months of negotiation)
  2. Procure an ETL tool (Usually takes months of negotiation)
  3. Install the data warehouse (A few weeks)
  4. Install the ETL tool (A few weeks)
  5. Deploy engineering resources to build data pipelines (Usually takes months)
  6. Create data models (A few weeks)
  7. Create data visualizations (A few weeks)

Building a cloud data warehouse with traditional ETL tools:  

  1. Sign up for a cloud data warehouse (Takes minutes) 
  2. Procure an ETL tool (Usually takes months of negotiation)
  3. Install the ETL tool (A few days)
  4. Deploy engineering resources to build data pipelines  (Usually takes months)
  5. Create data models (A few weeks)
  6. Create data visualizations (A few weeks)

Building a cloud data warehouse with Daton:

  1. Sign up for a cloud data warehouse (Takes minutes) 
  2. Sign up for Daton (Takes minutes)
  3. Configure sources (Takes minutes)
  4. Create data models (A few weeks)
  5. Create data visualizations (A few weeks)

As you can see, using a cloud data warehouse with Daton accelerates the time to data model development, which is really where any value is added to the business. Customers can get to data model development the same day they configure Daton. 

As an example, if a traditional approach took three months to get to the data modeling stage, using Daton gets customers to that stage in a day and sometimes even less than that.

Resources  

Resources, though the last but an important excuse that is given not to have a data warehousing solution. In a traditional data warehousing world, building a data warehouse required the following resources. Depending on the size and complexity of the data warehouse, some additional resources may be necessary, but for a majority of the companies, these roles typically would suffice. 

ResourceOn-PremisesCloud Data Warehouse with Traditional ETL toolsCloud Data Warehouse with Daton
Database administratorYesNoNo
System AdministratorYesNoNo
Network AdministratorYesNoNo
ETL Tool AdministratorYesYesNo
ETL DeveloperYesYesNo
Data Model DeveloperYesYesYes
Visualization DeveloperYesYesYes

When companies were building out data warehouses, it was typical of them to either hire different resources for the roles listed above or blend some of these roles into a smaller set of roles. With Daton and cloud data warehousing, the need for administrative roles and roles that cannot be attributed directly to business value generation have greatly diminished. Hence, many engineers in the administration roles, the behind the scenes heroes as we like to call them, are now pursuing software development or data analysis roles, among others. To showcase their skills tied to business value generation in an age where the size of their teams is shrinking, cloud data warehouses and Daton offer a great advantage.

The technology costs of a data warehouse have dropped from hundreds of thousands of dollars to a few hundred dollars a month. The number of resources required to build a functional data warehouse has come down to one resource with SQL skills. The time to go-live with a functional data warehouse has also gone down from months/years to weeks. We, at Saras Analytics, firmly believe that data has the power to transform businesses. Having supported hundreds of businesses in their data warehousing endeavors, we have the right product and team to support your data warehouse initiative. 

Sign up for a free trial of Daton today and kick start your data warehouse practice.

Avatar
Krishna

Sign up for a free trial of Daton today.

Take your analytics game to the next level

×
-